Selecting a Model & Schema
After deciding to use JATS, a project or organization is faced with many decisions:
- Which Tag Set?
- Which options within that Tag Set?
- What expression language or version of the model to use?
In this essay we describe these choices and help make those decisions.
There are three JATS Tag Sets:
- Journal Archiving and Interchange is the most permissive of the Tag Sets. The Archiving and Interchange Tag Set defines elements
and attributes that describe the content and metadata of journal articles, including research and non-research articles, letters,
editorials, and book and product reviews. The Tag Set allows for descriptions of the full article content or just the article
header metadata. It also allows for preservation of the sequence of content and generated text. Read more about the intent
of this Tag Set at: https://jats.nlm.nih.gov/archiving/
- The Journal Publishing Tag Set is a moderately prescriptive set, optimized for the
archives who wish to regularize and control their content, rather than accept the sequence and arrangement presented to them
by any particular publisher. Publishing is also intended for use by publishers for the initial XML tagging of journal material,
usually as converted from an authoring form like Microsoft Word. Read more about the intent of this Tag Set at: https://jats.nlm.nih.gov/publishing/
- Article Authoring is the most prescriptive of the Tag Sets. The Article Authoring Tag Set is optimized for authorship of new
journal articles, where regularization and control of content is important, and where it is useful rather than harmful to
have only one way to tag a structure. This Tag Set is more prescriptive than descriptive and includes many elements whose
content must occur in a specified order. Read more about the intent of this Tag Set at: https://jats.nlm.nih.gov/articleauthoring/
In consultation with the partners with which you want to interchange JATS documents, you should select the model that best
meets your needs.
The Archiving and Publishing Tag Sets provide
options for table modeling; they are available with either:
- a table model based on the XHTML table model, or
- both the XHTML and OASIS/CALS table models.
The Authoring Tag Set allows only the XHTML table model.
Many JATS users prefer the XHTML-based table model because it is easy to use in web-based and other electronic publications.
Some users prefer the OASIS/CALS table model because they have tools that require this model or because they believe that
it is easier to format complex print tables using it.
All three Tag Sets provide two options for math modeling. Each Tag Set is available with either:
- MathML version 2.0, or
- MathML version 3.0.
If you do not have MathML, we recommend starting with version 3.0. If you are not using MathML, we
recommend leaving the modules in and ignoring (never referencing) them.
We provide versions of the tag sets with MathML 2.0 for backwards compatibility. Some users of
MathML version 2 find that their MathML 2.0 is also valid MathML 3.0, and they can more effortlessly
into MathML 3. However, MathML 3.0 enforces rules that were not enforced in the version of the MathML
2.0 DTD used in JATS. The rules were in the textual documentation for MathML 2.0 and in some
versions of the MathML 2.0 DTD, but some users may find that their existing documents are not
forwards-compatible with MathML 3.0.
Note: MathML 2.0 that is not valid against MathML 3.0 may render incorrectly on display.
Note: It is likely that JATS will move to MathML 3.0, and
stop supporting MathML 2.0 in a future version.
XML Tag Sets may be expressed as schema languages (also called “Constraint Languages”). These schemas are used by XML software
to enforce the rules of the language and to guide authoring applications. The formal rules of JATS are expressed in the prose
of ANSI/NISO Z39.96-2012. In addition, for the convenience of users, schemas for several options of each of the Tag Sets are
provided by the NLM.
- DTD, or Document Type Definitions, are the oldest of XML modeling languages. DTDs are used in many environments that work
primarily with textual documents. DTDs are defined in the XML Specification.
- XSD, or W3C XML Schema, is a modeling language specified by the W3C. Among the strengths of XSD are strong datatyping, context-based
models, and the fact that XSD documents are in XML document syntax.
- RNG, or Relax NG Schema, is a modeling language for XML documents that enables strong datatyping, a wide variety of modeling
types, and both a compact and an XML document syntax.
Users usually use the modeling language that best suits the tools they are using. It is not unusual for an organization to
use one version of the modeling language in authoring and/or receipt of documents from partners and another in their database
environment.
At each level of the tree below, select only one option. When you get to the end node, you will see the name of the appropriate
model for you and a link to that model on the NCBI FTP site:
Archiving
- XHTML-based tables only
- OASIS/CALS and XHTML-based tables
Publishing
- XHTML-based tables only
- OASIS/CALS and XHTML-based tables