Introduction to Archiving Tag Set
The intent of JATS is to provide a common format in which publishers and archives can exchange journal content. The Suite provides a set of XML schema modules that define elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews.
The Archiving and Interchange Tag Set defines elements and attributes that describe the content and metadata of a journal article, including research articles; subject-review articles; non-research articles; letters; editorials; book, software, and product reviews; peer reviews, and author responses included with an article. The Tag Set allows for descriptions of the full article content or just the article header metadata.
The intent of the Archiving Tag Set is to provide a standardized format in which to preserve the intellectual content of previously-published journal articles, capture structural and semantic components, and provide a single format into which content from many providers can be translated easily, with minimal loss.
This focus on being a conversion target for multiple sources has made this Tag Set a large and inclusive one. The Tag Set includes many loose structures—including some with nearly all content structures optional—and many elements that were created explicitly to avoid discarding information tagged by users when the material is converted into this Tag Set from another format. Most of the attribute values in the Tag Set are character data values, accommodating any source value. Care has also been taken to provide several mechanisms (for example, user-defined name/value pairs in the metadata and information-classing attributes for many structural elements) to preserve the intellectual content of a document structure when that structure is converted from another tag set or schema to this one and there is no exact element equivalent of the structural or semantic element.
Although the presentation order (reading order) of a published journal article cannot always be preserved—particularly within the metadata—the Archiving Tag Set provides the most flexibility of all the NISO JATS Tag Sets allowing for preservation of observed content without resorting to stylesheets or generation of textual elements. For that reason, labels, numbers, symbols of tables, figures, sidebars, and the like can be recorded as elements, as can the punctuation and spaces inside bibliographic references and lists.
This Tag Set describes an article model that is an easy conversion target for content originally tagged in other article models.
By design, this is a model for journal articles, such as the typical research article found in an STM journal, and not a model for complete journals. This Tag Set does not include an overarching model for a collection of articles. In addition, the following journal material is not described by this Tag Set:
- Company, product, or service display advertising
- Job search or classified advertising
- Calendars, meeting schedules, and conference announcements (except as these can be tagged as ordinary articles, sub-articles, or sections within articles)
- Material specific to an individual journal, such as Author Guidelines, Policy and Scope statements, editorial or advisory boards, detailed indicia, etc.
Article Structural Overview
The Journal Archiving and Interchange Tag Set defines a document that is a top-level component of a journal such as an article, a book or product review, or a letter to the editor. Each such document is composed of one or more parts; if there is more than one part, they must appear in the following order:
- Processing Metadata (optional). The metadata that concerns the XML file rather than the contents of the article.
- Front matter (required). The article front matter contains the metadata for the article (also called article header information), for example, the article title, the journal in which it appears, the date and issue of publication for that issue of that journal, a copyright statement, etc. This is not textual front matter as appears in books, rather this is bibliographic information about the article and the journal in which it was published.
- Body of the article (optional). The body of the article is the main textual and graphic content of the article. This usually consists of paragraphs and sections, which may themselves contain figures, tables, sidebars (boxed text), etc. The body of the article is optional to accommodate those repositories that just keep article header information and do not tag the textual content.
- Back matter for the article (optional). If present, the article back matter contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references.
- Floating Material (optional). A publisher may choose to place all the floating objects in an article (such as tables, figures, boxed text sidebars, etc.) into a separate container element outside the narrative flow for convenience of processing.
Responses and sub-articles (optional). Following the front, body, back, and floating material, there may be
or more responses to the article or one or more subordinate articles:
- Response. A response is a commentary on the article itself, for example, an opinion from an editor on the importance of the article or a reply from the original author to a letter concerning his article.
- Sub-article. A sub-article is a small article that is completely contained inside another article.
Tag Sets Developed from the Suite
XML schemas (DTDs, XSDs, and RNGs) are provided for 4 different variations of the Archiving Tag Set:
- Archiving Tag Set using XHTML tables and MathML 2.0
- Archiving Tag Set using XHTML tables and MathML 3.0
- Archiving Tag Set using both XHTML tables and OASIS Exchange CALS tables with MathML 2.0
- Archiving Tag Set using both XHTML tables and OASIS Exchange CALS tables with MathML 3.0
This Archiving Tag Set is one of several created from the Suite. Information about the other Tag Sets may be found at the following site: https://jats.nlm.nih.gov
Many people and organizations have contributed to the development, maintenance, and documentation of JATS. In naming some in these Acknowledgments we want to make it clear that any omissions are accidents of history, and we appreciate all contributions.
We thank bmj.com, Molecular Biology of the Cell, and The Proceedings of the National Academy of Sciences of the U.S.A. for providing many of the sample articles used in this Tag Library.
We thank AIP, John Benjamin Publishing Company, and Tony Graham of Antenna House for examples used throughout the documentation.
We thank the members of the NLM DTD working group:
- Jeff Beck, Moderator, National Library of Medicine
- Alex Brown. Griffin Brown
- Mark Doyle. American Physical Society
- Beth Friedman. Data Conversion Laboratory
- Linda Good. Cadmus Communications
- Kathryn Henniss. HighWire Press
- Laura Kelly. National Library of Medicine
- Debbie Lapeyre, Tag Set Secretariat. Mulberry Technologies, Inc.
- Nikos Markantonatos. Atypon Systems, Inc.
- John Meyer. Portico
- Jules Milner-Brage. HighWire Press
- Tom Mowlam. BioMed Central
- Evan Owens. Portico
- Bruce Rosenblum. Inera, Inc.
- B. Tommie Usdin. Mulberry Technologies, Inc.
We thank the past and current members of the NISO JATS Committee; now the NISO JATS Standing Committee:
- Ardie Bausenbach. Library of Congress
- Jeffrey Beck. National Library of Medicine (NLM)
- Brooke Begin. Silverchair Information Systems
- Franziska Buehring. De Gruyter
- Paul Donohoe. Macmillan Science and Education
- Thomas Dowling. OhioLINK
- Mark Doyle. American Physical Society (APS)
- Patricia Feeney. Crossref
- Gustavo Fonseca. SciELO
- Kevin Hawkins. University of North Texas Libraries
- Kathryn Henniss. HighWire Press
- Diane Hillmann. Metadata Management Associates
- Debbie Lapeyre. Mulberry Technologies, Inc.
- Vincent Lizzi. Taylor & Francis Group
- Nikos Markantonatos. Atypon
- Mary McRae. Orbis Technologies
- John Meyer. ITHAKA/JSTOR/Portico
- Nick Nunes. HighWire Press
- Evan Owens. Evan Owens
- Laura Randall. National Library of Medicine (NLM)
- Kennett Rawson. IEEE
- Bruce Rosenblum. Inera Inc.
- Kathleen Sheedy. American Psychological Association (APA)
- Soichi Tokizane, Aichi University
- B. Tommie Usdin, Mulberry Technologies, Inc.
- Alex Wade, Microsoft Corporation
We thank OASIS, for use of the OASIS/CALS table model, the MathML committee for use of MathML, NISO for hosting this work, and the National Library of Medicine for hosting the non-normative but absolutely essential user documentation that makes JATS work.