Welcome and Introductions
JATS and NISO — the value of community standardization
Todd Carpenter, National Information Standards Organization (NISO)
Since its earliest days, the publishing structures that became the Journal
Article Tag Suite (JATS) standards have proven themselves valuable as a method
for exchanging journal content. Designed by a team with decades of
typesetting and markup expertise, the specifications were quickly adopted by
preservation communities and as a basis for many of the largest publishers
production processes. As digital publishing evolves, the importance of common
vocabulary structures like JATS will only increase, because exchanging digital
files is a critical component in a functioning digital content ecosystem.
NISO plays a critical role in bringing together content creators,
intermediaries, and consumers to develop interoperability standards for the
creation, discovery, distribution and preservation of content. With the
support and engagement of the National Library of Medicine, NISO has engaged a
broader community of participants to the standardization of JATS and it
continues to support its ongoing development and expansion of the standard.
During this brief talk, Todd will discuss the value of national
standardization of JATS, and the future of interoperable content standards in
When the "One Size Fits Most" tagset doesn't fit you
Tommie Usdin, Mulberry Technologies, Inc.
JATS does not actually claim to be a "one size fits all" specification. However, many information content consumers (libraries,
archives, on-line services) accept only content that is valid to one of the JATS models, and in many cases specify a subset
of the model defined in one of the JATS instantiations (Archiving, Publishing, or Authoring). Thus, content creators find
that their vendors and tools often assume that they will be using one of the JATS models "out of the box". This can present
a real problem when a publisher has, and wants, information that is not modeled in JATS, or is not modeled in the JATS DTD
their vendors and publishing partners require. In this case, the publisher has several options: Drop the inconvenient information;
use "Custom Metadata" , hide the inconvenient information in prose, abuse a tag, suggest a modification of the standard, or
modify the tag set to encode the information that matters to you. None of these options are ideal, and which to choose in
large part depends on circumstances.
The Challenges and Benefits of Automating NLM-to-ePub3 File Conversion
Mike Dean, CFA Institute
While converting NLM book tag XML to an ePub seems like a relatively straightforward process (hey, an ePub is mostly just
HTML, right?), setting up a workflow to do just that is quite challenging. It turns out writing the XSLT could be considered
the "easy" part. Other problems, such as dealing with ePub display issues across ebook readers (anything from minor CSS differences
to major MathML display problems), deciding what tagging makes the most sense semantically, and figuring out how to give semantic
meaning to visual formatting such as table cell shading add a layer of complexity to the process. This paper discusses the
challenges, rewards, and as-yet unresolved problems encountered in the process of creating an NLM to ePub3 workflow.
Tracking Changes to JATS XML in an Online Proofing System
Charles O'Connor, Dartmouth Journal Services
Antony Gnanapiragasam, Dartmouth Journal Services
Michael Hepp, Dartmouth Journal Services
When Dartmouth Journal Services began building ProofExpress, an online, XML-based proofing and editing system for STM journals,
we knew that the most difficult challenge would be creating an accurate change-tracking mechanism. Change tracking is an essential
feature, both to ensure that author corrections conform to journal style and to catch any changes to data or claims. The system
must not only track each insertion, deletion, and formatting change, it must also give production editors the ability to accept
or reject changes without breaking the XML.
ProofExpress is built on SDL LiveContent Create (formerly Xopus). We use its extensive API to add custom elements and attributes
to mark changes in the XML. The XML is then transformed through XSLT to group and nest changes so that they can be acted upon
by the production editor. To prevent breaking the XML during this process, a rule engine enforces the order of acceptance
and rejection of changes.
Case Study on Redlining application using JATS XML at the International Organization for Standardization
Chandi Perera, Typefi Systems
Redlining is the process of comparing two datasets and displaying the changes in a meaningful and human readable way. Comparing
XML files and rendering the results is more complex than just identifying the differences between two files. Using the experiences
of International Organization for Standardization (ISO) as a case study, this paper will describe the process of comparing
two versions of a JATS XML file, filtering out changes that have no meaningful impact (e.g. changes in tag order of article-id
tags) and ignoring changes that the business requirements deem trivial. The paper will go on to identifying and rendering
changes to content ranging from simple paragraphs, tables, equations, figures and lists. The case study will cover how differences
are rendered in a way where the reader can easily understand and follow the changes. The paper will describe the easy wins,
the difficulties and impossibilities of a JATS XML redlining workflow. The paper will conclude with what changes can be made
to process and content structure to make redlining more effective.
Ontology based Biomedical Research Paper Authoring Support Tool
Senator Jeong, National Center for Medical Information and Knowledge, Korea National Institute of Health
Sejin Nam, Biomedical Knowledge Engineering Laboratory, Seoul National University
Hyun-Young Park, National Center for Medical Information and Knowledge, Korea National Institute of Health
Biomedical research papers often follow IMRAD (Introduction, Methods, Results, and Discussion) structure. Lexical bundles
(also known as formulaic patterns) function as basic building blocks of this discourse structure. They are combinations of
three or more words that frequently occur in a corpus, For example, the lexical bundle "the purpose of this study was" indicates
the research purpose in Introduction section.
The goal of this study is to develop a biomedical research paper authoring support tool that provides writers with appropriate
expressions for a specific discourse purpose in a section.
Lexical bundles were extracted from sentences in 160,150 structured abstracts of the PubMed Central Open Access Subset and
analyzed their distribution by IMRAD sections. We designed the Lexical Bundle Ontology (LBO) that semantically organizes lexical
bundles according to their rhetorical purposes in each IMRAD section of a biomedical research paper. Then, a JATS -compliant
authoring support tool was implemented. This tool lists up candidate lexical bundles responding to authors' discourse purposes
in a specific section and helps to complete sentence. We will present use case scenarios of this authoring support tool. We
expect that this tool helps to conveniently organize their ideas and arguments and lower the language barrier for non-English
If you insist on InDesign typesetting and NLM-style tagging, we're gonna generate XML as an afterthought, bloated with layout
Gerrit Imsieke, le-tex publishing services
This paper covers two major topics: a schema that extends and slightly modifies BITS and a conversion pipeline from InDesign
via this modified BITS dialect to EPUB. This schema, HoBoTS XML, ha been developed for Hogrefe Publishing Group's STM, specialist,
test, and self-help books. Apart from minor deviations/enhancements with respect to BITS 0.2, HoBoTS' main features are the
orthogonal layers of RDFa for conveying additional semantics and CSSa for conveying layout information. While RDFa provides
extension points for future application, CCSa (CSS properties expressed as XML attributes) is already being applied successfully.
It provides a neutral layer for conveying the presentational semantics, for example when converting from print publications
to EPUB. HoBoTS XML and, from there, EPUB output is produced out of InDesign input by means of an XML-last conversion pipeline.
This pipeline is based on a sophisticated XProc/XSLT2/Schematron framework that is available under an open source license.
The paper/talk will also focus on this open source framework, the reasons and prerequisites for choosing XML last, how extending
BITS with CSSa facilitates EPUB creation, the framework's suitability for XML first workflows, and why it's straightforward
to convert from HoBoTS to plain JATS.
JATS and the Standards Ecosystem
Bruce Rosenblum, Inera, Inc.
JATS, BITS, and publishing live in an ecosystem of interrelated standards and initiatives. If you don't know what the acronyms
ORCID, PIE-J and JAV stand for, this talk will describe what they are and why JATS implementers should be familiar with them
and many standards, recommended practices, and other initiatives in the JATS neighborhood.
Inconsistent XML as a barrier to reuse of Open Access Content
Daniel Mietchen, Open Knowledge Foundation, Germany
Chris Maloney, PMC (Contractor with A-Tek, Inc.)
Nils Dagsson Moskopp,
In this paper, we will describe the current state of some of the tagging of articles within the PMC Open Access subset. As
a case study, we will use our experiences developing the Open Access Media Importer, a tool to harvest content from the OA
subset and automatically upload it to Wikimedia Commons.
Tagging inconsistencies stretch across several aspects of the articles, ranging from licensing to keywords to the MIME types
of supplementary materials. While all of these complicate large-scale reuse, the unclear licensing statements required us
to implement text mining-like algorithms in order to accurately determine whether or not specific content was compatible with
reuse on Wikimedia Commons.
Besides presenting examples of incorrectly tagged XML from a range of publishers, we will also explore past and current efforts
towards standardization of license tagging, and we will describe a set of recommendations for generators of content on how
best to tag certain data so that it is both compatible with existing standards, and consistent and machine-readable.
The Web, the W3C and the Future of Publishing
Liam Quin, The World Wide Web Consortium (W3C)
As custodians of the World Wide Web, the Web Consortium (W3C) is both a
leader and a follower. We follow because you can't standardise a process or
technology until it is in use. We lead, because we guide the new
technologies from technical, business, and social perspectives.
The Web has already changed publishing, and we are at the brink of even
bigger changes. What happens when Web technologies are good enough to
replace existing authoring tools? What happens when the Web includes SVG
and MathML and can support typography powerful enough to produce printed
books? What happens when electronic books and Web sites converge?
We're not quite there yet, but W3C is working in this area, working with
commercial publishers, with IPDF and other organizations, listening to
industry experts and tool-makers, and gently nudging the Web forward all
over the world.
The difficulty facing publishers today is how to manage when the Web
isn't quite ready. The right question to ask is, how do we make the Web
In this session Liam Quin from the W3C will describe what W3C is doing
in its new Publishing Activity, how it will affect you, and how you can
Extending JATS to include the NISO/NFAIS Recommended Practices for Online Supplemental Journal Article Materials
Karen Gutzman, National Library of Medicine
Kimberly A. Tryka, NCBI, National Library of Medicine
This paper discusses our experience of creating an extension for JATS that incorporates the NISO "Recommended Practices for
Online Supplemental Journal Article Materials" (NISO RP-2013). We will discuss our analysis of the recommendations and our
comparison of the recommendations with JATS, as well as our thrashing over language and terminology associated with supplementary
materials and our eventual creation of the extension. The extension is not part of the official JATS specification; it is
a local extension that will be made publicly available for community use and discussion.
NLM Conversion to Build "Atomic" Physics Content in an Agile Fashion
M. Scott Dineen, The Optical Society
Mark Gross, Data Conversion Laboratory, Inc.
Devorah Ashlem, Data Conversion Laboratory, Inc.
Beth Friedman, Data Conversion Laboratory, Inc.
Alexander Schwarzman, The Optical Society
Gitty Kupferstein, Data Conversion Laboratory, Inc.
When faced with the challenge of converting 8 highly technical journals spanning 95 years, how do you divide responsibility
between the content owner and the conversion vendor? Do you spend a year on document analysis and developing conversion specifications,
or do you hand the project over to a well-regarded service provider and rely on their expertise entirely? This paper demonstrates
how an agile approach to content conversion with close collaboration between the publisher and the conversion vendor has allowed
The Optical Society (OSA) and Data Conversion Laboratory, Inc. (DCL) to navigate between the two extremes and create a high-quality
digital archive that will serve OSA's strategic aims for developing innovative products and services.
What JATS Users should Know about the Book Interchange Tag Suite (BITS)
Jeffrey Beck, NCBI/NLM/NIH
The Book Interchagnge Tag Suite (BITS) is a book model based on the JATS article model. There are many things that can be
structured the same way in both a Journal Article and a Book (or a part of a book), and some things that are very different.
We'll review the things you 'get for free' if you are already familiar with the article model, and what parts of the book
model you will need to pay a little more attention to.
Perspective on application of journal article tag extensible markup language for scholarly journal articles written in Korean
Sun Huh, Department of Parasitology, College of Medicine, Hallym Hallym Hallym University
Tae Jin Choi, National Research Foundation, Korea
So hyeong Kim, National Research Foundation, Korea
Korea is the fifth ranking country in the number of PMC journals. In May 2013, 73 journals from Korea are included in PMC.
From 2013, a variety of funding agencies to research and journal publication began to introduce the open access full text
databases in the fields of medicine, science, and social science & humanity. In those databases, JATS 1.0 will be used since
Korean articles can be easily manipulated for full text XML. It is necessary for editors or publishers to make full text XML
files based on JATS 1.0. I would like to introduce and present the present situation of application of JATS 1.0 to academic
journals not only in English but also in Korean: technology, training programs and policy of Korean Government on open access
full text XML. This experience in Korea can be a model in constructing mother-language open access full text journal databases
based on JATS 1.0. The usefulness of JATS is stressed in scholarly journal publication of all fields and propagation of the
information in Korea besides of medical fields.
Journal Article Tag Suite Update and Open Discussion
mPach: Integrated Publishing and Archiving of Journals in HathiTrust
Kevin S. Hawkins, Michigan Publishing, University of Michigan
Seth Johnson, Michigan Publishing, University of Michigan
Carrick Rogers, Michigan Publishing, University of Michigan
Bryan Smith, Michigan Publishing, University of Michigan
mPach is a package of tools being developed to provide a modular platform to enable the publication of born-digital open-access
journals in the HathiTrust repository. One of the chief technological challenges for this system is the conversion of edited
manuscripts to an archivable format. We selected JATS as our preservation format because of the increasing coalescence of
the publishing industry around this open, non-proprietary standard. This paper provides a technical overview of the mPach
platform, with special attention paid to the design and functionality of Norm, a tool being developed to convert Microsoft
Word documents to JATS.
Allen Renear, Graduate School of Library and Information Science,
University of Illinois at Urbana-Champaign