JATS-Con Logo

JATS-Con 2017 Schedule with Abstracts

April 25 2017

8:00-9:00

Registration

9:00-9:15

Welcome and Introductions

9:15-9:45

Conference Opening

9:45-10:30

NISO STS ― An Update

Bruce Rosenblum, Inera, Inc.
Robert Wheeler, ASME (American Society of Mechanical Engineers)
Lesley West, ASTM International

10:30-11:00

Coffee Break

11:00-11:45

Implementation of JATS at Taylor & Francis

Vincent Lizzi, Taylor & Francis

In 2012 Taylor & Francis began a project to upgrade our proprietary DTD for journal article content. At that point, we had been using the Taylor & Francis Journal Article (TFJA) DTD for many years and had developed: automated validation; screens for production staff to quality check content; XSLT transformations for delivering content; and integration with our vendors. As we are a journal publisher that publishes new content and also retrodigitizes content archives, we chose JATS Archiving & Interchange (Green) as the basis for our new DTD. Our ambitious plan for TF JATS was set on a 1 year timetable and we got to work. We added several extension elements to JATS, including a model for Issue XML. A key realization we had was that we would need to support both TFJA and JATS content formats within our systems for the foreseeable future.

It is now four years later and we are upgrading to JATS 1.1. In approaching our second iteration, we are reflecting on what we have learned from TF JATS 1.0, examining more recent requirements, and looking for opportunities to align tagging practice between journals and books.

11:45-12:30

JATS4R Roadmap and Update

JATS4R Working Group,

The JATS for Resuse group will review its progress and plans and seek feedback and contributions from the JATS-Con crowd.

12:30-1:30

Lunch

1:30-2:15

In pursuit of family harmony: Introducing the JATS Compatibility Meta Model

B. Tommie Usdin, Mulberry Technologies, Inc.
Deborah A. Lapeyre, Mulberry Technologies, Inc.
Laura Randall, NCBI/NLM/NIH
Jeffrey Beck, NCBI/NLM/NIH

JATS is an Open Standard. Users may modify it by adding or removing elements and attributes to suit their needs. Some publishers have extended (added to) JATS based on their own requirements. And there are some public extensions like BITS, STS, and Taxpub. Users expect significant efficiencies from vocabularies based on JATS, including the ability to intermingle the documents in databases, to use tools created for JATS for their new vocabulary with minimal additional work, and to adopt rendering/formatting applications and change only those aspects specific to the new vocabulary. Some model changes create compatible documents, which can interoperate with JATS documents gracefully. But some model changes are disruptive. We discuss what types of changes to the JATS models can be integrated into existing XML environments and which may be disruptive. We propose a set of criteria to evaluate whether a proposed change will be seamless or might cause problems.

2:15-3:00

Circling in on the JATS Compatibility Meta-Model

B. Tommie Usdin, Mulberry Technologies, Inc.
Deborah A. Lapeyre, Mulberry Technologies, Inc.
Laura Randall, NCBI/NLM/NIH
Jeffrey Beck, NCBI/NLM/NIH

The JATS Meta-Model was developed to guide people who want to customize JATS to meet local needs and have their JATS-based vocabularies work gracefully with existing JATS-based infrastructure. From analyzing content models to defining "social behaviors" of XML elements, the process of defining the JATS Compatibility Meta-Model was rarely straightforward and very often led us to surprising conclusions. Why, for instance, is whether or not something is metadata not a defining property of compatibility? This paper aims to explain the process and thinking behind the model--how we came to the conclusions about compatibility and what we even mean by compatibility. We'll look at some of the assertions we started absolutely knowing to be important, and discuss why they're ultimately not in the Meta-Model. By examining the process behind the model and sharing our successes and failures, we hope to improve understanding of the model and its broader implications.

3:00-3:30

Coffee Break

3:30-4:15

JATS Subset and Schematron: Achieving the Right Balance

Alexander B. Schwarzman, OSA—The Optical Society

Ensuring that published content adheres to the publisher's business and style rules requires the implementation of quality-control solutions that encompass the entire enterprise, including vendors and in-house staff. The solutions must span the entire life cycle of the manuscript, from XML conversion to production to post-publication enhancements. Two techniques that may help in achieving this goal are 1) developing Schematron and 2) making a JATS subset. Both come with costs: Schematron change management requires development and maintenance of an extensive testbase; making a subset requires comprehensive content analysis and the knowledge of the publishing program's direction. Achieving the right balance between the two techniques may reduce the costs associated with them.

In this paper, we revisit the notion of "appropriate layer validation" at the current state of technology. We share the experience of running a successful large-scale quality-control operation that has been accomplished by using a combination of JATS subset and Schematron. After demonstrating what Schematron change management entails, analyzing the advantages and costs associated with building Schematron and with creating a subset, and considering several validation scenarios, we conclude with the suggestion that the two techniques, when used in tandem, may complement one another and help control software development costs.

4:15-5:00

PubMed: Redesigning citation data management

Kathleen Gollner, NCBI/NLM/NIH
Kathi Canese, NCBI/NLM/NIH

Over the last couple years, we have drastically changed the systems and processes used to manage PubMed citation data. It began with revising long-standing NLM policies and reducing reliance on manual citation corrections, then culminated with the release of the PubMed Data Management (PMDM) system in October 2016. With PMDM, we introduced a single system for managing citation data with a UI for correcting citation data errors. In this brave new world, the responsibility for correcting citation data was shifted from NLM Data Review to PubMed Data Providers. Any errors reported in PubMed citations are now forwarded to the publisher ― a strategy that publishers have enthusiastically upheld. This presentation will outline how the systems and process for managing PubMed citation data have changed, and detail the outcome of these changes since PMDM was launched.

April 26, 2017

9:00-9:45

Adoption without Disruption: NCBI's Experience in Switching to BITS

Martin Latterner, NCBI/NLM/NIH
Marilu Hoeppner, NCBI/NLM/NIH

The NCBI Bookshelf is an online archive of books and documents in life science and healthcare. Its growing collection comprises over 5000 titles, the majority of which are stored as full text XML. In the fall of 2014, Bookshelf began work to adopt the Book Interchange Tag Suite (BITS) DTD and to replace the NCBI Book Tag Set Version 2.3 as its XML format of choice. It became immediately apparent that the NCBI could not simply perform a one-time "switch" to BITS. It needed to support the new schema alongside the old one. The project was simply too complex: Focusing all energy on BITS would have meant bringing regular production workflows to a complete halt. This was particularly inconceivable as the benefits of adopting BITS were judged to be mostly long term rather than immediate. Released only in December 2013 the BITS was also still very young: While there was no doubt that the format is superior to the NCBI Book DTD, the prospect of further revisions to the Tag Suite cautioned against acting too quickly. Therefore adoption of BITS was conceived as a longer term project of small, incremental steps designed to neither disrupt the regular production cycle nor consume all resources. By the time version 2.0 of BITS was released in December 2015, Bookshelf had the ability to load, render, and index books tagged as per the BITS. A number of in-house XML converters were updated to output BITS and the first titles in BITS were released. While the majority of new content was still tagged as per the NCBI Book Tag Set v2.3, Bookshelf had now a solid foundation to complete adoption using the new version of BITS. By the end of 2016, all workflows had switched to BITS, including Bookshelf's word authoring program, external vendors providing BITS XML, and over 20 in-house XML converters. This paper describes Bookshelf's experience in adopting BITS: the challenges Bookshelf faced, the solutions it developed, and the lessons learned along the way. Special emphasis is placed on issues related to markup and XML conversion.

9:45-10:30

Beware of the laughing horse

Laurent Galichet, ISO

In 2011, ISO embarked on their XML journey. The base DTD chosen was JATS and customizations were made to be able to capture standards- metadata and content. This became known as the ISO STS (standards tag set). The first acid test was to convert ISO's legacy content from Word/PDF to ISO STS compliant XML. How ISO went about this task is the subject of this paper.

Aim: To convert over 30,000 (650,000 pages) standards (EN and FR) into XML in two years.

Method: An RFP was launched early 2011 for potential providers of XML conversion. Two providers were shortlisted and after site visits by the then project manager, the director of IT, the director of standards development and the Secretary-General of ISO, one provider was chosen.

Theory: The ccontract and pricing had already been agreed upon, this contained the set-up period of two months, with a view of mass conversion commencing January 1st 2012 and end December 2013.

Practice: A project manager was appointed 1st November 2011 to lead the project. The set-up period consisted of an iterative process of marking up the same set of content, reviewing and sending feedback to the providers to then re-markup and send back. Sources file were MS Word, cPDFs and scanned PDFs. The set-up period took 6 months.

Mass conversion: Batches were prepared for the mass conversion according to some criteria. It was agreed that batches should be composed of documents totaling about 625 pages. Also, in order for the team to get used to the structure of standards, short document (less than 20 pages) were first batched up. Equally, easier file formats were prioritized, so the initial batches were short MS Word documents. This made up 10000 documents more or less. Then larger documents followed and once the MS Word source files had all been batched up, cPDFs were sent and lastly, image PDFs. In retrospect, that was probably a mistake.

Results: Conversion ended February 2014, the project overran by two months. the budget was also all spent, including the 25% contingency amount. The quality obtained was very good and the XML sent was fed directly into ISO's online browsing platform, which is regularly used by many.

Conclusion:You can not just expect anyone to get it right without getting your hands dirty. The old adage of "you get out what you put in" is very much appropriate if you are considering a legacy conversion project.

10:30-11:00

Coffee Break

11:00-11:45

HTML First? Testing an alternative approach to producing JATS from arbitrary (unconstrained or "wild") .docx (WordML) format

Wendell Piez, Independent Consultant

XSweet, a toolkit under development by the Coko Foundation, takes a novel approach to data conversion from .docx (MS Word) data. Instead of trying to produce a correct and full-fledged representation of the source data in a canonical form such as JATS, XSweet attempts a less ambitious task: to produce a faithful rendering of a Word document's appearance (conceived of as a "typescript"), translated into a vernacular HTML/CSS. It is interesting what comes out from such a process, and what doesn't. And while the results are barely adequate for reviewing in your browser, they might be "good enough to improve" using other applications.

One such application would produce JATS. Indeed it might be easier to produce clean, descriptive JATS or BITS from such HTML, than to wrestle into shape whatever nominal JATS came back from a conversion processor that aimed to do more. This idea will be tested with a real-world example.

11:45-12:30

Presenting Texture: A Role-based, JATS-enforcing, Open Source WYSIWYG Editor

Alex Garnett, Simon Fraser University
Juan Pablo Alperin, Simon Fraser University
Michael Aufreiter, Substance Software GmbH

The primary goal of Texture is to provide a solution for publishers to bring accepted papers to production more efficiently. Texture serves as a text-editor-like application allowing to turn raw content into structured content iteratively, and adding as much semantic information as needed for the production of scientific publications. Texture is open source software and built with Substance, an advanced Javascript library for building web-based editor applications, which supports XML as a primary exchange format natively in the browser.

With JATS being the de-facto standard exchange-format for scholarly publishing, Texture reads and produces valid JATS files. This allows Texture to work seamlessly in existing publishing workflows. For instance Texture can take the output of a Word to JATS converter and enhance the content until it is ready to be published and can be consumed by existing output toolchains. Texture produces normalized JATS, applying a nearly loss-less conversion and following strict rules and best practices (JATS4R).

Texture can be customized for different user roles to provide different levels of complexity. For editors, the interface could be toggled to much more closely resemble something like Oxygen, allowing a user to pop out a raw attribute editor for any given element. Texture ships with a fairly complete set of document elements already implemented in the user interface, including support for tables, figures, citations, equations, and so on. Rendering of each of these elements is implemented on a modular basis, so that any supported JATS element can be added as a local customization and merged back into the upstream codebase. As-yet unimplemented elements can still be added or edited in raw XML as needed. The ultimate goal is to use Texture as an integral building block in modern and customised end-to-end publishing systems, where the document sits in the center (single-source) and is edited by all involved parties (author, editor, reviewer) in a collaborative way.

12:30-1:30

Lunch

1:30-3:00

JATS Open Session

3:00-3:30

Coffee Break

3:30-4:15

SWISS - The Semantic Web for Interoperable Specifications and Standards

Rupert Hopkins, XSB

SWISS - The Semantic Web for Interoperable Specifications and Standards - is a linked data model built on top of NISO STS XML. In SWISS, each concept in a document is given a unique address that can be linked to other documents and systems.

SWISS is motivated by the interoperability requirements of standards users. It is more about how to make it easier for engineers to accurately link concepts (instead of whole documents) from authoritative sources to concepts in their enterprise systems. And, to ensure that enterprise systems are be 'aware' of changes made by those authorities. The combination reduces engineering time to knowledge as well as the cost of configuration management.

In summary, this model based approach results in documents that 'understand' who they are connected to, 'why' they are connected, and the state of that connection. We call these smart, connected documents. You can think of this as the Internet of Things for the concepts inside a document.

This is a powerful concept for users of technical data derived from specifications and standards.

4:15-5:00

JATS and CrossRef

Chuck Koscher, CrossRef