Jeffrey Beck
National Center for Biotechnology Information, National Library of Medicine, US National Institutes of Health
beck@ncbi.nlm.nih.gov
Presented at JATS-Con, April 2, 2014
The Book Interchagnge Tag Suite (BITS) is a book model based on the JATS article model.
Version 1.0 was released at the end of 2013.
The schemas are available in DTD, XSD, and RELAX NG.
There is a complete Tag Library available: http://jats.nlm.nih.gov/extensions/bits/tag-library/1.0/
Aren't there already plenty of Book tagging models?
Designed for NCBI Bookshelf project
Was written as an extension to the NLM DTDs
Much in need of a re-examination
Not updated since NLM Version 3.0.
First meeting in person in the Summer of 2010 at NLM
The first task was to define the scope of the project
The scope for BITS is a single completed book or a complete book component such as a chapter, part, or module.
It should define both new and legacy book material, and be able to describe both book sets and series.
For documents where time is a factor such as continually updated books or editions that represent a work in multiple versions, the XML model will represent a snapshot of the book or book component at a single point in time.
Document versioning should be handled by the publication system and not by the document XML.
Atlases and Field Guides are not considered to be out-of-scope, but these may need additional semantic metadata to be represented completely.
STM textbooks also are not explicitly out of scope, but they typically have semantic tagging and processing requirements that are beyond what is available in BITS version 1.0.
BITS is a project of NCBI at the US National Library of Medicine (NLM). While BITS is an extension of NISO Z39.86 JATS, it is not a NISO Standard.
The firt design decision that the BITS Working Group made was that the Tag Set will be based on the most recent version of NISO JATS, including the multi-language capbilities of this structure.
This means if JATS has a named structure and that structure occurs in book content, then the JATS name (and to the extent possible, the JATS model) will be used.
This implies that the NISO JATS model will not be improved by the BITS working group. However, the BITS working group made many comments on NISO JATS version 1.0 that were included in JATS version 1.1d1.
Besides being JATS-based, the Working Group also made the following conclusions that informed details of BITS:
Any element with the same name in JATS and BITS has the same model**
** This is not 100% true. But it's OK. Trust me :)
Now that we know that the content area of a book chapter is similar to the content area of an article, we only have to worry about how all of the chapters fit together.
Books can have many different levels of content. Compare the Tables of Contents (TOCs) of a book divided into Parts with one divided simply into chapters.
From the reader's, writers, publisher's, and editor's perspective, this is not a problem at all. A book can have chapters or a book can have parts that have chapters, or a book can have sections that have parts that have chapters (that then, of course, have sections inside of chapters).
But this gives the XML modeler something to think about.
Do you create an explicit <Section> element that allows <Part>, and <Part> that allows <Chapter> (which gets us down to our basic unit)?
If you do this, then the root element (let's call it <Book>) will either need to allow <Section> and/or <Part> and/or <Chapter> or you will force anyone tagging a just-chapter book to tag an "empty" copy of each level until they get to the level that has any content.
The simpler strategy (from a model writing point of view) is to have one element (<book-part> in our case) that has a type attribute (@book-part-type) that describes the type (or level) of the book part.
In this way, you can make as many levels as needed, and if @book-part-type is not a controlled list (which it is not in BITS), then you can call them whatever you like.
A book represented by a single XML document does not need to be written in a single XML file.
This is a problem that can be solved with entities.
But this is not a great solution that has processing repurcussions.
In BITS 1.0, we added the element <xi:include> and <xi:fallback>, so that books and book parts can be managed as separate files and “included” as needed into a final document.
Just be sure you have an XInclude parser.
There are two new book- (and book-part-) level objects that have been added:
These will most likely be used to tag EXISTING Tables of Contents and Indices.
<index-term> is and element used in the document flow to tag terms for indexing.
It will most likely be used to CREATE indexes from content.
The BITS Working Group proposed a simple but flexible Question and Answer model to the JATS Standing Committee as a comment on JATS 1.0. It included new elements for <question>, <question-wrap>, <answer>, <answer-set>, and <explaination>.
It models only questions and answers and not quizzes or Continuing Medical Education exams, but the elements can be used to build these items.
The JATS Standing Committee did not add the Question and Answer model to JAT 1.1d1 because it was not yet a proven model and there was not a lot of call for it in Journals. The BITS Working Group added these elements to BITS 1.0 because of the need for them in book content. Details on questions and answer are available in the Tag Library.
BITS will continue to be supported by NCBI.
We expect more changes to come about as the model gets more use.
Please take it, play with it, tag stuff in it, break it, complain about it, and give it a good going-over.