◇◆
Modifying This Tag Set
This section contains implementor’s instructions for using this Tag Set, customizing
this Tag Set, or making derivative tag sets based on this one.
First Steps: Using This Tag Library
If you want to learn about this Tag Set in order to write a new tag set based on this
Tag Set or to modify the current Tag Set:
- Skim the first two chapters of this Tag Library, the Navigation and the Tag Library Structure of the Tag Library.
- Read the parameter entities that name the classes (in the module %default-classes.ent;).
- If you do not know the symbols used in the Hierarchy Diagrams, then read the “Key to the Near & Far® Diagrams”.
- Use the Hierarchy Diagrams to give you a good sense of the top-level elements and their contents.
- Pick an element from one of the diagrams. (Look up the element in the Elements Section to find the full name of the element, the definition, usage notes, content allowed, and attributes list. Look up one of the attributes to find its full name, usage notes, and potential values.)
- Read the Tag Set Modules. New Tag Sets are created by writing, at a minimum, a new
DTD module and new customization modules, so you might want to read the modules in
this order:
- The DTD module (BITS-book2-1.dtd or BITS-book-oasis2-1.dtd);
- The modules that names all the BITS-Tag-Set-specific customization and element modules (BITS-bookcustom-modules2-1.ent or BITS-book-oasis-custom-modules2-1.ent>.
- The module that names all the other modules in the Suite (%modules.ent;);
- The BITS customization modules (“BITS-bookcustom-classes2-1.ent”, “BITS-book-oasis-custom-classes2-1.ent”, “BITS-bookcustom-mixes2-1.ent”, and “BITS-bookcustom-models2-1.ent”).
You might also wish to familiarize yourself with the relationship between the “customization” modules and the “default” modules for classes, mixes, and models, so you might read the Suite class and mix modules next (%default-classes.ent; and %default-mixes.ent;);
- Any one of the many class-defining modules from the Suite (for example, the %list.ent; module).
Modular DTD Design
BITS has been written as a series of XML DTD modules that can be combined into a
number of different tag sets. The modules are separate physical files that, taken
together,
define all element structures (such as tables, math, chemistry, paragraphs, sections,
figures, footnotes, and reference elements), as well as attributes and entities in
the
Suite.
Modules in the Suite are primarily intended to group elements for maintenance. There
are
different kinds of modules. A module may:
- Be a building block for a base Tag Set (such as the module “Module to Name the JATS Modules”).
- Define the elements inside a particular structure. For example, the Bibliography References (Citation) Elements Module names all the potential components of bibliographic reference lists.
- Name the members of a “class” of elements, where class is a named grouping of elements that share a similar usage or potential location. For example, the Phrase-Level Content Elements Module defines small floating elements that may occur within text, such as inside a paragraph or a title, or that describe textual content, for example, a disease name, drug name, or the name of a discipline.
- Be a module of “editorial convenience”. For example, the module Common (Shared) Element Declarations Module holds elements and attributes used in the content models of the various class elements.
The major disadvantage of a modular system is the longer learning curve, since it
may
not be immediately obvious where within the system to find
a particular element or attribute cluster. To help with this, each element page includes
an
expanded content model.
There are many advantages to such a modular approach. The smaller units are written
once, maintained in one place, and used in many different tag sets. This makes it
much
easier to keep lower level structures consistent across document types, while allowing
for
any real differences that analysis identifies. A tag set for a new function (such
as an
authoring tag set) or a new publication type can be built quickly, since most of the
necessary components will already be defined in the Suite. Editorial and production
personnel can bring the experience gained on one tagging project directly to the next
with
very little loss or retraining. Customized software (including authoring, typesetting,
and
electronic display tools) can be written once, shared among projects, and modified
only for
real distinctions.
BITS and Linked Data
The primary purpose of BITS is to support book and book-part markup for production,
interchange, and archiving of the complete text of most books, with the associated
metadata necessary to support those activities. Therefore, BITS is not defined in
an RDF-enabled way. In addition, BITS is a descriptive rather than a prescriptive standard, so it does not mandate a way to make ontological or linked-data connections.
Nevertheless, the BITS/JATS Suite provides several tagging constructs that are useful
in making a BITS document as RDF-friendly as is practical in an application specifically
designed for full text document production:
- Every element in BITS and JATS has either an optional or a required attribute of type ID. These attributes were added to enable document creators to provide URIs at any level they choose in the document. These IDs can be used to make the document, or a portion of the document at any level of specificity, directly addressable.
- Every element in BITS and JATS can take an @xml:base attribute. This attribute provides a base URI for identifiers within the XML document. While this mechanism provides an inward-facing linkability rather than a pointer to an external ontology, @xml:base can be used to support link-bases into the XML and external semantic interpretations layered over the XML.
- There is also an easy mechanism to add RDF-a attributes (or any other attributes) to every BITS or JATS element. The BITS and JATS DTDs provide two parameter entities (%jats-common-atts; and %jats-common-atts-id-required;) that can be used to add any attributes a user may prefer to all of the elements in the Tag Suite (except those out of our control, such as MathML elements). These are the two parameter entities that give each BITS and JATS element an ID and an @xml:base. Among current BITS users, these parameter entities have been used to add RDF-a attributes to each element in a BITS document collection.
How To Make New Tag Sets
Parameter Entities Modules to Customize and Change
Parameter entities are the major mechanism for customizing this Tag Set or creating
a
new tag set from the modules in the Suite. Individual tag sets will be constructed
by (1)
establishing element and attribute combinations and content models using parameter
entities in one of the Tag-Set-specific customizing modules and (2) choosing appropriate
modules from the Suite that declare the elements needed. For example, if a base tag
set
contained 6 kinds of lists and 2 table models, a more specific tag set, such as an
authoring tag set, might use a Customize Classes Module to redefine the List Class
to name
only 3 lists and redefine the Display Class to allow only one table model.
The standard modules to create a customized tag set are: the DTD itself, a module
to
name its components, and as many override modules and new elements modules as necessary.
Typical modules for a new tag set are:
- DTD — The DTD module (.dtd) for the new tag set base DTD (At a minimum, this module declares the top-level element (such as book or book part wrapper) and any other structural elements unique to the new document type.)
- Tag-Set-specific Module of Modules — Module to name all the new modules created expressly for the new Tag Set
- Class overrides — Tag-Set-specific overrides of the Suite default element classes
- Mix overrides — Tag-Set-specific overrides of the Suite default class mixes
- Model overrides — Tag-Set-specific content model overrides for the content models in the modules of the Suite (using “-elements” and “-model” parameter entities)
- New Models — Tag-Set-specific new elements (For example, a new legislative Book Tag Set might add legislation-specific metadata elements.)
Element Classes Concept
Many of the elements in the BITS Book Tag Set with elements from the JATS Suite have
been grouped into loose
element classes. There is no hard and fast rule for what constitutes a class; each
one is
a design decision, a matter of judgment. These classes are designed to ease customization
to meet the particular needs of new tag sets. Base classes for the BITS DTD Suite
are
defined in a separate custom classes module (BITS-bookcustom-classes2-1.ent).
Base classes for the JATS DTD Suite are
defined in a separate default classes module
(%default-classes.ent;).
Content models are built using sequences of elements, and OR groups that are classes
(typically) or mixes. As an example, the content model for a Paragraph element is declared to be an OR group (that is, a
choice) of data characters and any of the elements named
in the Paragraph Elements mix. The mix %p-elements; is declared to be a large OR group of many other
element-defining classes: the block display class (Display Class No Alternatives Elements), the Mathematical Expressions Class Elements, the List Class Elements, the Citation Class Elements, etc.
Implementor’s Note: These element classes can be
viewed as building blocks that will be used to build larger parameter entities for
element
mixes. A mix describes a usage circumstance for a group of elements, such as all the
paragraph-level elements, all the elements allowed inside a table cell, all the elements
inside a paragraph, or all the inline elements. For example, to add another block
display
item to the Block Display Class Elements, you would make a new
%block-display.class; parameter entity (and probably also the Display Class No Alternatives Elements parameter entity) in your
Tag-Set-specific Class Override Module to override the BITS redefining (in BITS-bookcustom-classes2-1.ent) of the JATS default parameter entities (defined in the Suite’s Default Element Classes Module) You would also need to create a new module containing the Element Declaration for
the new block display item (typically a custom models module).
Parameter Entity Names for Classes and Mixes
PARAMETER ENTITY: SAME FUNCTION, SAME NAME — The
Suite modules and initial Tag Sets have used a series of parameter entity naming
conventions consistently. While parsing software cannot enforce these parameter entity
naming or usage conventions, these conventions can make it much easier for a person
to know how the content models work and what must be modified to make a Tag Set change.
CLASSES — Classes are functional groupings of
elements used together in an OR group. Each class is named with a parameter entity,
and all class parameter entity names end in the suffix “.class”:
<!ENTITY % list.class "def-list | list">
A class, by definition, should never be made empty; the class should be removed from
all models where you do not want the class elements included.
MIXES — Mixes are functional OR groups of classes;
mixes should never contain element names directly. All mixes must be declared after
all classes, since mixes are composed of classes. Mix names have no set suffix; for
example,
they may end in “-mix” or
“-elements”. Content models and content model overrides use mixes and classes for all OR groups.
Only content model sequences are made up of element
names directly.
MODEL OVERRIDES — Parameter entity mixes for overriding a
content model are of two styles: (1) inline mixes and (2) full content model
replacements. These two groupings have been defined and named separately to preserve
the
mixed-content or element-content nature of the models in Tag Sets derived from the
Suite.
The inline parameter entities to be intermingled with character data
(#PCDATA) in a mixed content model are named with a suffix
“-elements”. For example,
“%institution-elements;” would be used in the content model
for the element <institution>:
<!ENTITY % institution-elements "| %all-phrase; | %break.class;" > <!ELEMENT institution (#PCDATA %institution-elements;)* >
All inline mixes begin with an OR bar, so that the mix can be removed leaving just
character data (#PCDATA):
<!ENTITY % rendition-plus "| %all-phrase;" >
The override of a complete content model will be named with a suffix
“-model” and should include the entire content model,
including the enclosing parentheses:
<!ENTITY % kwd-group-model "(label?, title?, ((%kwd.class; | %x.class;)+ | %unstructured-kwd-group.class;))" > <!ELEMENT kwd-group %kwd-group-model; >
How To Build a New Custom Tag Set
The Concept
The basic idea for a new Tag Set is that all lower-level elements (paragraphs,
lists, figures, etc. and as much of the metadata as possible) will be defined in modules
— either the modules of the base Suite
or in new Tag-Set-specific modules rather than in the DTD itself. The new DTD will
be
fairly short and include only definitions of the topmost elements, at least the document
element and maybe its children.
Modules are declared using external parameter entities in the Suite’s Module to Name the JATS Modules or in the Tag-Set-specific Module of Modules.
Modules are referenced in the DTD proper, in the order
needed to define the parameter entities in sequence.
This BITS Book Tag Set was written as an example of the new best-practice
customization technique. A new variant tag set that follows this plan will probably
consist of the following modules:
- A DTD module to define the top-level elements (for example, BITS-book2-1.dtd);
- A Tag-Set-specific Module of Modules to name new non-Suite modules in this Tag Set (for example, BITS-bookcustom-modules2-1.ent);
- A Tag-Set-specific definition of element classes to add new classes and override the default classes (for example, BITS-bookcustom-classes2-1.ent);
- A Tag-Set-specific definition of element mixes to add new mixes and override the default mixes (for example, BITS-bookcustom-mixes2-1.ent);
- A Tag-Set-specific module of content model overrides (for example, BITS-bookcustom-models2-1.ent);
- Tag-Set-specific modules to hold new element declarations; and
- All or most of the modules in the Suite.
Making a Variant Tag Set
To show the process, here is a series of instructions for making a new tag set, illustrated
by showing how the BITS Book Tag Set was created from the modules of the whole Suite.
This list may not contain quite all the steps current BITS takes, so after you understand the principles, read the BITS
DTD and copy what you think you need.
- Modules — Write a new Tag-Set-specific Module of Modules, which defines all new customization modules the tag set needs. As an example, the BITS Book Tag Set created the module BITS-bookcustom-modules2-1.ent, which contains the definitions of the class-override module BITS-bookcustom-classes2-1.ent, the mix-override module BITS-bookcustom-mixes2-1.ent, the models-override module BITS-bookcustom-models2-1.ent, and the BITS specific modules such as BITS-book-part2-1.ent.
- Class overrides — Write a Tag-Set-specific class-override module, defining any overrides to the Suite classes. These classes are defined in the default classes module, JATS-default-classes1-3.ent. As an example, the BITS Book Tag Set created the module BITS-bookcustom-classes2-1.ent, in which %date.class; and %rest-of-para.class; were redefined. As an example, the BITS Book Tag Set created the module BITS-bookcustom-classes2-1.ent, in which parameter entities such as %emphasis.class; and %attrib.class; were redefined, and new BITS-specific parameter entities such as %content-version.class; were created.
- Mix overrides — Write a Tag-Set-specific mix-override module, defining any overrides to the Suite mixes. These mixes are defined in the default mixes module, JATS-default-mixes1-3.ent. As an example, the BITS Book Tag Set created the module BITS-bookcustom-mixes2-1.ent, in which mix overrides such as %emphasized-text; and %para-level; were declared.
- Model overrides — Create a Tag-Set-specific content-model-override module, defining any overrides to the content models and attribute lists for the Suite. As an example, the BITS Book Tag Set created the module BITS-bookcustom-models2-1.ent, in which element collections (suffixed “-elements”) that will be mixed with #PCDATA were redefined, full content models overrides (suffixed “-model”) were redefined, and some new attributes and attribute lists were added.
- New Elements — Write any new element modules needed. These will define any new block-level or phrase-level elements. For the BITS Book Tag Set, there were several such modules, for book-specific metadata, to define the models for book parts, and for structural indexes and Tables of Contents.
-
DTD Module — With those modules in place,
construct a new DTD module. Within that module:
- Use an external parameter entity Declaration to name and then call the Tag-Set-specific module of modules, for the BITS Book Tag Set, the module BITS-bookcustom-modules2-1.ent.
- Use an external parameter entity Declaration to name and then call the Suite Module of Modules, which names all the potential modules, for the full JATS, the module JATS-modules1-3.ent.
- Use an external parameter entity reference to call the Tag-Set-specific class overrides, for the BITS Book Tag Set, the module BITS-bookcustom-classes2-1.ent.
- Use an external parameter entity reference to call the Suite default classes, for the BITS Book Tag Set, the module JATS-default-classes1-3.ent.
- Use an external parameter entity reference to call the Tag-Set-specific mix overrides, for the BITS Book Tag Set, the module BITS-bookcustom-mixes2-1.ent.
- Use an external parameter entity reference to call the Suite default mixes, for the BITS Book Tag Set, the module JATS-default-mixes1-3.ent.
- Use an external parameter entity reference to call the Tag-Set-specific content models and attribute list overrides, for the BITS Book Tag Set, the module BITS-bookcustom-models2-1.ent.
- Use an external parameter entity reference to call in the standard Common Module (JATS-common1-3.ent) that defines elements and attributes so common they are used by many modules.
- Use an external parameter entity reference to call any new Tag-Set-specific module defining new block-level or phrase-level elements. For the BITS Book Tag Set, there are many such modules, for example, the module for book-specific metadata.
- Select, from the Module of Modules, those modules which contain the elements needed for your Tag Set (for instance, selecting lists and not selecting math elements) and call in each of the modules needed. The BITS Book Tag Set calls these in alphabetical order, since the order does not matter.
- Define the document element and any other unique elements and entities needed for this Tag Set. For example, the BITS Book Tag Set declares only a few elements including: <book> [the top-level element] and its potential components: <book-meta>, <front-matter>, <book-body>, and <book-back>.
Namespaces and MathML
When JATS was first designed, many software tools did not handle multiple redefinitions
of the same namespace cleanly and correctly. BITS follows JATS in how namespaces are
handled. The following namespace prefixes, namespace URIs, and xsmlns declarations are declared in the MathML DTD setup modules or in the BITS MathML 3.0
QName modules (as well as in the MathML 3.0 schema modules for XSD and RNG):
- XLink
- The XLink prefix is set to “xlink”.
- The XLink namespace URI is set to
http://www.w3.org/1999/xlink
- The XLink xmlns pseudo-attribute is set as follows, for use in attribute lists:
xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink'
- MathML
- The MathML namespace prefix is set to “mml”.
- The MathML namespace URI is set to
http://www.w3.org/1998/Math/MathML
- The MathML xmlns pseudo-attribute is set as follows, for use in attribute lists:
xmlns:mml CDATA #FIXED 'http://www.w3.org/1998/Math/MathML'
- W3C Schema Instance
- The W3C Schema namespace prefix is set to “xsi”.
- The W3C Schema namespace URI is set to “http://www.w3.org/2001/XMLSchema-instance”.
- The W3C schema xmlns pseudo-attribute is set as follows, for use in attribute lists:
xmlns:xsi CDATA #FIXED 'http://www.w3.org/2001/XMLSchema-instance'
The math and linking namespaces are defined inside the MathML modules, rather than
defined as part of the JATS modules, as the ali namespace (for example) is defined.
This has annoying subsetting implications. It means that if you do not include the
MathML setup modules and MathML modules in your tag set, you will not have the math
and linking namespaces defined, unless you define them yourself.
Thus, if you want to use the BITS modules to create a tag set that does not include
MathML, there are two options open to you:
- Include the MathML setup modules and MathML DTD modules and ignore them in your tagging and in your documentation; or
- Write your own namespace setup module that declares the namespaces mentioned above.
BITS Naming Conventions
Element and Attribute Naming Rules
- CASE — All XML names originating in this BITS Tag Set or in the JATS Suite used by this Tag Set are to be in lower case. Such names include element names, attribute names, parameter entity names, notation names, and IDs. The casing and interior punctuation of element-type names, attribute names, and parameter entity names inherited from PUBLIC models (such as the JATS XHTML-inspired table model, the MathML Tag Set, or the OASIS XML Exchange (CALS) table model) are unchanged. That means that they occur in the case in which they were found in the original module, and so may be in mixed case or upper case (e.g., %Flow.mix;).
- MULTI-WORD NAMES — When two or more words are concatenated into an element name, attribute name, or parameter entity name, a hyphen is placed between the words, for example, <verse-group> and <book-title>.
- WORD STANDARDIZATION — Abbreviations may be used when words are used in combination. When a word stands alone as a name, it is not abbreviated. Thus, the element <conference> uses the full word “conference”, but the conference combinations (such as <conf-theme> and <conf-num>) use the abbreviation “conf”. Abbreviations are standardized so that, for example, “figure” is always used as “fig” (as in the element <fig-group>) and “group” is not abbreviated (as in the elements <fig-group>, <kwd-group>, and <fn-group>).
- EMPHASIS — The typographic emphasis elements are usually spelled out in full, for example, <bold> (rather than “<b>”), <italic>, and <underline> instead of being differentiated using attribute values on an element such as <emphasis>. Superscript and subscript are not considered to be purely typographic, and each is a separate element: <sub> and <sup>.
The following table contains a growing list of abbreviations/names to be used in combined
tag names (element type names), attribute names, and parameter entity names. (Words
in the list below that are not abbreviated are displayed in italics.)
ORIGINAL WORD | ABBREVIATION |
---|---|
acknowledgment | ack |
address | addr |
affiliation | aff |
alternate/alternative-something | alt |
alternatives | alternatives |
answer | answer |
article | article |
attribution | attrib |
author | author |
biography | bio |
book | book |
chemical | chem |
communication | communication |
conference | conf |
contributor/contribution | contrib |
collection | collection |
corresponding | corresp |
count | count |
cross | x (no hyphen) |
definition | def |
description | desc |
display | disp |
division | div (as in <index-div>) |
editor | editor |
end | end |
equal | equal |
external | ext |
figure | fig |
first | f (no hyphen; <fpage>) |
footnote | fn |
formula | formula |
government | gov |
graphic | graphic |
group/grouping | group |
heading/header | head |
identifier/ID | id |
index | index |
inline | inline |
institution | institution |
item | item |
journal | journal |
keyword | kwd |
last | l (no hyphen; <lpage>) |
link | link |
list | list |
location | loc |
material | material |
metadata | meta |
navigation | nav (as in <nav-pointer>) |
number | num |
page | page |
paragraph | p |
part | part (as in <book-part>) |
pointer | pointer (as in <nav-pointer>) |
prefix | prefix |
proceedings | proceedings |
publication | pub |
publisher | publisher |
question | question |
quote | quote |
reference | ref |
related | related |
section | sec |
sequence/sequential | seq |
series | series |
size | size |
standard | std |
start | start |
statement | statement |
structure | struct |
subject | subj |
subscript | sub (rather than “inferior”) |
suffix | suffix |
superscript | sup (rather than “superior”) |
supplement | supplement |
supplementary | supplementary |
table | table |
table of contents | toc (as in <toc-div>) |
title | title |
translated/translator | trans |
type | type |
underline | underline |
version | version |
volume | vol |
word | word |
wrapper | wrap (Note that <book-part-wrapper> is an exception to this abbreviation rule, made because <book-part-wrapper> is a top-level element, not an ordinary container element like <table-wrap>.) |
File Naming Conventions
This Tag Library describes the components for BITS. BITS is distributed as a DTD,
an XSD schema, and a RELAX NG schema. For the DTD, the base DTD module (delivered
as the files BITS-book2-1.dtd and BITS-book-oasis2-1.dtd) calls in all the other DTD fragment modules as external parameter entities. Each
module specific to this Tag Set (therefore, not part of the JATS Suite) takes the
prefix “BITS”. The same prefix has been followed in the other two constraint languages/schemas.
Each DTD and DTD fragment module has been assigned a unique formal public identifier
(fpi). File names are never referenced directly in the DTD; the file is
referred to by the name of an external parameter entity, which names the fpi and a
system name for the file. The external parameter entity name has been set to the initial delivery filename, that is, with the version number not
part of the name.
The two BITS DTDs, the individual BITS and JATS DTD-fragment modules of this Tag Set
and the Suite, the XSD schema modules, and the RELAX NG schema modules have been given
DOS/Windows 3-digit suffixes indicating their type:
*.dtd
|
A module that can be used as the top level of an XML hierarchy. Used for the two
BITS top level modules (BITS-book2-1.dtd and BITS-book-oasis2-1.dtd) but also as the suffix for several public tag set modules that have been included
in this Tag Set such as the MathML Tag
Set and the JATS XHTML-inspired Table Module.
|
---|---|
*.ent
|
A DTD fragment for incorporation into a full DTD. May contain element
declarations, entity declarations, etc., for example, BITS-index2-1.ent.
|
*.mod
|
A DTD fragment for incorporation into a full DTD. May contain element declarations,
entity declarations, etc. This extension has the same meaning as *.ent and is only used to maintain the extension names dictated by the inclusion of PUBLIC
tag set and/or schema fragments, for example, mathml3-qname-1.mod.
|
*.xsd
|
A W3C XML Schema (XSD) schema module, for example,
BITS-book2-1.xsd and BITS-book-oasis2-1.xsd.
|
*.rng
|
A RELAX NG schema module, for example, BITS-book2-1.rng
and BITS-book-oasis2-1.rng.
|
All BITS and JATS modules reflect the full version number in the file name. For example,
a module that in the last version of the BITS Tag Set was BITS-book2-1.dtd is now named BITS-book2-1.dtd
(to indicate that it is part of the BITS 2-1 release). As an additional example, the
JATS module for list elements is currently named JATS-list1-3.ent (to indicate that it is part of the ANSI/NISO JATS 1.3 release). (Note: The parameter
entity that references the file is still called <list.ent;, but it now references
a file with the JATS- prefix and the version number embedded, namely JATS-list1-3.ent.)
The current plan is that, if there are future dot releases (BITS 2.2, 2.3, etc.),
these will be reflected in the filename (for example, BITS-index2-2.ent, JATS-para1-4.ent, etc.). Filenames will not be named with a single digit that does not change for
a dot release, as was done in earlier versions of the BITS and JATS tag sets, when
for example, the name for the JATS list module was JATS-list1.ent for JATS versions 1.0, 1.1, 1.2, 1.2d1, etc.