The Authoring Tag Set comprises a handful of Tag-Set-specific modules that set up parameter entity overrides and uses (by reference) the base modules of the full Journal Archiving and Interchange Suite. The modules of that Suite were developed as part of an effort to create XML applications through which materials on health-related disciplines could be shared and reused electronically. Although the full Suite was developed to support electronic production, the structures should be adequate to support some print production as well. The Suite has been used to construct many tag sets in addition to this one.
Because this is a Authoring Tag Set, thus optimized for creating new content, the Tag Set is far smaller (fewer elements, and fewer choices in many contexts) than either the Archiving or the Publishing Tag Sets. Where, in the Archiving Tag Set, there may have been several ways to express the same information, the goal was to allow only one way in this Authoring Tag Set. It was not the intention to limit the expressive power licensed by this Tag Set, but rather to limit the meaningless choices that a full interchange Tag Set needs to make to accommodate conversion from as wide a variety of formats as possible. The philosophy for the Archiving Tag Set was to accept as many varied forms of many structures as possible unchanged. The philosophy for the Publishing Tag Set was to accept a wide variety of structures and to regularize those that matter to the archive. The philosophy of this Authoring Tag Set is to prefer a single structural form, or at least a single style of tagging, whenever possible. Similarly the Archiving and Publishing Tag Sets allow for formatting such as list numbering and citation references to be preserved. This Tag Set assumes that such objects will need to be generated as part of production.
The Authoring Tag Set has been written as a set of DTD “modules” that make use of the modules of the Archiving and Interchange Suite. Each module is a separate physical file, no module is an entire DTD by itself, and modules can be combined into a number of different tag sets. The modules are separate physical files that, taken together, define all element structures (such as tables, math, chemistry, paragraphs, sections, figures, footnotes, and reference elements), as well as attributes and entities in the Suite. The module files are primarily intended for ease of constructing new tag sets and ease of maintenance.
Modules in the Suite are primarily intended to group elements for maintenance. There are different kinds of modules. A module may:
The major disadvantage of a modular system is the longer learning curve, since it may not be immediately obvious where within the system to find a particular element or attribute cluster. To help with this, each element page includes an expanded content model and also names the module in which that element is defined.
There are many advantages to such a modular approach. The smaller units are written once, maintained in one place, and used in many different tag sets. This makes it much easier to keep lower level structures consistent across document types, while allowing for any real differences that analysis identifies. A tag set for a new function (such as a Repository Tag Set) or a new publication type can be built quickly, since most of the necessary components will already be defined in the Suite. Editorial and production personnel can bring the experience gained on one tagging project directly to the next with very little loss or retraining. Customized software (including authoring, typesetting, and electronic display tools) can be written once, shared among projects, and modified only for real distinctions.
If you want to learn about the Tag Set in order to write a new Tag Set based on this Tag Set or to modify this Tag Set:
Many of the elements in the Authoring Tag Set have been grouped into loose element classes. There is no hard and fast rule for what constitutes a class; each one is a design decision, a matter of judgment. These classes are designed to ease customization to meet the particular needs of new tag sets. Base classes for the Archiving and Interchange Suite are defined in a separate Default Element Classes Module (%default-classes.ent;).
Content models are built using sequences of elements by element name, but OR groups typically do not use element names, ORs offer choices of classes (the usual) or mixes. As an example, the content model for a Paragraph element is declared to be an OR group (that is, a choice) of data characters and any of the elements named in the Paragraph Elements mix (%p-elements;). The mix %p-elements; is declared to be a large OR group of many other element-defining classes: the Block Display Class Elements, the Mathematical Expressions Class Elements, the List Class Elements, the Citation Class Elements, et al.
Implementor’s Note: Element classes can be viewed as building blocks used to build larger parameter entities for element mixes. A mix describes a usage circumstance for a group of elements, such as all the paragraph-level elements, all the elements allowed inside a table cell, all the elements inside a paragraph, or all the inline elements. For example, to add another block display item to the Block Display Class Elements, you would edit the %block-display.class; parameter entity in your Tag-Set-specific Article Authoring Class override Module to override the default parameter entity that is defined in the Suite’s Default Element Classes Module module and create a new module containing the Element Declaration of the new block display item.
The classes described here — with a few exceptions noted below — are defined in the Archiving and Interchange Suite Default Element Classes Module (%default-classes.ent;) and have been used to divide the elements into physical modules. The documentation for the classes and their current default element contents are listed in the parameter entity Section toward the end of this Tag Library. In the parameter entity Section, the names of the elements in a group or class are listed within quotation marks, separated by vertical bars. For example, Phrase Class will be listed as “ %phrase.class; ” and shown to contain:
"abbrev | named-content"
which means that the two elements <abbrev> and <named-content> are defined as Phrase Class Elements.
Accessibility Class |
(%access.class;) Elements added to make the processing of journal articles more accessible to people with special needs and the devices that meet those needs, for example, the visually handicapped. Includes, for example, the element <alt-text> which is a short phrase name or description of an object, usually a graphical object, that can be used “behind the picture” on a website or pronounced in an audio system. |
Address Class |
(%address.class;) Potential element components of an <address>, such as <country> or <fax> |
Appearance Class |
(%appearance.class;) Formatting elements used primarily in tables, for example, a horizontal rule (usage discouraged) |
Appendix Class |
(%app.class;) A construct containing only the appendix for use in the back matter of an article |
Break Class |
(%break.class;) Formatting element used to force a line break, primarily in tables and titles (usage discouraged) |
Citation Class |
(%citation.class;) Reference (a citation) to an external document as used within, for example, the text of a paragraph |
Contributor Information Class |
(%contrib-info.class;) Metadata about a contributor |
Definition Class |
(%def.class;) Definitions (<def>) and other elements to match with terms and abbreviations |
Degree Class |
(%degree.class;) The academic or professional degrees that accompany a person’s name |
Display Class |
(Several parameter entities: %caption.class;, %block-display.class;, %display-back-matter.class;, %inline-display.class;, %simple-display.class;, %simple-intable-display.class;) Graphical or other display-related elements, including figures, chemical formulas, and images [parameter entities %block-display.class;, %inline-display.class;, and %simple-display.class; defined in the %articleauthcustom-classes.ent; module] |
Emphasis Class |
(%emphasis.class;, %subsup.class;) Used to produce rendering/typographical distinctions, such as superscript, subscript, or bold text [parameter entity %emphasis.class; defined in the %articleauthcustom-classes.ent; module] |
Identifier Class |
(%id.class;) DOIs and other identifiers used by publishers at many levels, for example, for an <abstract> or a <fig> |
Keyword Class |
(%kwd.class;) Keywords and other elements which name a subject term, critical expression, key phrase, etc. associated with an entire document and used for identification and indexing purposes |
Link Class |
(Several parameter entities: %address-link.class;, %article-link.class;, %simple-link.class;, %fn-link.class;) Elements that associate one location with another, including cross references, and URIs for links to the World Wide Web |
List Class |
(%list.class;) The types of lists used in text, including numbered lists and bulleted lists |
Math Class |
(Several parameter entities: %math.class;, %block-math.class;, %inline-math.class;) The mathematical element (<mml:math>) and the elements that can contain them (such as <inline-formula> and <disp-formula>) [parameter entity %math.class; defined in the %articleauthcustom-classes.ent; module] |
Name Class |
(%name.class;) The various types of names (such as <collab>) for people who produce products or articles [Defined in the %articleauthcustom-classes.ent; module] |
Paragraph Class |
(Several parameter entities: %just-para.class;, %rest-of-para.class;, %intable-para.class;) Information for the reader that is at the same structural level as a paragraph, including both regular paragraphs and specially-named paragraphs that may have distinctive uses or different displays, such as dialogs and formal statements [parameter entities %rest-of-para.class; and %intable-para.class; defined in the %articleauthcustom-classes.ent; module] |
Personal Name Class |
(%person-name.class;) The element components of a person’s name (such as <surname>),which can be used, for example, inside the name of a contributor |
Phrase Class |
(%phrase.class;) Inline elements that surround a word or phrase in text because the subject (content) should be identified to support some kind of display, searching, or processing (such as <abbrev> to identify an abbreviation). |
Reference Class |
(%references.class;) The elements that may be included inside a <mixed-citation> (bibliographic reference) [Defined in the %articleauthcustom-classes.ent; module] |
Reference List Class |
(%ref-list.class;) A construct containing only the reference list (defined in References Module) for use in the back matter of an article |
Section Class |
(%sec.class;) The elements that are at the same hierarchical level as a section |
Table Class |
(Several parameter entities: %table.class;, %just-table.class;, and %table-foot.class;) Elements that contain the rows and columns inside the Table Wrapper element (<table-wrap>). The following XHTML Tables Module elements can be set up for inclusion: <table>. |
Parameter entities are the major mechanism for customizing a tag set or creating a new tag set from the modules in the full Suite. Individual tag sets will be constructed by 1) establishing element and attribute combinations and content models using parameter entities in one of the Tag-Set-specific customizing modules and 2) choosing appropriate modules from the Suite that declare the elements needed. For example, if the base tag set contained 6 kinds of lists and 2 table models, a more specific tag set might use a Customize Classes Module to redefine the List Class to name only 3 lists and redefine the Display Class to allow only one table model.
The standard modules to create a customized tag set are: 1) the DTD itself, 2) a module to name its component modules, and 3) as many override modules (class, mix, and/or model) and new elements modules as necessary. Thus, typical modules for a new Tag Set are:
PARAMETER ENTITY: SAME FUNCTION, SAME NAME — The Suite modules and initial DTDs have used a series of parameter entity naming conventions consistently. While parsing software cannot enforce these parameter entity naming or usage conventions, these conventions can make it much easier for a person to know how the content models work and what must be modified to make a Tag Set change.
CLASSES — Classes are functional groupings of elements used together in an OR group. Each class is named with a parameter entity, and all class parameter entity names end in the suffix “ .class ”:
<!ENTITY % list.class "def-list | list">
A class, by definition, should never be made empty; the class should be removed from all models where you do not want the class elements included.
MIXES — Mixes are functional OR groups of classes; mixes should never contain element names directly. All mixes must be declared after all classes, since mixes are composed of classes. Mix names have no set suffix; for example, they may end in “ -mix ” or “ -elements ”. Content models and content model overrides use mixes and classes for all OR groups. Only content model sequences are made up of element names directly.
MODEL OVERRIDES — parameter entity mixes for overriding a content model are of two styles: 1) inline mixes and 2) full content model replacements. These two groupings have been defined and named separately to preserve the mixed-content or element- content nature of the models in DTDs derived from the Suite.
The inline parameter entities to be intermingled with character data (#PCDATA) in a mixed content model are named with a suffix “ -elements ”. For example, “ %institution-elements; ” would be used in the content model for the element <institution>:
<!ENTITY % institution-elements "| %subsup.class;" > <!ELEMENT institution (#PCDATA %institution-elements;)* >
All inline mixes begin with an OR bar, so that the mix can be removed leaving just character data (#PCDATA):
<!ENTITY % rendition-plus "| %emphasis.class; | %subsup.class;" >
The override of a complete content model will be named with a suffix “ -model ” and should include the entire content model, including the enclosing parentheses:
<!ENTITY % kwd-group-model "(title?, (%kwd.class;)+ )" > <!ELEMENT kwd-group %kwd-group-model; >
The basic idea for a new tag set is that all lower-level elements (paragraphs, lists, figures, etc.) will be defined in modules — either the modules of the base Suite or in new tag-set-specific modules rather than in the DTD itself. The new DTD will be fairly short and include only definitions of the topmost elements, at least the document element and maybe its children.
Modules are declared using external parameter entities in the Suite’s Module to Name the Modules or in the tag-set-specific Module of Modules. Modules are referenced in the DTD proper, in the order needed to define the parameter entities in sequence.
This Authoring Tag Set was written as an example of the new best-practice customization technique. A new variant Tag Set that follows this plan will probably consist of the following modules:
To show the process, here is a series of instructions for making a new Tag Set, illustrated by showing how the Authoring Tag Set was created from the modules of the whole Suite.