Tagging Keywords

◇◆

Keywords are words and phrases used to name an article’s or section’s key concepts for search and retrieval purposes. Typically an author, publisher, or indexing service will assign a small number of key terms to expand lookup beyond full text, to point up the most important topics described in an article, or to map an article to a taxonomy. Indexers assigning keywords can make sure that someone searching for “this topic” will find this article or section, even if the exact words are not present in the text. Thus keywords may be key words taken from the text, from an outside vocabulary or taxonomy, or selected by authors, indexers, or publishers.

In this Tag Set, keywords come in sets (<kwd-group>), each of which may come from a particular source or ontology (such as “author-created” or the “MeSH Subject Headings”). The sets are named using the @kwd-group-type attribute. Here are some sample tagged keywords that a contributor chose as best describing an article:

<kwd-group kwd-group-type="author-created">
  <kwd>acid precipitation</kwd>
  <kwd>acid rainfall</kwd>    
  <kwd>smelting region</kwd>
  <kwd>Aluminum residues</kwd>
  <kwd>Sulphur dioxide</kwd>
  <kwd>Copper-nickel smelters</kwd>
</kwd-group>

Tagging Complex/Compound Keywords

Keywords can possess an internal structure of their own; for example, a keyword may include both a textual phrase and its corresponding code (“863 Icelandic sagas”). Many styles of such compound keywords can be handled in this Tag Set with the <compound-kwd> element, which is modeled as a series of repeatable parts (<compound-kwd-part>). These parts can differentiate a text/code pair, divide a coded keyword into multiple code segments, describe a hierarchy, and name a variety of other compound structures. The @content-type attribute on the <compound-kwd-part> element is used to name each part, describe the role it plays, or otherwise define how each part functions within the keyword as a whole.

Keywords with Codes

The simplest case of a compound keyword is a keyword that includes both a textual phrase and its corresponding code, for example, “863 Icelandic sagas”. Both the code and the text can be tagged as keywords parts (<compound-kwd-part>) in the element <compound-kwd>, with the @content-type attribute used to name the role or type of each part:

<kwd-group kwd-group-type="ISO-463">
  <compound-kwd>
    <compound-kwd-part 
      content-type="ISO-463-code">863</compound-kwd-part>
    <compound-kwd-part
      content-type="ISO-463-text">Icelandic sagas</compound-kwd-part>
  </compound-kwd>
  ...
</kwd-group>

Abbreviation and Expansion Keywords

Compound keywords can also be used to handle keywords that hold an abbreviation and its expansion. Both the abbreviation and the expansion are tagged as <compound-kwd-part> in a single <compound-kwd>. The @kwd-group-type attribute on <kwd-group>, which is sometimes used to name the source or the descriptor for the keywords, can be used instead to name the type of information, such as “abbreviations”. For example:

<kwd-group kwd-group-type="abbreviations">
  <compound-kwd>
    <compound-kwd-part content-type="abbrev">WT</compound-kwd-part>
    <compound-kwd-part content-type="expansion">WildType</compound-kwd-part>
  </compound-kwd>
  <compound-kwd>
    <compound-kwd-part content-type="abbrev">CFU</compound-kwd-part>
    <compound-kwd-part content-type="expansion">Colony-forming unit</compound-kwd-part>
  </compound-kwd>
</kwd-group>

Tagging Nested or Hierarchical Keywords

Some publishers assign hierarchical topics to articles. For example, a publisher might tag selected topics (“Blood–brain barrier”), nested inside themes (“Cellular and Molecular Biology”), grouped into larger units like “Neuroscience”, and grouped into still larger units such as “Biological Sciences”, forming the following hierarchy:

Biological Sciences
Neuroscience
Cellular and Molecular Biology
Blood–brain barrier

This kind of structure places an article in context or sorts articles into categories. This is commonly seen in Tables of Contents, where all the Neuroscience articles are grouped together and all the Biochemistry articles are grouped, etc. Since keywords are intended to aid in searching and retrieval of articles rather than establishing an article’s context, Best practice is to tag this topic structure as subject groups:

<article-categories>
  <subj-group subj-group-type="classification"> 
    <subject>Biological Sciences</subject>
    <subj-group>
      <subject>Neuroscience</subject>
      <subj-group>
        <subject>Cellular and Molecular Biology</subject>  
        <subj-group>
          <subject>Blood&ndash;brain barrier</subject>
        </subj-group>
      </subj-group>
    </subj-group>
  </subj-group> 
</article-categories>

Hierarchical (nested) keywords structures are also possible. These, while still rare, are becoming more common as taxonomies are used to tag keywords for articles. Each <nested-kwd> contains a single keyword and any levels of nesting under that keyword. Since nested keywords are recursive, these lower levels are inside an inner <nested-kwd>. The same example is used below to show how nested keywords work:

<kwd-group kwd-group-type="author" xml:lang="en">
  <nested-kwd>   
    <kwd>Biological Sciences</kwd>
    <nested-kwd>
      <kwd>Neuroscience</kwd>
      <nested-kwd> 
        <kwd>Cellular and Molecular Biology</kwd>  
        <nested-kwd>
          <kwd>Blood&ndash;brain barrier</kwd> 
        </nested-kwd>
      </nested-kwd>
    </nested-kwd> 
  </nested-kwd>
</kwd-group>

Keywords from a Formal Taxonomy

If the content of a keyword or group of keywords contains terms from a thesaurus (ontology, taxonomy, term-list, vocabulary, industry glossary, thesaurus, or other known term source), the JATS vocabulary attributes can be used to name the vocabulary, providing additional semantics for the term. The keyword semantic source named should be named when it is a formal ontology and can be named when it is an informal field of study.

Two JATS vocabulary attributes name the controlled or uncontrolled vocabulary. They can be used on individual keywords (<kwd>, <compound-kwd>, <compound-kwd>) or on groups of keywords to refer to each of the keywords in the group (<kwd-group> and <unstructured-kwd-group>).

vocab	Name of the controlled or uncontrolled vocabulary, taxonomy, ontology, index, database, term list, thesaurus, industry glossary or similar that is the source of the keyword term. For example, a keyword might be taken from the IPC Codes (“ipc”), MESH headings (“mesh”), or NISO CRediT, Contributor Roles Taxonomy (“credit”). For an uncontrolled keyword term, the value might be an area of study such as “medical-devices” or merely the word “uncontrolled”.
vocab-identifier	Unique identifier of the vocabulary, such as (but not limited to) a URI or DOI. For example, for Dublin Core (DCC), the identifier might be “http://dublincore.org/documents/2012/06/14/dces/”.

vocab

Name of the controlled or uncontrolled vocabulary, taxonomy, ontology, index, database, term list, thesaurus, industry glossary or similar that is the source of the keyword term. For example, a keyword might be taken from the IPC Codes (“ipc”), MESH headings (“mesh”), or NISO CRediT, Contributor Roles Taxonomy (“credit”). For an uncontrolled keyword term, the value might be an area of study such as “medical-devices” or merely the word “uncontrolled”.

vocab-identifier

Unique identifier of the vocabulary, such as (but not limited to) a URI or DOI. For example, for Dublin Core (DCC), the identifier might be “http://dublincore.org/documents/2012/06/14/dces/”.

Two JATS vocabulary attributes name the term from a controlled or uncontrolled vocabulary. They are be used on individual keywords (<kwd>, <compound-kwd>, <compound-kwd>) to further identify the semantics of the term.

Where the @vocab and @vocab-identifier attributes described above identify the vocabulary (ontology, taxonomy, etc.) of a keyword, these two vocabulary attributes identify the individual term from such a controlled or an uncontrolled vocabulary:

vocab-term	The content of the element is the free prose version of the vocabulary or taxonomic term. The @vocab-term attribute holds the canonical version of the same term, as it appears in the vocabulary. For example, if the @vocab-term attribute value was “digitized-vor”, the element might contain a display text such as “Digitized Version of Record”.
vocab-term-identifier	The unique identifier of the term within a specific vocabulary, such as (but not limited to) an item number, a URI, DOI, etc.

String Keywords

Although it is not considered Best practice, in some tag sets a single keyword element has been used to hold an entire list of keywords, with connecting punctuation used to mark the boundary between individual keywords rather than distinguishing individual keywords with markup. To untangle this single list into individual keywords might require publisher-specific string parsing or even human judgment, so some receiving archives have chosen not to break the list into single keywords. The element <unstructured-kwd-group> has been added as an alternative to <kwd> inside <kwd-group> to handle these lists. The element is named <unstructured-kwd-group> to indicate that it is a grouping of keywords and not a single keyword. Such lists can then be tagged as shown in the following examples:

<kwd-group>
  <unstructured-kwd-group>XML, DTD, schema, RELAX NG, XSD, models, 
   UML, Schematron</unstructured-kwd-group>
</kwd-group>

<kwd-group>
  <unstructured-kwd-group>molecular chaperones; surface plasmon resonance; 
   dynamic light scattering; trypsin digestion; citrate synthase;
   <italic>Neurospora crassa</italic>; prot&eacute;ines chaperonnes, 
   r&eacute;sonance des plasmons de surface; diffusion dynamique 
   de la lumi&egrave;re; digestion par la trypsine; citrate synthase;
   <italic>Neurospora crassa</italic></unstructured-kwd-group>
</kwd-group>

Journal Archiving and Interchange Tag Library NISO JATS Version 1.3 (ANSI/NISO Z39.96-2021)

Tagging Keywords

Vocabulary Naming Attributes

Keyword Term Naming Attributes