Multiple Languages/scripts

The JATS community is becoming increasingly aware of the need to support multi-language journal articles. Multi-lingual content is much more than block quotes in a second language. Uses include more than one original language, an original language and one or more translations, transcriptions, and more.
This section describes the JATS multi-lingual mechanism, a largely attribute-based encoding solution designed to handle documents which are written in more than one language, where the JATS user wishes to record the relationship between the language variances. The JATS mechanism covers:
Rationale for JATS Multi-lingual Mechanism
When substantial portions of document content are in more than one language, the same structures are often repeated, once in each language. These equivalent structures (the same content, differing only by language) need not be co-located in the document, and the content in a single language need not be contiguous. As an example, alternate sections could repeat the same content in French and English. As another, a paragraph could be repeated, first in Greek, then in Romanian, and then in Italian. The same figure or table could be presented in Spanish, English, and Portuguese.
Therefore, in a multi-lingual document, when there are two or more same-content objects (sections, figures, boxed-texts, tables, etc.), differing only in language, few assumptions can be made about the locations and interrelationships between these “same-content” objects. Portions in a single language need not be contiguous. The objects with the same content need not located anywhere near each other in the document. For these reasons, the JATS multi-lingual mechanism cannot simply enclose (wrap up) all the same-content objects. This wrapper-approach is the mechanism of the current JATS element <block-alternatives>, which can hold multiple copies of a block object such as a figure in multiple languages. For true multi-lingualism, a wrapper-style mechanism is not sufficient.
It was also the hope of the JATS Standing Committee that the JATS multi-lingual mechanism not be bulky and intrusive. Ideally, any mechanism should enable users to create true multi-lingual documents while not requiring changes to the tagging of mono-lingual documents. Ideally, any multi-lingual mechanisms should be completely ignorable by creators/users of mono-lingual documents.
Attribute @lang-group (JATS Multi-lingual Mechanism)
In the JATS multi-lingual mechanism, same-content structures in different languages can be flagged as belonging to the same “language group”. The phrase “language group” is not a term of art; we use it to mean the collection of objects that are the same in content and vary only in language. JATS calls alternate language versions of the same content “variants”, and they are collected into a “language group” by the values of the @lang-group attribute. The members of a language group need not be contiguous. That is, they may appear next to each other, but they may also appear in different places within a document.
How @lang-group builds language groups:
  • The value of the @lang-group attribute is an IDREF, and the attribute is used to tie the members of a language group together. The value of the @lang-group must be the same for all members of a language group to support processing. The variant content objects in the @lang-group are bound together only by the IDREF.
  • The value of the @lang-group must be the @id attribute value of one of the variant objects in the group. (It does not matter which, and there is no significance to the selection of which @id is used.)
Desirable functionality for language groups in an online environment might include: allowing the user to choose whether to see a particular language version or all the variants, and allowing a user to find an article in a search using a filter specifying the language(s). For example, in an article in both English and Spanish, a user could opt to see only the Spanish, only the English, or both. This would be a function of the display application supported by the JATS markup.
Alternative Spanish and English Variants of a Figure
This example shows two same-content figures differing only in language: one figure is the original (in Spanish) and one figure is a translation (in English). The figure element has been repeated, placed into an attribute-named language group using the @lang-group attribute. One variant has been marked as the original and the other as a translation using the @lang-variant attribute. One variant has been marked as primary and the other as secondary using the @lang-focus attribute. The figures need not be co-located in the article, and may be presented in any order.
...
<fig id="f0001" lang-group="f0001"
  position="float" fig-type="scatter-graph" 
  xml:lang="en" lang-variant="translation" 
  lang-source="translator" lang-focus="secondary">
 <label>Figure 1.</label>
 <caption><p>Evolution of the repetition rate in all sequences</p></caption>
 <graphic xlink:href="RIYA_A_1889289_F0001_OC-en.jpg" 
    content-type="color" specific-use="web-only"/>
</fig>

<fig id="f0005" lang-group="f0001"  
  position="float" fig-type="scatter-graph" 
  xml:lang="es" lang-variant="original" 
  lang-source="author" lang-focus="primary">
 <label>Figura 1.</label>
 <caption><p>Evolución de la tasa de repetición en  
  todas las secuencias</p></caption>
  <graphic xlink:href="RIYA_A_1889289_F0005_OC-es.jpg" 
    content-type="color" specific-use="web-only"/> 
</fig>
...
Alternative Spanish and English Variants of a Table
This example shows two same-content tables differing only in language: one table is the original (in Spanish) and one table is a translation (in English). The table element has been repeated, placed into an attribute-named language group using the @lang-group attribute. One variant has been marked as the original and the other as a translation using the @lang-variant attribute. One variant has been marked as primary and the other as secondary using the @lang-focus attribute. The tables need not be co-located in the article, and may be presented in any order.
...
<table-wrap id="t0001" lang-group="t0001"  
  position="float" orientation="portrait" 
  xml:lang="es" lang-variant="original" 
  lang-source="author" lang-focus="primary">
 <label>Tabla 1.</label>
 <caption><p>Estadísticos descriptivos por edad</p></caption>
 <table>...</table>
</table-wrap>

<table-wrap id="t0006" lang-group="t0001"  
  position="float" orientation="portrait" 
  xml:lang="en"  lang-variant="translation" 
  lang-source="translator" lang-focus="secondary">
 <label>Table 1.</label>
 <caption><p>Descriptive statistics by age</p></caption>
 <table>...</table>
</table-wrap>
...
Alternative Spanish and English Variants of a Section
This example shows two same-content sections differing only in language: one section is the original (in Spanish) and one section is a translation (in English). The section element has been repeated, placed into an attribute-named language group using the @lang-group attribute. One variant has been marked as the original and the other as a translation using the @lang-variant attribute. One variant has been marked as primary and the other as secondary using the @lang-focus attribute. The sections need not be co-located in the article, and may be presented in any order.
...
<sec id="s0005" xml:lang="en" lang-group="s0005"
  lang-variant="translation" lang-source="translator"
  lang-focus="secondary">
 <title>Method</title>
 ...
</sec>

<sec id="s0011">...</sec>

<sec id="s0013" xml:lang="es" lang-group="s0005"
  lang-variant="original" lang-source="author"
  lang-focus="primary">
 <title>Método</title>
 ...
</sec>
...
Overview of Multi-lingual Attributes/Elements
The JATS Multi-lingual mechanism was created by adding multi-lingual attributes, creating a new multi-language metadata element, and changing existing JATS element models, usually to make structures repeatable to allow for different language variants.
JATS Multi-lingual Mechanism Attributes/Element
Attribute Meaning and Typical Values
@lang-grouping Placed on the <processing-meta> element as a flag to indicate that this document uses the @lang-group attribute and associated multi-language attributes to group and describe multiple language content.
@lang-group Placed on two or more elements to indicate that these elements are part of the same language group (that is, they represent the same content in different languages). The heart of the JATS multi-lingual mechanism.
@lang-focus How members of a language group are related to each other, for example, one of the languages might be the primary language and the other language(s) secondary. (Note: If the values provided in the @lang-focus value list do not include a needed value, select “custom” for the @lang-focus value (“@lang-focus='custom'”) and put the desired value into new attribute @lang-focus-custom.)
@lang-source What was the role of the person/entity who created this language variant? For example: author, translator, a machine translation. (Note: If the values provided in the @lang-source value list do not include a needed value, select “custom” for the @lang-source value (“@lang-source='custom'”) and put the desired value into the new attribute @lang-source-custom.)
@lang-variant Names the type of language variant for a member of a language group, for example, a translation or an original. (Note: If the values provided in the @lang-variant value list do not include a needed value, select “custom” for the @lang-variant value (“@lang-variant='custom'”) and put the desired value into the new attribute @lang-variant-custom.)
@lang-translate Should the content of this element be translated? Possibilities are yes or no.
@xml:lang The language of the content of the element for which this is an attribute. The value of this attribute must conform to IETF RFC 5646.
NOT @language Not part of the multi-lingual mechanism. The @language attribute names a programming or scripting language in which code is written, e.g., “javascript”. (In contrast to the @xml:lang attribute, which names the language of an element.)
NOT @language-version Not part of the multi-lingual mechanism. The attribute @language-version names the version of the programming or scripting language in which code is written, e.g. “3.0”, for code written in “JavaScript 3.0”.
NOT @hreflang Not part of the multi-lingual mechanism. The attribute @hreflang names the language of the target to which an external link is pointing. A processor following the link would expect to find a document in the language named.
JATS Multi-lingual Element
<content-language> Identifies one language used in this document, by containing a two-letter ISO 639 code. The element should be repeated for every primary language in the document.
Language Grouping Attribute (@lang-grouping) Flags Use of the Mechanism
As part of the processing metadata (<processing-meta>), the Language Grouping Flag attribute (@lang-grouping) is a flag indicating that this document uses the @lang-group attribute and associated multi-language attributes to group and describe multiple language content.
The Language Grouping Flag attributes has two possible values:
Value Meaning
yes This document is in more than one language and uses the @lang-group attribute mechanism to associate related material in different languages.
no This document does not mark multi-lingual content using the @lang-group attribute.
Using @lang-grouping to flag two figures as the same logical figure, the original in Spanish and an English translation:
<article dtd-version="1.4d1" xml:lang="en"   >
<processing-meta lang-grouping="yes"/>
<front>...
<article-meta>...
<content-language>es</content-language>
<content-language>en</content-language>...
</article-meta>
</front>
<body>
...
<fig id="f0001" lang-group="f0001"  
  xml:lang="es" lang-variant="original" 
  lang-source="author" lang-focus="primary"
  position="float" fig-type="scatter-graph" >
 <label>Figura 1.</label>
 <caption><p>Evolución de la tasa de repetición en  
  todas las secuencias</p></caption>
...
</fig>

<fig id="f0005" lang-group="f0001"  
  xml:lang="en" lang-variant="translation" 
  lang-source="translator" lang-focus="secondary"
  position="float" fig-type="scatter-graph" >
 <label>Figure 1.</label>
 <caption><p>Evolution of the repetition rate in all sequences</p></caption>
 ...
</fig>
...
</body>
</article>
Language Variant Attribute (@lang-variant)
The Language Variant attribute (@lang-variant) names the relationship between the alternate language variants of the same content. For example, a section in French could be the “original” language variant and the same section translated into English could be a “translation” variant.
Multi-lingual documents can be complex. There may, for example, be two “original” language sections with the same content, one original and one translation, or an original and an interpretation. For example, in Canada (and elsewhere) both the English and the French same-language constructs could be “original” material from the author. Other possible variant relationships include: “transliteration” and “transcription”.
Using @lang-variant
While it is always possible to create a language grouping (using @lang-group) to record alternative languages, some JATS elements, particularly in the metadata, are repeatable. The @lang-variant attribute can be used on such elements even if no language groupings are created.
Since repeatable metadata elements already take @xml:lang to name their content language, providing multiple language versions merely means repeating the element and adding the @lang-variant and possibly the @lang-source attributes. So two <title-group> elements, or two <abstract>s, or two of any repeatable metadata element, could vary only in language.
For example, here are Spanish and English alternative language versions of the article abstract inside <article-meta>:
<article-meta>
 ...
 <abstract xml:lang="es" 
    lang-variant="original" lang-source="author">
  <title>RESUMEN</title>
  <p>La repetición verbal espontánea forma parte de la
   interacción temprana adulto–niño, estando enmarcadas 
   en el seno de conversaciones. ...</p>
 </abstract>

 <abstract xml:lang ="en" 
    lang-variant="translation" lang-source="translator">
  <title>ABSTRACT</title>
  <p>Spontaneous verbal repetition is part of early adult–child 
   conversational interchanges. ...</p>
 </abstract>
 ...
</article-meta>
@lang-variant Value List
In the table below, note that although this is a restricted value list, both “custom” (which requires @lang-variant-custom to provide a value) and “unknown” are included among the values.
@lang-variant Value List
Value Meaning
original A passage in its original language (from the author)
translation A translation of a passage into another language
interpretation A rewording of a passage into another language
transcription A representation of spoken language in a written form
transliteration A mapping from one system of writing into another
phonetic A representation of speech sounds using phonetic symbols
spoken A passage spoken aloud
custom Any variant not on this list or any combination of the above values
unknown The language relationship is unknown. This is not the same as omitting the @lang-variant attribute; this is an application statement “we do not know”.
@lang-variant-custom
JATS contains many attribute value lists that include the value “custom” and the attribute @custom-type to provide an escape hatch and allow any attribute value to be given. This mechanism, using “@lang-variant='custom'” and the value of the @lang-variant-custom attribute, can provide any value not on the list above for @lang-variant.
Language Source Attribute (@lang-source)
The Language Variant Source (@lang-source) attribute names the source (origin) of a language variant. The intended meaning for “source” is “who created this variant?” Possible answers include: the author, a translator, or a machine translation by computer algorithm.
Here are two abstracts with different language sources:
...
 <abstract xml:lang="es" 
    lang-variant="original" lang-source="author">
  <title>RESUMEN</title>
  <p>...</p>
 </abstract>

 <abstract xml:lang ="en" 
    lang-variant="translation" lang-source="translator">
  <title>ABSTRACT</title>
  <p>...</p>
 </abstract>
   ...
@lang-source Value List
Value Meaning
author An author or authors created this variant
editor An editor or editors created this variant
translator A human translator or translators created this variant
machine Computer processing created this language variant, for example, a machine translation
custom The creator of this variant is not any of the listed values. The @lang-source-custom attribute should be used to specify the relationship.
@lang-source-custom
JATS contains many attribute value lists that include the value “custom” and the attribute @custom-type to provide an escape hatch and allow any attribute value to be given. This mechanism, using “@lang-source='custom'” and the value of the @lang-source-custom attribute, can provide any value not on the list for @lang-source.
Language Focus Attribute (@lang-focus)
The @lang-focus attribute indicates how members of a language group are related to each other, which may be used as a hint for how multiple language variants might be displayed. The manner in which the members of a language group are related is application specific, perhaps driven by reader preference. However, the @lang-focus attribute can be used to provide hints in markup about the author’s’s intention for how the members of a language group should be considered and displayed.
...
<p id="para011" lang-group="para011" xml:lang="la" 
   lang-variant="original" lang-focus="primary"
   lang-translate="no" 
>Si hortum in bibliotheca habes, nihil deerit.</p>

<p id="para011-b" lang-group="para011" xml:lang="en" 
   lang-variant="interpretation" lang-focus="secondary"
   >Literally "if you have a garden in a library, nothing will be lacking", usually paraphrased as:
 "If you have a garden and a library, you have everything you need."</p>
...
@lang-focus Value List
Value Meaning
primary The text has a more central focus than other language variants in the group. In display, such text is typically made more prominent or is the only focus displayed.
secondary The text is not the primary textual focus in the language group. In display, such text is typically less prominent.
custom The language relationship is not any of the specific listed values. The @lang-focus-custom attribute should be used to specify the relationship.
undefined No recommendation is made concerning the relative focus of the language variants. No display or relevance relationship is indicated, thus all variants are intended to be displayed in the same way. In display, all alternative language variants are typically displayed in document order. (Note: This is the default behavior (not an XML-grammatically-mandated default), which may be assumed if the @lang-focus attribute is not provided with a specific listed value.)
@lang-focus-custom Attribute
JATS contains many attribute value lists that include the value “custom” and the attribute @custom-type to provide an escape hatch and allow any attribute value to be given. This mechanism, using “@lang-focus='custom'” and the value of the @lang-focus-custom attribute, can provide any value not on the list above for @lang-focus.
Should it Translate Attribute (@lang-translate)
The @lang-translate attribute specifies whether an element’s content is to be translated, for example, to exempt a paragraph from automatic translation or to leave some content unchanged when the content is localized. Text that should not be translated, for example, automatically by Google translate, can be tagged with attribute “@lang-translate='no'” to indicate that the text should not be translated or localized. A value of “@lang-translate='yes'” or absence of this attribute indicates that it would be appropriate to translate the content.
<p>...As Augustus used to say  
<named-content lang-translate="no" content-type="foreign-phrase"
>Carpe Diem</named-content> ...</p>
Element <content-language> Records Primary Languages
The <content-language> is a repeatable article-metadata element, adopted from NISO STS, that names the primary languages present in the article. JATS is agnostic on the meaning of “primary” and also on whether “primary language” describes the narrative content of the article, the article metadata, or both.
Unlike any other element in JATS, the content of <content-language> is prescribed, although not in a way JATS can enforce. For best practice, the entire content of <content-language> should be one language code (with possible subtags). The element <content-language> should repeat, once for each primary language.
 <content-language>fr</content-language>
 <content-language>en</content-language>
The use of @xml:lang with a language value on a top-level element (such as “fr” for French) implies a mono-lingual document. A multi-lingual document should use the @xml:lang value “mul” or not use @xml:lang on the top article level. No purpose is served by the element <content-language> in a mono-lingual document.
Citations in Multiple Languages
Reference counting and tracking is critical in the current journal ecosystem. Great care must be taken to make sure that a citation coded in multiple languages be counted as a single citation and referenced as one from within the document. Tagging the same citation in multiple languages requires more than just the multi-language mechanism; it also requires using the <citation-alternatives> element. The element <citation-alternatives> is used to “hold alternative versions of a single citation, for example, the same citation in multiple languages”.
A bibliographic reference (<ref>) may contain more than one citation (<mixed-citation> or <element-citation>); this is not an uncommon publishing practice. So two citations inside a reference may indicate two separate items cited. When the same citation is provided in multiple languages, the equivalent language-varying citations should all be inside one <citation-alternatives> element.
How to Tag the Same Citation in Multiple Languages
  • Place a <citation-alternatives> element inside a bibliographic reference (<ref>), to hold two or more equivalent language-variant citations.
  • Each single-language citation should be tagged with a citation element (<mixed-citation> or <element-citation>). The citations need not all be tagged using the same type of citation element.
  • Each single-language citation should use the @xml:lang attribute to name the language of the citation.
    Note that an @xml:lang attribute on a citation element identifies the language of the citation itself, not that of the cited work.
  • If the article uses the multi-language functionality, it should be used on the citations as well, to provide guidance for customized user display. For example, each single-language citation may use the multi-language attribute @lang-variant to flag the citation as an “original” or a “translation”.
Multiple Languages within a Single Citation
It is also possible to tag multiple language elements within a citation (<mixed-citation> or <element-citation>). In this practice the <article-title> and other citation components would be present once for each language, with the attribute @xml:lang used to name the language and the attribute @lang-group used to associate the two identical components to mark them as a single real-world object. (For details, see Citations in Multiple Languages.)
A Full Article, Published in Two Languages
Users with a complete article in two or more languages may associate them by tagging each article as a <sub-article> and putting all of these sub-articles into an <article>. The <article> element serves mostly as a vessel for the multiple equivalent logical-articles, each of which is tagged as a <sub-article>. Each <sub-article> takes an @xml:lang attribute and can include full metadata (in <front-stub>), a complete body, back-matter, references, etc. The containing <article> may or may not have any content except a small amount of metadata.
A few suggestions for the multi-language use of <sub-article>:
  • Both the overarching article and the sub-articles should contain the <content-language> as part of their metadata. This element names at least one primary language of an article or sub-article.
  • The single-language <sub-article>s are assumed to be equivalent (same content, different language), and each can be given a DOI.
Note: Two different-language <sub-article>s do not imply an article and a translation, but possibly two parallel/equivalent original sub-articles. For example, in the article in French and the same article in English below, both sub-articles have been marked as “original” using the @lang-variant attribute.
<article dtd-version="1.4d1" xml:lang="mul"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:mml="http://www.w3.org/1998/Math/MathML"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:ali="http://www.niso.org/schemas/ali/1.0/"  >
 <processing-meta lang-grouping="yes"/>

 <front>...
  <article-meta>
   <title-group><article-title>...</article-title></title-group>
   ...
   <content-language>fr</content-language>
   <content-language>en</content-language>
  </article-meta>
 </front>

 <sub-article>
  <processing-meta lang-grouping="yes"/>
  <front-stub>
   ...
   <title-group xml:lang="fr" 
      lang-variant="original" lang-source="author">
    <article-title>Chirurgie en période COVID, étude 
      observationnelle</article-title>
   </title-group>
   ...
   <content-language>fr</content-language>
  </front-stub>
  ...
 </sub-article>

 <sub-article>
  <processing-meta lang-grouping="yes"/>
  <front-stub>
   ...
   <title-group xml:lang ="en" 
      lang-variant="original" lang-source="author">
    <article-title>Surgery under COVID:  An observational 
      study</article-title>
   </title-group>
   <content-language>en</content-language>
   ...
  </front-stub>
  ...
 </sub-article> 
</article>
Use of @xml:lang on the Main Article
There is no general consensus as to the value of the @xml:lang attribute on the single, overarching article that contains two or more single-language sub-articles. IETF RFC 5646 (https://tools.ietf.org/html/rfc5646) allows for a 3-letter code “mul” meaning a multi-language document, but acceptance is not universal, particularly since this is a 3-letter code in the set of 2-letter language codes.
Some publishers chose not to use @xml:lang on the top-level article. The fact that the current article is a multi-language article can be detected by the presence of the <content-language> element in the article-metadata, but this is not a perfect solution as this leaves no way to detect a multi-language article right up front, on the <article> element. The only wrong answer (and one publishers have had no choice but to use in the past) is to tag the full article with only one of the languages. Multi-lingual JATS is still too new to know how this will resolve, and this version of JATS provides no guidance.
Changes to JATS to Enable Multi-lingual Documents
There were surprisingly few changes needed to JATS to allow for multi-language articles. If you publish mono-lingual articles, you can ignore the new attributes and new element repeatability and your processing should continue unchanged.
The minor, backwards-compatible changes include:
  • Title Groups — All elements that wrap related titles and subtitles are now allowed to repeat, typically once per language, varying only by @xml:lang. This includes <title-group>, <journal-title-group>, <issue-title-group>. Best practice is to place the @xml:lang at the outer title group level, except for the very rare exceptions in which a title and the corresponding subtitle are in different languages.
  • Trans-xxx Elements — All of the elements named “trans-xxx” are now deprecated, because titles, abstracts, etc. will now just repeat and use @xml:lang to record the language. This means there will be no need to call a second title, abstract, etc. a translation in order to tag a second language; each title might be an original.
  • Face Markup — The attribute @xml:lang will be allowed on all of the face markup elements (<bold>, <underline>, etc.), <sub>, <sup>, and inside Ruby (<rb>, <rt>).
  • Permissions — The attribute @xml:lang and the multi-lingual attributes were added to both <permissions> and the permissions content elements: <copyright-statement>, <copyright-holder>, <license>, and <license-p>.
    [Note: The concern has been expressed that, by making all these copyright objects repeatable for languages, users will be able to repeat them for other reasons, and implementations that count on there being a single object will be impaired. The JATS Standing Committee worried that these repetitions may be abused. For best practice, the permissions elements should repeat only when language information is different. JATS recommend that an application that requires only one of a permissions element control their input with Schematron or other content-based checking.]
  • Repeating Metadata Elements — Nearly all elements inside <article-meta> (such as <title-group>, <abstract>, <kwd-group>, etc.) are repeatable and so can handle multiple languages by using an @xml:lang attribute on each of the repeating elements. Metadata elements that did not repeat in JATS 1.3, were made repeatable, including: <author-notes>, <permissions>, and <supplement>.

Retiring the “Trans-xxx” Elements

The elements named “trans-xxx” (translations) were originally created so that JATS could record two article titles, abstracts, sources, etc. in more than one language: one in the original language (for example, <article-title>) and one a translation of that original (for example, <trans-title>). There were several disadvantages with this technique, including the proliferation of “trans-wrapper” elements (to hold translated components together) and the necessity to name an original and a translated language, even if your two languages were both original languages.
Starting with JATS release 1.4d1, all elements whose names start with “trans” are deprecated. Metadata titles, abstracts, etc. can repeat and each repetition use attribute @xml:lang to name another language and the attribute @lang-variant to say that these two titles (for example) are both original, or an original and a translation. There will be no need to call a second title, abstract, etc. a translation in order to tag a second language.
The following table lists the translation elements and describes their replacements.
What to Tag Instead of Deprecated “trans-”:
Trans Element Replacements
<trans-abstract> Repeat <abstract>
<trans-title>
In citations and related articles, Repeat <article-title> or <part-title>
<trans-title-group>
For <title-group>, Repeat the <title-group>
<trans-subtitle>
In citations and related articles, the element <trans-subtitle> has never been allowed. The subtitle should be merged with the appropriate title element.
<trans-source> Repeat <source>