◇◆
Multiple Languages/scripts
The JATS community is becoming increasingly aware of the need to support multi-language
journal articles. Multi-lingual content is much more than block quotes in a second
language. Uses include more than one original language, an original language and one
or more translations, transcriptions, and more.
This section describes the JATS multi-lingual mechanism, a largely attribute-based
encoding solution designed to handle documents which are written in more than one
language, where the JATS user wishes to record the relationship between the language
variances. The JATS mechanism covers:
- An entire document in 2 or more languages
- Substantial portions of content in 2 or more languages
- Article metadata in 2 or more languages
- Selected block structures such as figures and tables in 2 or more languages
Rationale for JATS Multi-lingual Mechanism
When substantial portions of document content are in more than one language, the same
structures are often repeated, once in each language. These equivalent structures
(the same content, differing only by language) need not be co-located in the document,
and the content in a single language need not be contiguous. As an example, alternate
sections could repeat the same content in French and English. As another, a paragraph
could be repeated, first in Greek, then in Romanian, and then in Italian. The same
figure or table could be presented in Spanish, English, and Portuguese.
Therefore, in a multi-lingual document, when there are two
or more same-content objects (sections, figures, boxed-texts, tables,
etc.), differing only in language, few assumptions can be made about
the locations and interrelationships between these
“same-content” objects. Portions in a single
language need not be contiguous. The objects with the same
content need not located anywhere near each other in the
document. For these reasons, the JATS multi-lingual mechanism
cannot simply enclose (wrap up) all the same-content objects.
For true multi-lingualism, a wrapper-style mechanism is not
sufficient.
It was also the hope of the JATS Standing Committee that the JATS multi-lingual mechanism
not be bulky and intrusive. Ideally, any mechanism should enable users to create
true multi-lingual documents while not requiring changes to the tagging of mono-lingual
documents. Ideally, any multi-lingual mechanisms should be completely ignorable by
creators/users of mono-lingual documents.
Attribute @lang-group (JATS Multi-lingual Mechanism)
In the JATS multi-lingual mechanism, same-content structures in different languages
can be flagged as belonging to the same “language group”. The phrase “language group”
is not a term of art; we use it to mean the collection of objects that are the same
in content and vary only in language. JATS calls alternate language versions of the
same content “variants”, and they are collected into a “language group” by the values
of the @lang-group attribute. The members of a language group need not be contiguous. That is, they
may appear next to each other, but they may also appear in different places within
a document.
How @lang-group builds language groups:
- The value of the @lang-group attribute is an IDREF, and the attribute is used to tie the members of a language group together. The value of the @lang-group must be the same for all members of a language group to support processing. The variant content objects in the @lang-group are bound together only by the IDREF.
- The value of the @lang-group must be the @id attribute value of one of the variant objects in the group. (It does not matter which, and there is no significance to the selection of which @id is used.)
Desirable functionality for language groups in an online environment might include:
allowing the user to choose whether to see a particular language version or all the
variants, and allowing a user to find an article in a search using a filter specifying
the language(s). For example, in an article in both English and Spanish, a user could
opt to see only the Spanish, only the English, or both. This would be a function of
the display application supported by the JATS markup.
Alternative Spanish and English Variants of a Figure
This example shows two same-content figures differing only in language: one figure
is the original (in Spanish) and one figure is a translation (in English). The figure
element has been repeated, placed into an attribute-named language group using the
@lang-group attribute. One variant has been marked as the original and the other as a translation
using the @lang-variant attribute. One variant has been marked as primary and the other as secondary using
the @lang-focus attribute. The figures need not be co-located in the article, and may be presented
in any order.
...<front> ...</front> <body>... <fig id="f0001" lang-group="f0001" position="float" fig-type="scatter-graph" xml:lang="en" lang-variant="translation" lang-source="translator" lang-focus="secondary"> <caption><p>Evolution of the repetition rate in all sequences</p></caption> <graphic xlink:href="RIYA_A_1889289_F0001_OC.jpg" content-type="color" specific-use="web-only"/> </fig> <fig id="f0005" lang-group="f0001" position="float" fig-type="scatter-graph" xml:lang="es" lang-variant="original" lang-source="author" lang-focus="primary"> <caption><p>Evolución de la tasa de repetición en todas las secuencias</p></caption> <graphic xlink:href="RIYA_A_1889289_F0005_OC.jpg" content-type="color" specific-use="web-only"/> </fig>...</body> </article>
Alternative Spanish and English Variants of a Table
This example shows two same-content tables differing only in language: one table is
the original (in Spanish) and one table is a translation (in English). The table element
has been repeated, placed into an attribute-named language group using the @lang-group attribute. One variant has been marked as the original and the other as a translation
using the @lang-variant attribute. One variant has been marked as primary and the other as secondary using
the @lang-focus attribute. The tables need not be co-located in the article, and may be presented
in any order.
... <table-wrap id="t0001" lang-group="t0001" position="float" orientation="portrait" xml:lang="es" lang-variant="original" lang-source="author" lang-focus="primary"> <caption><p>Estadísticos descriptivos por edad</p></caption> <table>...</table> </table-wrap> <table-wrap id="t0006" lang-group="t0001" position="float" orientation="portrait" xml:lang="en" lang-variant="translation" lang-source="translator" lang-focus="secondary"> <caption><p>Descriptive statistics by age</p></caption> <table>...</table> </table-wrap> ...
Alternative Spanish and English Variants of a Section
This example shows two same-content sections differing only in language: one section
is the original (in Spanish) and one section is a translation (in English). The section
element has been repeated, placed into an attribute-named language group using the
@lang-group attribute. One variant has been marked as the original and the other as a translation
using the @lang-variant attribute. One variant has been marked as primary and the other as secondary using
the @lang-focus attribute. The sections need not be co-located in the article, and may be presented
in any order.
... <sec id="s0005" xml:lang="en" lang-group="s0005" lang-variant="translation" lang-source="translator" lang-focus="secondary"> <title>Method</title> ... </sec> <sec id="s0011">...</sec> <sec id="s0013" xml:lang="es" lang-group="s0005" lang-variant="original" lang-source="author" lang-focus="primary"> <title>Método</title> ... </sec> ...
Overview of Multi-lingual Attributes/Elements
The JATS Multi-lingual mechanism was created by adding multi-lingual attributes, creating
a new multi-language metadata element, and changing existing JATS element models,
usually to make structures repeatable to allow for different language variants.
JATS Multi-lingual Mechanism Attributes/Element
Attribute | Meaning and Typical Values |
---|---|
@lang-grouping | Placed on the <processing-meta> element as a flag to indicate that this document uses the @lang-group attribute and associated multi-language attributes to group and describe multiple language content. |
@lang-group | Placed on two or more elements to indicate that these elements are part of the same language group (that is, they represent the same content in different languages). The heart of the JATS multi-lingual mechanism. |
@lang-focus | How members of a language group are related to each other, for example, one of the languages might be the primary language and the other language(s) secondary. (Note: If the values provided in the @lang-focus value list do not include a needed value, select “custom” for the @lang-focus value (“@lang-focus='custom'”) and put the desired value into new attribute @lang-focus-custom.) |
@lang-source | What was the role of the person/entity who created this language variant? For example: author, translator, a machine translation. (Note: If the values provided in the @lang-source value list do not include a needed value, select “custom” for the @lang-source value (“@lang-source='custom'”) and put the desired value into the new attribute @lang-source-custom.) |
@lang-variant | Names the type of language variant for a member of a language group, for example, a translation or an original. (Note: If the values provided in the @lang-variant value list do not include a needed value, select “custom” for the @lang-variant value (“@lang-variant='custom'”) and put the desired value into the new attribute @lang-variant-custom.) |
@lang-translate | Should the content of this element be translated? Possibilities are yes or no. |
@xml:lang | The language of the content of the element for which this is an attribute. The value of this attribute must conform to IETF RFC 5646. |
NOT @language | Not part of the multi-lingual mechanism. The @language attribute names a programming or scripting language in which code is written, e.g., “javascript”. (In contrast to the @xml:lang attribute, which names the language of an element.) |
NOT @language-version | Not part of the multi-lingual mechanism. The attribute @language-version names the version of the programming or scripting language in which code is written, e.g. “3.0”, for code written in “JavaScript 3.0”. |
NOT @hreflang | Not part of the multi-lingual mechanism. The attribute @hreflang names the language of the target to which an external link is pointing. A processor following the link would expect to find a document in the language named. |
JATS Multi-lingual Element | |
<content-language> | Identifies one language used in this document, by containing an ISO 639 code. The element should be repeated for every primary language in the document. |
Language Grouping Attribute (@lang-grouping) Flags Use of the Mechanism
As part of the processing metadata (<processing-meta>), the Language Grouping Flag attribute (@lang-grouping) is a flag to indicate that this document uses the @lang-group
attribute and associated multi-language attributes to group and describe multiple
language content.
The Language Grouping Flag attributes has two possible values:
Value | Meaning |
---|---|
yes | This document is in more than one language and uses the @lang-group attribute mechanism to associate related material in different languages. |
no | This document does not mark multi-lingual content using the @lang-group attribute. |
Using @lang-grouping to flag two figures as the same logical figure, the original in Spanish and an English
translation:
<article dtd-version="1.4" xml:lang="en"> <processing-meta lang-grouping="yes"/> <front> <article-meta>... <content-language>es</content-language> <content-language>en</content-language>... </article-meta> </front> <body> ... <fig id="f0001" lang-group="f0001" xml:lang="es" lang-variant="original" lang-source="author" lang-focus="primary" position="float" fig-type="scatter-graph" > <caption><p>Evolución de la tasa de repetición en todas las secuencias</p></caption> ... </fig> <fig id="f0002" lang-group="f0001" xml:lang="en" lang-variant="translation" lang-source="translator" lang-focus="secondary" position="float" fig-type="scatter-graph" > <caption><p>Evolution of the repetition rate in all sequences</p></caption> ... </fig> ... </body> </article>
Language Variant Attribute (@lang-variant)
The Language Variant attribute (@lang-variant)
names the relationship between the alternate language variants of the same content.
For example, a section in French could be the “original” language variant and the
same section translated into English could be a “translation” variant.
Multi-lingual documents can be complex. There may, for example, be two “original”
language sections with the same content, one original and one translation, or an original
and an interpretation. For example, in Canada (and elsewhere) both the English and
the French same-language constructs could be “original” material from the author.
Other possible variant relationships include: “transliteration” and “transcription”.
Using @lang-variant
While it is always possible to create a language grouping (using @lang-group) to record alternative languages, some JATS elements, particularly in the metadata,
are repeatable. The @lang-variant attribute can be used on such elements even if no language groupings are created.
Since repeatable metadata elements already take @xml:lang to name their content language, providing multiple language versions merely means
repeating the element and adding the @lang-variant and possibly the @lang-source attributes. So two <title-group> elements, or two <abstract>s, or two of any repeatable metadata element, could vary only in language.
For example, here are Spanish and English alternative language versions of the article
abstract inside <article-meta>:
<article-meta> ... <abstract xml:lang="es" lang-variant="original" lang-source="author"> <title>RESUMEN</title> <p>La repetición verbal espontánea forma parte de la interacción temprana adulto–niño, estando enmarcadas en el seno de conversaciones. ...</p> </abstract> <abstract xml:lang ="en" lang-variant="translation" lang-source="translator"> <title>ABSTRACT</title> <p>Spontaneous verbal repetition is part of early adult–child conversational interchanges. ...</p> </abstract> ... </article-meta>
@lang-variant Value List
In the table below, note that although this is a restricted value list, both “custom” (which requires @lang-variant-custom to provide a value) and “unknown” are included among the values.
@lang-variant Value List
Value | Meaning |
---|---|
original | A passage in its original language (from the author) |
translation | A translation of a passage into another language |
interpretation | A rewording of a passage into another language |
transcription | A representation of spoken language in a written form |
transliteration | A mapping from one system of writing into another |
phonetic | A representation of speech sounds using phonetic symbols |
spoken | A passage spoken aloud |
custom | Any variant not on this list or any combination of the above values |
unknown | The language relationship is unknown. This is not the same as omitting the @lang-variant attribute; this is an application statement “we do not know”. |
@lang-variant-custom
JATS contains many attribute value lists that include the value “custom” and the attribute @custom-type to provide an escape hatch and allow any attribute value to be given. This mechanism,
using “@lang-variant='custom'” and the value of the @lang-variant-custom attribute, can provide any value not on the list above for @lang-variant.
Language Source Attribute (@lang-source)
The Language Variant Source (@lang-source) attribute names the source (origin) of a language variant. The intended meaning
for “source” is “who created this variant?” Possible answers include: the author,
a translator, or a machine translation by computer algorithm.
Here are two abstracts with different language sources:
... <abstract xml:lang="es" lang-variant="original" lang-source="author"> <title>RESUMEN</title> <p>...</p> </abstract> <abstract xml:lang ="en" lang-variant="translation" lang-source="translator"> <title>ABSTRACT</title> <p>...</p> </abstract> ...
@lang-source Value List
Value | Meaning |
---|---|
author | An author or authors created this variant |
editor | An editor or editors created this variant |
translator | A human translator or translators created this variant |
machine | Computer processing created this language variant, for example, a machine translation |
custom | The creator of this variant is not any of the listed values. The @lang-source-custom attribute should be used to specify the relationship. |
@lang-source-custom
JATS contains many attribute value lists that include the value “custom” and the attribute @custom-type to provide an escape hatch and allow any attribute value to be given. This mechanism,
using “@lang-source='custom'” and the value of the @lang-source-custom attribute, can provide any value not on the list for @lang-source.
Language Focus Attribute (@lang-focus)
The @lang-focus attribute indicates how members of a language group are related to each other, which
may be used as a hint for how multiple language variants might be displayed. The manner
in which the members of a language group are related is application specific, perhaps
driven by reader preference. However, the @lang-focus attribute can be used to provide hints in markup about the authors’s intention for
how the members of a language group should be considered and displayed.
... <p id="para011" lang-group="para011" xml:lang="la" lang-variant="original" lang-focus="primary" lang-translate="no" >Si hortum in bibliotheca habes, nihil deerit.</p> <p id="para011-b" lang-group="para011" xml:lang="en" lang-variant="interpretation" lang-focus="secondary" >Literally "if you have a garden in a library, nothing will be lacking", usually paraphrased as: "If you have a garden and a library, you have everything you need."</p> ...
@lang-focus Value List
Value | Meaning |
---|---|
primary | The text has a more central focus than other language variants in the group. In display, such text is typically made more prominent or is the only focus displayed. |
secondary | The text is not the primary textual focus in the language group. In display, such text is typically less prominent. |
custom | The language relationship is not any of the specific listed values. The @lang-focus-custom attribute should be used to specify the relationship. |
undefined | No recommendation is made concerning the relative focus of the language variants. No display or relevance relationship is indicated, thus all variants are intended to be displayed in the same way. In display, all alternative language variants are typically displayed in document order. (Note: This is the default behavior (not an XML-grammatically-mandated default), which may be assumed if the @lang-focus attribute is not provided with a specific listed value.) |
@lang-focus-custom Attribute
JATS contains many attribute value lists that include the value “custom” and the attribute @custom-type to provide an escape hatch and allow any attribute value to be given. This mechanism,
using “@lang-focus='custom'” and the value of the @lang-focus-custom attribute, can provide any value not on the list above for @lang-focus.
Should it Translate Attribute (@lang-translate)
The @lang-translate attribute
specifies whether an element’s content is to be translated, for example, to exempt
a paragraph from automatic translation or to leave some content unchanged when the
content is localized. Text that should not be translated, for example, automatically
by Google translate, can be tagged with attribute “@lang-translate='no'” to indicate that the text should not be translated or localized. A value of “@lang-translate='yes'” or absence of this attribute indicates that it would be appropriate to translate
the content.
<p>...As Augustus used to say <named-content lang-translate="no" content-type="foreign-phrase" >Carpe Diem</named-content> ...</p>
Element <content-language> Records Primary Languages
The <content-language> is a repeatable article-metadata element, adopted from NISO STS, that names the primary
languages present in the article. JATS is agnostic on the meaning of “primary” and
also on whether “primary language” describes the narrative content of the article,
the article metadata, or both.
Unlike any other element in JATS, the content of <content-language> is prescribed, although not in a way JATS can enforce. For best practice, the entire
content of <content-language> should be one language code (with possible subtags). The element <content-language> should repeat, once for each primary language.
<content-language>fr</content-language> <content-language>en</content-language>
The use of @xml:lang with a language value on a top-level element (such as “fr” for French) implies a mono-lingual document. A multi-lingual document should use
the @xml:lang value “mul” or not use @xml:lang on the top article level. No purpose is served by the element <content-language> in a mono-lingual document.
Citations in Multiple Languages
Reference counting and tracking is critical in the current journal ecosystem. Great
care must be taken to make sure that a citation coded in multiple languages be counted
as a single citation
and referenced as one from within the document. Tagging the same citation in multiple
languages requires more than just the multi-language mechanism; it also requires using
the <citation-alternatives> element. The element <citation-alternatives> is used to “hold alternative versions of a single citation, for example, the same
citation in multiple languages”.
A bibliographic reference (<ref>) may contain more than one citation (<mixed-citation> or <element-citation>); this is not an uncommon publishing practice. So two citations inside a reference
may indicate two separate items cited. When the same citation is provided in multiple
languages, the equivalent language-varying citations should all be inside one <citation-alternatives> element.
How to Tag the Same Citation in Multiple Languages
- Place a <citation-alternatives> element inside a bibliographic reference (<ref>), to hold two or more equivalent language-variant citations.
- Each single-language citation should be tagged with a citation element (<mixed-citation> or <element-citation>). The citations need not all be tagged using the same type of citation element.
- Each single-language citation should use the @xml:lang attribute to name the language of the citation.
Note that an @xml:lang attribute on a citation element identifies the language of the citation itself, not that of the cited work.
- If the article uses the multi-language functionality, it should be used on the citations as well, to provide guidance for customized user display. For example, each single-language citation may use the multi-language attribute @lang-variant to flag the citation as an “original” or a “translation”.
Multiple Languages within a Single Citation
It is also possible to tag multiple language elements within a citation (<mixed-citation> or <element-citation>). In this practice the <article-title> and other citation components would be present once for each language, with the
attribute @xml:lang used to name the language and the attribute @lang-group used to associate the two identical components to mark them as a single real-world
object. (For details, see Citations in Multiple Languages.)
Changes to JATS to Enable Multi-lingual Documents
There were surprisingly few changes needed to JATS to allow for multi-language articles.
If you publish mono-lingual articles, you can ignore the new attributes and new element
repeatability and your processing should continue unchanged.
The minor, backwards-compatible changes include:
- Title Groups — The element <title-group> is now allowed to repeat, typically once per language, varying only by @xml:lang. Best practice is to place the @xml:lang at the outer title group level, except for the very rare exceptions in which a title and the corresponding subtitle are in different languages.
- Trans-xxx Elements — All of the elements named “trans-xxx” are now deprecated, because titles, abstracts, etc. will now just repeat and use @xml:lang to record the language. This means there will be no need to call a second title, abstract, etc. a translation in order to tag a second language; each title might be an original.
- Face Markup — The attribute @xml:lang will be allowed on all of the face markup elements (<bold>, <underline>, etc.), <sub>, <sup>, and inside Ruby (<rb>, <rt>).
-
Permissions — The attribute @xml:lang and the multi-lingual attributes were added to both <permissions> and the permissions content elements:
<copyright-statement>, <copyright-holder>, <license>, and <license-p>.
[Note: The concern has been expressed that, by making all these copyright objects repeatable for languages, users will be able to repeat them for other reasons, and implementations that count on there being a single object will be impaired. The JATS Standing Committee worried that these repetitions may be abused. For best practice, the permissions elements should repeat only when language information is different. JATS recommend that an application that requires only one of a permissions element control their input with Schematron or other content-based checking.]
- Repeating Metadata Elements — Nearly all elements inside <article-meta> (such as <title-group>, <abstract>, <kwd-group>, etc.) are repeatable and so can handle multiple languages by using an @xml:lang attribute on each of the repeating elements. Metadata elements that did not repeat in JATS 1.3, were made repeatable, for example, <supplement>.
Retiring the “Trans-xxx” Elements
The elements named “trans-xxx” (translations) were originally created so that JATS
could record two article titles, abstracts, sources, etc. in more than one language:
one in the original language (for example, <article-title>) and one a translation of that original (for example, <trans-title>). There were several disadvantages with this technique, including the proliferation
of “trans-wrapper” elements (to hold translated components together) and the necessity
to name an original and a translated language, even if your two languages were both
original languages.
Starting with JATS release 1.4, all elements whose names start with “trans” are deprecated.
Metadata titles, abstracts, etc. can repeat and each repetition use attribute @xml:lang to name another language and the attribute @lang-variant to say that these two titles (for example) are both original, or an original and
a translation. There will be no need to call a second title, abstract, etc. a translation
in order to tag a second language.
The following table lists the translation elements and describes their replacements.
What to Tag Instead of Deprecated “trans-”:
Trans Element | Replacements |
---|---|
<trans-title> |
In citations and related articles, Repeat <article-title> or <part-title>
In <title-group>, Repeat <article-title>
|
<trans-source> | Repeat <source> |