xml:lang

Language

The language of the intellectual content of the element for which this is an attribute.

Typical values are described by the IETF RFC 5646, two-letter, lower-case language codes such as “ fr ” (French), “ en ” (English), “ de ” (German), and “ zh ” (Chinese). These values are NOT case sensitive, but current best practice uses all lower case. Values can be obtained from the IANA Language Subtag Registry: http://www.iana.org/assignments/language-subtag-registry.

Remarks

Inheritance: The language value inherits down the tree, so an @xml:lang attribute names the language of the element and all its descendants, unless the descendant sets its own @xml:lang attribute. The default value of English (“ en ”) is set at the top-level element, and can be over-ridden there or anywhere lower in the document.

Script and Language: In some languages, script codes are also critically important; for example, in Japanese, there is the need to express whether a name is in Kanji as opposed to in Kana (Hiragana or Katakana) to determine sort keys. Best practice is to use the full language-code-plus-script-code as the value for @xml:lang. In our use of both language and script tagging as values for @xml:lang, we are following the IETF (Internet Engineering Task Force) best practice guideline: Network Working Group Request for Comments: 5646 [Tags for Identifying Languages, A. Phillips and M. Davis, Editors, September 2009]. That document defines a language tag as composed of (in part):

  1. A language code Language (typically using the shortest ISO 639)
  2. Potentially followed by a hyphen and then a script code script (using the ISO 15924 code)
  3. Potentially followed by a hyphen and a region code region (using the ISO 15924 code)

Some sample values of @xml:lang for Chinese and Serbian illustrate this complexity:

Thus, for example, the following are among the expected values of @xml:lang for Japanese, incorporating both a language (“ ja ”) and a script type:

Historical Note: This attribute has been significantly remodeled in the NISO Version 0.4 (equivalent to JATS Version 3.1 Draft) of the Tag Set, in that there used to be (in Versions 3.0 and below) default values of @xml:lang set to English (“ en ”) in several elements below the top level (including the structural elements <response> and <sub-article> as well as the metadata elements <journal-title>, <journal-subtitle>, and <abbrev-journal-title>). In the interests of internationalization, all defaults except that for the top-level <article> have been dropped. While this is not strictly backwards-compatible, it is unlikely to cause many production problems.

Attribute Values

In Element

<article>
Value Meaning
An alphanumeric string, which may include hyphens An abbreviation for a natural language (such as “en” for English or “de” for German) or for a language and a script (“ ja-Kana ”)
Default value: en

In Elements

<abbrev>, <abbrev-journal-title>, <abstract>, <ack>, <addr-line>, <address>, <aff>, <alt-text>, <alt-title>, <annotation>, <anonymous>, <app>, <app-group>, <array>, <article-title>, <attrib>, <author-comment>, <award-group>, <award-id>, <bio>, <boxed-text>, <caption>, <chapter-title>, <chem-struct>, <chem-struct-wrap>, <collab>, <comment>, <conf-acronym>, <conf-date>, <conf-loc>, <conf-name>, <conf-num>, <conf-sponsor>, <conf-theme>, <conference>, <copyright-holder>, <copyright-statement>, <corresp>, <country>, <custom-meta>, <date-in-citation>, <day>, <def>, <def-item>, <def-list>, <degrees>, <disp-formula>, <disp-formula-group>, <disp-quote>, <edition>, <element-citation>, <email>, <etal>, <ext-link>, <fig>, <fig-group>, <fn>, <fn-group>, <fpage>, <funding-group>, <funding-source>, <funding-statement>, <glossary>, <gov>, <graphic>, <inline-formula>, <inline-graphic>, <inline-supplementary-material>, <institution>, <issue>, <issue-id>, <issue-part>, <issue-sponsor>, <issue-title>, <journal-id>, <journal-subtitle>, <journal-title>, <kwd-group>, <label>, <license>, <list>, <list-item>, <long-desc>, <lpage>, <media>, <milestone-end>, <milestone-start>, <mixed-citation>, <month>, <name>, <named-content>, <nlm-citation>, <note>, <notes>, <on-behalf-of>, <open-access>, <p>, <page-range>, <part-title>, <patent>, <person-group>, <prefix>, <preformat>, <price>, <principal-award-recipient>, <principal-investigator>, <product>, <publisher-loc>, <publisher-name>, <ref>, <ref-list>, <related-article>, <related-object>, <response>, <role>, <season>, <sec>, <self-uri>, <series>, <series-text>, <series-title>, <sig>, <size>, <source>, <speaker>, <speech>, <statement>, <std>, <string-conf>, <string-date>, <string-name>, <styled-content>, <sub-article>, <subj-group>, <subtitle>, <suffix>, <supplement>, <supplementary-material>, <table-wrap>, <table-wrap-group>, <target>, <term>, <textual-form>, <trans-abstract>, <trans-source>, <trans-subtitle>, <trans-title>, <trans-title-group>, <uri>, <verse-group>, <verse-line>, <volume>, <volume-id>, <volume-series>, <x>, <xref>, <year>
Value Meaning
An alphanumeric string, which may include hyphens An abbreviation for a natural language (such as “en” for English or “de” for German) or for a language and a script (“ ja-Kana ”)
Restriction: This attribute may be specified if the element is used.