xml:lang Language

The language of the intellectual content of the element for which this is an attribute.
The value of this attribute must conform to the IETF RFC 5646 (http://tools.ietf.org/html/rfc5646). For most uses, a primary-language subtag such as “fr” (French), “en” (English), “de” (German), or “zh” (Chinese) is sufficient. These values are NOT case sensitive, but current best practice uses all lower case. In addition to the primary language subtag, the value of this attribute may contain other subtags as well. Values for the various subtags (which can be used in certain combinations) can be obtained from the IANA Language Subtag Registry: http://www.iana.org/assignments/language-subtag-registry

Usage/Remarks

Element-level Inheritance
In the words of the W3C report xml:lang in XML document schemas: “Content … either contained within [a] document directly or considered part of [a] document when it is processed or rendered … should use the @xml:lang attribute to indicate the language of that content.” (https://www.w3.org/International/questions/qa-when-xmllang)
According to W3C specifications, the @xml:lang attribute, when placed on an element, applies by inheritance to:
  • all of an element’s textual content,
  • all of the element’s children and descendents (even the multimedia ones),
  • all of the element’s attributes.
Thus the @xml:lang attribute is scoped: all the elements inside the element that contains the @xml:lang attribute (as well as their attributes) inherit the language value named. Any descendant element can name its own @xml:lang value, thus overriding the inheritance. Therefore it is good practice to name the primary language of the document at the top level, and override it when needed for inclusions in other languages.
Related Attributes
The @xml:lang attribute names the language used for the content of an element in the current document. The @hreflang attribute suggests the language of the target (external document) to which a link is pointing.
Related element: How to Tag the Language
In BITS, there are two ways to describe the natural language of the content of a document:
  • XML Lang Language Attribute: The @xml:lang attribute can be put on many elements, to indicate the language of that element and its descendants. This is an inherited value, so that element and all of its children will be in the named language, unless specifically overridden with another @xml:lang attribute. Thus a language code on the top level element of a document (in the case of BITS, <book> or <book-part-wrapper>) names the only primary language in a mono-lingual document or can be the value "mul" to indicate that the document has multiple primary languages.
  • Content Language Element: The <content-language> element, in the metadata of a book or book-part, identifies the primary language(s) used in the document. The element appears once for each primary language used in the document. For Best Practice, the <content-language> content should be the two-letter ISO 639 code for the language, for example, “en” for English, “de” for German, or “es” for Spanish.

Best Practice @xml:lang and <content-language>

For multi-lingual documents, the @xml:lang attribute may be omitted from the top document-level element or the document may use the 3-digit ISO 639-2 value “xml:lang="mul"”, indicating multiple primary languages are used.
This tag set is agnostic on how “primary” is defined, leaving that decision to each producer. However, the intent of this element is to record the principle languages used in a multi-lingual document, not to state that a few quotations in another language occur in an essentially mono-lingual document.
In addition:
  • There is no value to using <content-language> on a mono-lingual document.
  • The use of a language code value for @xml:lang on a top-level element strongly implies a mono-lingual document.
Script and Language
In some languages, script codes are also critically important; for example, in Japanese, there is the need to express whether a name is in Kanji as opposed to in Kana (Hiragana or Katakana) to determine sort keys. Best practice is to use the full language-code-plus-script-code as the value for @xml:lang. In JATS use of both language and script tagging as values for @xml:lang, JATS is following the IETF (Internet Engineering Task Force) best practice guideline Network Working Group Request for Comments: 5646 [Tags for Identifying Languages, A. Phillips and M. Davis, Editors, September 2009]. That document defines a language tag as composed of (in part):
  1. A language code Language (typically using the shortest ISO 639)
  2. Potentially followed by a hyphen and then a script code (Script using the ISO 15924 code)
  3. Potentially followed by a hyphen and a region code (Region using the ISO 15924 code)
Some sample values of @xml:lang for Chinese and Serbian illustrate this complexity:
  • Language subtag plus Script subtag: xml:lang="zh-Hant" (Chinese written using the Traditional Chinese script)
  • Language subtag plus Script subtag: xml:lang="zh-Hans" (Chinese written using the Simplified Chinese script)
  • Language-Script-Region: xml:lang="zh-Hans-CN" (Chinese written using the Simplified script as used in mainland China)
  • Language-Script-Region: xml:lang="sr-Latn-RS" (Serbian written using the Latin script as used in Serbia)
Thus, for example, the following are among the expected values of @xml:lang for Japanese, incorporating both a language (“ja”) and a script type:
  • xml:lang="ja-Hira" (Japanese written in Hiragana)
  • xml:lang="ja-Hrkt" (Japanese written in Hiragana + Katakana)
  • xml:lang="ja-Jpan" (Japanese written in Han + Hiragana + Katakana)
  • xml:lang="ja-Hani" (Japanese written in Kanji (Hanzi, Hanja, Han))
  • xml:lang="ja-Kana" (Japanese written in Katakana)
OPTIONAL (defaults to en) on element: <book>
Value Meaning
An alphanumeric string, which may include hyphens An abbreviation for a natural language (such as “en” for English or “de” for German) or for a language and a script (“ja-Kana”).
Default value en
OPTIONAL on many elements; click for list and usage

<abbrev>, <abstract>, <ack>, <addr-line>, <address>, <aff>, <alt-text>, <alt-title>, <annotation>, <anonymous>, <answer>, <answer-set>, <app>, <app-group>, <array>, <article-title>, <attrib>, <author-comment>, <award-desc>, <award-group>, <award-id>, <award-name>, <bio>, <book-app>, <book-app-group>, <book-id>, <book-part>, <book-part-id>, <book-part-wrapper>, <boxed-text>, <caption>, <chapter-title>, <chem-struct>, <chem-struct-wrap>, <city>, <code>, <collab>, <collection-id>, <collection-meta>, <comment>, <conf-acronym>, <conf-date>, <conf-loc>, <conf-name>, <conf-num>, <conf-sponsor>, <conf-theme>, <conference>, <content-language>, <contrib-id>, <contributed-resource-group>, <copyright-holder>, <copyright-statement>, <corresp>, <country>, <custom-meta>, <custom-meta-group>, <data-title>, <date-in-citation>, <day>, <dedication>, <def>, <def-item>, <def-list>, <degrees>, <disp-formula>, <disp-formula-group>, <disp-quote>, <edition>, <element-citation>, <email>, <era>, <etal>, <event>, <event-desc>, <explanation>, <ext-link>, <fig>, <fig-group>, <fn>, <fn-group>, <foreword>, <fpage>, <front-matter-part>, <funding-group>, <funding-source>, <funding-statement>, <glossary>, <gov>, <graphic>, <index>, <index-div>, <index-entry>, <index-group>, <index-term>, <inline-formula>, <inline-graphic>, <inline-media>, <inline-supplementary-material>, <institution>, <institution-id>, <issue>, <issue-id>, <issue-part>, <issue-title>, <journal-id>, <kwd-group>, <label>, <legend>, <license>, <list>, <list-item>, <long-desc>, <lpage>, <media>, <milestone-end>, <milestone-start>, <mixed-citation>, <month>, <name>, <named-content>, <nav-pointer>, <note>, <notes>, <on-behalf-of>, <open-access>, <option>, <p>, <page-range>, <part-title>, <patent>, <person-group>, <postal-code>, <preface>, <prefix>, <preformat>, <price>, <principal-award-recipient>, <principal-investigator>, <product>, <pub-date>, <publisher-loc>, <publisher-name>, <question>, <question-preamble>, <question-wrap>, <question-wrap-group>, <rb>, <ref>, <ref-list>, <related-article>, <related-object>, <resource-group>, <resource-id>, <resource-name>, <role>, <rt>, <season>, <sec>, <see>, <see-also>, <see-also-entry>, <see-entry>, <self-uri>, <series>, <sig>, <size>, <source>, <speaker>, <speech>, <state>, <statement>, <std>, <std-organization>, <string-conf>, <string-date>, <string-name>, <styled-content>, <subj-group>, <subtitle>, <suffix>, <supplement>, <supplementary-material>, <support-description>, <support-group>, <support-source>, <table-wrap>, <table-wrap-group>, <target>, <term>, <textual-form>, <toc>, <toc-div>, <toc-entry>, <toc-group>, <trans-abstract>, <trans-source>, <trans-subtitle>, <trans-title>, <trans-title-group>, <unstructured-kwd-group>, <uri>, <verse-group>, <verse-line>, <version>, <volume>, <volume-id>, <volume-in-collection>, <volume-series>, <x>, <xref>, <year>

Value Meaning
An alphanumeric string, which may include hyphens An abbreviation for a natural language (such as “en” for English or “de” for German) or for a language and a script (“ja-Kana”).
Restriction @xml:lang is an optional attribute; there is no default.
Tagged Samples
The attribute at the top level, describing an entire book
<book
  xmlns:mml="http://www.w3.org/1998/Math/MathML"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  book-type="gov"
  dtd-version="2.1"
  indexed="yes"
  xml:lang="en">
 <book-meta>...</book-meta>
 <book-body>...</book-body>
 <book-back>...</book-back>
</book>
Within a citation, names the language of the book title (German) and a translated title (English)
...  
<ref>
 <element-citation publication-type="book">
  <person-group person-group-type="author">
   <name><surname>Hartmeier</surname>
    <given-names>Winifried</given-names></name>
  </person-group>
  <source xml:lang="de">Immobilisierte Biokatalysstoren</source>
  <trans-source xml:lang="en">Immobilized biocatalysts</trans-source>
  <publisher-loc>Berlin</publisher-loc>
  ...
 </element-citation>
</ref>  
...
Romanized Japanese name referred to as an “English” name
...
<name name-style="western" xml:lang="en">
 <surname>Sonoda</surname>
 <given-names>Naoko</given-names>
</name>
...