xml:lang

Language

The language of the intellectual content of the element for which this is an attribute.

Typical values are described by the IETF RFC 5646, two-letter, lower-case language codes such as “ fr ” (French), “ en ” (English), “ de ” (German), and “ zh ” (Chinese). These values are NOT case sensitive, but current best practice uses all lower case. Values can be obtained from the IANA Language Subtag Registry: http://www.iana.org/assignments/language-subtag-registry.

Remarks

Inheritance: The language value inherits down the tree, so an @xml:lang attribute names the language of the element and all its descendants, unless the descendant sets its own @xml:lang attribute. The default value of English (“ en ”) is set at the top-level element, and can be over-ridden there or anywhere lower in the document.

Script and Language: In some languages, script codes are also critically important; for example, in Japanese, there is the need to express whether a name is in Kanji as opposed to in Kana (Hiragana or Katakana) to determine sort keys. Best practice is to use the full language-code-plus-script-code as the value for @xml:lang. In our use of both language and script tagging as values for @xml:lang, we are following the IETF (Internet Engineering Task Force) best practice guideline: Network Working Group Request for Comments: 5646 [Tags for Identifying Languages, A. Phillips and M. Davis, Editors, September 2009]. That document defines a language tag as composed of (in part):

  1. A language code Language (typically using the shortest ISO 639)
  2. Potentially followed by a hyphen and then a script code script (using the ISO 15924 code)
  3. Potentially followed by a hyphen and a region code region (using the ISO 15924 code)

Some sample values of @xml:lang for Chinese and Serbian illustrate this complexity:

Thus, for example, the following are among the expected values of @xml:lang for Japanese, incorporating both a language (“ ja ”) and a script type:

Attribute Values

In Element

<article>
Value Meaning
An alphanumeric string, which may include hyphens An abbreviation for a natural language (such as “en” for English or “de” for German) or for a language and a script (“ ja-Kana ”)
Default value: en

In Elements

<abbrev>, <abstract>, <ack>, <addr-line>, <address>, <aff>, <alt-text>, <annotation>, <anonymous>, <app>, <app-group>, <article-title>, <attrib>, <author-comment>, <award-group>, <award-id>, <bio>, <boxed-text>, <caption>, <chapter-title>, <chem-struct>, <chem-struct-wrap>, <collab>, <comment>, <conf-acronym>, <conf-date>, <conf-loc>, <conf-name>, <conf-num>, <conf-sponsor>, <conf-theme>, <conference>, <copyright-holder>, <copyright-statement>, <country>, <date-in-citation>, <day>, <def>, <def-item>, <def-list>, <degrees>, <disp-formula>, <disp-formula-group>, <disp-quote>, <edition>, <element-citation>, <email>, <etal>, <ext-link>, <fig>, <fn>, <fn-group>, <fpage>, <funding-group>, <funding-source>, <funding-statement>, <glossary>, <gov>, <graphic>, <inline-formula>, <inline-graphic>, <inline-supplementary-material>, <institution>, <issue>, <issue-id>, <issue-part>, <issue-title>, <journal-id>, <kwd-group>, <label>, <license>, <list>, <list-item>, <long-desc>, <lpage>, <media>, <mixed-citation>, <month>, <name>, <named-content>, <nlm-citation>, <on-behalf-of>, <open-access>, <p>, <page-range>, <part-title>, <patent>, <person-group>, <prefix>, <preformat>, <price>, <principal-award-recipient>, <principal-investigator>, <product>, <publisher-loc>, <publisher-name>, <ref>, <ref-list>, <related-article>, <related-object>, <role>, <season>, <sec>, <self-uri>, <series>, <size>, <source>, <speaker>, <speech>, <statement>, <std>, <string-name>, <styled-content>, <subtitle>, <suffix>, <supplement>, <supplementary-material>, <table-wrap>, <target>, <term>, <textual-form>, <trans-source>, <trans-title>, <uri>, <verse-group>, <verse-line>, <volume>, <volume-id>, <volume-series>, <xref>, <year>
Value Meaning
An alphanumeric string, which may include hyphens An abbreviation for a natural language (such as “en” for English or “de” for German) or for a language and a script (“ ja-Kana ”)
Restriction: This attribute may be specified if the element is used.