MathML and JATS

The 2-minute Version

  • JATS tag sets available with MathML 2.0 and MathML 3.0: All three JATS Tag Sets (Archiving, Authoring, and Publishing) come in two different versions: one using MathML 2.0 and one using MathML 3.0. Each JATS user must select one of these models, even if they do not use MathML at all!
    In the JATS family: NISO STS comes in the same two versions requiring the same choice. BITS uses only MathML 3.0.
  • Why?: MathML 2.0 and MathML 3.0 are not compatible. The differences between MathML 2.0 and MathML 3.0 are such that XML created using MathML 2.0 is not guaranteed to be valid in MathML 3.0. Some users of MathML 2.0 report that moving from MathML 2.0 to MathML 3.0 would necessitate re-proofing all of their mathematical expressions, because they fear that some math expressions that render as desired using MathML 2.0 might not look the same if rendered from MathML 3.0.
  • Using MathML 2.0 and 3.0 together: There is not now, nor is there likely to be, a JATS Tag Set that includes both MathML 2.0 and MathML 3.0 in a single DTD. MathML 2.0 and MathML 3.0 are not intended to be used together; among other challenges, they use the same namespace URI.
  • Best Practice: If a JATS user has a choice between MathML 2.0 and 3.0, or does not intend to use MathML, they should select a JATS model that uses MathML 3.0, in case they ever need MathML in the future. MathML 3.0 is more complete and correct than MathML 2.0. The MathML 2.0 tag set was only included in JATS to accommodate JATS users who were already using MathML 2.0 and have a backfile of MathML 2.0.

Abbreviated MathML History

(With thanks to Inera, whose blog is paraphrased here)
A brief timeline for MathML
  • MathML 1.0 W3C Recommendation in 1998
  • MathML 2.0 (Second edition) in 2003
  • MathML 3.0 (Second edition) in 2014
  • MathML 4.0 in process in 2019
Minor Complication: While MathML 1.0 came out as a DTD, later releases were delivered as W3C XSD Schemas which were then converted to DTD form for DTD users. The XSD Schemas enforce rules the DTDs cannot, and some of the early-release DTDs did not even enforce restrictions such as ID-type attributes that DTDs are able to enforce.

MathML 1.0 to MathML 2.0 Backward Compatibility

As reported by Inera: “MathML 2.0 was fully backwards-compatible with MathML 1.0, such that an equation validated to the MathML 1.0 DTD would also be valid when parsed with the MathML 2.0 DTD.”

MathML 2.0 to MathML 3.0 Backward Compatibility

Unfortunately, MathML 2.0 and MathML 3.0 are not backward compatible. An equation valid to MathML 2.0 may or may not be valid to MathML 3.0. Many, possibly all, of the attributes, attribute, values, and practices that were deprecated in MathML 2.0 have been removed from MathML 3.0. Therefore, the more closely your MathML 2.0 followed the documentation about best-practices and deprecated attributes, and the more error-free your MathML 2.0 was, the higher the likelihood that your equation will convert validly.

Some tag set differences

The two tag sets are different in several ways including (an illustrative, not an exhaustive list):
  • In MathML 2.0, some attributes and constructions were “deprecated” in the recommendation. These deprecated components were removed in MathML 3.0 For example:
    • The @name attribute on the element <mml:math> was used to provide a human-readable name for an equation. There is no @name attribute in MathML 3.0.
    • The @type attribute on <mml:math> only exists in MathML 2.0.
  • Some attribute values lists that were CDATA (any text values accepted) in MathML 2.0 have become constrained value lists in MathML 3.0.
  • Some of the styling, sizing, and spacing parameters take slightly different value lists or have different defaults in MathML 3.0 than they had in MathML 2.0.
  • Some constructs are modeled differently between the two (e.g., fractions).
Would this be a problem? How much MathML 2.0 would not be valid MathML 3.0? We could not find solid statistics, but when PubMed Central ran such a test on their MathML-2-encoded journal articles, they found thousands of valid MathML 2.0 equations that had XML validation errors when parsed with the MathML 3.0 DTD. It is not known how many additional equations may have been DTD-valid, but incorrect in MathML 3.0.

Converting MathML 2.0 to MathML 3.0

There is no tool to convert MathML 2.0 to MathML 3.0 automatically, because many of the situations in which MathML 2.0 cannot programmatically be converted to MathML 3.0 represent errors in the MathML 2.0 coding, that coincidentally happen to render properly or for which the rules were not checked by the grammar. These errors would need to be corrected, which is unlikely to be a 100% programmable process. Even if it were automatable, the correction might alter the rendition and hence require reproofing.
  • The tightening of attribute list values is the one most problematic aspects for conversion. For example, the attribute @mathvariant on the element <mml:math> used to be unconstrained; any text value was acceptable, and there was no way to validate that the value used had any meaning. The MathML 3.0 values for @mathvariant are a constrained list. So if a bad typist keyed the value
    <mml:mi mathvariant="blood-italic">Q<mml:mi>
    instead of the correct value of
    <mml:mi mathvariant="bold-italic">Q<mml:mi>
    the MathML 2.0 equation sailed though processing without a hitch. This error has always been a math coding error, even in MathML 2.0, but not an error that parsing MathML 2.0 would catch, and while it probably does have formatting consequences, they might be missed in proofing. If this <mml:mi mathvariant="blood-italic"> were converted to MathML 3.0, the error would be caught by the parser, and the equation would be invalid. The fix might not be as obvious, nor as programmatically convertible, as this example.
  • If a list of attribute values used to have 6 choices and now has 5, maybe the correct conversion of the value that is now illegal can be accomplished programmatically, and maybe it takes human judgment.
  • Some differences between MathML 2.0 and MathML 3.0 could be auto-converted easily in the DTD or schema, but the changes might create havoc in later stages of processing. For example, both the @name and @type attributes on <mml:math> could be discarded easily during conversion, and then the equation might parse cleanly. But a major reason to assign names and types is to base processing or display on the values. The equation is now fine, but the system may break.
Whether your MathML 2.0 will convert cleanly to MathML 3.0 may depend on the tool or process used to create the math. As soon as possible after MathML 3.0 came out, some vendors of math editing software changed their products to avoid the deprecated parts of MathML 2.0, so their MathML 2.0 should be compatible with MathML 3.0’s “strict math” approach. However in other cases, MathML 2.0 may have been created by manual keying or automated conversion, and keying/production instructions in the workflow may not have had these consistency features added.