All Aboard!

Wendell Piez

JATS-Con 2015

April 21 2015

Contact: wapiez@wendellpiez.com

From JATS to HTML, and back again

From JATS to HTML, and back again

The general problem is ...

... but getting “nice, clean, pretty” JATS out of “messy, sloppy, ambiguous” HTML ...

Why the problem is a problem

(An outline of) a solution

No pressure

Crypto-JATS HTML

<div class="book-part-meta">
  <div class="title-group">
    <div class="title">Ongeont Douq Iukt</div>
  </div>
</div>
<div class="body">
  <p class="p">Ok u fvonaronan unz setoquqh lonqo...</p>
  <p class="p">Lankezoqenl tvo velv fqamumeseth aw u Hoqkeun....</p>

Where to hide the JATS in the HTML

We can determine which element types are block (div) or inline (span) based on discoverable information.

Everything shouldn't be div and span elements, should it? (Arguments for and against.)

We can improve mappings based on prior knowledge:

Since we will be looking at @class values not element types here, we have flexibility.

Producing HTML from JATS: down the hill

Rules are very simple:

Exceptions can be made for mappings in cases where “functional semantics” must be respected

The “exception layer” (XSLT) can sit on top of the “generic layer” (XSLT) for generality and reusability.

Representing JATS attributes in the HTML

First we tried cramming these into @class attributes also:

<book-part book-part-number="B1" book-part-type="chapter"> ... </book-part>

becomes

<div class="book-part book-part-number..B1 book-part-type..monograph">

Although this works (yes, it supports round-tripping), there were ... concerns ...

HTML5 offers a better solution

<div class="book-part" data-book-part-number="B1" data-book-part-type="monograph">

(Our HTML team preferred this, and who wouldn't?)

Plus there are exceptions for certain attributes with generalized or global semantics such as @id and @lang, which can be mapped directly.

Reading JATS back again (pulling back up)

“JATS sniffing”

Imperfect by design

Fun problems

As usual, much time can be spent working at the edges. (How good is good enough?)

Perhaps surprisingly, these did not bog us down.

What works

Constraining the HTML crypto format

“JATS Architectural Form”?

Validation in the application has its limits. We want to know of issues before runtime!

However, JATS models map into HTML only as co-occurrence constraints.

(Combinations of values of @class or other attributes, assigned to parents and children, with allowances.)

This can be complex! Especially projected into crypto-JATS.

Just for instance (constraining the HTML crypto-JATS)

For example, report (using Schematron) if any data-* attributes other than data-content-type or data-specific-use appears on div with @class of “disp-quote”*:

<rule context="*[a:classes(.='disp-quote')]">
  <let name="jats-attrs"       value="'content-type','specific-use'"/>
  <let name="wrong-data-attrs" value="a:data-attr(.)[not(@name=$jats-attrs)]"/>
  <assert test="empty($wrong-data-attrs)">
    Wrong data attribute found on 'disp-quote':
    <value-of select="string-join($wrong-data-attrs/@name,', ')"/>
  </report>
</rule>
<div class="disp-quote inner" data-inner-value="x"> ... </div>

(We wish to see an error because <disp-quote inner-value="x"> will be invalid in JATS.)

* JATS analogous rule: the only attributes permitted on disp-quote are @content-type and @specific-use (and others accounted for otherwise such as @id and @xml:lang)

“Ascetic HTML”

“Our discipline is strict so that life may be easy” (Idries Shah)

Externalizing the problem

Will HTML-based systems buy it?

Yes to the extent we can make things easier for them ...?

Yes this means an HTML (subset) schema! and probably other semi-auto-generated tools as well ...?

  • RelaxNG framework for basic structures
  • Schematron (generated via schema inspection/query)
  • CSS as validation technology? (Using fallbacks to detect unwarranted markup)
  • Eric van der Vlist's Examplotron
  • Use an XML database to provide examples to test against?
  • Generate a schema from the home (JATS) schema?
  • A stopgap: run a mini-application (XProc) to perform transformation and validate to JATS directly

Is it worth the effort?

Can we have our JATS cake and eat it too?

Acknowledgements