Introduction

Class attributes, sometimes known by different names and with slightly different purposes, are ubiquitous in all common vocabularies. In DocBook they are called “role”, in TEI they are called “type” or “rend”, in HTML and DITA they are actually called “class”, and in JATS they are called “content-type”, “list-type”, “sec-type”, etc. With the exception of TEI’s @type attribute, they may not contain just a single value, but space-separated tokens. TEI is special in that it distributes what goes into @class over (at least) four attributes, @type, @subtype, @rend and @rendition. In the generic attributes, for example @role, @class, and @content-type, each of these tokens can mean anything. Sometimes (often) they influence whether or how the element is displayed or converted. In TEI, this is the mainly the purpose of the @rend attribute.

There are typically no strict sets of available values. If there were, the vocabulary might as well offer a dedicated element with that name. (Customized schemas may restrict these values though.)

DITA, in this regard, is quite extreme, as it moves much of the semantics from the element and attribute names into class attribute tokens. Processing documents with many space-separated class attribute tokens can incur a significant performance penalty, as John Lumley presented at XML London in 2015 [Lumley Kay 2015].

Creating class attributes, on the other hand, is not particularly difficult or demanding in terms of computing power. So why this paper?

In his JATS-Con 2010 contribution [Piez 2010], Wendell Piez described two lanes of “vertical customization“ for what would later become the JATS Preview Stylesheets: One lane for CSS and the other for XSLT customizations. He calls them vertical because both pile overridden or additional rules upon off-the-shelf basic rules. CSS adaptation is often the first customization method at hand, he argues, but a diverse repertoire of XSLT customizations is often needed when CSS means have been exhausted.

Class tokens are the most widespread and versatile tools available when it comes to applying CSS formatting to content. Therefore generating class tokens with XSLT can be seen as the middle ground between pure CSS styling and more elaborate XSLT content transformations.

So one should assume that manipulating class attributes can be done without writing too much code. This may be true if “code” means “new code”, but, as we will see when examining popular XML to HTML conversion stylesheets, it often involves copying, and only slightly changing, sometimes large extents of existing code, which can become a maintenance burden.

This paper is primarily about writing customizable stylesheets with a particular focus on “don’t repeat yourself” or, more specifically, on avoiding a copy&paste approach of re-defining large functions or templates in the importing stylesheets. The techniques presented in this paper are not particularly new, but apparently they are underappreciated in this problem space. They all work with XSLT 2. They are about fine-grained templates that match in auxiliary, aspect-oriented modes, where one “aspect” can be “creating class attributes.”

Most conversions from one content-centric XML application to another follows rule-based and sometimes computational programming patterns, as described in Chapter 17 of Michael Kay’s XSLT and XPath 2.0 book [Kay XSLT 2.0]. These are appropriate and versatile techniques to complete the task. However, it turns out that templates or functions in these stylesheets are often too coarse-grained to allow, for example, the addition of a custom, computed token to the class list of a result element. We will look at these templates or functions and occasionally suggest tweaks to the off-the-shelf stylesheets so that they offer better customization hooks to the custom, importing stylesheets.

This is not so much about rekindling a push vs. pull approach discussion. It is taken for granted that the commonly used XSLT stylesheets here operate in push mode (the transformation is driven by the source document). It is more about offering finer-grained, context-dependent customization hooks for rule-based push-mode templates or computational functions/named templates.

The proposed hooks are almost exclusively transformations of the context element in a certain XSLT mode, for example a mode that computes class attribute tokens from attributes and class attributes with token lists from elements, by processing their attributes.

The overall approach suggested in this paper is not limited to class attribute generation. It can also be useful, for example, when mapping element names between XML applications or in determining whether a given element is meant to be rendered inline.

Common Vocabularies and Conversion Stylesheets

We will examine in more or less detail how the class generation templates or functions work for certain widely used conversion stylesheets. We will in particular test whether it is easy to modify, by means of xsl:import and overriding templates or functions, the class generation in certain ways:

  • Add a token that represents the source element’s (local) name;

  • make sure that this source element name token will be created even in the absence of the source attribute that will become a class attribute by default;

  • add a token for a subtype that stems from another attribute (and potentially suppress processing this attribute otherwise);

  • add a token that is somehow calculated by processing nodes relative to the context (example: a table row shall get a class token that toggles whether the row is hidden);

  • for certain elements, suppress certain values of the class token that is produced by default.

DocBook to HTML

We transform this source document:

<section xmlns="http://docbook.org/ns/docbook" version="5.1">
  <title>Title</title>
  <para role="foo">Para</para>
</section>

using the XSLT 2.0 version of the DocBook stylesheets [DocBook XSLT 2.0].

The paragraph will be output as:

<p class="foo">Para</p>

If the input is a simpara, the output will be the same. Now suppose that you generate the HTML for a browser-based XML editor (that actually edits HTML and transforms it back to DocBook) that offers different context-dependent formatting controls for para and simpara. You want to add a token with the element’s local name to the class list in order to invoke the appropriate controls. You look at the template that happens to render both elements:

<xsl:template match="db:para|db:simpara">
  <xsl:param name="runin" select="()" tunnel="yes"/>
  <xsl:param name="class" select="()" tunnel="yes"/>
  <!-- irrelevant parts left out -->
  <p>
    <xsl:sequence select="f:html-attributes(., @xml:id, $class)"/>
    <xsl:copy-of select="$runin"/>
    <xsl:apply-templates/>
  </p>
</xsl:template>

The function f:html-attributes() accepts a string $class that will be added to the class attribute tokens, which is good. So in the importing stylesheet, one can use this template:

<xsl:template match="db:para|db:simpara">
  <xsl:next-match>
    <xsl:with-param name="class" as="xs:string" select="local-name()" tunnel="yes"/>
  </xsl:next-match>
</xsl:template>

For structural divisions such as chapter, appendix, or section, nothing has to be adapted. The resulting HTML element has both the local name and any @role that they may have as a class token. The same holds for admonitions such as caution.

<caution role="bar">
  <para>Caution</para>
</caution>

will become

<div class="caution bar admonition">
  <h3>Caution</h3>
  <div class="admonition-body">
    <p class="para">Caution</p>
  </div>
</div>

Suppose that you want to add the content of the @condition attribute to the resulting class tokens. You will notice that

<caution role="bar" condition="foo">
  <para>Caution</para>
</caution>

yields a result that is identical to the one we saw above. How can we make the token(s) in @condition also appear in the class list of the resulting div?

The class attribute that is generated for admonition-like elements is populated by these two nested function calls:

<xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
                        f:html-extra-class-values(., 'admonition'))"/>

If we want to add the value of condition to the class list, we could override the 15-line-long admonition template:

<xsl:template match="db:note|db:important|db:warning|db:caution|db:tip|db:danger">
  <xsl:choose>
    <xsl:when test="$admonition.graphics">
      <xsl:apply-templates select="." mode="m:graphical-admonition"/>
    </xsl:when>
    <xsl:otherwise>
      <div>
        <xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
                                f:html-extra-class-values(., 'admonition'))"/>
        <xsl:call-template name="t:titlepage"/>
        <div class="admonition-body">
          <xsl:apply-templates/>
        </div>
      </div>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

with this one:

<xsl:template match="db:note|db:important|db:warning|db:caution|db:tip|db:danger">
  <xsl:choose>
    <xsl:when test="$admonition.graphics">
      <xsl:apply-templates select="." mode="m:graphical-admonition"/>
    </xsl:when>
    <xsl:otherwise>
      <div>
        <xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
                                f:html-extra-class-values(., ('admonition', @condition)))"/>
        <xsl:call-template name="t:titlepage"/>
        <div class="admonition-body">
          <xsl:apply-templates/>
        </div>
      </div>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

We are displaying the full templates here in order to give an impression of the redundancy incurred. In the author’s experience, copy&paste of lengthy functions or templates poses a significant maintenance burden. New features or bugfixes in the overridden code will not make it to the adapted code unless someone takes the time and compares/transfers the changes.

An alternative approach within the boundaries of the existing stylesheet design is to override the f:html-extra-class-values() or f:html-attributes() functions. Since the context element is the first argument for both, one could build in a switch that checks whether the context is an admonition and then add @condition to the resulting @class. Or one could do this change for all contexts, but that might introduce undesired class attribute tokens in other places.

Overriding a generic function or a named template in order to effect specific changes for certain contexts is not advisable, for the reason given above (maintainability). If the task at hand is to process documents in a context-dependent fashion, matching templates, rather than named templates or functions, naturally suggest themselves. And there is a straightforward way that allows keeping the current function call mechanism without the need to bloat the functions’ bodies with context-dependent conditionals. We’ll look at that in section “Making the DocBook to HTML Conversion More Extensible”.

TEI to HTML

As hinted at in section “Introduction”, TEI differentiates between types (including subtypes) and rendering information. When it comes to generating @class and @style attributes, the TEI XSL Stylesheets [TEI XSL] primarily look at rendering information that may be declared at various locations in the document, and it adds the element’s local name to the class tokens by default. The strategy is described in the code documentation for the named template makeRendition as follows:

Work out rendition. In order of precedence, we first look at @rend; if that does not exist, @rendition and @style values are merged together; if neither of those exist, we look at default renditions in tagUsage; if default is set to false, we do nothing; if default has a value, use that for @class; otherwise, use the element name as a value for @class.

The template makeRendition will call a function, tei:processRend() for processing what’s in @rend, and in its absence it will look at @rendition and @style and use different functions to compute the HTML attributes @class and @style.

Although the @rend attribute may contain arbitrary text, tei:processRend() has very specific expectations about the @rend tokens that it will map to @class tokens. If none of the mappings catches on, the original token will be forwarded to the class attribute. However, it is not possible to add computed class tokens unless one is willing to override tei:processRend(), makeRendition, or the templates that invoke makeRendition. They are sometimes as compact as

<xsl:template match="tei:gloss">
  <span>
    <xsl:call-template name="makeRendition"/>
    <xsl:apply-templates/>
  </span>
</xsl:template>

but there might be many of them that need customizing.

Given the gloss example input from the TEI P5 Guidelines [TEI gloss]:

We may define <term xml:id="tdpv" rend="sc">discoursal point of view</term> as <gloss target="#tdpv">the relationship, expressed through discourse structure, between the implied author or some other addresser, and the fiction.</gloss>

The output contains <span class="gloss">….

If the input has a rend="foo" attribute on gloss, the output will be <span class="foo">….

In order to retain the token gloss in the span’s class list, one needs to override the template in an importing stylesheet:

<xsl:template match="tei:gloss">
  <span>
    <xsl:call-template name="makeRendition">
      <xsl:with-param name="auto" select="local-name()"/>
    </xsl:call-template>
    <xsl:apply-templates/>
  </span>
</xsl:template>

For simple templates this overriding is acceptable. However, makeRendition is called 70 times, and some of the calling templates comprise more than 50 lines of code. So if overriding makeRendition globally is not an option, potentially much redundancy will be created because the stylesheets lack granularity or customization hooks.

So far we have looked at the rendering attributes, but what about @type and subtype? If they are used on gloss, they don’t show up in the result. We can add them to the class list in the same customized template that we used for local-name().

For other typed elements, such as TEI’s div, the @type attribute will be converted to a @class attribute. However, the subtype is missing in the resulting HTML div’s class list.

The classes of div-like elements will be created by a template named divClassAttribute. It will call the known template makeRendition with the (non-tunneling) default parameter set to the value of the div’s @type attribute. One needs to redefine divClassAttribute (24 lines of code) in order to add @subtype. If one wanted to add it only for selected contexts, one would need to introduce conditionals in this otherwise generic template, blowing up the generic template even more—unless the template provided a hook that allows context-aware creation of the class token list.

At least these generic named templates make sure, for selected elements, that there is a hook for overriding default class attribute generation. The situation would be worse if class generation relied on matching and transforming existing @type or @rend attributes, when in their absence there wouldn’t be a class attribute at all. But similar to the generic functions in section “DocBook to HTML”, if one wants to supply non-default computed class tokens, one needs to either redefine possibly many matching templates from which the generic templates/functions are called or add conditional logic to the generics.

JATS to HTML

The JATS Preview Stylesheets [JATS XSL] are less elaborate than the DocBook or TEI stylesheets. They accept fewer parameters on invocation, they don’t support chunking, etc. As Tony Graham describes in his paper about customizing the XSL-FO rendering, this is so on purpose. “Deliberately not supporting every possible style permutation hasn’t precluded the JATS Preview stylesheets from supporting other people customizing the stylesheets nor does it stop you from using the stylesheets as a base for customized output.” [Graham 2014]

This is exactly the customization by xsl:import and overriding templates that the current paper is about. So how well do the JATS Preview Stylesheet fare?

There are two templates that process elements with @content-type:

<xsl:template match="p | license-p">
  <p>
    <xsl:if test="not(preceding-sibling::*)">
      <xsl:attribute name="class">first</xsl:attribute>
    </xsl:if>
    <xsl:call-template name="assign-id"/>
    <xsl:apply-templates select="@content-type"/>
    <xsl:apply-templates/>
  </p>
</xsl:template>

and

<xsl:template match="named-content">
  <span>
    <xsl:for-each select="@content-type">
      <xsl:attribute name="class">
        <xsl:value-of select="translate(.,' ','-')"/>
      </xsl:attribute>
    </xsl:for-each>
    <xsl:apply-templates/>
  </span>
</xsl:template>

The class tokens for named-content cannot be extended (for example, with a token 'named-content') without rewriting the complete template. It’s not difficult at all but it implies redundant replication of (more or less) complex functionality nonetheless.

In the case of the template that matches JATS’s p element, there is an <xsl:apply-templates select="@content-type"/>, which does nothing by default, so this serves as a hook for custom processing in the importing stylesheets. If one added

<xsl:template match="p/@content-type">
  <xsl:attribute name="class" select="."/>
</xsl:template>

in the customization, one would lose the first token that is created by default for paragraphs without predecessor. So this is extra functionality that one needs to copy from the original template. That’s not too much of redundancy because the code snippet is small, but still this is redundancy that will impede maintenance.

Apart from the different type attributes that JATS elements may have, almost every element may carry a @specific-use attribute. It is somewhat similar to DocBook’s @condition attribute. It may hold information that, among other purposes, the content is for a limited audience, for a specific output format, etc. This kind of information might be used by a rendering process to filter out content, for instance. But in the author’s experience, more often than not, what is in @specific-use is useful in the HTML @class attribute, too. For example it might be required that content marked specific-use="optional" not be removed during the rendering, it rather be collapsed by CSS or Javascript, based on a the class token “optional”.

However, there is no hook in the preview stylesheets that would allow customizations to include more tokens in the @class attribute. Other attributes, such as @list-content or milestone/@rationale are candidates for inclusion in class token lists, too.

We didn’t see the amount of monolithic templates, underequipped with hooks, in the JATS Preview Stylesheets as we saw in the TEI stylesheets, but there is room for improvement in the JATS matching templates, too.

Problem Summary

The problems with the HTML converters presented in this section fall into one or more of these classes:

  • Where functions or named templates are responsible for creating class attributes:

    • They are sometimes complex.

    • Overriding them creates redundancy.

    • Hooks for context-aware processing are rarely provided.

  • Where matching templates are responsible for transforming source elements:

    • They are sometimes complex.

    • Overriding them creates redundancy.

    • A requirement to create class attributes differently for many elements entails changing matching templates for many elements.

  • Where matching templates are responsible for transforming source attributes:

    • If the respective source attribute is missing, no class attribute will be generated.

    • That makes it hard to generally add, for example, the source element’s name to the resulting class list

Addressing the Problems (and some more)

Transform Elements in a Dedicated Class Attribute Mode

Yes, that’s proposed here as a fix for almost all issues listed in the previous section.

Let’s refactor the first template in section “JATS to HTML”:

<xsl:template match="p | license-p">
  <p>
    <xsl:apply-templates select="." mode="class-att"/>
    <xsl:call-template name="assign-id"/>
    <xsl:apply-templates/>
  </p>
</xsl:template>

The conditional first token creation has disappeared. Instead, the element itself is transformed in a newly introduced mode called class-att.

The default template for all elements in this mode is:

<xsl:template match="*" mode="class-att" as="attribute(class)?">
  <xsl:call-template name="make-class">
    <xsl:with-param name="tokens" as="xs:string*">
      <xsl:apply-templates select="@content-type, @list-type, @list-content, 
                                   @rationale, @sec-type, @specific-use" mode="#current"/>
    </xsl:with-param>
  </xsl:call-template>
</xsl:template>

By default, it transforms all kinds of …-type attributes in the same class-att mode. (This is not an exhaustive list of all possible attributes that may end up as class tokens, just the frequently occurring ones and some marginally important examples.)

By default, each of these attributes will become a string with the same value as the attribute:

<xsl:template match="@*" mode="class-att" as="xs:string">
  <xsl:sequence select="string(.)"/>
</xsl:template>

These strings, if there are any, will be joined into a space-separated list of tokens in a newly created class attribute:

<xsl:template name="make-class" as="attribute(class)?">
  <xsl:param name="tokens" as="xs:string*"/>
  <xsl:if test="exists($tokens[normalize-space()])">
    <xsl:attribute name="class" separator=" "
      select="distinct-values($tokens[normalize-space()])"/>
  </xsl:if>
</xsl:template>

Within this framework, it is now possible to selectively add the first token to paragraphs that lack predecessors, without the need to insert a conditional statement into the template that processes them in default mode:

<xsl:template match="  p[not(preceding-sibling::*)]
                     | license-p[not(preceding-sibling::*)]"
              mode="class-att" as="attribute(class)?">
  <xsl:attribute name="class" separator=" ">
    <xsl:sequence select="'first'"/>
    <xsl:next-match/>
  </xsl:attribute>
</xsl:template>

The xsl:next-match instruction will look for the next matching template in the same mode, be it imported (lower import precedence) or be it in the same stylesheet and have lower priority. In this case, it is the template that matches all elements in class-att mode. This next-matching template will produce a @class attribute that will be cast to a string. The current template will prepend string 'first' and then turn this sequence of strings into a space-separated value for a newly generated @class attribute.

Using the powerful and elegant xsl:next-match instruction, the resulting class list can be selectively extended. Suppose you want to include the name of license-p elements in the @class attribute, in order to be able to style it specially (this is a contrived example because former license-ps will end up in what can be selected in CSS by div.metadata-chunk p, so there is no need to style p.license-p by its own class, but the fundamental utility of this lightweight ex-post class token decoration should be evident at this point):

<xsl:template match="license-p" mode="class-att" as="attribute(class)?">
  <xsl:attribute name="class" separator=" ">
    <xsl:sequence select="name()"/>
    <xsl:next-match/>
  </xsl:attribute>
</xsl:template>

The computed priority of this template, according to the XSLT specification [XSLT 3 priority], is 0. The template that matches * has the computed priority −0.5, while the template that adds 'first' has a predicate in the matching pattern and therefore its computed priority is +0.5. One would expect that for a license-p, the first token will be 'first', followed by 'license-p', followed by the attribute values of @content-type, @specific-use, etc. Transforming this input:

<license>
  <license-p specific-use="bar baz" content-type="foo">© 2020 Jane Smith</license-p>
</license>

will indeed yield this output:

<div class="metadata-area">
  <p class="metadata-entry"><span class="generated">License: </span></p>
  <div class="metadata-chunk">
    <p class="first license-p foo bar baz" id="d2e13">© 2020 Jane Smith</p>
  </div>
</div>

(The value of @content-type precedes the value of @specific-use because of the order-preserving sequence concatenation operator (comma) in "@content-type, @list-type, …, @specific-use").

It is quite easy to filter out certain tokens of, for example, @specific-use:

<xsl:template match="license-p/@specific-use" mode="class-att" as="xs:string?">
  <xsl:variable name="orig" as="xs:string?">
    <xsl:next-match/>
  </xsl:variable>
  <xsl:sequence select="tokenize($orig)[not(. = 'baz')]"/>
</xsl:template>

Or if the tokens 'foo' and 'bar' shouldn’t appear in the class attribute no matter where they came from:

<xsl:template match="license-p" mode="class-att" priority="1">
  <xsl:variable name="orig" as="attribute(class)?">
    <xsl:next-match/>
  </xsl:variable>
  <xsl:call-template name="make-class">
    <xsl:with-param name="tokens" select="tokenize($orig)[not(. = ('foo', 'bar'))]"/>
  </xsl:call-template>
</xsl:template>

Please note that if you are going to use this in XSLT 2, you might need to replace tokenize($orig) with tokenize($orig, '\s+'). Recent versions of Saxon [Saxon], however, will accept XPath 3.1 functions even if the stylesheet’s XSLT version is 2.0.

Also note, and this is an important thing to remember for XSLT novices, that if you use this priority 1 template in an importing stylesheet, you may safely omit the priority attribute (unless there are other priority clashes you need to address). This is because templates that match the same items always have precedence when they occur in importing stylesheets. They have a higher import precedence [XSLT 3 precedence], which always trumps priority.

Although this solution uses a named template, too, this named template is not monolithic at all. It merely creates class attributes from tokens. Also the matching templates in this solution are less “ambitious” and more fine-grained than the original templates.

Making the DocBook to HTML Conversion More Extensible

Let’s modify the admonition template of section “DocBook to HTML”, using a newly introduced m:extra-class-values mode:

<xsl:template match="db:note|db:important|db:warning|db:caution|db:tip|db:danger">
  <xsl:choose>
    <xsl:when test="$admonition.graphics">
      <xsl:apply-templates select="." mode="m:graphical-admonition"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="extra-class-values" as="xs:string*">
        <xsl:apply-templates select="." mode="m:extra-class-values"/>
      </xsl:variable>
      <div>
        <xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
                                f:html-extra-class-values(., $extra-class-values))"/>
        <xsl:call-template name="t:titlepage"/>
        <div class="admonition-body">
          <xsl:apply-templates/>
        </div>
      </div>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

But that’s not the final state of optimization for greater maintainability. Let’s take this change back and call a f:html-extra-class-values() function that only takes the context node as the single argument:

        <xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
                                f:html-extra-class-values(.))"/>

The function was previously declared as follows:

<xsl:function name="f:html-extra-class-values" as="xs:string?">
  <xsl:param name="node" as="element()"/>
  <xsl:sequence select="f:html-extra-class-values($node, ())"/>
</xsl:function>

<xsl:function name="f:html-extra-class-values" as="xs:string?">
  <xsl:param name="node" as="element()"/>
  <xsl:param name="extra" as="xs:string*"/>

  <xsl:variable name="classes" as="xs:string*">
    <xsl:if test="$node/@role">
      <xsl:sequence select="tokenize($node/@role, '\s+')"/>
    </xsl:if>
    <xsl:if test="$node/@revision">
      <xsl:sequence select="concat('rf-', $node/@revision)"/>
    </xsl:if>
    <xsl:sequence select="$extra"/>
  </xsl:variable>

  <xsl:if test="exists($classes)">
    <xsl:sequence select="string-join(distinct-values($classes), ' ')"/>
  </xsl:if>
</xsl:function>

The class attributes can only be overridden in a context-dependent way if a different $extra argument is passed in each context. If we want to use the @condition tokens for the @class attribute in caution but not in the other admonition elements, we need to insert a conditional switch in the admonition template or clone this template and modify the clone only for caution.

The following refactoring will replace the single-argument invocation and also implement the current behavior of the two-argument invocation for admonitions:

<xsl:template match="*" mode="m:extra-class-values">
  <xsl:apply-templates select="@*" mode="#current"/>
</xsl:template>

<xsl:template match="db:note | db:important | db:warning | db:caution | db:tip | db:danger"
  mode="m:extra-class-values" as="xs:string*">
  <xsl:sequence select="'admonition'"/>
  <xsl:next-match/>
</xsl:template>

<xsl:template match="@*" mode="m:extra-class-values"/>

<xsl:template match="@role" mode="m:extra-class-values" as="xs:string+">
  <xsl:sequence select="tokenize(.)"/>
</xsl:template>

<xsl:template match="@revision" mode="m:extra-class-values" as="xs:string">
  <xsl:sequence select="concat('rf-', .)"/>
</xsl:template>

<xsl:function name="f:html-extra-class-values" as="xs:string*">
  <xsl:param name="node" as="element()"/>
  <xsl:variable name="tokens" as="xs:string*">
    <xsl:apply-templates select="$node" mode="m:extra-class-values"/>
  </xsl:variable>
  <xsl:sequence select="distinct-values($tokens[normalize-space()])"/>
</xsl:function>

This is more code than it was initially, but if additional class attributes are needed in certain contexts, the additional templates in the importing stylesheet are much simpler:

<xsl:template match="db:caution/@condition" mode="m:extra-class-values">
  <xsl:sequence select="tokenize(.)"/>
</xsl:template>

The important change that makes previously inflexible functions or named templates versatile is to let a template match in a dedicated mode from within the function or named template body.

Improving the TEI to HTML Conversion

The monolithic template makeRendition in section “TEI to HTML” can be refactored in a similar way. Inside the template, or inside other monolithic functions called from there, such as tei:processRendition(), the context element will be transformed in a dedicated mode. In order to indicate that the refactored function (or the named template) corresponds to the dedicated mode, the mode’s name may be identical to the function name. This is no technical requirement, but the author recommends that you follow this convention.

Also functions that do other things than generating class attributes may be refactored in this way; particularly tei:isInline(), a function that accepts an element as its argument and decides whether this element is inline or block-level, can replace its 128 xsl:when branches with matching templates in mode="tei:isInline" while still keeping the same function signature.

This way, TEI customizations that use other @rend attribute values than 'display' or 'block' can be declared block elements in a customization, without redefining this 140-lines-long function. This xsl:when clause:

<xsl:when test="tei:match(@rend,'display') or tei:match(@rend,'block')">false</xsl:when>

will become:

<xsl:template match="*[tei:match(@rend,'display') or tei:match(@rend,'block')]" mode="tei:isInline">
  <xsl:sequence select="false()"/>
</xsl:template>

and can be extended with:

<xsl:template match="*[tei:match(@rend,'list-item')]" mode="tei:isInline">
  <xsl:sequence select="false()"/>
</xsl:template>

in an importing stylesheet.

Note

In order to mimic the previous xsl:choose/xsl:when behavior, it might be necessary to add explicit priorities to some of the matching templates in tei:isInline mode.

Mapping Element Names

This is an example of a LaTeXML to TEI conversion. Most source elements can be mapped to target elements in a linear fashion. The mapping may be either coded into a function with many case switches, or it can be done by matching templates:

<xsl:template match="*" mode="latexml2tei">
  <xsl:variable name="new-name" as="xs:string">
    <xsl:apply-templates select="." mode="latexml2tei-new-name"/>
  </xsl:variable>
  <xsl:element name="{$new-name}">
    <xsl:apply-templates select="." mode="latexml2tei-style"/>
    <xsl:apply-templates select="@*" mode="#current"/>
    <xsl:if test="self::p">
      <xsl:apply-templates select="../@xml:id" mode="#current"/>
    </xsl:if>
    <xsl:apply-templates mode="#current"/>
  </xsl:element>
</xsl:template>

Note

Creating a function somenamespace:latexml2tei-new-name() that transforms the element argument in mode="latexml2tei-new-name" would make storing the template output in a variable dispensable. We didn’t think about this when we wrote the stylesheet in 2018.

Some sample templates in these modes:

<xsl:template match="enumerate" mode="latexml2tei-new-name">
  <xsl:sequence select="'list'"/>
</xsl:template>

<xsl:template match="enumerate" mode="latexml2tei-style">
  <xsl:attribute name="rend" select="'numbered'"/>
</xsl:template>

<xsl:template match="enumerate/item" mode="latexml2tei">
  <xsl:apply-templates mode="#current"/>
</xsl:template>

Refactoring Monolithic Functions in a Hub to BITS Conversion

Another example for refactoring a monolithic function to something finer-grained is taken from a Hub-to-BITS conversion library [hub2bits]. (Hub XML is le-tex’s DocBook-derived intermediate XML format. For this example, it can be assumed as equivalent to DocBook.)

There is one of several element name mapping functions, jats:part-submatter(), that returns, for a given DocBook context element, the target BITS element name. (The namespace prefix is jats although the target format is BITS and although JATS doesn’t have a namespace anyway; this is because this library is also used for DocBook-to-JATS conversions, and we simply use xmlns:jats="http://jats.nlm.nih.gov" as a prefix for functions, keys, and modes related to any of the JATS family vocabularies. We do this because XSLT function names need to be namespaced and we love namespaces. There, we said it.) Before refactoring, the function was defined as follows:

<xsl:function name="jats:part-submatter" as="xs:string">
  <xsl:param name="elt" as="element(*)"/>
  <xsl:choose>
    <xsl:when test="name($elt) = ('title', 'info', 'subtitle', 'titleabbrev')">
      <xsl:sequence select="'book-part-meta'"/>
    </xsl:when>
    <xsl:when test="name($elt) = ('toc')">
      <xsl:sequence select="'front-matter'"/>
    </xsl:when>
    <xsl:when test="name($elt) = ('bibliography', 'glossary', 'appendix', 'index')">
      <xsl:sequence select="'back'"/>
    </xsl:when>
    <xsl:when test="name($elt) = 'section' and $elt[matches(dbk:title/@role, $jats:additional-backmatter-parts-title-role-regex)]">
      <xsl:sequence select="'back'"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="jats:book-part-body($elt/..)"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

After refactoring, it’s as simple as:

<xsl:function name="jats:part-submatter" as="xs:string">
  <xsl:param name="elt" as="element(*)"/>
  <xsl:apply-templates select="$elt" mode="jats:part-submatter"/>
</xsl:function>

The templates in mode="jats:part-submatter" have been created in order to provide the previous functionality, only in a much more extensible way:

<!-- additional advantage over xsl:choose in the function body with test="name($elt) = ('title', …)": 
     all the flexibility of matching patterns -->
<xsl:template match="dbk:title | dbk:info | dbk:subtitle | dbk:titleabbrev" mode="jats:part-submatter" as="xs:string">
  <xsl:sequence select="'book-part-meta'"/>
</xsl:template>

<!-- this way, we can handle front matter appendices quite elegantly: --> 
<xsl:template match="dbk:toc | dbk:appendix[following-sibling::dbk:chapter | following-sibling::dbk:part]" 
  mode="jats:part-submatter" as="xs:string">
  <xsl:sequence select="'front-matter'"/>
</xsl:template>

<xsl:template match="dbk:bibliography | dbk:glossary | dbk:appendix |
                     dbk:section[matches(dbk:title/@role, $jats:additional-backmatter-parts-title-role-regex)]" 
              mode="jats:part-submatter" as="xs:string">
  <xsl:sequence select="'back'"/>
</xsl:template>

<!-- previously xsl:otherwise: -->
<xsl:template match="*" mode="jats:part-submatter" as="xs:string">
  <xsl:sequence select="jats:book-part-body(..)"/>
</xsl:template>

The css:content Template

We at le-tex have been using the approach presented in this paper for some years now in our JATS/BITS→HTML, Hub/DocBook→HTML, TEI→HTML, and Hub/DocBook→JATS/BITS conversions.

In the beginning we wrote templates to convert attributes in the css namespace (@css:font-weight, @css:background-color, etc. [CSSa]) into

  • potentially a class attribute, and

  • potentially a style attribute.

We then bundled them with other CSS-attributes-related mappings and wrappings to a template called css:content [css:content]. It is called in order to transform attributes and nodes for almost any element that is not metadata (the css prefix and xmlns:css="http://www.w3.org/1996/css" is used for historic reasons, because we were dealing with the @css:* attributes primarily).

The only thing that this template doesn’t to is to map the source element name to a target name and to create the target element, or to unwrap the source element (and to ignore the generated attributes in case of unwrap).

This source element is transformed in a class-att mode in order to compute the class attribute. The @css:* attributes are transformed in a mode hub2htm:css-style-overrides; all CSS attributes that are not discarded by this mode will be put together, semicolon-separated, into an HTML @style attribute. (For the DocBook→JATS conversion, these remaining attributes will be either discarded, transformed in another way, or copied verbatim if allowed by a tweaked target schema.)

The additional steps that css:content performs are

  • create remaining attributes (copied verbatim or transformed)

  • create wrapper elements (b, i, sub, sup, …)

  • create other elements from attributes (generate a[@id] from def-item/@id when going from JATS to HTML’s unwrapped dt, dd sequences, for example)

  • transform the nodes in the #current mode

  • make sure that the attributes are written to the result before the other nodes.

Whether a given attribute should be wrapped in the output is determined by transforming the attributes in the special mode css:map-att-to-elt:

<xsl:template match="@css:font-style[. = ('italic', 'oblique')]" mode="css:map-att-to-elt" as="xs:string?">
  <xsl:sequence select="$css:italic-elt-name"/>
</xsl:template>

The global variable $css:italic-elt-name is defined as 'i' in the HTML-generating stylesheet and it is overridden to 'italic' in the customization that is used to create JATS/BITS from DocBook/Hub. If transformation of the attributes generates a sequence ('i', 'b'), then the transformed content will be wrapped like this: <b><i>content</i></b> and the wrapping-inducing CSS attributes will be removed from the attributes to be transformed.

This template heavily makes use of these dedicated token-generating modes for different purposes. The versatility of this approach is underlined by the fact that it could be adapted to transformations between different vocabularies with minimal customization, while giving full control to the stylesheet customizer for context-dependent class/style attribute creation and wrapper generation (not to speak of the rest of the transformation that happens in whatever mode is #current, about which the stylesheet customizer retains almost full control).

Final Thoughts

Functions or Named Templates?

Functions and named templates have been treated interchangeably in this paper so far. It should be noted though that functions should only be used when they help avoid redundancy in XPath expressions (and only if repeated evaluation of them in matching patterns won’t slow down template matching).

Many of the functions used in the DocBook (section “DocBook to HTML”) and TEI (section “TEI to HTML”) rendering stylesheets can be replaced with named templates, or, in the spirit of this paper, with matching templates. If they need not be called in XPath expressions, functions should be rather written as named templates in the first place.

The reason is tunneling. We didn’t see much of tunneling in the examples in this paper.

On a non-public project, the author recently needed to filter out columns of tables in BITS. The cells should not be discarded, rather, a class token 'discarded' should be added to them so that users can toggle the display of these discarded cells. The column numbers were calculated according to some criteria taken from thead/th and passed to the transformation of the whole table as tunneled integer parameters. A fact that not every XSLT developer knows: Even when switching modes, from normal document transformation to class-att mode, tunneled parameters will be passed on. This made creation of the 'discarded' tokens a very lightweight endeavor. This wouldn’t have been possible if the class attributes had been created using functions, unless the cell matching templates caught the tunneled parameter and passed it to the function. This would have necessitated that the function accept such a parameter, which is unlikely for generic functions that create class attributes like the ones we have seen.

Naming the Approach

Although it has been shown that there are more use cases for this design pattern than creating class attributes, one could call it “the class-att approach.” Other candidates are “auxiliary modes approach“, “micromode approach”, or “breakout mode approach.”

On the other hand, isn’t what this approach does just common sense? Writing monolithic functions or templates that lack context-dependent customization hooks might qualify as an antipattern, but will doing the opposite merit being called a pattern?

Maybe one can call the xsl:apply-template hooks that calculate something small, like a string, a token list, or an attribute, in a dedicated mode from within a formerly monolithic function or template “mode hooks”, and the dedicated modes such as class-att may be called “hook modes”. Then an XSLT developer can tell the other developer: “You should refactor this function so that it only has a mode hook inside, and then do the lifting in distinct matching templates in the hook mode.”

Caveats

Sometimes there are several levels of customization, and different XSLT developers might be responsible for maintaining these levels.

If the “hook mode” approach is chosen for a customizable stylesheet, then the people who adapt (import) this stylesheet need to be aware not to mix other approaches with the hook mode rules.

They should avoid something like this:

<xsl:template match="license-p[@content-type = 'foo']">
  <p class="license-foo">
    <xsl:apply-templates/>
  </p>
</xsl:template>

If you import their stylesheet and try to modify the resulting @class in mode="class-att", nothing will happen. XSLT developers might get frustrated if they cannot rely on this hook mode mechanism because intermediate imports spoiled it. Then use of this approach will erode more and more in each customization level and in each new customization they create on top of “mode hook“ methodology stylesheets. Therefore these auxiliary modes and the hooks for creating class attributes, element names, etc., should be documented in the basic stylesheets.

Conclusion

The mode hook/hook mode approach, if used in basic stylesheets, can help the people who import these stylesheets avoid redundancy.

Many existing stylesheets can be refactored without too many changes to the function or template signatures.

It remains to be seen whether people consider the methodology described here as a known pattern that only lacked a name. If this paper helped name this pattern, it is a success. If it makes people adapt their basic stylesheets accordingly, it will be beneficial for people who need to customize off-the-shelf stylesheets.

References

[Lumley Kay 2015] Lumley, John, and Kay, Michael. Improving Pattern Matching Performance in XSLT. XML London 2015. https://www.saxonica.com/papers/xmllondon-2015jl.pdf. doi:https://doi.org/10.14337/XMLLondon15.Lumley01.

[Piez 2010] Piez, Wendell. Fitting the Journal Publishing 3.0 Preview Stylesheets to Your Needs: Capabilities and Customizations. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010. Bethesda (MD): National Center for Biotechnology Information (US); 2010. https://www.ncbi.nlm.nih.gov/books/NBK47104/ [accessed 2020-07-02].

[Graham 2014] Graham, Tony. Formatting JATS: as easy as 1-2-3. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2013/2014. Bethesda (MD): National Center for Biotechnology Information (US); 2014. https://www.ncbi.nlm.nih.gov/books/NBK189779/ [accessed 2020-07-02].

[Kay XSLT 2.0] Kay, Michael. XSLT 2.0 and XPath 2.0 Programmer’s Reference, 4th edition. John Wiley & Sons, 2008.

[DocBook XSLT 2.0] Tovey-Walsh, Norman, Kosek, Jiří, et al. DocBook XSLT 2.0 Stylesheets. https://github.com/docbook/xslt20-stylesheets [accessed 2020-07-02].

[TEI XSL] Rahtz, Sebastian, et al. TEI XSL Stylesheets. https://github.com/TEIC/Stylesheets [accessed 2020-07-02].

[TEI gloss] TEI Consortium. Reference page for <gloss>. In P5: Guidelines for Electronic Text Encoding and Interchange. https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-gloss.html [accessed 2020-07-02].

[JATS XSL] Various contributors. JATS Preview Stylesheets. https://github.com/ncbi/JATSPreviewStylesheets [accessed 2020-07-02].

[XSLT 3 priority] Default Priority for Template Rules. In: Kay, Michael (ed.). XSL Transformations (XSLT) Version 3.0. W3C Recommendation 8 June 2017. https://www.w3.org/TR/xslt-30/#dt-default-priority [accessed 2020-07-02].

[Saxon] Saxonica. Saxon XSLT Processor. http://www.saxonica.com/products/products.xml [accessed 2020-07-02].

[XSLT 3 precedence] Stylesheet Import. In: Kay, Michael (ed.). XSL Transformations (XSLT) Version 3.0. W3C Recommendation 8 June 2017. https://www.w3.org/TR/xslt-30/#dt-import-precedence [accessed 2020-07-02].

[hub2bits] Imsieke, Gerrit, Pufe, Maren, et al. hub2bits XSLT/XProc library. https://github.com/transpect/hub2bits/commit/7c45174 [accessed 2020-07-02].

[CSSa] Imsieke, Gerrit. Conveying Layout Information with CSSa. In: XML Prague Proceedings 2013. https://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf#page=73 [accessed 2020-07-02].

[css:content] Imsieke, Gerrit, et al. hub2html XSLT/XProc library. https://github.com/transpect/hub2html/blob/master/xsl/css-atts2wrap.xsl [accessed 2020-07-02].

Gerrit Imsieke

Gerrit is managing director at le-tex publishing services, a mid-size preprint services, data conversion, and content management software company in Leipzig, Germany. A physicist by training, he entered the field of scientific publishing during his graduate studies. He is responsible for XML technologies at le-tex. He is a member of the NISO STS Standing Committee and of the XProc 3.0 working group.