Balisage logo

Proceedings

Approximate CSS Styling in XSLT

John Lumley

jωL Research

Saxonica

Balisage: The Markup Conference 2016
August 2 - 5, 2016

Copyright © 2016 jωL Research Ltd. All rights reserved.

How to cite this paper

Lumley, John. “Approximate CSS Styling in XSLT.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). DOI: 10.4242/BalisageVol17.Lumley01.

Abstract

This paper discusses transforming a CSS stylesheet into an XSLT transform that projects an approximation of the styling from the CSS onto a target XML document. It was developed during several XSLT-based projects involving multi-dialect XML documents, where there was a need either to evaluate CSS properties for another external tool, such as in an HTML → XSL-FO → PDF pipeline, or where a document styling needed to be fixed for embedding in another document, such as examples in professional papers. The paper presents examples, explains the general architecture of the generated XSLT transform, discusses how that transform is itself constructed from the CSS stylesheet and outlines the strengths and weaknesses and some of the directions in which the tool could be developed. It is approximate in that it only supports some of the core CSS features, assumes the user is skilled in the art and is working with CSS stylesheets that are understood and visible, and that the execution speed of the CSS projection is not an issue. Nevertheless, in the author's experience the ability to mix CSS styling into the XSLT researcher's toolbox has proved to be of some utility.

Table of Contents

Why?
A Simple Example
Similar work
The execution model
Simple (and approximate) XSLT model
@style property
Generated content
Counters
Using accumulator as counters
Drawbacks, shortcomings and workrounds
Relative properties
Inheritance
Namespaces
Style provenance and importance
Restriction to applicable elements
Constructing the XSLT transform
Additional features
A larger example
Possible developments and Conclusion
Robust CSS parsing
Generic program translation
Conclusion
Acknowledgements

Why?

In the current world of the web, much of the textural styling of XML-based documents is defined by declarations within Cascading Style Sheets (CSS) CSS. These encourage a model where XML elements are defined as members of classes, and the CSS stylesheets declare patterns to match elements of given type, class, position, ancestry, siblings and progeny, for which a set of values for properties (e.g. font, colour …), pre- and post-ambles (numbering, text), layout type (block, none, inline.…) and possible geometry (size, positioning...) are provided. Applicable patterns are chosen through a relatively simple specificity and supersessional model. For many simple presentational views of XML documents an agent who implements CSS can produce acceptable results. Many XML dialects (e.g SVG, XHTML) describe properties for many of their elements that are manipulable through CSS styling, as an alternative to or overriding direct attributive properties.

Different views of the same document can be projected by binding different CSS stylessheets. For example a 'table of contents' summary could be generated by suppressing display (display : none) of any element that is not a header or a descendant of a header. CSS works because for most cases it can provide a comparatively simple, and relatively easily understood, model that mere mortals can work with. But generally CSS has one major drawback for providing views of complex documents – it cannot, save for deletion or minor addition of textual/image content, alter the topology of the tree result, such as moving a caption element from the start to the end of the adjacent table.

Note

It is possible in CSS to indicate geometric requirements, through directives such as float:right or position:absolute, which can alter the presentational order of components, but these are intimately tied up with the CSS layout model, often supported in incompatible ways and usually require both craft and measures to support multi-browser robustness.

Another feature is that CSS styling is through inheritance – it is not universally possible to 'block' application within a subtree; only by introducing subtree-scope overriding rules (which will still define properties) can a section of a document be protected from other CSS rules. (Mechanisms to revert to a parent-inherited value or the default initial value are being defined in later CSS versions (3, 4), but browser support is inconsistent.)

In more complex document creation from XML sources, XSLT3.0 provides vastly greater flexibility, by being effectively a full-blown (and higher-order capable) programming language with an XML tree as one of its principal datatypes. It has an arbitrary capability of manipulating a document and styling the result (e.g. through properties described at element attributes.)

Necessity is the Mother of Invention – over recent years the author has had several occasions where document styling was available in CSS, but an XML document being processed in a full-blown XSLT setting needed some of the CSS styling information applied within an XSLT execution. One option would be to manually construct equivalent XSLT libraries (templates, attributes etc.) which corresponded to the CSS intention and apply these to the source. This would of course be subject to coherence drift having to keep two styling masters in synchronisation. Another option was to see whether some equivalent XSLT could be generated from the given CSS which would detemine the same properties for the document in question, leaving the results principally as attributes on the result tree.

There are a number of cases where the effect of CSS styling might need to be evaluated before some final projection or display. Examples are:

  • When the effect of a particular CSS needs to be fixed on a subsection of a document, such that conflict with a wider scope CSS stylesheet is avoided. For example, an embedded SVG component within an XHTML page could be styled by its own CSS stylesheet. This could be in conflict with that used (or externally imposed) for the HTML page. In this case for document consistency we wish to fix the properties for the SVG. Similarly for example in the preparation of a paper for a conference that is styled by CSS, which needs to present embedded examples that are styled by another CSS[1] it can be helpful to project the localised effects completely, and keep the conference organisers from fiddling with the author's intent.

    Note

    Recent changes in the Balisage publication process, moving away from a final XHTML publication to one in PDF, has reduced some of the self-referential amusement this paper might offer the reader. However, if this had been presented at Balisage 2015, humour would have been restored.

  • The eventual processor for the document doesn't support CSS styling directly. For example XSL-FO doesn't support references to CSS resources, though it does use many attributive properties taken from the CSS2 model.

The goal of this work is to attempt to project as much of the effect of a CSS stylesheet onto a an XML document as possible, leaving the effects as either grounded property values as attributes on relevant elements, or modest sections of added content.

Note

The author is well aware that CSS is evolving (descriptions of parts of CSS3 date back to 2001, the CSS2.2 spec is being continually re-drafted), handled to differing degrees of compliance by a variety of tools, and that extensions are numerous. This work is not indented to build a definitive CSS → XSLT converter (hence the title Approximate), but show how a basic converter can be constructed and permit limited, but common, CSS styling to be projected in an XSLT environment. In the process, if I play somewhat fast and loose with the CSS standard(s), it is partly because in practice that is how CSS appears to be.

In this paper I will first present a very small SVG example styled by a CSS and show what an equivalent XSLT transform could be. Possible XSLT mechanisms that should support a variety of CSS rules are then discussed, followed by how the transform itself is generated. A larger example of its use (styling an ACM paper) is then presented. Finally I consider possible developments of the technique and draw some conclusions.

A Simple Example

Suppose we have a very simple SVG document:

Figure 1: An unstyled SVG document

<svg xmlns="http://www.w3.org/2000/svg" width="250" height="150">
  <g alignment-baseline="baseline">
    <rect x="20" y="20" width="200" height="100"/>
    <rect class="a" x="100" y="5" height="140" width="40"/>
  </g>
</svg>
SVG image ../../../vol17/graphics/Lumley01/Lumley01-001.svg

Now we want to improve the attractiveness of the offering, by styling it with an externally defined CSS stylesheet:

Figure 2: CSS styling unapplied in an embedded document

* { /* rule 1 */
    fill: red;
    stroke: black;   
}
rect.a { /* rule 2 */
    fill: blue;
    fill-opacity: 0.5;
}
SVG image ../../../vol17/graphics/Lumley01/Lumley01-002.svg

(If you are viewing this directly in an HTML browser, where the SVG is referenced by an img element, as opposed to the mediaobject in the original DocBook source, then you won't see the pretty colours – this is one of the issues this work is addressing!)

SVG extends the CSS property vocabulary for its own specific presentational properties, such as fill. In line with the don't change topology philosophy of CSS, these do not cover alterations to geometry, i.e. we can't change the size of one of the rectangles through a CSS stylesheet. If we now reference this SVG document from within another, such as an HTML rendering of this paper, the CSS styling probably won't come through[2]. A much more robust approach would be to fix the styling on the SVG and embed it directly:

Figure 3: CSS styling grounded onto an embedded example

<svg xmlns="http://www.w3.org/2000/svg" width="250" height="150"
  fill="red" stroke="black">
  <g alignment-baseline="baseline" fill="red" stroke="black">
    <rect x="20" y="20" width="200" height="100" 
      fill="red" stroke="black"/>
    <rect class="a" x="100" y="5" height="140" width="40"  
      fill="blue" stroke="black" fill-opacity="0.5"/>
  </g>
</svg>
SVG image ../../../vol17/graphics/Lumley01/Lumley01-003.svg

This paper is about how, and to what extent, this projection of a CSS stylesheet onto an XML document can be performed in an entirely XSLT execution environment.

Similar work

The problem of incorporating CSS into the XSL-FO mix has been addressed by a specific product – CSSToXSLFO. http://www.re.be/css2xslfo/index.xhtml which is specifically targetted to generated styled XSL-FO structures corresponding to the box-model of layout implicit in CSS. Internally it uses the Flute Java-based CSS Parser to generate an internal (Java) rule structure. This structure is then applied across the (arbitrary) input XML document to create a DOM tree with additional namespaced attributes or elements (e.g. @css:text-align, css:before) which define the appropriate property as determined by the CSS. This intermediate document is then converted to suitable XSL-FO by a generic XSLT stylesheet matching these determined CSS directives attached to elements.

Externally my approach is perhaps similar, in that the CSS sheet is parsed into an action that determines CSS-defined styling properties and additional content, though these are used to overwrite any existing properties of the same name, and no further conversion is attempted. Internally it is quite different, developing a complete parse tree for the CSS (using regular expressions in XSLT), which is used to generate a suitable XSLT program.

The execution model

A single simple CSS rule declaration has the following generic form:

selector-pattern { property : value; property : value; … } 

where for elements matching the selector pattern, values for the given style properties are declared. Such property values might be superseded or modified by properties defined in other rules with more specific matches to the given element. In this way, stylesheets can cascade their effects. For example a rule h1 {font-size: 130%;} in a generic stylesheet merely declares that h1 elements should be rendered in a 30% larger font than whatever is the default. A more specific CSS sheet can define what that default font size should be or even force a specific font size on the h1 element.

The selector patterns, whilst not as rich as XSLT's generic XPath patterns, do have enough expressiveness to cover ancestry, class membership and identity reference, sibling position, descendant contents, attribute values and so forth. Pseudo-element selectors permit styling portions of the tree that are not direct elements, such as adding preamble via the :before selector.

Most properties are simple stylistic scalar values (e.g. color:), but some relate to box-model display control (display:) and others declare additional content or counter variables, which are used to support numbering. Later versions have started to incorporate declarations about column placement, flow targets etc.

The precedence model between these CSS declarations is comparatively simple – a specificity model for a given property examines patterns which would effect it and score them in a hierarchy on the numbers of identity attribute, other attribute and element name references in the pattern. Specific @style attributes win out and the universe of declarations is also split according to source (user agent, user and author) and an importance declaration (!important).

Simple (and approximate) XSLT model

There are a number of possible architectures that could be used. In this section I discuss one where there is some direct correspondence between the CSS rules and the XSLT templates in the equivalent stylesheet – the approach the author took originally. Later on I'll discuss some other possibilities for the overall design.

I'll start by examining a very simple case, using the example CSS stylesheet above. According to the CSS specificity rules, when rule 2 is matched, then the @fill property would be set to 'blue' as it is more specific than the (unclassed) rule 1. Technically then we should compute the fill property for each element independently of the other properties, with reference to these rules.

However, if for the moment we make some assumptions about the sort of CSS stylesheets being employed, we can build a fairly simple XSLT transform with the following templates that would have the same effect:

Figure 4

<xsl:template match="*">
  <xsl:copy>
    <xsl:sequence select="@*"/>
    <xsl:apply-templates select="." mode="css"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>
<xsl:template match="*" priority="1.5" mode="css"/>
<xsl:template priority="1.501" mode="css" match="*">
  <xsl:next-match/>
  <xsl:attribute name="fill">red</xsl:attribute>
  <xsl:attribute name="stroke">black</xsl:attribute>
</xsl:template>
<xsl:template priority="1.502" mode="css" match="rect[tokenize(@class,'\s+')='a']">
  <xsl:next-match/>
  <xsl:attribute name="fill">blue</xsl:attribute>
  <xsl:attribute name="fill-opacity">0.5</xsl:attribute>
</xsl:template>

Simple cascading templates

(using a default XPath namespace for SVG, see section “Namespaces” ) which generates

<svg xmlns="http://www.w3.org/2000/svg" width="250" height="150" fill="red" stroke="black">
     <g alignment-baseline="baseline" fill="red" stroke="black">
        <rect x="20" y="20" width="200" height="100" 
              fill="red" stroke="black"/>
        <rect class="a" x="100" y="5" height="140" width="40"  
              fill="blue" stroke="black" fill-opacity="0.5"/>
    </g>
</svg>

which is the desired outcome.

We use XSLT templates to describe the entirety of the property applications within each CSS rule, rather than using the set of rules to look-up a required property in turn. This works by using the feature that XSLT can overwrite the values of element attributes while still effectively constructing the element node itself rather this its children – the last written with a given name to the element head tag becomes the value in the result. This we arrange by ensuring that the lowest precedence rules fire first followed by the successively higher and higher precedence (specificity) rules in turn, giving every one of our template rules defined and different priorities and using the xsl:next-match instruction to chain through to the lower priorities for earlier evaluation.

This simple general technique makes the following assumptions which seem to be approximately good enough for many common uses of CSS[3]:

  • Evaluating matching rules for all properties en masse for a given element produces an equivalent end result to lookup of each relevant property through the rule structure in turn. It can be dependent on the fact that many final user agents will (should?) silently ignore additional extraneous (and usually attributive) information. For example the transform above, because of the * matching, would also add @fill and @stroke attributes to any desc or title elements (or their descendants) in an SVG document, even though such properties aren't defined for those meta-documental components within SVG[4]. How this problem might be alleviated will be described in a later section.

  • No use is made of relative properties, i.e. where a property changes dependent upon that of some parent. In this case each rule can be considered in isolation. Later I will discuss how this issue might be solved.

To generate these rules first we need to determine equivalent XSLT patterns that will match elements according to the CSS selector grammar. Some examples are shown in the following tables:

Table I

Some basic CSS pattern equivalents

CSSXSLT pattern
**
elementelement
.class*[tokenize(@class,'\s+')='class']
#id*[@id='id']
E > FE/F
E FE//F
E - FF[preceding-sibling::E]
E + FF[(preceding-sibling::*)[1]/self::E]

Table II

Some CSS predication pattern (selector) equivalents

CSSXSLT patternNotes
pattern[attr]XSLT pattern[exists(@attr)]
pattern[attr=value]XSLT pattern[@attr=value]A suitable typed binding of value to an XPath literal (e.g. 1, 2, 'three' ...) will be needed.
pattern:nth-child(even)XSLT pattern[position() mod 2 = 0]
pattern:nth-child(odd)XSLT pattern[position() mod 2 = 1]
pattern:nth-child(digit)XSLT pattern[position() = digit]

@style property

As well as supporting grounded attributive properties (@fill, @stroke-width) many XML dialects permit a conglomerate @style attribute, whose value is a list of property-value pairs. Some dialects (e.g. XHTML) only permit direct document-borne styling though this means. To make this compound style attribute list, we can use substantially the same mechanism with the following modifications to the code:

Figure 5

<xsl:template match="*">
  <xsl:copy>
    <xsl:sequence select="@*"/> 
    <xsl:variable name="style-att" as="attribute()*">
      <xsl:apply-templates select="." mode="css"/>
    </xsl:variable>
    <xsl:sequence select="$style-att"/>
    <!-- By writing onto an element, we ensure that same-name attributes overwrite -->
    <xsl:variable name="h" as="element()">
      <H>
        <xsl:sequence select="$style-att"/>
      </H>
    </xsl:variable>
    <xsl:if test="exists($style-att)">
      <xsl:attribute name="style" select="string-join(($h/@* ! (name() || ':' || .)), ';')"/>
    </xsl:if>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>
<xsl:template match="*" priority="1.5" mode="css"/>
<xsl:template priority="1.501" mode="css" match="*">
  <xsl:next-match/>
  <xsl:attribute name="fill">red</xsl:attribute>
  <xsl:attribute name="stroke">black</xsl:attribute>
</xsl:template>
<xsl:template priority="1.502" mode="css" match="rect[tokenize(@class,'\s+')='a']">
  <xsl:next-match/>
  <xsl:attribute name="fill">blue</xsl:attribute>
  <xsl:attribute name="fill-opacity">0.5</xsl:attribute>
</xsl:template>

Using style attributes

We write the found attributes onto a temporary element H so that the last values win, then we sweep up all the attributes into the property/value list. (If we only want the @style property then the xsl:sequence select="$style-att" instruction can be suppressed. This could be controlled on a source-language property of the overall process.)

Generated content

CSS selectors can also include qualifiers (pseudo-elements) that add (textual) content to a given element. This is often used to support list labelling or numbering of headers or running entities such as figures or tables. (Sadly this isn't a feature of SVG, so instead of illustrating with pictures, I'll have to use dull XHTML instead.) As a very simple example:

Figure 6: Some additional content on XHTML elements

*{ font-family: arial, helvetica;}
h1, h2 { color: red; }
h1:before {content: "HEADING: " open-quote; }
h1.vital:before {content: "VITAL: " open-quote; }
h1:after {content:  close-quote; }
<body>
  <h1>First h1 heading</h1>
  <p>Paragraph</p>
  <h2>First h2 heading</h2>
  <p>Paragraph</p>
  <h1 class="vital">Second h1 heading</h1>
  <p>Paragraph</p>
  <h2>Second h2 heading</h2>
  <p>Paragraph</p>
</body>
image ../../../vol17/graphics/Lumley01/Lumley01-004.png

The :before and :after selectors target effectively the beginning and end respectively of the contained content of the given element, using the content property (which can either be text, source resource URI, smart quotes, counters (see later) or the value of an attribute on the source element) as the content to insert. [Technically in CSS3 pseudo-elements such as these must use a double-colon :: prefix to distinguish from pseudo-classes, but :before and :after are permitted in backwards compatibilty for CSS1 and CSS2. See Selectors Level 3].

To process these we have a parallel set of templates that process the overall body of the element, collecting before and after content sections in turn being passed as tunnelled variables of xs:string? type. (Content is as far as I'm aware not additive, so more specific content overrules less specific.) The match patterns are the same as those used for any normal styling for that element.

Table III

<xsl:template match="*" mode="body">
  <xsl:param name="before" tunnel="yes"/>
  <xsl:param name="after" tunnel="yes"/>
  <xsl:copy>
    <xsl:apply-templates select="@*" mode="#current"/>                  
    <xsl:apply-templates select="." mode="css"/>                 
    <xsl:value-of select="$before"/>
    <xsl:apply-templates mode="#current"/>
    <xsl:value-of select="$after"/>
  </xsl:copy>
</xsl:template>

<xsl:template priority="1.501" mode="body" match="h1">
  <xsl:param name="before" as="item()?" select="()" tunnel="yes"/>
  <xsl:variable name="content" as="item()*">
    <xsl:text>HEADING: </xsl:text>
    <xsl:text>‟</xsl:text>
  </xsl:variable>
  <xsl:next-match>
    <xsl:with-param name="before"
      select="($before,string-join($content))[1]" tunnel="yes"/>
  </xsl:next-match>
</xsl:template>
<xsl:template priority="1.502" mode="body"
  match="h1[tokenize(@class,'\s+')='vital']">
  <xsl:param name="before" as="item()?" select="()" tunnel="yes"/>
  <xsl:variable name="content" as="item()*">
    <xsl:text>VITAL: </xsl:text>
    <xsl:text>‟</xsl:text>
  </xsl:variable>
  <xsl:next-match>
    <xsl:with-param name="before"
      select="($before,string-join($content))[1]" tunnel="yes"/>
  </xsl:next-match>
</xsl:template>

<xsl:template priority="1.503" mode="body" match="h1">
  <xsl:param name="after" as="item()?" select="()" tunnel="yes"/>
  <xsl:variable name="content" as="item()*">
    <xsl:text>”</xsl:text>
  </xsl:variable>
  <xsl:next-match>
    <xsl:with-param name="after"
        select="($after,string-join($content))[1]" tunnel="yes"/>
  </xsl:next-match>
</xsl:template>

Counters

CSS uses the same mechanism to declare, modify and interpolate the values of counters, which become important in styling sectioned documents.

Figure 7: XSLT equivalents for numbered counters

...
body {counter-reset: h1;}
h1:before{
  counter-increment: h1;
  content: counter(h1) ".  " open-quote;
  counter-reset: h2;
}
h2:before{
  counter-increment: h2;
  content: counter(h1) "." counter(h2)".  " open-quote;
}
h1:after, h2:after{
  content: close-quote;
}

<body>
  <h1>First h1 heading</h1>
  <p>Paragraph</p>
  <h2>First h2 heading</h2>
  <p>Paragraph</p>
  <h1 class="vital">Second h1 heading</h1>
  <p>Paragraph</p>
  <h2>Second h2 heading</h2>
  <p>Paragraph</p>
</body>
image ../../../vol17/graphics/Lumley01/Lumley01-005.png

When the counter names are the same as the elements they are counting, and same-name elements don't nest, an equivalent structure for the h2:before rule can be built in the XSLT by altering the content to:

<xsl:variable name="content" as="item()*">
  <xsl:variable name="last.reset" select="()[last()]"/>
  <xsl:value-of select="count(if(exists($last.reset)) then preceding::h1[. &gt;&gt; $last.reset] else preceding::h1) + 0"/>
  <xsl:text>.</xsl:text>
  <xsl:variable name="last.reset" select="((preceding::h1))[last()]"/>
  <xsl:value-of select="count(if(exists($last.reset)) then preceding::h2[. &gt;&gt; $last.reset] else preceding::h2) + 1"/>
  <xsl:text>. </xsl:text>
  <xsl:text>“</xsl:text>
</xsl:variable>

which counts back through the relavent elements to the last element which reset the given counter[5]. Obviously more complex code can be developed to handle arbitrary counter names, and within an XSLT3.0 environment accumulators may offer a more elegant solution.

Using accumulator as counters

XSLT3.0 introduced accumulators to support the processing or retaining data from the source tree for later use within streamed processing. They have parallels with xsl:key, but rather more flexibility, and can be considered as a map from source tree nodes to an arbitrary value, which is computed by a walk through the tree computing a pre-descent and a post-descent value for each node. In the absence of any applicable rule that changes the value of the accumulator, the value remains constant. Thus we could compute the values of the h1 and h2 counters above (which increment by 1) with the following:

<xsl:accumulator name="h1" initial-value="0">
 <xsl:accumulator-rule match="h1" phase="start" select="$value + 1"/>
</xsl:accumulator>
<xsl:accumulator name="h2" initial-value="0">
 <xsl:accumulator-rule match="h1" phase="start" select="0"/>
 <xsl:accumulator-rule match="h2" phase="start" select="$value + 1"/>
</xsl:accumulator>

and we retrieve the value of the counters by accumulator-before('h1') and accumulator-before('h2') respectively. Note that the h2 rule gets its value reset to 0 by any h1 element, as declared by the h1 rule. CSS defines that numbering will nest for descendants using the same counter. This can be accomodated for a nested example like ol/li.. ol/li by using the accumulator to hold a stack of values and popping the stack on exit from the list:

<xsl:accumulator name="list" initial-value="()">
 <xsl:accumulator-rule match="ol" phase="start" select="0, $value"/>
 <xsl:accumulator-rule match="ol" phase="end"
   select="if(ancestor::ol) then tail($value) else $value"/>
 <xsl:accumulator-rule match="li" phase="start" select="head($value) + 1,tail($value)"/>
</xsl:accumulator>

This uses the post-descent phase of processing an ol to return to the previous level if it is within an outer ol and the value is now accessed with head(accumulator-before('list')).

Drawbacks, shortcomings and workrounds

The currrent technique of course falls well short of handling all CSS constructs, but there are some drawbacks and shortcomings that might be susceptible to different approaches:

Relative properties

Some numeric properties within a rule in CSS, especially font-size, can be defined to be relative to the default or already-determined value for that property, such as h2 {font-size: larger;} or .superscript {font-size: 50%;} The current technique would merely write this as the attribute's value, which would probably be an (invalid) property value. In order to support this it will be necessary to determine any computed value of that property for the parent.

In XSLT of course only the source tree can be examined directly by XPath – result trees need to be bound to local variables to examine those, and doing that from a child into the parent is tricky and fraught. This suggests that computed properties for the parent need to be passed down to child processing, probably in the form of tunnelled variables. A generic possibility is to pass down a childless, but attributed, instance of the altered element as a parameter for processing the children. In XSLT3.0 a map() would be a much more suitable mechanism, using something like:

<xsl:template match="span[tokenize(@class, '\s+') = 'a']" 
  priority="1.5" mode="css:properties" as="map(*)">
 <xsl:param name="css:properties" as="map(*)" tunnel="yes"/>
 <xsl:variable name="lower-props" as="map(*)">
  <xsl:next-match/>
 </xsl:variable>
 <xsl:variable name="new-props" as="map(*)">
  <xsl:map>
   <xsl:map-entry key="'color'" select="'blue'"/>
   <xsl:map-entry key="'font-size'" select="$css:properties('font-size') * 0.70"/>
  </xsl:map>
 </xsl:variable>
 <xsl:sequence select="map:merge(($lower-props, $new-props))"/>
</xsl:template>

<xsl:template match="span[tokenize(@class, '\s+') = 'a']" 
   priority="1.5" mode="css:apply" as="attribute()*">
 <xsl:variable name="props" as="map(*)">
  <xsl:apply-templates select="." mode="css:properties"/>
 </xsl:variable>
 <xsl:attribute name="color" select="$props('color')"/>
 <xsl:attribute name="font-size" select="$props('font-size')"/>
</xsl:template>

where the mode css:properties collects a map of the properties (relying on map entries overwriting), which can involve properties from inheritance or less specific patterns, and css:apply looks up the requested appropriate values and returns as attributes. Alternatively in XSLT3.0, accumulators, which compute across the tree, could use multiple rules to track the values. For example with the following accumulator:

<xsl:accumulator name="font-size" initial-value="12">
 <xsl:accumulator-rule match="div[tokenize(@class, '\s+') = 'small']" select="10"/>
 <xsl:accumulator-rule match="span[tokenize(@class, '\s+') = 'a']" select="$value * 0.7"/>
</xsl:accumulator>

accumulator-before('font-size') executed on a div.small span.a would yield the value 10.

Inheritance

The property value keyword inherit can be used to declare that the value for the given property is the same as that for its parent. Thus * { color: inherit } means that all descendants of a node have the colour property of the closest ancestor that defines a colour[6]. Any solution to support relative properties would also be able to support such inheritance. Again using a tunneled variable holding such inheritance values may be be suitable.

Namespaces

Most CSS stylesheets are used in a namespace-ignorant manner: patterns match local names of elements. Of course XSLT is far from such ignorance and patterns should be suitably namespace-applicable. This can either be done by using the local name with wildcarded namespace (*:element-name) or when the document is in a single namespace, declaring the default XPath namespace, which is used in the examples above. In CSS3 there are mechanisms to define default and prefixed namespaces (@namespace svg "http://www.w3.org/2000/svg";, svg|circle { ... }) so these could be detected and converted to XSLT trivially.

Style provenance and importance

Styling rules can come from a number of sources: the rendering agent, a user-imposed stylesheet, an author-declared stylesheet and also attached to @style properties on a document element itself. In addition an important indicator can preclude further overriding of a given property. These would all have influence on the precedence order between rules and associated templates, though !important applies only to a property, not to a rule. This would require properties to be processed individually rather than as at present, en masse.

Restriction to applicable elements

As suggested in section “Simple (and approximate) XSLT model”, the generated XSLT patterns are somewhat catholic, or even perhaps cavalier, in their applicability. Rather than an element of, say, type foo only requiring properties a and b, and in CSS looking through the ruleset to determine if any definitions for these two properties exist, the XSLT approach means that the rules impose themselves on the elements. Thus a CSS rule * {a:1; b:2; c:3;} would impose the (attributive) property c onto foo elements, even if that property were irrelevant to the foo type. In our SVG example above SVG meta-document elements such as desc or title could be so styled with irrelevant colouring or fonting. When tools behave as good XML citizens this shouldn't be a problem, though conceivably a property could be imposed (or overwritten) as an attribute that was not stylable through CSS according to the domain semantics.

If this is a problem, then a possible solution would be to exploit restrictions from any schema available for the target documents. The SVG schema for example restricts the permitted attributes on desc to the following: @id, @xml:base,@xml:lang, @xml:space, @class, @content and @style. This it would be possible to guard the application of other properties within a * { … } pattern with xsl:if test="not(self::desc)". A better and more coherent option would be to use the schema actively either in the run-time or the compile time. This is especially so given the structured nature of many schemas – in the case of the SVG schema large attributeGroups partition the sets of applicable properties into useful chunks.

Note

Sadly SVG 1.1 only has a DTD – the schema is from an earlier 2002 version.

Another drawback is that our CSS rules are applied regardless of whether the property involved is CSS-manipulable within the target language. For example we could write a rule:

rect.large {
  width: 200;
}

which would be translated into the simple template:

<xsl:template priority="..." mode="css" match="rect[tokenize(@class,'\s+')='large']">
  <xsl:next-match/>
  <xsl:attribute name="width">200</xsl:attribute>
</xsl:template>

the consequence of which would be to set the width attribute of large rect elements. However in SVG, unlike for example in XHTML, no direct geometry properties are susceptible to CSS styling (see SVG's styling properties), and normally the directive would be ignored, so in our (XSLT-styled) case we'd get an incorrect, not just approximate, result. This could be solved by a target-language-specific filtering of the CSS rules. Note that using the SVG schema or DTD alone to determine applicability would not be sufficient – @width is a rather crucial component of svg:rect!

Constructing the XSLT transform

In this section I'll discuss how the XSLT transform to perform the CSS styling is constructed. As this tool was built up incrementally as needed, some parts are certainly not as complete as they eventually should be.

The first step of this process is to parse the CSS stylesheet(s) to produce an XML equivalent parse tree, from which code equivalents can be generated through normal XSLT push-processing methods. It should be possible of course to use some form of full-blown parser, perhaps one generated by Gunther Rademacher's excellent parser generator REx REx[7]. Another option would be to build on the example of Invisible XML described by Pemberton Pemberton2013 which uses parsing CSS as an example. I'll return to these possibilities in the conclusions.

However, in the somewhat ad hoc environment within which this tool was originally used, and given the assumption that the complexity of the CSS stylesheets being processed is understood well and that content is friendly (e.g. strings don't contain { or }), compound use of regular expressions has sufficed so far.

CSS stylesheets can import other stylesheets through @import placed at the directives at the head of a file – the semantics of the collection are flat, so a recursive inclusion sweep through the implied URL network concatentating text will yield a collection of statements that are in the correct document order.

After removing comments, the text of the CSS is split into separate rules by tokenization against a \}\s* pattern. Further regular expression tokenization gives us for each rule the selector text and a sequence of property/value pairs. The selector text is then parsed using a set of regular expressions (operating within an xsl:analyze-string instruction) to produce a modified @match attribute containing a suitable XPath equivalent. For extensibility these regular expressions are bound to variables – the following section of the converter shows how they're used[8]:

<xsl:variable name="elementIdP">\w+#[\w\-]+</xsl:variable>
<xsl:variable name="idP">#[\w\-]+</xsl:variable>
<xsl:variable name="elementClassP">\w+\.[\w\-_\.]+</xsl:variable>
<xsl:variable name="classP">\.[\w\-_\.]+</xsl:variable>
 
<xsl:variable name="regex"
            select="string-join(($elementIdP,$idP,$elementClassP $classP),'|')"/>
            
<pattern match="{.}">
  <xsl:attribute name="match">
    <xsl:analyze-string select="." regex="{$regex}">
      <xsl:matching-substring>
        <xsl:choose>
          <xsl:when test="matches(.,$elementIdP)">
            <xsl:value-of select="substring-before(.,'#')||'[@id='||$apos||substring-after(.,'#')||$apos||']'"/>
          </xsl:when>
          <xsl:when test="matches(., $idP)">
            <xsl:value-of select="'*[@id='||$apos||substring-after(.,'#')||$apos||']'"/>
          </xsl:when>
          <xsl:when test="matches(., $elementClassP)">
            <xsl:value-of select="substring-before(.,'.')||jwl:classPredicates(.)"/>
          </xsl:when>
          <xsl:when test="matches(., $classP)">
             <xsl:value-of select="'*'||jwl:classPredicates(.)"/>
          </xsl:when>
          ...
        </xsl:choose>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:attribute>
  <xsl:sequence select="$properties"/>
</pattern>

These regular expressions are used twice – firstly within an overall match (when they have been joined as a union) and then individually in specific tests for particular components – the order of these choices can be used to match more specific cases before more general, such as a named element with class before a generic class. These pattern structures not only can contain the selector pattern and any implied properties, but also any other qualifiers, such as being a :before pseuo-element. As an example:

<pattern match="*">
  <xsl:attribute name="font-family">arial, helvetica, sans-serif</xsl:attribute>
</pattern>
<pattern match="body">
  <xsl:attribute name="counter-reset">h1</xsl:attribute>
</pattern>
<pattern match="h1">
  <xsl:attribute name="color">red</xsl:attribute>
</pattern>
<pattern match="h2">
  <xsl:attribute name="color">red</xsl:attribute>
</pattern>
<pattern match="h1" action="before">
  <xsl:attribute name="counter-increment">h1</xsl:attribute>
  <xsl:attribute name="content">counter(h1) ". " open-quote</xsl:attribute>
  <xsl:attribute name="counter-reset">h2</xsl:attribute>
</pattern>
<pattern match="h2" action="before">
  <xsl:attribute name="counter-increment">h2</xsl:attribute>
  <xsl:attribute name="content">counter(h1) "." counter(h2)". " open-quote</xsl:attribute>
</pattern>
<pattern match="h1" action="after">
  <xsl:attribute name="content">close-quote</xsl:attribute>
</pattern>
<pattern match="h2" action="after">
  <xsl:attribute name="content">close-quote</xsl:attribute>
</pattern>

(Note that these pattern elements contain xsl:attribute instructions, which will then be copied into the final templates of the transform.) We now have a sequence of all the rules within the CSS. They should then be sorted according to their CSS specificity rank (a four-level vector), which can be calculated while the pattern elements are being constructed, e.g.

p.footnote → 
  <pattern match="p[tokenize(@class,'\s+')='footnote']" 
     specificity="0,0,1,1">....
section#bibliography > p.header → 
  <pattern match="section[@id='bibliograph']/p[tokenize(@class,'\s+')='header']" 
     specificity="0,1,1,2">....

This sorted set of rule descriptions is then used to generate an XSLT template (as shown in Figure 4) in mode css for each one, with a successively increasing priority and a generic form of executing lower priority matches first through an xsl:next-match chain followed by writing the relevant attribute properties.

With the equivalent stylesheet formed, it can either be emitted as a result, to be used externally, or within an XSLT3.0 environment, it can be used as the source stylesheet for a transform() function invocation to provide the styling to some candidate XML within the encompassing stylesheet itself.

Additional features

Apart from selectors, property values and counters, CSS also defines a number of other styling features for which equivalent XSLT can be generated comparatively easily. These include:

text-transform

CSS supports models for transforming the case of text content of an element with the property text-transform that supports four options (lowercase | uppercase | capitalize | none). This is supported by passing any text-transform property that has been determined when processing an element's body as a parameter to the match for text() nodes, and altering (some) characters of the text string value if required.

content: open-quote | close-quote

The content property can contain reserved tokens for inserting quotation marks, which can nest, and whose non-default values are taken from a property quotes which contains a list of open and close pairs for successive nesting levels. (Currently these are merely mapped to “ and ” respectively, with no nesting considered.)

A larger example

A more realistic example is the authoring/production process for a professional paper to be published eventually in PDF format. The publisher, in this case the ACM, provides a style guide, and templates in Word or LaTeX. In this case the paper source, much of which might be autogenerated (including self-laying-out SVG sections) was in neither of those formats, but in an extended and well-formed XHTML. The publication route, which had been developed over several years, involved a mapping from XHMTL to an extended SVG/FO format (which included layout directives), then to a fully grounded SVG and thence to PDF with a SVG→PDF converter[9].

Originally the paper sources were created in a standard XML editor and the ACM styling was imposed through XSLT templates and attribute sets that were used to form up the equivalent document content and layout directives. This was reasonable, but the editing front-end was all angle brackets. With the advent of styled authoring packages, such as that in Oxygen, where a document is viewed and edited through a CSS stylesheet, it became possible to edit the document with most elements styled reasonably – e.g. paragraphs have correct fonting, headings look similar to those that will be in the final document, including section numbering, lists are seen as lists etc[10] and angle brackets are rarely seen!

It took little time to write the ACM-equivalent CSS for the main components of an XHTML report, so the paper authoring became much easier. Then we can use the techniques described above to ground the style of the XHTML according to the CSS, and remove the attribute-set styling from the XHTML→SVG/FO step. In the process of doing this of course, we increase the generality of the solution. Textural styling (as opposed to layout/geometric) can now be completely defined from the CSS.

This is a part of the original source, where there are some additional element declarations (embed supports adding multiple images, declaring their relative positioning and sizing), and attribute properties (@lay:float declares that this element, in this case this entire div, can always float forward (next page, next column..) if it won't fit in remaining available space.)

<div lay:float="always" lay:id="d48e871">
  <figure id="sE2" height="free">
    <embed src="simpleExtend-2.svg?type=tree simpleExtend-3.svg" xpath="(.//svg:g[@id='main']/*,./*)[1]"
       stack-offset="50" TYPE="image/svg-xml" gap-y="5" height="20" columns="2" box="no" lay:same.scale="true"/>
  </figure>
  <p class="caption">Figure 6. A continual variation of Figure 5</p>
</div>
<p>The retained instructions (<code>xsl:for-each</code>...) are considered to be a null element
   as far as layout is concerned, will not be displayed in an SVG viewer, have no influence on
   geometry and will be copied in place in the resulting resolved flow<sup class="fnIndex">8</sup>.</p>
<p class="footnote" lay:footnote="footnote">
  <sup class="fnIndex">8</sup>This is not so for topologically-modifying layouts, 
  such as sort-by-size but any  eventual (graphical) results from these instructions
  would be placed correctly on resolving the parent's layout.</p>
<p>There are a small number of possible types of propagation 
   that may be declared in an <code>@R:retain</code> attribute:</p>
<ul>
  <li>
    <code>true</code> - execute the instruction and always retain.</li>
  <li>
    <code>while[<em>(expression)</em>], until[<em>(expression)</em>] </code>- execute the instruction;
    retain while/until the given XPath expression is true.</li>
</ul>

These are some relavent sections from the ACM style CSS:

*        { font-family:Times-Roman; font-size:9pt; }
p        { padding-bottom:3pt; text-align:justify; hyphenate:true; }
dl,ul,ol { margin-left:12pt; }
li       { text-align:justify; hyphenate:true; padding-bottom:3pt; }
code     { font-family:Courier; font-size:8pt; font-weight:bold; }
sup      { vertical-align:4pt; font-size:6pt; }
footnote { font-size:8pt; font-style:italic; }

And this is what it looks like when viewing the combination in Oxygen's Author mode.

Figure 8: ACM styling in a document authoring mode

image ../../../vol17/graphics/Lumley01/Lumley01-006.png

When we transform the CSS into XSLT and apply it, we now get:

<div lay:float="always" lay:id="d48e871" style="font-family:Times-Roman;font-size:9pt">
  <figure id="sE2" height="free" style="font-family:Times-Roman;font-size:9pt">
    <embed src="simpleExtend-2.svg?type=tree simpleExtend-3.svg"
       xpath="(.//svg:g[@id='main']/*,./*)[1]" stack-offset="50" TYPE="image/svg-xml"
       gap-y="5" height="20" columns="2" box="no" lay:same.scale="true"
       style="font-family:Times-Roman;font-size:9pt"/>
  </figure>
  <p class="caption" style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt; text-align:center;
     hyphenate:true;margin-top:5pt;font-weight:bold">Figure 6. A continual variation of Figure 5</p>
</div>
<p font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;
   padding-bottom:3pt;text-align:justify;hyphenate:true">The retained instructions (
   <code style="font-family:Courier; font-size:8pt;font-weight:bold">xsl:for-each</code>
   ...) are considered to be a null element as far as layout is concerned, will not be displayed
   in an SVG viewer, have no influence on geometry and will be copied in place in the resulting resolved flow
   <sup class="fnIndex" style="font-family:Times-Roman;font-size:6pt;vertical-align:4pt">8</sup>.</p>
<p class="footnote" lay:footnote="footnote"
   style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt;text-align:justify;hyphenate:true">
   <sup class="fnIndex" style="font-family:Times-Roman;font-size:6pt;vertical-align:4pt">8</sup>
   This is not so for topologically-modifying layouts, such as sort-by-size 
   but any eventual (graphical) results from these instructions would be placed correctly
   on resolving the parent's layout.</p>
<p font-family="Times-Roman"
   style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt;text-align:justify;hyphenate:true">
   There are a small number of possible types of propagation that may be declared in an
   <code font-family="Courier" style="font-family:Courier;font-size:8pt;font-weight:bold">@R:retain</code>
   attribute:</p>
<ul font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;margin-left:12pt">
  <li font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;text-align:justify;
      hyphenate:true;padding-bottom:3pt">
    <code font-family="Courier" style="font-family:Courier;font-size:8pt;font-weight:bold">true</code> 
    - execute the instruction and always retain.</li>
                          ...
</ul>

which then gets converted to SVG and ultimately a PDF looking like this:

Figure 9: The final publication to ACM standards in PDF

image ../../../vol17/graphics/Lumley01/Lumley01-007.png

Note

The observant will notice that the footnote styling alters. footnote was an artificial element used for authoring, whence the italics aided viewing in the styling editor. During a phase where such additional constructs are expanded to suitable XHTML equivalents, the footnote was converted into a sup index, placed at the same location and a p @class="footnote", paragraph, containing the relevant index reference, immediately following the enclosing paragraph. Such footnote bodies are extracted during pagination to build from the bottom of a column.

Not only has this technique been used on a variety of professional papers, it has also been employed to generate PDF product data sheets for a well-known XSLT engine supplier, and indeed the entire styling of the author's PhD thesis.

Possible developments and Conclusion

This paper is reporting on work in progress and as such there are a number of areas which would need development for black box use, some of which have been introduced earlier. The most pressing, providing robust parsing of the CSS, is reported in the next subsection. Others that await development are:

  • Supporting important (!important) declarations – these need to be separated from the humdrum, which means that rules may need to be split into two declaration sections and those that are important ranked in precedence (and thus eventual XSLT template priority) above all that are not.

  • Providing a route for inheritance and relative property declarations. As implied earlier, a mechanism using tunneled variables (xsl:param name="font-size" tunnel="yes") could be used to pass through computed values for parent elements in the recursive processing of the tree, from whence the relative value for the child can be computed. This will be complicated further by the need to process in multi-unit arithmetic (150% * 14pt), in a discrete universe (larger x-small) or establishing what are the base values in the absence of any CSS stylesheet setting (e.g HTML font size 3).

Robust CSS parsing

Using a robust CSS parser (operating entirely in XSLT perhaps, or as a CSS/SAC-based XSLT extension function) producing an XML tree would be advantageous and certainly needed for any full-scale automated production use. As an example, using an approximate CSS declaration grammar along with a grammar for the selectors, written in EBNF, we can use REx to generate an appropriate parser and obtain a raw parse tree for a CSS stylesheet, where element names are those of the appropriate productions.

These trees tend to be large, as noted by Pemberton and reduced forms are much more convenient. As in Lumley2014 it is a simple XSLT task to refine these trees to the essentials, by discarding most one-child elements in favour of their children, and migrating many of the operators to attributes. So for example we can parse the following into the accompanying trees[11].

Table IV

A reduced CSS parse tree

div.fred > pre.code {
    font-size: 70%;
}
.alf, #bert {
    stroke-width: 33;
}
div#fred svg|p.code.emphasis {
    font-size: 25pt;
}
<ruleset>
 <selectors_group>
  <selector>
   <element name="div">
    <class name="fred"/>
   </element>
   <combinator type="&gt;"/>
   <element name="pre">
     <class name="code"/>
   </element>
  </selector>
 </selectors_group>
 <proplist>
  <declaration property="font-size">
   <percentage value="70"/>
  </declaration>
 </proplist>
</ruleset>
<ruleset>
 <selectors_group>
  <selector>
   <class name="alf"/>
  </selector>
  <selector>
   <id id="bert"/>
  </selector>
 </selectors_group>
 <proplist>
 <declaration property="stroke-width">
  <number value="33"/>
 </declaration>
 </proplist>
</ruleset>
<ruleset>
 <selectors_group>
  <selector>
   <element name="div">
    <id id="fred"/>
   </element>
   <combinator type="space"/>
   <element name="p" prefix="svg">
    <class name="code"/>
    <class name="emphasis"/>
   </element>
  </selector>
 </selectors_group>
 <proplist>
  <declaration property="font-size">
   <dimension value="25" unit="pt"/>
  </declaration>
 </proplist>
</ruleset>

Note

The ideas of Pemberton on defining a grammar which specifically marks what is to be written to the final parse tree, and how, is quite attractive as it would automate the refinement phase, which is currently written in (simple) XSLT. It might even be tractable to use an Invisible XML grammar definition and project it into two forms – an EBNF to use for a parser generator, and a set of instructions (XSLT templates?) for parse tree refinement.

From these structures it is comparatively easy to either form the same pattern structures as shown earlier, or produce a new template-generation strategy. Here are an intermediate form and the resultant XSLT structures:

Table V

Intermediate form and final XSLT sections

<ruleset pattern="div[tokenize(@class,'\s+')='fred']
   /pre[tokenize(@class,'\s+')='code']"
   specificity="0,0,2,2">
 <selector pattern="div[tokenize(@class,'\s+')='fred']
   /pre[tokenize(@class,'\s+')='code']" specificity="0,0,2,2">
  <element name="div" pattern="div[tokenize(@class,'\s+')='fred']"
      specificity="0,0,1,1">
    <class name="fred" pattern="[tokenize(@class,'\s+')='fred']"
        specificity="0,0,1,0"/>
  </element>
  <combinator type="&gt;" pattern="/"/>
  <element name="pre" pattern="pre[tokenize(@class,'\s+')='code']"
      specificity="0,0,1,1">
   <class name="code" pattern="[tokenize(@class,'\s+')='code']"
        specificity="0,0,1,0"/>
  </element>
 </selector>
 <proplist>
   <declaration property="font-size" xpath="'70%'"/>
 </proplist>
</ruleset>

<ruleset pattern="*[tokenize(@class,'\s+')='alf']"
   specificity="0,0,1,0">
 <proplist>
   <declaration property="stroke-width" xpath="33"/>
 </proplist>
</ruleset>

<ruleset pattern="*[@id='bert']" 
   specificity="0,1,0,0">
 <proplist>
  <declaration property="stroke-width" xpath="33"/>
 </proplist>
</ruleset>

<ruleset pattern="div[@id='fred']
   //svg:p[tokenize(@class,'\s+')='code']
          [tokenize(@class,'\s+')='emphasis']"
   specificity="0,0,2,2">
 <proplist>
  <declaration property="font-size" xpath="'25pt'"/>
 </proplist>
</ruleset>
<xsl:template match="*[tokenize(@class,'\s+')='alf']"
    mode="body.css" as="attribute()*" priority="1.506">
 <xsl:next-match/>
 <xsl:attribute name="stroke-width" select="33"/>
</xsl:template>

<xsl:template match="div[tokenize(@class,'\s+')='fred']
   /pre[tokenize(@class,'\s+')='code']"
    mode="body.css" as="attribute()*" priority="1.507">
 <xsl:next-match/>
 <xsl:attribute name="font-size" select="'70%'"/>
</xsl:template>

<xsl:template match="*[@id='bert']"
    mode="body.css" as="attribute()*" priority="1.508">
 <xsl:next-match/>
 <xsl:attribute name="stroke-width" select="33"/>
</xsl:template>

<xsl:template match="div[@id='fred']
    //svg:p[tokenize(@class,'\s+')='code']
           [tokenize(@class,'\s+')='emphasis']"
    mode="body.css" as="attribute()*" priority="1.509">
 <xsl:next-match/>
 <xsl:attribute name="font-size" select="'25pt'"/>
</xsl:template>

For the first rule the appropriate properties for the children (suitable pattern, CSS specificity) are shown – these are then combined to make the generic pattern and compound specificity for the rule. These rules can then be sorted according to their specificity to produce a declared priority order for the resulting XSLT templates. The second rule has been split into two copies as their patterns have differing specificity and hence must be supported by XSLT rules operating at differing priorities.

Generic program translation

In a more general sense, this work is an example of a generic technique whereby portions of a program P, written in some language Y are used within some other program written in a language Z , by equivalence conversion of PPz. The success of this would be subject to a number of factors:

  • The degree of similarity in executional semantics of Y and Z. In this case CSS and XSLT push mode both work in a pattern-directed manner, with a defineable conflict resolution model[12]with a similar source tree → result tree execution and a common target data type, so most of the main semantics of CSS can be simulated closely by XSLT equivalent models.

    When the languages are not in such close agreement, the ease by which an interpreter for sections of Y can be constructed in Z becomes important. For example a program defined as simple non-cyclic spreadsheet could be interpreted in XSLT by casting as some XML datastructure coupled with an iterative tree-walker that uses a calculator function.

  • The extent (and accuracy) of the semantics of Y required within P. When these are large, significant problems may result. For example, if P requires accurate error handling, then it may well prove intractable to provide the same in the Z environment. In our case, CSS's execution error handling is basically ignore, so this doesn't cause a problem, and we're not pushing the edges of the CSS envelope. Equally we are prepared to push the CSS property onto the target, rather than pull the property from the CSS, giving the possibility of extraneous styling, on the basis that final user agents will ignore such minor infractions, permitting a much simpler and more generic model.

  • The need for execution performance. In general such equivalent-code simulations will always be slower than native implementations, sometimes very much slower.

  • The skill and craft of the program developer, especially where some feature requires a workaround to be supported.

  • The ease with which language Y can be parsed from code in Z. This can range from hand-crafted code (such as using regular expressions as described in section “Constructing the XSLT transform”) to using a specifically targeted parser.

The real advantage of using XSLT for such conversion is that once a parse tree has been established in XML, it is a uniform push-mode process to build other necessary data structures and ancilliary models which can then either be used to generate directly equivalent XSLT or be used by suitable interpreters to execute the required semantics of P These intermediate data structures can be examined easily (they serialize).

Conclusion

This paper has described using a program written in another language (CSS) as a component within an XSLT execution environment, by cross-conversion. The method was originally rather rough-and-ready and used for a single document purpose but has progressively been refined until it might be suitable for (some) semi-production situations, especially in the hands of craftsmen. Though the similarity of the major semantics of CSS and some parts of XSLT (pattern invocation, property projection) meant the necessary XSLT execution model was reasonably simple, there is still potential in the technique for other situations. The use of an XML-centric processing model starting from the source language parse tree and progressing on to intermediate data structures, coupled with XSLT3.0's rich toolbox, makes this field not only productive, but fun. I encourage others to try it.

Acknowledgements

I must thank the several organisations (ACM, University of Nottingham, Saxonica) who've posed document generation challenges that motiviated and exercised this need to employ CSS where it wasn't supported. I'm also grateful for the insightful and supportive comments from the Balisage reviewers who've encouraged me to increase the robustness of the method, both in terms of using a proper CSS parse and in addressing more complex issues especially in style inheritance.

References

[CSS] Bos, Bert et al, Editors. Cascading Style Sheets Level 2 Revision 2 (CSS 2.2) Specification. World Wide Web Consortium, Editors' Draft, 29 March 2016. [online] http://dev.w3.org/csswg/css2/

[CSS3] Çelik, Tantek et al, Editors. Selectors Level 3. World Wide Web Consortium, 29 September 2011. [online] http://www.w3.org/TR/css3-selector/

[Lumley2013] Lumley, John. Functional, Extensible, SVG-based variable documents Proceedings of the 2013 ACM symposium on Document engineering, pp 131-140. doi:10.1145/2494266.2494274. [online] http://dl.acm.org/citation.cfm?doid=2494266.2494274

[Lumley2014] Lumley, John. Analysing XSLT Streamability. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). doi:10.4242/BalisageVol13.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html

[LumleyPhD] Lumley, John. Documents as Functions. University of Nottingham, PhD Thesis. June 2012 [online] http://eprints.nottingham.ac.uk/12631/

[Pemberton2013] Pemberton, Steven. Invisible XML. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:10.4242/BalisageVol10.Pemberton01. [online] http://www.balisage.net/Proceedings/vol10/html/Pemberton01/BalisageVol10-Pemberton01.html

[REx] Rademacher, Gunther. REx Parser Generator. [online] http://www.bottlecaps.de/rex/

[XPath3.0] Robie, Jonathan, Chamberlin, Don, Dyck, Michael, and Snelson, John, Editors. XML Path Language (XPath) 3.0. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-30/

[XPath.FO] Kay, Michael, Editor. XQuery and XPath Functions and Operators 3.0. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-functions-30/

[XSLT2.0] Kay, Michael, Editor. XSL Transformations (XSLT) Version 2.0 (Second Edition). World Wide Web Consortium, 23 January 2007. [online] http://www.w3.org/TR/xslt20/

[XSLT3.0] Kay, Michael, Editor. XSL Transformations (XSLT) Version 3.0. World Wide Web Consortium, 2 October 2014. [online] http://www.w3.org/TR/xslt-30/



[1] Lector, si exemplum requiris, circumspice

[2] Or if it does, it requires various forms of black art or eldritch wizardry.

[3] At least the half-dozen or so the author has needed, in production of professional papers (ACM, Balisage, XMLLondon ...), product marketing documents and his own PhD.

[4] The author argues in Lumley2013 that such good XML citizen behaviour by processing tools is important for the production of compound document processing architectures. Note that if the content of an SVG desc element contained markup elements (such as in the XHTML namespace) this styling might well effect those components.

[5] Reusing the same XSLT variable name in sibling positions is legal – the scope for each is along a following-sibling::*/descendant-or-self::* compound axis, until overridden.

[6] For some XML dialects this might be true by default. SVG has default interitance for textural styling, even after CSS style application.

[7] Indeed one of the samples for the generator is for CSS selectors.

[8] In XSLT3.0 a set of map() bindings for these regular expressions might increase the coherence.

[9] These papers were all reporting on a highly extensible and highly-adaptable variable document architecture. Not only was it more coherent and robust to use the architecture itself to form the papers, for example supporting in-document evaluation of examples, it was also behoven on the authors to demonstrate that the architecture was capable of producing documents to a high professional standard. Close visual inspection of the resulting PDF can tell it came from neither the Word or LaTeX templates, but the variation is less than between those two forms themselves.

[10] Much as this paper is being authored using the Balisage CSS.

[11] The author is one of those who prefers keeping properties in attributes when they are singular rather than as text values of elements.

[12] Other target languages that might have similar tendencies include Prolog, Mathematica, the Haskell/ML family of functional languages and SNOBOL descendants.

John Lumley

jωL Research

Saxonica

A Cambridge engineer by background, John Lumley created the AI group at Cambridge Consultants in the early 1980s and then joined HPLabs Bristol as one of its founding members. He worked there for 25 years, managing and contributing in a variety of software/systems fields, latterly specialising in XSLT-based document engineering, in which he subsequently gained a PhD in early retirement. He is currently helping develop the Saxon XSLT processor for Saxonica and consulting on various aspects of using XSLT.