How to cite this paper

Kalvesmaki, Joel. “Schematron as XSLT Interface, and Advanced Validation Models.” Presented at Balisage: The Markup Conference 2025, Washington, DC, August 4 - 8, 2025. In Proceedings of Balisage: The Markup Conference 2025. Balisage Series on Markup Technologies, vol. 30 (2025). https://doi.org/10.4242/BalisageVol30.Kalvesmaki01.

Balisage: The Markup Conference 2025
August 4 - 8, 2025

Balisage Paper: Schematron as XSLT Interface, and Advanced Validation Models

Joel Kalvesmaki

Joel Kalvesmaki is a software developer for the United States Government Publishing Office, the creator of the Text Alignment Network, and a scholar specializing in early Christianity.

© Joel Kalvesmaki, released under a Creative Commons Attribution 4.0 International License

Abstract

Schematron (ISO/IEC 19757-3:2008, 2016, 2020) is widely enjoyed as perhaps the most expressive and powerful of all XML schema languages. In this article, I argue that Schematron is profitably seen as a kind of API (application programming interface) for a constrained flavor of XSLT. This outlook has immense benefits, because when we act on that insight, Schematron’s horizons expand to those of XSLT. Anything an XSLT can create can be reported in Schematron, and together they can conquer the world. I present four models of validation that closely partner XSLT and Schematron, and offer suggestions on how that partnership can be strengthened in the future.

Table of Contents

Introduction
Schematron and XSLT
Schematron Implementations
Models of XSLT-Driven Schematron
Model A: Only XSLT Variables and Functions
Model B: Validandum Transformed
Model C: Context-Agnostic Schematron
Model D: Push-Me Pull-You
The Schematron-XSLT Partnership

Introduction

Schematron (ISO/IEC 19757-3:2008, 2016, 2020) is widely enjoyed as perhaps the most expressive and powerful of XML schema languages. Although the claim would be difficult to prove formally, I am confident in asserting that any structural rules expressed in an existing schema (DTDs, RELAX-NG, XSD) can be expressed in a Schematron file.[1] But the reverse is not the case, because Schematron’s query binding language allows for an enormously expressive repetoire of validation checks, e.g., those that engage resources outside the context of the file being validated. (The file to be validated is what the Schematron specifications call an instance to be validated, but for convenience in this article I will call it the validandum.) For example, a Schematron routine can look for other files and use that material to determine whether the content of an attribute or element in the validandum is accurate.

The breathtaking expressiveness of Schematron is seen in the rather self-effacing graphic on the home page of schematron.com:

Graphic from schematron.com, depicting the expressiveness of Schematron
                    relative to three other schema languages.

In reality, the Schematron ellipse should subsume all others, except DTDs, which is capable of doing things impossible for other schema languages.[2]

In this paper, I argue that when Schematron has XSLT as its query language binding, the Schematron file becomes, in effect, a kind of API (application programming interface) for a constrained flavor of XSLT. By adopting and embracing this paradigm, Schematron programmers can go well beyond the merely apparent limits of Schematron and develop sophisticated validation routines. Let XSLT perform the complex analysis, and let Schematron report the results.

After surveying the formal relationship between Schematron and XSLT, I explore the field of Schematron implementations/processors. I then present four different advanced models that pair XSLT with Schematron to do validation that would be inconvenient, difficult, or impossible, were we compelled to stay within the constraints of a Schematron file.

This article is written with the assumption that readers have used Schematron before, and understand the basics, particularly Schematron’s pattern, rule, assert, and report elements. If that is not the case, I invite you to visit https://schematron.com/, learn about the technology, and begin to use and enjoy it.

Schematron and XSLT

Schematron is a standard of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Versions published in 2008 and 2016 were open source and freely accessible, but the 2020 version, which only slightly modified the 2016 version, was withdrawn from free public access, to sharp public criticism (Jeliffe_2020). At the time of this writing, even the 2016 version is no longer accessible at ISO’s Publicly Available Standards page. A new version of the Schematron standard is expected in September 2025. My comments below take into account the 2016 version of Schematron.

Within the specifications, XSLT is but one of many possible query language bindings (QLB). A QLB is required, to declare the language that should be used to provide general queries, to set the context scope for rules, and to express true/false tests for reports and assertions (Schematron_2016 6.4, Siegel 2022a). Among QLBs, XSLT is given a privileged place insofar as XSLT version 1 is the default QLB, if one is not specified.

If XSLT is the QLB, certain XSLT elements are permitted within the Schematron file. For XSLT version 1, <xsl:key> may be used before <sch:pattern> elements.[3] <xsl:copy-of> is also supported, but only without child nodes and only within <sch:property>.

If XSLT version 2 or 3 is chosen, version 1 constructs are allowed, as well as <xsl:function>, again, before any <sch:pattern>.

To an XSLT programmer, the choices made by the authors of the specifications might seem unusual. <xsl:copy-of> is already defined by the XSLT specifications as an empty element, so it is unclear why the Schematron specifications tell us not to give it children nodes. And if the authors of the specifications thought it important to restrict what is or is not allowed in that particular XSLT element, one wonders if there are similar tacit restrictions to <xsl:function> and <xsl:key>. Some decisions seem arbitrary, such as the required position of <xsl:function> and <xsl:key> or why these particular elements and no others were chosen. In particular, as will be argued below, the lack of support for <xsl:import> or <xsl:include> represent a lost opportunity (but the 2025 version will finally seize on that opportunity). Even without such explicit support, there is a workaround, albeit somewhat inconvenient, namely the use of fn:transform(), which is accessible in a Schematron without any XSLT constructs. That is, by invoking fn:transform() (obviously, with an xslt3 or xpath3 QLB) you can access a complete XSLT library without restriction, as this example of a global Schematron variable demonstrates:

<sch:let name="validandum-transformed" value="
        transform(
        map {
            'stylesheet-location': 'validator.xsl',
            'source-node': root()
        }
        )?output"/>

Schematron Implementations

A Schematron implementation is not well defined in the specifications, but the implications are that it has a wide swathe of responsibility.[4] An implementation is required to take a Schematron file, a validandum, and a binding language, and produce a validation report in the form of a Schematron Validation Report Language (SVRL) XML file.

An active list of implementations of Schematron is maintained at https://github.com/Schematron/awesome-schematron. I summarize them here:

  1. Skeleton, developed by Rick Jelliffe, an implementation based on XSLT. An XSLT processor (such as Saxon) takes a Schematron file as primary input and transforms it using the Skeleton XSLT library, resulting in a new XSLT file as output. This new XSLT file is designed to create SVRL reports. The XSLT processor then takes the validandum as primary input and transforms it using the new SVRL-creating XSLT file, resulting in an SVRL report as primary output. How this SVRL report is to be used is determined downstream from the implementation. Skeleton appears to be the basis for Schematron implementations in both XMetaL and Oxygen. When an XML file is validated in Oxygen, messages are prefaced [ISO Schematron], an alias for Skeleton. The Skeleton project was archived and rendered read-only in October 2020: https://github.com/Schematron/schematron.

    Skeleton enjoys numerous wrappers or handlers. Version 1.5 (from 2002) formed the core of the Topologi Schematron Validator, a Windows application. Norm Tovey-Walsh has developed support for Skeleton directly in MarkLogic at ML-Schematron: https://github.com/ndw/ML-Schematron. A Java wrapper for Skeleton called ph-schematron has been written by Philip Helger, https://github.com/phax/ph-schematron. At https://github.com/Schematron there are XQuery modules to use Skeleton in BaseX (https://basex.org/) and eXist databases (http://exist-db.org).

  2. SchXslt, developed by David Maus, an implementation based on XSLT. The process is similar to Skeleton, in that the XSLT processor passes the Schematron file as primary input through the SchXslt XSLT library, resulting in a new SVRL-creating XSLT file as output. The XSLT processor then passes the validandum as primary input through the new SVRL-creating XSLT file, and the result is an SVRL report as primary output, to be processed or used downstream from the implementation. Source code is available at Codeberg: https://codeberg.org/SchXslt/schxslt2. Helger’s ph-schematron (see above) makes SchXslt available in Java.

  3. XQuery for Schematron (XQS), developed by Andrew Sales, restricts itself to Schematron files that use XQuery as the QLB. An XQuery processor applies an XQS XQuery to a Schematron file, resulting in a compiled XQuery script. The XQuery processor then applies this XQuery script to the validandum to generate the SVRL report. Because this is an XQuery-centered implementation, it does not support <xsl:function> or <xsl:key> constructs. Source code is available at Github: https://github.com/AndrewSales/XQS.

  4. PySchematron, developed by Robbert Harms, represents a pure implementation of Schematron within Python. Because of this approach, only XPath expressions are supported, and not XSLT functions. The source code is available at Github: https://github.com/robbert-harms/pyschematron.

  5. Node-schematron, developed by Wybe Minnebo, is a limited Javascript implementation of Schematron. XSLT functions are not supported. Because it relies on fontoxpath (https://www.npmjs.com/package/fontoxpath), XPath 3.1 is supported. Source code is available on Github: https://github.com/wvbe/node-schematron.

The last two implementations, PySchematron and node-schematron, exemplify the classic implementation model. A Schematron file and the validandum go straight into the processor, and out comes an SVRL report, to be handled downstream as the user or implementation wants.

Note, however, the challenges in these two implementations, in that they do not offer full XSLT support.

The first three implementations—Skeleton, SchXslt, and XQS—follow a different model, and have much in common. All of them rely upon an XSLT or XQuery processor to do the actual processing. In none of the implementations is the Schematron file applied directly to a validandum. Rather, the XSLT or XQuery processor uses the implementation to convert the Schematron to an intermediary script (XSLT or XQuery), and then that same processor uses that intermediary script to convert the validandum to an SVRL report. All three implementations use a two-step process, where XSLT or XQuery code works hand-in-hand with an XSLT or XQuery processor. The source implementation code is at the heart of the first step, and its resultant SVRL-creating code is at the heart of the second step. When we think of an XQuery, XSLT, or XProc processor, we think of a single, encapsulated piece of software. But in these three implementations, a Schematron processor is not a single thing. Rather, it is a collaboration between three tools: the implementation code base, an XQuery or XSLT processor, and the compiled code.

We are fortunate to have implementations such as Skeleton and SchXslt, where the underlying XSLT process is open to investigation. A look inside is inspiring, and suggestive of a special, closely mapped relationship between Schematron and XSLT.

Consider the first pass. The XSLT processor takes the Schematron file as primary input and passes it through the implementation code’s template modes, for example in this snippet from SchXslt’s compiler:

<xsl:template match="sch:rule" mode="schxslt:compile">
  <xsl:param name="mode" as="xs:string" required="yes"/>
  <xsl:param name="typed-variables" as="xs:boolean" required="yes"/>

  <xsl:call-template name="schxslt:check-multiply-defined">
    <xsl:with-param name="bindings" select="sch:let" as="element(sch:let)*"/>
  </xsl:call-template>

  <template match="{@context}" priority="{count(following::sch:rule)}" mode="{$mode}">
    <xsl:sequence select="(@xml:base, ../@xml:base)[1]"/>
In this code (at the time of writing, lines 233-242 of compile-2.0.xsl, seven folders deep in the source code), a <sch:rule> is intercepted by the schxslt:compile mode (line 233). Schematron variables are checked for duplication (lines 237-239). Then the process converts the <sch:rule> into an <xsl:template>, by binding the Schematron rule’s @context to the XSLT @match, and assigning a priority that reflects the Schematron constraint that the first <sch:rule> in a <sch:pattern> wins (the opposite of XSLT’s rule that among competing template modes of the same priority level, the last one wins). The value of the XSLT @mode is provided by a generated id for the Schematron <sch:pattern> (line 241). By copying any local @xml:base in the Schematron file (line 242), the compiled/intermediate XSLT can also take the appropriate base.

The close relationship between the input Schematron code and output SVRL-creating XSLT code is best seen in an example. Suppose we have this simple Schematron rule:

<sch:rule context="foo" xml:base="..">
    <sch:report test="true()">Hello world</sch:report>
</sch:rule>

The SchXslt compiler produces the following snippet of SVRL-creating XSLT (abbreviated for clarity):

<xsl:template match="foo" priority="0" mode="d7e3" xml:base="..">
   . . . . . . .
       <xsl:if test="true()">
          <svrl:successful-report xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
                                  location="{path(.)}"
                                  xml:lang="en">
             <xsl:attribute name="test">true()</xsl:attribute>
             <svrl:text>Hello world</svrl:text>
          </svrl:successful-report>
       </xsl:if>
   . . . . . . .
      <xsl:next-match>
         <xsl:with-param name="schxslt:patterns-matched"
                         as="xs:string*"
                         select="($schxslt:patterns-matched, 'd7e3')"/>
      </xsl:next-match>
   . . . . . . .
</xsl:template>

Skeleton’s output for this same example is similar:

<xsl:template match="foo" priority="1000" mode="M0">
   <svrl:fired-rule xmlns:svrl="http://purl.oclc.org/dsdl/svrl" context="foo"/>
   <xsl:if test="true()">
      <svrl:successful-report xmlns:svrl="http://purl.oclc.org/dsdl/svrl" test="true()">
         <xsl:attribute name="location">
            <xsl:apply-templates select="." mode="schematron-select-full-path"/>
         </xsl:attribute>
         <svrl:text>Hello world</svrl:text>
      </svrl:successful-report>
   </xsl:if>
   <xsl:apply-templates select="*|comment()|processing-instruction()" mode="M0"/>
</xsl:template>

We see in these examples very close relationships between the Schematron nodes and the output XSLT that will eventually create the SVRL report. <sch:rule> corresponds to <xsl:template>, @context to @match, @xml:base to @xml:base, <sch:report> to <xsl:if>, @test to @test, and Hello world to <svrl:text>.

When studying the SVRL-making XSLT that Skeleton and SchXslt create, the nearly one-to-one correlation between Schematron construct and XSLT construct tempts me to think, If I wanted, I could dispense with Skeleton or SchXslt, and I can even dispense with the Schematron file itself, and simply write and modify an XSLT file that makes an SVRL report. I would not act on that impluse, but its motivation is clear: the original Schematron file is losslessly represented by the SVRL-making XSLT. Just as the former was converted into the latter, you could probably develop an XSLT stylesheet that transforms the latter into the former. That is to say, the Schematron and the SVRL-creating XSLT are interchangeable. When you are writing one, you are writing the other.

This leads to my observation: when we write Schematron, it is possible to think of ourselves writing a kind of constrained XSLT. This is true regardless of whether the implementation we are using is reliant on XSLT or not. All Schematron implementations must produce the same results. Regardless of what is happening in the black box of a given implementation, we can picture ourselves writing an SVRL-creating XSLT at the same time we are writing Schematron.

Models of XSLT-Driven Schematron

As I mentioned earlier, the XSLT constructs <xsl:import> and <xsl:include> are not allowed by the current specifications. But in reality, the two implementations that fully support all Schematron features (SchXslt and Skeleton) are lenient on this point, and support the constructs. Which means that it is possible to use Oxygen or any other application that uses those implementations to let our Schematron files tap into the full power of XSLT. There are a number of benefits to doing so:

  • Unlike <sch:let>, <xsl:variable> allows typing (@as), to better control the type of value that is being processed.[5] That means that any Schematron global variable can be pushed to the XSLT tier.

  • One can use <xsl:template>, <xsl:accumulator>, <xsl:mode>, and a host of other top-level declaration constructions that would not be appropriate, or allowed, to put in a Schematron file.

  • With one simple <xsl:import> declaration, one can dispense completely with any other top-level XSLT elements in the Schematron file (such as <xsl:key> or <xsl:function>). Doing so provides a clean division of responsibilities. The XSLT becomes responsible for functionality and processing, and the Schematron reports the results.

  • Once functional code is moved out of Schematron into the XSLT library, it can be repurposed, perhaps to drive other Schematron files, or to be used in other XSLT projects.

There are other benefits we will encounter, as we proceed through several XSLT-Schematron models I have used in the past. What they all have in common, however, is an initial setup.
  1. Make XSLT the QLB.

  2. Remove any top-level XSLT elements from a Schematron file (except maybe <xsl:param>), and instead provide a single <xsl:import> or <xsl:include> that points to an XSLT library. Avoid putting more than one of these in your Schematron file. Instead, coordinate any other include or import statements within the main XSLT file. This allows better control and diagnostics over the import process.

  3. Find any namespaces invoked in the XSLT code for functions and global variables, and set them up in the Schematron file, as <sch:ns> elements.

The setup described above is conformant with Schematron rules, with the sole exception of the <xsl:import>/<xsl:include> statement. Because that exception to the specifications is supported by Skeleton and SchXslt, we are on solid ground.

Examples for the four models rely upon evaluation of a simple XML file with data about widgets, which will be checked for consistency and other business concerns:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="widgets.sch" type="application/xml" 
         schematypens="http://purl.oclc.org/dsdl/schematron"?>
<widgets>
    <widget id="a">
        <name>Sparklore</name>
        <name>Sparklore</name>
        <released>2025-01-01</released>
        <unit-cost currency="USD">5.00</unit-cost>
        <count>2</count>
    </widget>
    <widget id="a">
        <name>Flumberbund</name>
        <released>2024-11-15</released>
        <unit-cost currency="USD">15.00</unit-cost>
        <count>-3</count>
    </widget>
</widgets>

Sample validandum widgets.xml

Model A: Only XSLT Variables and Functions

Sometimes we just need some basic XSLT functionality. A project is just starting out, and the needs are minimal. But we want to take advantage of the benefits that come with doing our primary processing in XSLT. In this model, we rely upon the XSLT file for our global variables and functions.

Here is a simple example of a paired Schematron file with supporting XSLT file:

Model A Schematron file widgets.sch

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">
    <xsl:import href="widgets.xsl"/>
    <sch:ns prefix="f" uri="my-functions"/>
    <sch:pattern>
        <sch:rule context="widget">
            <sch:report test="@id = $duplicate-widget-ids">Duplicate id.</sch:report>
        </sch:rule>
        <sch:rule context="widget/name">
            <sch:report test=". = f:duplicate-strings(../name)">Names should not be
                duplicated.</sch:report>
        </sch:rule>
    </sch:pattern>
</sch:schema>

Model A XSLT file widgets.xsl

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="my-functions"
    exclude-result-prefixes="#all" version="3.0">

    <xsl:variable name="duplicate-widget-ids" as="xs:string*"
        select="f:duplicate-strings(//widget/@id)"/>

    <xsl:function name="f:duplicate-strings" as="xs:string*">
        <xsl:param name="strings" as="xs:string*"/>
        <xsl:for-each-group select="$strings" group-by=".">
            <xsl:if test="count(current-group()) gt 1">
                <xsl:sequence select="current-grouping-key()"/>
            </xsl:if>
        </xsl:for-each-group>
    </xsl:function>
</xsl:stylesheet>

In the example above, the XSLT file defines the function f:duplicate-strings, which also drives the binding of the global variable $duplicate-widget-ids. Both the function and the variable are invoked by the Schematron file, which checks every <widget> for duplicate ids, and uses f:duplicate-strings to make sure that no <widget> has a duplicate <name> .

Obviously, both the global variable and the function could have been expressed in the Schematron file. But one can easily imagine functions and variables that need much more complex operations. When such expansion is required, we simply expand our XSLT suite, without complicating our Schematron file.

Model A depends upon a strong asymmetric contract between the XSLT and Schematron files. In writing the Schematron file, one needs to know what the XSLT is creating, and but those who write the XSLT file might not need to know about the Schematron file. This allows the XSLT component to develop into a full-fledged library that may have purposes much larger than the catalyzing Schematron file. In fact, that XSLT library may be driving multiple Schematron processes.

Model A is already in use. It is the approach taken by Matthieu Ricaud-Dussarget in his XSLT Quality project, https://github.com/mricaud/xslt-quality.

Model B: Validandum Transformed

There are more complex situations, where it is wholly more efficient to do some full transformations of the validandum, then evaluate as needed. Here is an example of how that might be handled in a Schematron file and its companion XSLT file. Here we treat the widget XML file as a running series of orders, where each <widget> represents change in inventory, and we wish to flag places where the total inventory level dips below a certain point. We set up the desired reports in the Schematron file:

Model B Schematron file widgets.sch

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">
    <xsl:import href="widgets.xsl"/>
    <sch:ns prefix="f" uri="my-functions"/>
    <sch:pattern>
        <sch:rule context="widget">
            <sch:let name="self-evaluated" value="key('get-by-path', path(.), $input-evaluated)"/>
            <sch:let name="count" value="xs:integer($self-evaluated/running-widget-count)"/>
            
            <sch:report test="$count lt 10 and $count gt 0" role="warn">We are low on widgets! 
                <sch:value-of select="$self-evaluated/running-widget-count"/></sch:report>
            <sch:report test="$count le 0">We are out of widgets! <sch:value-of
                    select="$self-evaluated/running-widget-count"/></sch:report>
        </sch:rule>
    </sch:pattern>
</sch:schema>

Now, in the companion XSLT file, we use an accumulator to keep track of the running inventory.

Model B XSLT file widgets.xsl

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="my-functions"
    expand-text="yes"
    exclude-result-prefixes="#all" version="3.0">
    
    <xsl:key name="get-by-path" match="*" use="@_path"/>
    
    <xsl:variable name="input-prepped" as="document-node()">
        <xsl:apply-templates select="/" mode="prep"/>
    </xsl:variable>
    <xsl:variable name="input-evaluated" as="document-node()">
        <xsl:apply-templates select="$input-prepped" mode="eval"/>
    </xsl:variable>
    
    <xsl:mode name="prep" on-no-match="shallow-copy"/>
    <xsl:template match="*" priority="1" mode="prep">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:attribute name="_path" select="path(.)"/>
            <xsl:apply-templates mode="#current"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:accumulator name="running-widget-count" as="xs:integer" initial-value="0">
        <xsl:accumulator-rule match="widget/count" select="$value + xs:integer(.)"/>
    </xsl:accumulator>
    <xsl:mode name="eval" on-no-match="shallow-copy" use-accumulators="#all"/>
    
    <xsl:template match="widget" mode="eval">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <running-widget-count>{accumulator-after('running-widget-count')}</running-widget-count>
            <xsl:apply-templates mode="#current"/>
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

Despite the trim Schematron file, the underlying process is busy. The bulk of the evaluation takes place in the XSLT file, through a two-pass transformation. A copy of the validandum is made, with a temporary @_path attribute added to each element, to create a dereferencable unique id, and the results are bound to $input-prepped. In the second pass, bound to global variable $input-evaluated, an accumulator is consulted, and a running inventory is added to each <widget>.

Back in the Schematron file, every <widget> fires a rule. The local variable $self-evaluated is bound to its counterpart from the global variable $input-evaluated, i.e., the context node transformed by the main XSLT file, which has been looking for changes in inventory. The rule then evaluates the local variable, in this case checking for a running count of widgets, warning the user when it drops below 10 and raising an error when negative.

Model B requires a strong contract between the XSLT and Schematron files. In writing the Schematron file, one needs to know what the XSLT is creating, and in writing the XSLT file, one needs to know what patterns and analysis are of interest.

A benefit to this model is that it is quite flexible and extensible: you can create as many global variables as you like, with the validandum transformed in different ways. Such processing is straightforward in XSLT, and the results are immediately accessible to any <sch:report> or <sch:assert>.

I am uncertain if any public projects use Model B, but I have used it for a number of projects for the Federal Government. One such project is the validation of XML files for the Federal Register. The Federal Register has complex requirements about where footnote references and the footnotes themselves must be placed. Determining whether footnote references and footnotes have been placed accurately, and making sure that the footnote numerals are sequential and pairwise correct requires complex analysis of the entire document. It is definitely not a job for Schematron. But XSLT can do this analysis efficiently, and provide the results to the Schematron file to report anything of concern or interest.

Model C: Context-Agnostic Schematron

It does not take long for a Model B XSLT-Schematron project to grow to the point where we want the XSLT file to simply do all the evaluation, and use Schematron simply to report the results. A common motivation for taking this extra step is the discovery that the logic being put into a traditional Schematron file needs to be leveraged and repurposed outside the Schematron file. For example, the Schematron reports and asserts may be giving error codes, but we get to the point where we need to maintain an inventory of error codes separately, to integrate with documentation or a regression test suite. Another reason might be the need to take advantage of XSLT’s template-based approach to shadowing, which is more sophisticated than Schematron’s. And yet another reason might be that our Schematron Quick Fix library is getting overly large, and needs to be simplified and refactored into fix typologies.

With these high-level needs, we abandon the thought of making a <sch:rule> for every context. We start by making a very simple and small Schematron file with one rule, and one context, firing for every element. The XSLT’s job is to reduce the validandum to an arbitrarily constructed error and warning report, integrated with possible fixes. The Schematron file simply checks an XSLT-created global variable to see if that particular element has any warnings, errors, or quick fixes.

Here is an example of Model C, beginning with the Schematron file:

Model C Schematron file widgets.sch

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">
    <xsl:import href="widgets.xsl"/>
    <sch:ns prefix="f" uri="my-functions"/>
    <sch:pattern>
        <sch:rule context="*">
            <sch:let name="errors" value="key('get-by-path', path(.), $all-errors)"/>
            <sch:report test="exists($errors[1])" sqf:fix="replace-element">
                <sch:value-of select="$errors/message"/></sch:report>
            
            <sqf:fix id="replace-element" use-for-each="$errors/fix[@type = 'replace-element']">
                <sqf:description>
                    <sqf:title><sch:value-of select="$sqf:current/@message"/></sqf:title>
                </sqf:description>
                <sqf:replace select="$sqf:current/*"/>
            </sqf:fix>
        </sch:rule>
    </sch:pattern>
</sch:schema>

Before presenting the XSLT file, which is longer than previous examples, it is worth noting that the Schematron file is now context-agnostic. The Schematron file is not looking or specifically named elements. It simply fires at every element, and checks the XSLT for a list of relevant errors, and reports them. If the error includes any fixes, they are passed to the appropriate SQF construct. For the sake of simplicity, I have used only one SQF fix type here, but the types of SQFs can and do grow. The example also anticipates only the first error for a given element. If there are multiple errors, there are creative ways these can be administered in the Schematron file, depending upon what is desired.

The XSLT that drives the whole process is as follows:

Model C XSLT file widgets.xsl

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="my-functions"
    expand-text="yes"
    exclude-result-prefixes="#all" version="3.0">
    
    <xsl:key name="get-by-path" match="*" use="@_path"/>
    
    <xsl:variable name="input-prepped" as="document-node()">
        <xsl:apply-templates select="/" mode="prep"/>
    </xsl:variable>
    <xsl:variable name="all-errors" as="document-node()">
        <xsl:document>
            <errors>
                <xsl:apply-templates select="$input-prepped" mode="errors"/>
            </errors>
        </xsl:document>
    </xsl:variable>
    
    <xsl:mode name="prep" on-no-match="shallow-copy"/>
    <xsl:template match="*" priority="1" mode="prep">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:attribute name="_path" select="path(.)"/>
            <xsl:apply-templates mode="#current"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:accumulator name="running-widget-count" as="xs:integer" initial-value="0">
        <xsl:accumulator-rule match="widget/count" select="$value + xs:integer(.)"/>
    </xsl:accumulator>
    <xsl:mode name="errors" on-no-match="shallow-skip" use-accumulators="#all"/>
    
    <xsl:template match="widget[accumulator-after('running-widget-count') lt 0]/count" mode="errors">
        <error n="1">
            <xsl:copy-of select="@_path"/>
            <message>We are out of widgets! {accumulator-after('running-widget-count')}</message>
            <fix type="replace-element" message="Increase count so running count is zero.">
                <xsl:copy>
                    <xsl:copy-of select="@* except @_path"/>
                    <xsl:value-of select="xs:integer(.) - accumulator-after('running-widget-count')"/>
                </xsl:copy>
            </fix>
            <fix type="replace-element" message="Add 1,000 to count.">
                <xsl:copy>
                    <xsl:copy-of select="@* except @_path"/>
                    <xsl:value-of select="xs:integer(.) + 1000"/>
                </xsl:copy>
            </fix>
        </error>
        
        <xsl:next-match/>
    </xsl:template>
    
</xsl:stylesheet>

As with Model B, we take the preliminary step of adding the temporary attribute @_path to each element. The next pass, however, does its best to reduce the original input to a simple set of elements. The errors template mode is set to shallow-skip (not shallow-copy, as in Model B), to reduce the validandum to a (hopefully small) list of errors. For a long validandum with few errors, the query time in each <sch:rule> is minimal. The bulk of work (and therefore performance concerns) are restricted to the XSLT tier.

In Model C, we customize our own error report, along with potential quick fixes, using whatever structure makes sense to us. In the example above, there is an <error> element that takes an error code, a message, and zero or more <fix>es. It has been my experience that some knotty SQFs benefit from sophisticated XSLT adjustments. Because the SQF process has been moved into the XSLT tier, we can provide some creative fixes.

Model C depends upon a symmetric but loose contract between the XSLT and Schematron files. In writing the Schematron file, one does not need to much about what the XSLT is creating, aside from the structure of the errors and fixes. Those who write the XSLT file might need to know how those errors will be reported and those fixes will be applied, but that is all, because the Schematron file is generic. The XSLT tier essentially subsumes the traditional responsibility of Schematron.

The error or warning report created by the XSLT is of an arbitrary structure. How that report is designed is a private affair, exclusively between the XSLT file and the Schematron handler.

This particular approach is eminently expansible. A project can set up error codes and messages in a stand-off file, and create a suite of private functions to more concisely build errors, warnings, and fixes.

I have adopted Model C widely, the best public example being the validation system behind the Text Alignment Network, https://textalign.net/. In a project I am developing for the Government Publishing Office, pertaining to marking up legislation that comes from the U.S. Congress, I have used Model C to do some very nuanced validation, such as checking to see whether conjunctions (and / or) have been properly placed in the complex hierarchy. Each error or warning is assigned a code. If a user wishes to ignore a particular error, they can squelch specific warnings or errors through the Oxygen interface, which writes the preferences to a stand-off configuration file for the user. The Schematron file does not need to worry about these matters. The XSLT tier checks external resources and makes the decisions about whether a warning or error should be fired.

Model D: Push-Me Pull-You

Models A, B, and C cover common situations. Model D handles a rarer, but significant, use case. It should appeal to those who appreciate XSLT recursion.

Let us suppose we have adopted model A or B, and it is dependent upon an underlying XSLT library. The two become tightly coupled. The XSLT library matures and begins to serve a variety of needs, not just the Schematron process. The library grows to the point where it needs to validate a file and incorporate the report into other processes. That is, the XSLT library needs to get validation information on an XML file and do something with it.

We have this feature already, in the case of other schemas. For example, one can use Charles Foster’s Saxon Jing project (https://www.cfoster.net/saxon-jing/) within XSLT code to validate any XML file against a given RELAX-NG schema. Or in the case of XML Schema one can use saxon:evaluate() (https://www.saxonica.com/documentation12/index.html#!functions/saxon/evaluate) or <xsl:evaluate> to validate any XML file. Once bound to an XSLT variable, such error reports can be used very creatively. And of course Skeleton or SchXslt could be leveraged to convert a Schematron into SVRL-making XSLT.

But what if the XSLT library that needs the SVRL report is the very same library upon which the Schematron file depends? We run up against the constraint of mutual dependence. Two XSLT files may not include each other (although that may change in XSLT 4.0). We want to be able to use the Schematron in its own right. But we also want to compile it to an SVRL-making XSLT file that can be grafted within the main XSLT project.

This model, although the most sophisticated of the four we have so far examined, has the most in common with Model A. That is, the Schematron file expresses all our rules, and draws from the XSLT tier only for variables and functions. The Schematron file might be quite lengthy, with many patterns and rules.

There are solutions to the problem of mutual inclusion, and the one I have opted for here requires introducing to the Schematron file a static XSLT parameter. <xsl:param> is supported by the Skeleton implementation, but not by SchXslt (at this writing). But SchXslt’s MIT license allows us to make the necessary modifications.

We also make sure that we use <xsl:import> (not include), because we are going to declare the same static parameter in the XSLT library, which will import the XSLT that is built by Skeleton or SchXslt.

The abstract idea is made clear with some example code, with key sections set in boldface.

Model D Schematron file widgets.sch

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="my-functions" queryBinding="xslt3">
    
    <xsl:param name="schematron-is-main-point-of-entry" as="xs:boolean" select="true()" static="yes"/>
    <xsl:import href="widgets.xsl" use-when="$schematron-is-main-point-of-entry"/>
    
    <sch:ns prefix="f" uri="my-functions"/>

    <sch:pattern>
        <sch:rule context="widget">
            <sch:report test="@id = $duplicate-widget-ids">Duplicate id.</sch:report>
        </sch:rule>
        <sch:rule context="widget/name">
            <sch:report test=". = f:duplicate-strings(../name)">Names should not be
                duplicated.</sch:report>
        </sch:rule>
    </sch:pattern>
</sch:schema>

In the above code sample, note the use of the static parameter and the import statement, which allows our parameter to shadow the one in the XSLT file. Note also @use-when, which will determine whether this XSLT element will even be used.

Now the XSLT library:

Model D XSLT file widgets.xsl

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="my-functions"
    exclude-result-prefixes="#all" version="3.0" default-mode="widgets">
    
    <xsl:param name="schematron-is-main-point-of-entry" as="xs:boolean" select="false()" static="yes"/>
    
    <xsl:import href="widgets-svrl-maker_thank-you-skeleton.xsl" 
        use-when="not($schematron-is-main-point-of-entry)"/>
    <xsl:import href="widgets-svrl-maker_thank-you-schxslt.xsl" 
        use-when="not($schematron-is-main-point-of-entry)"/>
    
    <xsl:variable name="duplicate-widget-ids" as="xs:string*"
        select="f:duplicate-strings(//widget/@id)"/>
    
    <xsl:function name="f:duplicate-strings" as="xs:string*">
        <xsl:param name="strings" as="xs:string*"/>
        <xsl:for-each-group select="$strings" group-by=".">
            <xsl:if test="count(current-group()) gt 1">
                <xsl:sequence select="current-grouping-key()"/>
            </xsl:if>
        </xsl:for-each-group>
    </xsl:function>
    
    <xsl:variable name="skeleton-svrl-report" as="item()*">
        <xsl:apply-templates select="/" mode="#unnamed"/>
    </xsl:variable>
    
    <xsl:variable name="schxslt-svrl-report" as="item()*">
        <xsl:apply-templates select="/" mode="d8e9"/>
    </xsl:variable>
    
    <xsl:template match="/" mode="widgets">
        <report>
            <skeleton>
                <xsl:copy-of select="$skeleton-svrl-report"/>
            </skeleton>
            <schxslt>
                <xsl:copy-of select="$schxslt-svrl-report"/>
            </schxslt>
        </report>
    </xsl:template>
    
</xsl:stylesheet>

The <xsl:import> statements presume access to the SVRL-creating XSLT, courtesy Skeleton or SchXslt. The names of the files referenced by @href attempt to make clear this subtle point. That is, we need some system (manual or automated) that can convert the Schematron, on a whim’s notice, to the appropriate XSLT package.

Just like the Schematron file, the XSLT library invokes the same static parameter (albeit with the opposite value), and it points with <xsl:import> declarations to two possible XSLT files that create SVRL reports, one generated by Skeleton and the other generated by SchXslt.

The @use-when attribute is key. The <xsl:import> declarations are retained if the file is the main point of entry, and dropped otherwise. The static parameters ensure that each set of XSLT files can invoke the other – the push-me pull-you aspect of the model – without triggering a fatal XSLT error.

The widgets.xsl file presented above does not do very much of interest. It simply returns the two different SVRL reports, but the global variables $skeleton-svrl-report and $schxslt-svrl-report could be used in creative ways in a more developed form of the project.

Model D depends upon a strong symmetric contract between the XSLT and Schematron files. In writing the Schematron file, one needs to know what the XSLT is creating, and those who write the XSLT need to know about the Schematron file, or, rather, they need to know about its SVRL-making XSLT representation. Skeleton takes over the unnamed template mode as its primary mode, but SchXslt uses a hexadecimal name for its primary mode. The XSLT tier needs to accommodate the decisions made by either implementation about how to build the template modes.

I am unaware of any public project that uses Model D. In private development for the Federal Government, I have used the push-me pull-you method in the GPO XML Toolkit, where validation of the Federal Register needs to be performed and presented in different contexts. It needs to be fired directly in an XML editor on the command line, but it is also necessary to synthesize the validation report into the validandum to create a visually appealing HTML file. I do not expect Model D to be used commonly, because it is tailored for projects that have tightly coupled their Schematron files with an advanced XSLT function library, and this does not happen often.

The Schematron-XSLT Partnership

I hope it is clear by now that Schematron shares a special relationship with XSLT. When that relationship is leveraged, Schematron gets an upgrade, from the most expressive XML schema language to the most expressive and powerful XML schema language. I am not suggesting that the models I propose here should supplant any others. Quite the contrary. Schematron can be used and repurposed in many ways, with different results. I hope that I have shown that when Schematron is imagined as an interface for a constrained form of XSLT, and when we embrace that vision, some rather sophisticated models of Schematron emerge.

It is worth concluding with thoughts about how new versions of Schematron might leverage and enhance that relationship. Here are some ideas:

  1. Explicitly allow <xsl:param> and <xsl:import> as top-level elements (before <sch:pattern> is fine). Perhaps deprecate (but not forbid) the use of <xsl:key> and <xsl:function>, to encourage a stronger separation of concerns.

  2. Create stronger bridges between Schematron phases and XSLT parameters. This needs a bit of explanation. When a user chooses a phase name, they are effectively doing the same thing they do when they change the value of a global parameter in XSLT. Would it not be beneficial to bind phase selection to a parameter, so that a Schematron phase can affect how the companion XSLT library behaves? We could postulate that all phase names are tacitly passed to an assumed XSLT static parameter <xsl:param name="sch:phases" static="true" as="xs:string*"/>, shadowing any XSLT static parameters of the same name in any imported XSLT. This mechanism would allow XSLT developers to adjust their code according to the phases requested by the user.

Regardless of how Schematron develops, the benefits are already clear. I hope that these four models of advanced Schematron validation stir your imagination.

References

[Jelliffe_2020] Jelliffe, Rick. ISO Schematron 2020 Released. Schematron.com, n.d., https://schematron.com/standards/iso_schematron_2020_released.html.

[Kalvesmaki_2022] Kalvesmaki, Joel. String Comparison in XSLT with tan:diff(). Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). doi:https://doi.org/10.4242/BalisageVol26.Kalvesmaki01.

[Siegel_2022a] Siegel, Erik. Schematron Query Language Binding and XSLT. XML.com, Oct. 17, 2022, https://www.xml.com/articles/2022/10/17/schematron-qlb-xslt/.

[Siegel_2022b] Siegel, Erik. Schematron: A Language for Validating XML. Denver: XML Press, 2022.

[Schematron_2016] ISO/IEC 19757-3:2016 Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation using Schematron.



[1] I restrict myself to structural rules to avoid the question of content. DTDs can define entities and notations, which are not possible in the other schema languages.

[2] See previous note.

[3] In this article, prefixes xsl and sch are bound to http://www.w3.org/1999/XSL/Transform and http://purl.oclc.org/dsdl/schematron, respectively.

[4] At Schematron_2016, 12 section 3.12 we are given a circular definition for implementation: implementation : implementation of a Schematron validator. The relationship between Schematron validator and implementation is at best murky, because the former term is not defined, and used rarely (only four times). The Schematron specifications treat the implementation as a singular agent, but below we will look at three products where the implementation proper is shared by multiple tools.

[5] Typed variables will be available in the 2025 version of Schematron.

×

Jelliffe, Rick. ISO Schematron 2020 Released. Schematron.com, n.d., https://schematron.com/standards/iso_schematron_2020_released.html.

×

Kalvesmaki, Joel. String Comparison in XSLT with tan:diff(). Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). doi:https://doi.org/10.4242/BalisageVol26.Kalvesmaki01.

×

Siegel, Erik. Schematron Query Language Binding and XSLT. XML.com, Oct. 17, 2022, https://www.xml.com/articles/2022/10/17/schematron-qlb-xslt/.

×

Siegel, Erik. Schematron: A Language for Validating XML. Denver: XML Press, 2022.

×

ISO/IEC 19757-3:2016 Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation using Schematron.