Testing Schematron using XSpec
Copyright © 2017 by Vincent M. Lizzi.
Table of Contents
Schematron (ISO/IEC 19757-3:2006) is a rule-based language that uses XPath expressions to test assertions about the content of XML documents. Schematron is capable of expressing rules that other XML validation languages such as DTD and XML Schema are unable to implement (Usdin, Lapeyre, and Glass 2015). Schematron allows business rules written for a human audience to be included with XPath that instructs machines on how to enforce the business rules. This versatility and literate design make Schematron an attractive tool for implementing business rules in XML (Lubell 2009). Schematron is often deployed as an important part of quality assurance processes (Blair 2012; Kraetke and Bühring 2016). It is important to have verification that a Schematron schema functions as intended because a problem in the schema may allow in errors in XML that should be caught to slip through quality control. The use of an automated testing tool can greatly assist with verifying that a Schematron schema functions as intended.
A few tools have been created for testing Schematron schemas. The Schematron Testing Framework (STF) developed by Tony Graham is one open source tool for testing Schematron. Various homegrown solutions for testing Schematron, based on tools such as JUnit and others, have also been created.
XSpec has been enhanced recently to support testing Schematron (XSpec). XSpec is an open source unit test and behavior-driven development framework for XSLT and XQuery. XSpec was originally developed by Jeni Tennison and is currently maintained by Sandro Cirulli and community contributors. XSpec provides a structure in which sample XML and processing expectations can be organized and executed. Several features make XSpec an ideal tool for testing Schematron:
Tests are described using a simple and flexible XML format.
XSpec can run multiple test scenarios to encompass both passing and failing conditions.
Sample XML for input to tests can be provided either as fragments of XML or as XML files.
Execution can be focused on specific tests during development.
XSpec can automatically execute tests and produce a report.
XSpec can run in a variety of environments.
XSLT custom functions can be tested.
XSpec and Schematron have much in common: XSpec and Schematron (the standard Schematron Skeleton implementation) are both XSLT applications; both have a literate programming design that allows natural language rules to be expressed alongside machine executable code that implements the rules; both provide domain specific languages that allow tests to be described easily; and both can execute a large number of detailed tests efficiently.
When XSpec is deployed as an automated testing tool for Schematron, it can provide a number of benefits which include: decreasing the amount of time needed to write Schematron rules; helping to identify problems early; freeing time that would otherwise be spent on repetitive testing, which allows human effort to be directed to activities requiring knowledge and skill; and reducing the cost of developing and maintaining Schematron schemas. The following sections of this paper describe how to use XSpec to test Schematron. Additional information about how to use XSpec is available in the XSpec wiki (https://github.com/xspec/xspec/wiki).
Although much has already been written about Schematron, a brief review of concepts may be helpful. Schematron is used to test XML instance documents using specified assertions. Assertions are written in a Schematron schema as <assert> and <report> elements. <assert> and <report> elements contain a natural-language description, and they have an attribute @test which contains an XPath expression that will be evaluated on an input XML instance. <assert> and <report> also have several optional attributes including @id to hold an identifier and @role for describing function (e.g. "error" or "warn"). When a Schematron is run, an <assert> or <report> can be thrown if the input XML instance matches the criteria specified by the XPath expression in the @test attribute. An <assert> will be thrown if the XPath test evaluates to false. A <report> will be thrown if the XPath test evaluates to true. The <rule> element contains <assert> and <report> elements and provides context for evaluation of the XPath tests. The <pattern> element provides a container to organize <rule> elements. The <phase> element allows a set of patterns to be named so that Schematron can run a particular set of patterns (instead of all patterns). When Schematron is run, a report is produced that contains assertions thrown during evaluation of the input XML instance document. The report can be output as XML in the Schematron Validation Report Language (SVRL) format.
Developing a Schematron typically involves testing the Schematron using a set of sample XML. The sample set should include both XML that is valid — which the Schematron should pass, and XML that contains errors — which the Schematron should catch. This sample set must be maintained in addition to the Schematron itself. When changes are made to the Schematron, new sample XML should be added to test the changes, and the entire sample set should be used again to test the Schematron to verify that no regression defects have been introduced. The curation of a set of sample XML for a Schematron has been well described by Schwarzman (2017). XSpec can assist with developing and maintaining a Schematron by providing a structure for organizing the sample XML associated with a Schematron and automatically executing the Schematron on the set of sample XML.
The first step in creating an XSpec test for a Schematron schema is to create an XML file that adheres to the XSpec RelaxNG schema. The root element <x:description> should have an attribute @schematron that specifies the file path to the Schematron schema. For example,
<x:description xmlns:x="http://www.jenitennison.com/xslt/xspec" schematron="../src/demo.sch">
Next, <x:scenario> elements are added for each test case. The <x:scenario> element is used both to organize tests and to describe individual tests. A scenario is required to have a label, which is placed in an attribute @label, to describe what the scenario is testing. An <x:scenario> element can contain nested <x:scenario> elements, a convenience which provides a way to organize tests, as in this case:
<x:scenario label="main scenario"> <x:scenario label="nested scenario 1"> </x:scenario> <x:scenario label="nested scenario 2"> <x:scenario label="nested scenario 3"> </x:scenario> </x:scenario> </x:scenario>
An <x:scenario> that describes an individual test contains sample XML and declares one or more expectations about the desired result of running the Schematron on the sample XML. An <x:context> element is used to hold the sample XML. The sample XML can be placed directly in the <x:context> element:
<x:context> <article> <front> <article-meta/> </front> </article> </x:context>
Alternatively, sample XML in a separate file can be referenced by the <x:context> element using an @href attribute. For example,
The results that are desired when the Schematron is run on the sample XML can be declared using expect elements. The expect elements are as follows:
<x:expect-assert> verifies that an <assert> is thrown.
<x:expect-report> verifies that a <report> is thrown.
<x:expect-not-assert> verifies that an <assert> is not thrown.
<x:expect-not-report> verifies that a <report> is not thrown.
<x:expect-valid> verifies that the Schematron is executed and that it passes validation for the sample XML (i.e. no
<assert> or <report> is thrown). If the Schematron throws any warning or informational messages (i.e. <assert> or <report> with
a @role attribute specifying “warn” or “info”) these are allowed for a passing validation.
<x:expect> can be used to specify custom expectations to directly test the SVRL XML that is generated when a Schematron
schema is run on a sample XML.
The <x:expect-assert>, <x:expect-report>, <x:expect-not-assert>, and <x:expect-not-report> elements will normally match any <assert> or <report> that is in the Schematron. These expect elements have optional attributes, @id, @role, and @location, which can be used in any combination to make an expectation more specific.
id identifies a specific <assert>, <report>, or <rule> in the Schematron
with a matching @id attribute value.
role matches the @role attribute value of an <assert>, <report>, or <rule>
in the Schematron. The @role attribute is often used to specify outcomes such as “error”, ”fatal”,
”warn”, or ”info.”
location identifies a specific location, using an XPath pointer, in the context XML
that the Schematron <assert> or <report> is expected to find. Namespace prefixes that are
defined in Schematron using <ns> elements can be used in the XPath.
For instance, if you expect that when a Schematron schema is run on a particular sample XML an <assert> with id “a1” and role “error” will be thrown at XPath location /article/front/article-meta/fpage, this expectation could be written in a scenario as:
<x:scenario label="example"> <x:context> <article> <front> <article-meta> <fpage/> </article-meta> </front> </article> </x:context> <x:expect-assert id="a1" role="error" location="/article/front/article-meta/fpage"/> </x:scenario>
Custom XSLT functions that are embedded in a Schematron schema can be tested in an <x:scenario> by using an <x:call> element to call the function and an <x:expect> element to describe the expected result of the function. Parameter values that should be used in the test can be specified in the <x:call> element using <x:param> elements, as the following illustrates:
<x:scenario label="XSLT function test"> <x:call function="e:add" xmlns:e="example"> <x:param name="a" select="5" as="xs:integer"/> <x:param name="b" select="2" as="xs:integer"/> </x:call> <x:expect label="add 5 + 2" select="xs:integer(7)"/> </x:scenario>
If a Schematron has multiple phases separate XSpec files are needed to test each phase. The phase that is to be tested can be specified in the <x:description> element by adding an <x:param name="phase"> element containing the name of the phase. (<x:param> can also be used to provide parameters to the Schematron compilation.) For example:
<x:description xmlns:x="http://www.jenitennison.com/xslt/xspec" schematron="../src/demo.sch"> <x:param name="phase">thephase</x:param>
It is often helpful to organize test scenarios into separate files to make the maintenance of large test suites easier or to enable reuse of test scenarios. Test scenarios that are in a separate file can be imported using the <x:import> element, which can be placed in the <x:description> element. An attribute @href is required to specify the path to the XSpec file that is to be imported. For instance,
During development it can be helpful to run a single test scenario in isolation instead of running an entire test suite. The @focus attribute can be added to any <x:scenario> to instruct XSpec to run only that scenario. After finishing work on a focused scenario, it is a good idea to remove the @focus attribute and run XSpec to check that no problems have been introduced into other scenarios. For example,
<x:scenario focus="working on new test" label="article title should not be in all caps">
It can sometimes be necessary to prevent certain test scenarios from executing. For example, if a test scenario is known to fail for a particular reason that cannot be easily resolved, it might not be desirable to have this scenario executed. Any <x:scenario> can be marked as pending, which will prevent the <x:scenario> from being executed. There are two ways to mark an <x:scenario> as pending: the <x:scenario> can be wrapped in an <x:pending> element, or an attribute @pending can be added to the <x:scenario> element. For example,
<x:pending label="not yet implemented"> <x:scenario label="language code should use ISO 639"> <x:context> <article xml:lang="xx"/> </x:context> <x:expect-assert/> </x:scenario> </x:pending>
<x:scenario pending="not yet implemented" label="language code should use ISO 639"> <x:context> <article xml:lang="xx"/> </x:context> <x:expect-assert/> </x:scenario>
XSpec incorporates the standard XSLT implementation of ISO Schematron. When XSpec executes a Schematron test, before the test is actually run, the Schematron is first compiled by a series of three XSLT transforms. The process for compiling Schematron into XSLT is described in documentation for Schematron (Jelliffe). XSpec can be configured to use custom XSLTs for compiling Schematron by providing the file path to the custom XSLTs in environment variables SCHEMATRON_XSLT_INCLUDE, SCHEMATRON_XSLT_EXPAND, and SCHEMATRON_XSLT_COMPILE.
XSpec might not be able to test every Schematron schema. There are limits to what XSpec is able to test in XSLT which may also apply to Schematron. For example, XPath that begins at document root (i.e., begins with “/”) may not work as intended when executed within XSpec. The limitations of using XSpec to test Schematron are not yet known because this is a new feature. Users may report problems that they encounter to the issue log on the XSpec GitHub project (https://github.com/xspec/xspec/issues).
The following is an example of an XSpec test for a simple Schematron schema. This Schematron schema implements two rules based on JATS (Z39.96-2015), an XML format for journal articles:
|am-0001||An article should have one DOI tagged in <article-id> with pub-id-type="doi"||creates an error which should cause a stop|
|am-0002||A book review article should have details of the book(s) being reviewed tagged in <product> element(s)||creates a warning message|
The Schematron schema that implements these rules is as follows (filename
<?xml version="1.0" encoding="UTF-8"?> <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"> <sch:pattern> <sch:rule context="article-meta"> <sch:assert id="am-0001" role="error" test="count(article-id[@pub-id-type='doi']) = 1" >An article should have one DOI tagged in <article-id> with pub-id-type="doi"</sch:assert> <sch:report id="am-0002" role="warn" test="ancestor::article[@article-type='book-review'] and not(product)" >A book review article should have details of the book(s) being reviewed tagged in <product> element(s)</sch:report> </sch:rule> </sch:pattern> </sch:schema>
Sample XML is created to test each business rule. The rule am-0001 states that an article should have one DOI tagged. Three sample XMLs are needed to test this rule: a sample with one DOI tagged, which should pass the requirement; a sample with no DOI tagged, which should fail by having fewer than one DOI; and a sample with two DOIs tagged, which should fail by having more than one DOI. These sample XMLs are added to the XSpec test. First, a scenario element is added for rule am-0001. Inside this scenario element, three scenario elements are added for the three sample XMLs. The sample XML is placed in context elements, and expect elements are added to specify the result that is desired when the Schematron is run on the sample XML.
Rule am-0002 states that a book review article should have details of the reviewed book(s) tagged in product element(s). Two sample XMLs are used to test this rule: a sample with a book review article that has the product element, which should pass the requirement, and a sample with a book review that does not have the product element, which should generate the warning message. First, a scenario element is added for rule am-0002. Inside this scenario element, two scenario elements are added for the two sample XMLs. Again, the sample XML is placed in context elements, and expect elements are added to specify the desired result of the Schematron running on the sample XML.
In addition, a sample is created as an example of XML that correctly follows all of these rules. This
sample is an XML file and it is referenced using a context element (filename
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/archiving/1.1/JATS-archivearticle1.dtd"> <article article-type="book-review"> <front> <article-meta> <article-id pub-id-type="doi">10.0000/example</article-id> <product>example</product> </article-meta> </front> </article>
The XSpec test is written as just described (filename
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="../xspec/src/schemas/xspec.rnc" type="application/relax-ng-compact-syntax"?> <x:description xmlns:x="http://www.jenitennison.com/xslt/xspec" schematron="../src/demo.sch"> <x:scenario label="am-0001"> <x:scenario label="correct"> <x:context> <article-meta> <article-id pub-id-type="doi">10.0000/example</article-id> </article-meta> </x:context> <x:expect-not-assert id="am-0001"/> </x:scenario> <x:scenario label="incorrect DOI not present"> <x:context> <article-meta/> </x:context> <x:expect-assert id="am-0001"/> </x:scenario> <x:scenario label="incorrect multiple DOIs present"> <x:context> <article-meta> <article-id pub-id-type="doi">10.0000/example1</article-id> <article-id pub-id-type="doi">10.0000/example2</article-id> </article-meta> </x:context> <x:expect-assert id="am-0001"/> </x:scenario> </x:scenario> <x:scenario label="am-0002"> <x:scenario label="correct"> <x:context> <article article-type="book-review"> <front> <article-meta> <product>example</product> </article-meta> </front> </article> </x:context> <x:expect-not-report id="am-0002"/> </x:scenario> <x:scenario label="incorrect"> <x:context> <article article-type="book-review"> <front> <article-meta/> </front> </article> </x:context> <x:expect-report id="am-0002"/> </x:scenario> </x:scenario> <x:scenario label="valid documents"> <x:scenario label="book review"> <x:context href="samples/book-review.xml"/> <x:expect-valid/> </x:scenario> </x:scenario> </x:description>
The test is executed by running XSpec with the
-s option, which indicates a Schematron
xspec\bin\xspec.bat -s test\demo.xspec
XSpec runs the tests and produces a report that shows the result of each test. The report can be output in HTML format presented in a display similar to that of the following example:
Behavior-Driven Development (BDD) — also known as Test-Driven Development (TDD) — is a methodology for software development that involves writing tests to ensure that code works correctly before, or at the same time as, the code itself is written. The use of specialized tools to automate software testing helps to enable a BDD workflow (Fox 2016). BDD can be introduced at any point in a project. If a BDD workflow is used from the start, the project may have a complete corpus of automated tests that helps to ensure reliability of the software. If a BDD workflow is introduced to an existing project, new development work can begin to use BDD, and over time tasks can be performed to create automated tests for older code. Whether to use a BDD workflow is not an all-or-nothing decision; it is possible to use a BDD workflow only in those parts of a project where it is feasible.
The following is an outline of a BDD workflow for Schematron development using XSpec. The first three steps (writing the business rule, collecting sample XML, and assigning an identifier), can be done with the participation of stakeholders. The subsequent three steps require knowledge of Schematron and XSpec.
Write the business rule.
Collect samples of XML that are valid according to the business rule and samples of XML that should cause a validation failure or message according to the business rule. The sample set should be selected to include examples that test boundary limits and edge cases.
Assign the business rule an identifier (a valid xs:NCName)
Create an XSpec <x:scenario> for the business rule.
Inside the <x:scenario> for the business rule, create an <x:scenario> for each sample XML.
Provide the sample XML using <x:context>
Describe the expectation using expect elements.
Include the identifier in the @id attribute of the expect elements.
Write the Schematron assertion for the business rule. Include the identifier in the @id attribute of the <assert> or <report>.
Run the XSpec test to verify that the Schematron works as expected.
When writing a Schematron schema, a developer may choose to organize assertions into rules and patterns for reasons of efficiency, and the chosen organization may differ from the way the business rules are organized or the way the XSpec tests are organized. Assigning an identifier to each <assert> and <report> (using the @id attribute) and using these identifiers to relate the corresponding business rule and XSpec tests can help with maintenance of the Schematron. If a complex business rule is implemented using more than one <assert> or <report> the identifier can be extended by adding a unique suffix in the @id attribute of each <assert> or <report>.
Use of a continuous integration server can further improve a Schematron development workflow. Continuous integration offers a variety of options for task automation, such as executing tests when changes are pushed to a code repository, sending email alerts when tests succeed or fail, and triggering downstream actions after successful tests. XSpec is able to produce its reports in the JUnit XML format, which is understood by Jenkins, a popular continuous integration server (Jenkins).
The following is an example of how an XSpec Schematron test can be configured to run in a continuous integration server. It is worth noting that while this example illustrates using Jenkins as a continuous integration server, using GitLab to host a code repository, using a git submodule to import XSpec, and using a Windows server environment, other tools and methods can be used to achieve the same goal. This example also makes use of the File Operations Plugin for Jenkins. In addition, the server on which Jenkins is running has Java and git installed.
Begin by creating a git repository. Add a git submodule to import XSpec. Create a Schematron schema and XSpec test. Commit these changes to git. Then, push the repository to a project that has been created on GitLab. These tasks can be accomplished through the following git commands:
mkdir demo cd demo git init git submodule add https://github.com/xspec/xspec.git xspec git commit -m "Import XSpec as a submodule" git add src\demo.sch git add test\demo.xspec git commit -m "Create Schematron schema with XSpec test" git add test\samples\book-review.xml git commit -m "Add sample XML" git remote add origin email@example.com:vincentml/demo.git git push origin master
Next, configure a job to run the XSpec test by following these steps in Jenkins:
Create a new item (i.e. job, project)
Select Freestyle project. Enter a name for the item and avoid using spaces in the name.
In the “Source Code Management” section, enter the URL and credentials for the Git repository. Then, under “Additional Behaviours” select “Advanced sub-module behaviours” and enable the option “Recursively update submodules”.
In the “Build” section, add an action “File Operations” and select “File Download”. Enter the URL
and enter the target file name “
Next, in the “Build” section, add an action "Execute Windows Batch Command". Enter this script which sets the SAXON_CP environment variable and then executes XSpec.
set SAXON_CP="%WORKSPACE%\saxon.jar" xspec\bin\xspec.bat -s -j test\demo.xspec
In the “Post-build Actions” section, add an action “Publish Junit test result report”. Enter the
Test report XMLs location as “
Save the configuration
Click “Build Now” to have Jenkins run the XSpec test. A progress bar will appear to indicate that the test is running. After the running of the test is complete, click “Latest Test Result” to see the report.
Configuring XSpec to run automatically in a continuous integration server may provide the highest level of convenience for testing Schematron. There are a great many options available when considering how to incorporate XSpec in a project, and decisions can be driven by the unique needs of a project.
In this paper, XSpec was introduced as a testing automation tool to assist with developing and maintaining Schematron schemas. A tutorial on how to use XSpec for testing Schematron was provided, followed by an example of an XSpec test for a Schematron schema. A possible workflow for incorporating XSpec into a Schematron development project was suggested using the Behavior-Driven Development methodology. An example was used to illustrate how XSpec tests for a Schematron schema can be configured to run automatically in a continuous integration server.
The use of automated testing tools has become popular due to the many benefits that automated testing can provide. XSpec now offers Schematron users a new tool for automated testing of Schematron. Support for testing Schematron is a new feature in XSpec, and users may report feedback through the XSpec project on GitHub.
[Blair 2012] Blair, Julie. “Developing a Schematron–Owning Your Content Markup: A Case Study.” Presented at Journal Article Tag Suite Conference (JATS-Con) 2012, Bethesda, MD, October 16 - 17, 2012. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2012. National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/NBK100373/
[Fox 2016] Fox, Steve. 2016. “All You Need to Know About Behaviour-Driven Software.” Behaviour-Driven.org. November 12. http://behaviour-driven.org/need-know-behaviour-driven-software.html
[ISO/IEC 19757-3:2006] Information Technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-Based Validation, Schematron. International Standard ISO/IEC 19757-3, Geneva, Switzerland: ISO. http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
[Jelliffe] Jelliffe, Rick. “The Schematron ‘Skeleton’ Implementation.” Schematron.com. http://schematron.com/front-page/the-schematron-skeleton-implementation/
[Kraetke and Bühring 2016] Kraetke, Martin, and Franziska Bühring. “A Quality Assurance Tool for JATS/BITS with Schematron and HTML Reporting.” Presented at Journal Article Tag Suite Conference (JATS-Con) 2016, Bethesda, MD, April 12 - 13, 2016. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2016. National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/NBK350149/
[Lubell 2009] Lubell, Joshua. “Documenting and Implementing Guidelines with Schematron.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:10.4242/BalisageVol3.Lubell01
[Schwarzman 2017] Schwarzman, Alexander B. “JATS Subset and Schematron: Achieving the Right Balance.” Presented at Journal Article Tag Suite Conference (JATS-Con) 2017, Bethesda, MD, April 25 - 26, 2017. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017. National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/NBK425543/
[Usdin, Lapeyre, and Glass 2015] Usdin, Tommie, Deborah Aleyne Lapeyre, and Carter M. Glass. “Superimposing Business Rules on JATS.” Presented at Journal Article Tag Suite Conference (JATS-Con) 2015, Bethesda, MD, April 21 - 22, 2015. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015. National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/NBK279902/