Introduction

Testing of XML material – either XML-native business documents, or various XML-formatted inputs – often involves more than single document validation. The notion of validation may depend on a context involving other documents. Validation against a schema is just an an example of this. Behind XML document testing it is often a processor generating or editing this document, that is being tested. This is the case with XBRL test suites (for XBRL processors), with WS-I test suites (testing Web service instances) and also the ODF test suite (targeting document processors). As a consequence, the notion of validity depends on a context made of diverse other documents, that represent various inputs to these processors as well as traces that capture their behavior and relate all documents associated with the same test case. Roughly, two categories of documents can make up such context: (a) metadata documents, and (b) scenario documents.

  • Metadata documents may involve various business rules, reference documents and templates, contractual documents, configuration artifacts. XML schemas are just an example of these, and they may be involved in quite diverse validation patterns [1] beyond conventional schema-validation of instances.

  • Scenario documents reflect on operations over a system under test - such as an XML transcript of an electronic exchange with the system under test, a log converted into XML, or the script a sequence of operations to be performed by a test driver.

In such cases, testing is as much about verifying that each one of these XML artifacts is individually correct, as it is about verifying that some combinations of these are consistent (e.g. a Web service message must conform to its definition in WSDL, or the output of an ODF processor is consistent with the operation performed and the previous state of the document). In some cases it is not even a main document that is under test relative to some context, but rather a sequence of documents, e.g. a message choreography for a business transaction combining business payloads, message protocols and service interfaces that is tested for conformance [2]. The dependency between a document and its transactional context (exchange protocol) has also been analyzed in [3].

Such testing requirements are in fact closer to conventional system or software testing requirements than to document testing in a narrow sense - while also requiring same XML testing capabilities as known today for single documents. Because each type of artifact may have its own validation rules and test suites, tests must be grouped into modules, the execution of which is conditioned by the results of other test modules. Chaining of test cases becomes an important feature, across modules or within modules.

This diversity of these test requirements poses a challenge to a test environment:

  • Rules and constraints (Schematron, OWL Reasoning, RuleML) are often limited in either one of two ways: (a) their expressive power is often traded for ease of processing, (b) their decision model (e.g. predicate logic) often enforces a Boolean outcome missing the nuances expected in a test report.

  • Test suites - and test engines – often exceed the scope of dedicated tools such as Schematron (e.g. XBRL test suite). As a result they are architected and developed in an ad-hoc manner, regardless of how well they leverage XML technologies.

This paper describes a more integrated XML testing paradigm which supports flexible composition of test cases (chaining, parameterization) and test suites (modules, reuse). The resulting implementation makes the best of XPath2.0 and XSLT2.0 to provide on one hand a test script model able to express predicates crossing over such diverse inputs and to handle a richer spectrum of outcomes, and on the other hand a test engine able to compose and chain test assertions in a way that was usually considered requiring specialized rule engines written in conventional programming languages or dedicated AI languages such as Prolog or LISP.

XML Test Assertions for XML

In this section the authors argue that the best approach for testing XML material – given the integrated aspect of such testing - is one that builds on conventional test methodologies, augmented with a proper integration of XML-processing techniques (here XPath2.0, XSLT2.0).

Test Assertion Model

Test assertions (TA) is a familiar concept for QA engineers and test developers. A test assertion is a testable or measurable statement for evaluating the adherence of part of an implementation to a normative statement in a specification. Test assertions provide a link between the narrative of a specification (i.e. rules, schema, requirements, system definition) and the test suites that assess conformance of implementations. Test assertions have been mostly used in the domain of software engineering, and less often in more specialized domains such as XML artifacts, where ad-hoc solutions - and also very specialized tools - have flourished instead. Test assertions are usually declarative (logical) statements that are written as a blueprint for test cases, the latter being the actual executable tests.

A major benefit in writing test assertions, is that they represent a "conformance contract" understandable by all parties involved - domain experts, test writers, end-users. An additional interest in the XML space - where all material under test is in XML format, is that test assertions can be directly scripted using such dialects as XPath or XQuery, thus becoming themselves executable test cases. A general-purpose model for test assertions has recently been developed in the OASIS Test Assertions Guidelines (TAG) OASIS committee [8]. In this model, a test assertion (TA) is a well-structured object defined as follows:

  • TA Id: the identifier of the Test Assertion.

  • Source: the normative conformance requirement that this test assertion is addressing

  • Target: a Test Assertion always targets instances of a specific artifact type, for example, a line item fragment in a purchase order document, a SOAP Envelope, a WSDL port binding, etc. The Target element identifies this artifact type.

  • Prerequisite: a pre-condition that must be satisfied over the Target instance in order for this instance to qualify for evaluation under this TA. The Prerequisite may refer to other test assertions. If the Prerequisite evaluates to “false”, then the outcome of the TA for this Target will be “notQualified” in the test report.

  • Predicate: a logical expression over the Target. The Predicate is only evaluated if the Target instance is qualified, i.e. if the Prerequisite – if any – has already evaluated to “true”. If the Predicate result is “true” then the Target instance fulfills the related conformance requirement and violates it otherwise.

  • Prescription Level: a keyword reflecting how imperative it is to fulfill the (Source) requirement: mandatory / preferred / permitted.

The authors have profiled and extended this model so that test assertions become directly executable over XML artifacts, thus becoming "test cases" grouped in test suites.

The profiling consists of the following:

  1. Use XPath expressions to define Target, Prerequisite and Predicate.

  2. Define how the instances of a particular target type are identified. This is defined by another XPath expression that returns a unique ID, possibly resulting from aggregation of several fields relevant to this target type. This ID will show in the test report, but also used when chaining test assertions over a same target instance during test execution.

  3. Add references (XPath) to a combination of artifacts that represent contextual documents, over which Prerequisite and Predicate may operate.

  4. Add a new Reporting element that determines the outcome of the test assertion over a target instance.

  5. Add secondary output mostly for human readers: error messages, diagnostic data.

The Test Target and its Context

A test assertion will always focus on a “primary” target instance, but may need to access contextual material in order to test this target. This "side" material is identified in "variables" added to the TA. A simple example of this is schema-validation of a document. In the target scripting below, the test assertion will refer to the contextual document (a schema) while targeting a purchase order line item:

<testAssertion id="1234" lg=”xpath20” >
	  <var name=”poschema” type="string">http://www.mysupplychain_xyz.com/2009/04/12/po.xsd</var>
	  <target>//xyz:purchaseOrder/xyz:lineItem</target>
	  <predicate>$target instance of schema-element($poschema, xyz:lineItem) </predicate>
	  ...
	  </testAssertion>
The predicate validating a lineItem element will refer to this contextual document using the conventional variable notation ($). The predicate expressions will be pre-processed into executable XPath. The above test assertion applies to every line item of any purchase order. Variable expressions (<var>) may refer to any contextual material - either inside the same document or external. An XPath variable notations($) may then be used either to parameterize the location of a document, or to refer to the current value of the target expression:
<testAssertion id="2345" lg=”xpath20”>
	  <var name=”herbooks” >document($allbooks)/book[@author = $target/name]</var>
	  <var name=”herpublishers” >document($allpublishers)
	  //directory/publisher[fn:index-of( fn:distinct-values(
	  ‘for $bk in $herbooks return $bk/@publisher’), @name) gt 0 ]</var>

	  <target>//whoswho/arts[@section='literature']/biographies/author</target>
	  ...
	  </testAssertion>
In the above, "$allbooks" and "$allpublishers" are references to a documents that have been defined outside the test assertion. The variable "$herbooks" denotes the subset of books from this author. The variable "$herpublishers" is the subset of publishers this author has been dealing with. The target expression is matched against a third document, the main input. A predicate for the above target may express a condition over the author (the target), her related list of books ($herbooks) and of publishers ($herpublishers).

Reporting Test Outcomes

The additional Reporting element added to the test assertion structure may override the default outcome which is:

  • notQualified (if Prerequisite = “false”)

  • pass (if Prerequisite = “true” and Predicate = “true”)

  • fail (if Prerequisite = “true” and Predicate = “false”)

Indeed, other possible outcomes are:
  • missingInput (a contextual document or XML fragment is missing in order to pursue the evaluation)

  • warning (the Predicate may not be indicative enough of either violation or fulfillment, but has detected a situation calling for further attention.)

  • undetermined (e.g. the Predicate is only designed to detect some kinds of violation, when “false”, and has no particular conformance meaning when “true”)

These outcomes are not only intended for a final test report. They can also be tested and influence the test suite execution if the test assertion that produces them is referred to in predicates and prerequisites of subsequent test assertions.

Inheritance and Composition

An important benefit of clearly identifying target categories, is the ability to leverage inheritance and composition relationships between targets. Targets often belong to a class system, in the object-oriented sense (a target may be part of an other target, may be a subclass of another target, etc.). This leads to an enhanced test execution model that is able to leverage such relationships. In particular:

  • Inheritance: The test engine is able to determine that a test assertion will apply not only to all instances of its Target class, but also to all instances of its Target sub-classes. In other words, a target inherits the test assertions of its super-classes.

  • Composition: The test engine is able to handle cases where the prerequisite expression of TA t1 includes references to TA t2, and yet t1 and t2 do not have the same Target class - not even in a sub-class relationship - but have a component relationship. For example, t1 target - say a "binding definition" - is a part of t2 target - say a WSDL file. Before verifying that the binding satisfies some rules (TA t1) one may want to verify that the embedding WSDL file is schema-valid (TA t2). In such cases, the component relationship can be defined once as an access expression from t1 to t2 on the ancestor axis (XPath), reusable by any TA.

The authors are in favor of supporting two modes of representation for such a model: (1) "inline" relationship information can be embedded in each test assertion as needed. (2) a different mark-up separate from test assertions will hold target model information. While (2) is a more rational and scalable approach (avoid redundant information from on test assertion to the other, etc.), (1) is a convenient approach well suited for the test assertion development phase. Examples of inline model information are shown below. The target element uses a qualification notation to indicate the super-class (message) of SOAPmessage target class.

	<target type="message:SOAPmessage" > ... </target>
      

The composition link between the main target of a test assertion - here a WSDL binding - and the related embedding target of a prerequisite reference - here the WSDL file itself - is indicated using an XPath expression relative to the selected target node ($target) as "argument" of the test assertion reference (tag:BP2703):

	<testAssertion id="BP2403">
	<target type="binding" idscheme="..." >//wsdl:definitions/wsdl:binding</target>
	<prerequisite> tag:BP2703($target/..) = 'pass' </prerequisite>
	...
	</testAssertion>
      

Another major attribute of a Target class, is the ID scheme itself an XPath expression that will return a unique ID string for each target instance, to appear in the test report.

Chained Test Assertions

A powerful aspect of the otherwise simple TAG model, is that a test assertion (TA) may refer to other test assertions. It may do so in two ways:

  1. by using TA references in the Prerequisite element. Such references are just parts of the logical expression, e.g. (in a simplified notation) :

    Prerequisite of TA3: (TA1 = “pass”) and (TA2 = “pass”)
    means that it is expected that the target passed TA1 and TA2 before even being tested for TA3. TA-referencing in a Prerequisite is commonly used when the test expression (Predicate) in a TA can be greatly simplified by assuming that the target already passed other test assertions, or simply when the test itself is irrelevant in case some other (prerequisite) tests have failed.

  2. by using TA references in the Predicate element. This allows for writing meta-level test assertions that evaluate a composition of the results of other TAs. This is often needed when defining various "conformance profiles" related to the same type of document (e.g. a category of insurance claims, a purchase order of class 'urgent'). For example: “to comply with conformance profile P, a document must “pass” the set of test assertions {TA1, TA2, TA3} and at least must NOT “fail” the set of test assertions {TA4, TA5}.” In such a case, a single TA will summarize the composition test to be made over the results of all TAs involved in assessing the conformance profile P. The predicate will be:

    Predicate of TA6: (TA1 = “pass”) and (TA2 = “pass”) and (TA3 = “pass”)
    		and  not(TA4 = “fail”) and not(TA5 = “fail”)
    This summary TA (TA6) may in turn be referred to from the Prerequisite of another TA. This is an essential feature when dealing with contextual documents: in most cases, one must first ensure that the contextual document is itself “conforming” before using it in a test case over the “main” document. These expressions are pre-processed by the test engine into equivalent XPath boolean expressions.

There are two main reasons for chaining test assertions as described in (a) above:

(a1) When a set of tests should logically be done in a particular order, meaning that every single test should only be executed if the target instance passed the previous tests. For example, the following sequence of tests is expected to be done in this order, regarding a Web service definition:

  1. Normative statement: "The wsdl:definitions MUST be a well-formed XML 1.0 document. The wsdl:definitions namespace MUST have value: http://schemas.xmlsoap.org/wsdl/." This is verified by the test assertion BP2703 below.

    		<testAssertion id="BP2703" lg=”xpath20” >
    		<target>//wsil:descriptionFile[fn:prefix-from-QName
    		(fn:node-name(*:definitions)) eq 'wsdl' or wsdl:definitions]</target>
    		<predicate>$target instance of schema-element($wsdlschema) </predicate>
    		...
    		</testAssertion>

  2. Normative statement: "The wsdl:binding element MUST have a wsoap12:binding child element." This is verified by the test assertion BP2402 below.

    		<testAssertion id="BP2402" lg=”xpath20” >
    		<target>//wsil:descriptionFile/*:definitions/wsdl:binding</target>
    		<prerequisite>tag:BP2703($target/../..) eq 'pass'</prerequisite>
    		<predicate>child::wsoap12:binding</predicate>
    		...
    		</testAssertion>

  3. Normative statement: "The contained soap binding element MUST have a 'transport' attribute." This is verified by the test assertion BP2403 below.

    		<testAssertion id="BP2403" lg=”xpath20” >
    		<target>//wsil:descriptionFile/*:definitions/wsdl:binding</target>
    		<prerequisite>tag:BP2402($target) eq 'pass'</prerequisite>
    		<predicate>not(wsoap12:binding[not(@transport)])</predicate>
    		...
    		</testAssertion>

  4. Normative statement: "The 'transport' attribute - if any - of the soap binding element MUST have value: http://schemas.xmlsoap.org/soap/http." This is verified by the test assertion BP2404 below.

    		<testAssertion id="BP2404" lg=”xpath20” >
    		<target>//wsil:descriptionFile/*:definitions/wsdl:binding[wsoap12:binding]
    		</target>
    		<prerequisite>tag:BP2403($target) eq 'pass'</prerequisite>
    		<predicate>not(wsoap12:binding[@transport ne
    		'http://schemas.xmlsoap.org/soap/http'])</predicate>
    		...
    		</testAssertion>

These four test assertions are chained via their prerequisite elements. This chaining means that if one fails, the subsequent tests will not be performed: whatever their outcome is, it might not make any sense or would at best produce an unnecessary distraction in the test report.

(a2) In order to "reuse" (both at scripting time and at run-time) a complex expression outcome that has already been handled by another TA. For example in the Web services basic profile, a wsdl:binding must either be an rpc-literal binding or a document-literal binding. The test for ensuring this is not a simple one:

	    <testAssertion id="BP2017" lg=”xpath20” >
	    <target>//wsil:descriptionFile/wsdl:definitions/wsdl:binding
	    [wsoap12:binding]</target>
	    <prerequisite>tag:BP2404($target) eq 'pass'</prerequisite>
	    <predicate>not(.//wsoap12:body/@use != 'literal')
	    and (count(.//wsoap12:body) = count(.//wsoap12:body/@use)) and
	    ((not(.//wsoap12:*/@style != 'rpc') and
	    not(.//wsoap12:operation[not(@style) and not(../../wsoap12:binding/@style)]))
	    or (not(.//wsoap12:*/@style != 'document')))</predicate>
	    ...
	    </testAssertion>
Several test assertions relate only to document-literal type. Once it is known that a binding is of either above type, the test to distinguish an rpc-literal from a document-literal is fairly simple. In the following test assertion that applies only to document-literal type, this simple test - used here in the target expression - guarantees that only document-literal bindings will be selected:
	    <testAssertion id="BP2111" lg=”xpath20” >
	    <target>//wsil:descriptionFile/*:definitions/wsdl:binding
	    [not(.//wsoap12:*[@style = 'rpc'])]</target>
	    <prerequisite>tag:BP2017($target) eq 'pass'</prerequisite>
	    <predicate>not(.//wsoap12:body[@parts and contains(@parts," ")])</predicate>
	    ...
	    </testAssertion>

The other form of chaining (b) is done in the Predicate expression. This allows for defining "meta-level" test assertions that wrap entire groups of test assertions by summarizing their expected outcome in a single logical expression. Such a meta-level test assertion can then be referred by other test assertions in their prerequisite condition, when this entire group of tests must be passed. For example, consider BP1214 listed in the next section. This test assertion targets a SOAP message, but needs to access a contextual document: the interface binding definition that governs the content of this message. Before executing BP1214, it is clear that the binding definition must be verified. A meta-level test assertion can "summarize" all the tests that ensure this correctness for rpc-literal bindings:

	  <testAssertion id="BP-rpc-bindings" lg=”xpath20” >
	  <target>//wsil:descriptionFile/*:definitions/wsdl:binding
	  [.//wsoap12:*[@style = 'rpc']]</target>
	  <prerequisite>tag:BP2017($target) eq 'pass'</prerequisite>
	  <predicate>(tag:BP2404($target) eq 'pass') and
	  (tag:BP2406($target) eq 'pass') and (tag:BP2020($target) eq 'pass') and
	  (tag:BP2120b($target) eq 'pass') and (tag:BP2117($target) eq 'pass') and
	  (tag:BP2118($target) eq 'pass')
	  </predicate>
	  ...
	  </testAssertion>
The above test assertion may then be used as prerequisite for BP1214, over the binding definition related to its message target, i.e. the "binding" element parent of the "operation" element selected by the XPath expression in the variable $myOpBinding:
	  <testAssertion id="BP1214">
	  <var name="myOpBinding"> ...</var>
	  <prerequisite>tag:BP-rpc-bindings($myOpBinding/..) eq 'pass'</prerequisite>
	  <target type="message:SOAPmessage" >/wsil:testLog/wsil:messageLog/wsil:message[...]
	  </target>
	  ...
	  </testAssertion>

Execution Semantics

The general mode of execution for above TAs, is of a conventional forward-chaining rule engine. This is indeed necessary due to the chaining of test assertions, and departs radically from how other XPath-based rules systems are processed (e.g. Schematron or CAM do not handle chaining). The set of test assertions that have no references to other TAs is executed first over all candidate targets. This first set of test results for all possible target instances is recorded. In the next iteration of the engine, only TAs that have all their references resolved over the first set of results are executed in turn. Their results are added to the initial result set. Subsequent iterations add to the previous result set until iterations cannot augment anymore this result set which is considered stable. At this point, all possible validations have been made over the material under test, and they are ready for an (html) test report generation.

In the XPath-extended XML markup of the TAG model, test assertions can be conditionally chained as rules to create dynamic test suites, the result of which can also be manipulated by higher-level test assertions. This approach is addressing the need to integrate validation of various XML artifacts, with validation of combinations of such artifacts (consistency across documents). Indeed, this requires composing and orchestrating test cases and test suites in a modular way.

Implementation Considerations

The automation of TAG methodology is leveraging both XLST2.0 and XPath2.0. However, end-users (TA designers) only need to know about XPath to write test assertions. XSLT is only used for the execution engine. A two-phase processing of above test assertions has been implemented by the authors, as often used when XSLT is the target execution language.

Phase 1:

  • input = set of test assertions (type: xml + XPath2.0)

  • processor = TAG engine generator (type: XSLT2.0)

  • output = test assertions engine for this set (type: XSLT2.0)

Phase 2:

  • input = documents under test and context (xml)

  • processor = test assertions engine in output of Phase 1 (type: XSLT2.0)

  • output = final test report (type: xml)

The first phase amounts to generating an XSLT-hardcoded test suite for a particular set of test assertions. The second phase amounts to executing this test suite. In addition to increased performance, the two-phase approach allows for advanced parameterization features in test assertions at different levels, with the use of variables:

  • “Generation" (or "Phase 1") variables: these are given a value during Phase 1. Such value assignments are hardcoded in the output of Phase 1. Examples are those identifying contextual documents in a previous example (e.g. $allbooks, $allpublishers).

  • "Run-time" (or “Phase 2”) variables: these may have a different value at each execution of the test assertion. Such variables are used to break down complex expressions for Target, Prerequisite and Predicate, or to point at contextual documents that may vary from one target to the other. In the example below, the Target is a SOAP message, and the variable "myOpBinding" identifies the definition of the Web service operation (here the WSDL file is in the same log as the message trace) associated with this target instance (referred to using the pseudo variable "$target"):

	  <testAssertion id="BP1214">
	  <var name="myOpBinding">//wsil:descriptionFile/wsdl:definitions/wsdl:binding
	  [.//wsoap12:*[@style = 'rpc']]/wsdl:operation
	  [@name =  fn:local-name-from-QName(node-name($target/soap12:Body/*[1]))]</var>
	  <target type="message:SOAPmessage" >/wsil:testLog/wsil:messageLog/wsil:message[...]
	  </target>
	  ...
	  </testAssertion>
This test assertion verifies that the message is conforming to some aspect of its binding definition.

Two of the authors have developed test suites for Web Services profiles developed by the WS-Interoperability consortium (http://www.ws-i.org) using the XPath2.0-extended TAG model. These test suites include test cases that involve a combination of documents (WSDL, Schemas) and sequences of messages. About 250 test assertions were developed for three WS profiles. The entire test suite execution process (Phase 1 + Phase 2 + html rendering of the test report) is handled by stylesheets.

Prior to this one author had been leading a similar test tool development process for WS-I based on conventional programming languages (Java, C#) [4]. The advantage of the recent approach using XPath and XSL are:

  • Reliance on specialized XML dialects that have been developed over various platforms and have been tested for consistency across these platforms, removes platform-dependency of test tools (e.g. .NET, Java).

  • Visibility of the TA logic (test assertion definitions are currently embedded in the WS-I Profile document and readable by end-users and developers who need to comply with these profiles). In the previous approach, the logic of tests was buried in the binaries of the test tools.

  • A modest but real gain in test suite design and overall development effort and the related QA cycles.

Originally, the authors attempted to use XPath1.0 as expression language for TAs. This was not sufficient to handle complex correlations of XML-fragment (either intra-document or cross-document) required by WS-I test suites. XPath2.0 provided new features that significantly enhanced the expressive power of target / prerequisite / predicate expressions, such as quantified expressions, iterations and an extensive library of functions. Advanced correlation patterns inside Predicate could be expressed in a declarative way as a set of nested quantified expressions, making it possible to assign and reuse variables at each level.

XSLT2.0 proved a suitable script language to implement a forward-chaining test assertion engine thanks to features such as next-match(), while the authors have not been successful at trying this with XSLT1.0.

Along the line of managing complexity by leveraging the composability of test assertions and of their execution (chaining of test assertions, parameterization, meta-level assertions), a future enhancement will allow a test assertion to define “byproducts” , i.e. XML fragments produced by their execution (in addition to the main outcome “pass”, “fail” etc.) that can be reused in Prerequisite or Predicate of referring test assertions.

Other Works in Semantic Validation of Documents

Schematron

Schematron is a simple pattern and rule language well-focused on document testing. It leverages XPath functions and expressions, and can be implemented using XSLT. Rules in Schematron can be seen as serving similar purpose as test assertions. However the authors, after initially attempting to develop WS-I test suites with Schematron 1.5, had to give up mostly due to its restricted rule execution semantics: there is no support for "logical" rule chaining, and in a pattern only one rule - the first that matches the context - will execute. This form of "if-then-else" chaining applies to the context matching and not to the result of the rule itself, unlike what is expected in conventional rule-based systems.

Schematron has been designed around the idea that the entity under test is the document, while in our test assertion engine it is before all an XML fragment, subset of some document. Each fragment (target) is systematically identified according to a well-defined scheme for its target type. This identity - defined by an XPath expression - is not only used for detailed diagnostic information in the test report, but is central to the rule chaining mechanism of the test engine, i.e. for deciding of the order of the tests on this target. Schematron allows for detailed and dynamic diagnostic information that has the ability to fully identify a subset of a document, but this does not play any role in the rule processing mechanism. Some valuable convenience feature had to be added to test assertions in the form of variables, for handling of complex Predicate expressions or to parameterize test assertions. Such variables are also supported in Schematron 1.6. On the modeling side, Schematron introduces a hierarchy of constructs (assertion, rule, pattern, phase). In contrast, the presented approach is based on a flat model relying on a single construct - test assertion - composable at different levels, but subject to the same execution semantics.

Although Schematron can be written against any XML document, it is primarily intended for XML instances. Namespace handling becomes difficult when writing rules against an XML schema, and when there is a namespace prefix in a value of an attribute or element. Schemas often need be tested against naming and design rules [5]. Rules are label specific, i.e., there is no inheritance. If there is a type hierarchy, separate rules have to be written for every type even if the intention is the same. In Schematron 1.5 rules cannot be reused or combined programmatically, although ISO Schematron has more provision for reusability such as abstract pattern and include statement.

In conclusion, although Schematron is well-positioned for document validation through its lifecycle and is sufficient for many test cases, it has been designed more in the spirit of validating content in a type-checking mode (an extended schema). Its rules are intended for detecting patterns in documents and not to be executed along a test suite processing model requiring tight control on which tests are executed and when based on previous tests results. We believe the concept of test suite is appropriate when considering a combination of diverse XML artifacts including XML-formatted data of non XML source.

OWL

OWL by itself is not a rule or test language. It rather allows for declarative semantic models. The handling and mapping of ontologies is an important aspect of validation [6]. OWL reasoner can determine if there is a conflict based on the characteristics of a particular object instance – in our context, a test target - based on a semantic definition of class membership. OWL's “open world assumption” makes it difficult to use it to validate documents. The open world assumption states that even if an object instance has not declared one of its property, the reasoner cannot assume that it does not exist. Typically, a programming routine needs to be written to “close the world” by explicitly asserting that this object instance has no such property.

Because OWL is designed for acceptable computational time, its expressivity has been limited. Even in OWL Full (the most expressive level of OWL language), the semantic expressivity is limited to constraints around cardinalities and a few relationships between class and properties such as transitive, inverse, and uniqueness. Expressing arithmetic relationships between properties is virtually impossible. There are also few reasoner that can perform OWL Full reasoning and more importantly datatype reasoning (particularly user defined datatype necessary to validate against a range of values or a code list). OWL reasoners typically process data in triple representation, which is memory-greedy. Validating a few megabytes of document on a typical desktop with one to two gigabytes of memory almost becomes impossible (industrial strength reasoning engine and memory or storage management - such as Oracle RDF database - would be needed).

XBRL

The XBRL conformance test suite is worth consideration more as a typical use case than a reusable tool. It is an example of validating a (set of) complex document(s) with advanced semantics, that must comply with various rules in addition to schema compliance. It also encompasses the testing of processors supposed to produce such documents, illustrating how document-testing and processor-testing are intertwined. The “minimal” conformance suite focuses on the document validation while “full” conformance targets XBRL processors. Full conformance involves other documents than the main document being processed, and relies on output documents (Post-Taxonomy Validation infosets) that reflect the processing semantics. Minimal conformance generally contains at least one test for each appearance in the specification of ‘MUST’ that are not already enforced by XML Schema validation. The structure of the test suite is based on the OASIS XSL Conformance Suite. The structure of each individual test is simple:

  • .xsd or .xml input material to the XBRL processor (schema, linkbase, instance)

  • A test case file (.xml)

  • An expected output .xml

Each test case is described by a "meta-level" XML file that refers to associated test material. Some XML files are describing the expected outputs for each test case. The overall test engine is an ad-hoc stylesheet that runs the tests. The pass / fail decision for each case is based on the comparison of canonical forms of actual output and expected output. A first assessment allows to conclude that each test case could be scripted as a test assertion. When specialized operations are needed (like file canonicalization using infoset.xsl, or file comparison) these could be wrapped as xsl functions used in the test assertion expressions. Assuming a two-phase testing process (running XBRL processor, then validating results), the test assertion engine described here could handle the validation phase, which relies on XML documents.

Conclusion

A general-purpose test methodology based on a formal notion of test assertion (originally not intended exclusively for XML input) has proved adequate for the testing of XML artifacts where contextual material of various kinds need be taken into account. When extended and implemented with XML dialects such as XPath2.0 and XSLT2.0, this method has proved more powerful for such XML inputs than dedicated test tools. The resulting test model does not introduce a hierarchy of constructs, but uses a flexible notion of test assertion as the main construct for expressing atomic test results as well as for chaining and composing test units.

Another benefit of the proposed approach is to keep XSLT “under the hood” and not make it part of the definition language of test cases. There is also no need to develop an XSLT test program specific to a test suite. This contrasts with ad-hoc test suites such as XBRL’s. With a robust test assertion model, only XPath needs be mastered by test developers.

Future plans include standardization of the TAG mark-up and its XPath extension, along with an open-source - style availability of the XSLT-based engine technology that supports it.

References

[1] Holman, K., Green, S., Bosak, J., McGrath, T., Schlegel, S. ; Use of XPath to apply constraints to an XML Schema to produce a subset conformance profile ; UBL 1.0 Small Business Subset; 2006 http://docs.oasis-open.org/ubl/cs-UBL-1.0-SBS-1.0/

[2] Durand, J., Kulvatunyou, S., Woo J.,and Martin, M. ; Testing and Monitoring E-Business using the Event-driven Test Scripting Language ; proceedings I-ESA (Interoperability of Enterprise Systems and Applications), April 2007

[3] Glushko, R., and McGrath, T. ; Analyzing and Designing Documents for Business Informatics and Web Services ; MIT Press, March 2008

[4] Durand, J. ; "Will Your SOA Systems Work in the Real World?” ; STAR-East, Software Testing Analysis and Review Conference, May 2007

[5] Lubell, J., Kulvatunyou, B., Morris, K.C., Harvey, B. ; A Tool Kit for Implementing XML Schema Naming and Design Rules ; Extreme Markup Languages Conference, August 2006, Montreal, Canada.

[6] Anicic, N. , Marjanovic, Z. , Ivezic, N. , Jones, A. W. ; Semantic Enterprise Application Integration Standards ; International Journal of Manufacturing Technology and Management (IJMTM) , April, 2006

[7] Green, S., Holman, K.; The Universal Business Language and the Needs of Small Business; iTSC Synthesis Journal 2004.

[8] Test Assertions Guidelines; OASIS TAG technical Committee, February 2009.http://www.oasis-open.org/committees/document.php?document_id=31076

Author's keywords for this paper:
Document Testing; Test Assertion

Jacques Durand

Senior architect, R and D dir.

Fujitsu America, Inc.

Jacques Durand is software architect at Fujitsu America, Inc. with a long-time involvement in XML standard organizations, member of the OASIS Technical Advisory Board, contributor to XML user consortiums such as RosettaNet, OAGI. He has extensive experience in XML-related testing, is chair of the Test Assertions Guideline OASIS committee and of the Testing and Monitoring of Internet Exchanges (TaMIE) committee. He has been leading testing activities for years in the WS-Interoperability consortium and in the ebXML technical committee. He earned a Ph.D. in rule-based systems and logic-programming from Nancy Univ., France.

Stephen Green

Associate Director

Document Engineering Services

Stephen Green is an Associate Director of Document Engineering Services, an international consortium of experts supporting universal business interoperability through the use of open standards. His expertise is in finance, business documents and software development for business and financial applications. He has specialized in legacy systems and modern electronic business trends and their impact on small and medium sized enterprises. Stephen has been active in the Organization for the Advancement of Structured Information Standards (OASIS) for seven years, serving on as many technical committees. He is currently editing the Test Assertions Guidelines of the OASIS technical committee of that name. He previously led the first efforts to provide a small business subset conformance profile for the OASIS Universal Business Language, version 1.0."

Serm Kulvatunyou

Standard and Product Architect

Oracle

Serm Kulvatunyou is currently a Standard and Product Architect at the Oracle's Application Integration Architecture (AIA) division. Formerly, he was a guest researcher the at the Manufacturing Systems Integration Division, National Institute of Standards and Technology (NIST) from the Oak Ridge National Laboratory. At NIST, he has designed and implemented semantics testing and frameworks for design of document model and instance validation in the contenxt of an e-business testbed using XML and related technologies. He has been an active participants in several standard bodies such as UN/CEFACT and OASIS. His current interests are in architecture and best practices methodology to enterprise data model for reusable and interoperable Service-Oriented Architecture. He received his Ph.D. in Industrial Engineering from the Pennsylvania State University, University Park, in 2001.

Tom Rutt

Standards Manager

Fujitsu America, Inc.

Tom Rutt is Standards Manager at Fujitsu America, Inc. with a long-time involvement in XML standard organizations and participates in several Web Services standard committees. He has extensive experience in XML-related testing, and has been involved in the WS-Interoperability consortium for years, more recently designing and developing testing tools for profile conformance. He is also member of the OMG Architecture Board.