<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2"><title>Optimized Cartesian product: A hybrid approach to derivation-chain checking in XSD 1.1</title><!--===========--><!-- META INFO --><!--===========--><info><confgroup><conftitle>Balisage: The Markup Conference 2008</conftitle><confdates>August 12 - 15, 2008</confdates></confgroup><abstract><para>As XPath predicates are involved, XSD 1.1 conditional declarations make difficult the problem of statically verifying whether a type is a legal restriction of its base. The XSD 1.1 current draft adopts a full dynamic approach to the problem. In this paper we propose a hybrid solution (neither completely static, nor completely dynamic) to the same problem.</para></abstract><author><personname><firstname>Maurizio</firstname><surname>Casimirri</surname></personname><personblurb><para>Maurizio Casimirri is a graduate student in Computer Science, at the University of Bologna.</para></personblurb><email>mcasimir@cs.unibo.it</email><affiliation><jobtitle>Graduate student</jobtitle><orgname>Department of Computer Science, University of Bologna</orgname></affiliation></author><author><personname><firstname>Paolo</firstname><surname>Marinelli</surname></personname><personblurb><para>Paolo Marinelli holds a Master Degree in Computer Science from the University of Bologna. He is a temporary research associate at the Department of Computer Science of the University of Bologna.</para></personblurb><affiliation><jobtitle>Temporary research associate</jobtitle><orgname>Department of Computer Science, University of Bologna</orgname></affiliation><email>pmarinel@cs.unibo.it</email></author><author><personname><firstname>Fabio</firstname><surname>Vitali</surname></personname><personblurb><para>Fabio Vitali is an associate professor at the Department of Computer Science at the University of Bologna. He holds a Laurea degree in Mathematics and a Ph.D. in Computer and Law, both from the University of Bologna. His research interests include markup languages; distributed, coordinated systems; and the World Wide Web. He is the author of several papers on hypertext functionalities, the World Wide Web, and XML.</para></personblurb><affiliation><jobtitle>Associate professor</jobtitle><orgname>Department of Computer Science, University of Bologna</orgname></affiliation><email>fabio@cs.unibo.it</email></author><legalnotice><para>Copyright © 2008 by the authors. Reproduced with permission.</para></legalnotice></info><!--==============--><!-- INTRODUCTION --><!--==============--><section xml:id="sectIntroduction" xreflabel="Introduction"><title>Introduction</title><para>In this paper we propose an <emphasis>hybrid</emphasis> solution to the problem of verifying whether a type is a legal restriction of its base in XSD 1.1.</para><para>Version 1.1 of XML Schema (XSD) in now in Working Draft [<xref linkend="entryXSD1.1-structures"/>, <xref linkend="entryXSD1.1-datatypes"/>]. One of the most important features introduced by XSD 1.1 is Conditional Type Assignment, i.e., the possibility for element declarations to assign a type based on XPath predicates. CTA is meant to address one of the most evident limitations of XSD 1.0: the <emphasis>co-constraint</emphasis> definition support.</para><para>The CTA introduction in XSD arises an issue concerning the derivation by restriction between types containing conditional declarations. Indeed, the type actually assigned by a conditional declaration can be known at run-time only and it may vary from element to element. Thus the problem of <emphasis>statically</emphasis> verifying whether a type is a legal restriction of its base when conditional declarations are involved is difficult. It is possible to identify three alternative approaches to the problem:</para><orderedlist><listitem><para>Limiting the CTA usage in type restrictions.</para></listitem><listitem><para>Dynamic verification: at schema compile time no check is performed, but if at run-time we have a document proving that a type is not a legal restriction of its base, an error is thrown.</para></listitem><listitem><para>Hybrid approach: similar to the previous approach, with the exception that at schema compile time some analysis is performed providing the dynamic phase information that might decrease the operations to be performed.</para></listitem></orderedlist><para><emphasis>Co-occurrence constraints</emphasis> (also known as <emphasis>co-constraints</emphasis> ) are rules relating the presence and values of elements and attributes that may occur in distinct fragments of an XML document. E.g., we have a co-constraint when an attribute value governs the content of an element, or when two attributes are in some logic/arithmetic relation, and so on. Co-constraints are present in several XML-based languages, among which also languages recommended by W3C.</para><para>One of the widely recognized limitations of XSD 1.0 [<xref linkend="entryXSD1.0-structures"/>, <xref linkend="entryXSD1.0-datatypes"/>] is the inability to define co-constraints. This is a serious shortcoming for a schema language, and especially for XSD, given its widespread use. Indeed, when a schema is not able to capture every validity constraints of a class of XML documents, in order to reach a complete validation process specific modules are required to verify those constraints not covered by the schema. In such cases, not only the validation process becomes more complex, but also the interoperability between applications decreases.</para><para>Version 1.1 of XSD introduces two mechanisms for the co-constraint definition: <emphasis>assertions</emphasis> and  <emphasis>Conditional Type Assignment</emphasis> (CTA). The former is inspired by Schematron [<xref linkend="entrySchematronISOspecification"/>]. It permits to augment type definitions with XPath predicates (called assertions), each specifying a further validity condition besides those enforced by the content model. Assertions are particularly useful to require some elements and attributes to be in logic or arithmetic relations.</para><para>CTA, inspired by SchemaPath [<xref linkend="entrySchemaPathWWW"/>], allows an element declaration to conditionally assign a type based on XPath predicates. Here we refer to declaration of such a typology as <emphasis>conditional declarations</emphasis>. CTA is particularly suitable for those situations where an attribute value governs the content of an element. For instance, in order to subject the content of a <code>&lt;entry&gt;</code> element (representing a bibliographic entry) to the value of the <code>kind</code> attribute, the following conditional declaration might be used</para><programlisting xml:space="preserve">
&lt;xs:element name="entry" type="Entry"&gt;
	&lt;xs:alternative test="@kind = 'proceedings'" type="ProceedingsEntry" /&gt;
	&lt;xs:alternative test="@kind = 'journal'"     type="JournalEntry" /&gt;
	&lt;xs:alternative test="@kind = 'book'"        type="BookEntry" /&gt;
&lt;/xs:element&gt;
</programlisting><para>The above declaration reads as "if <code>@kind</code> is <code>proceedings</code> then <code>&lt;entry&gt;</code> is of type <emphasis>ProceedingsEntry</emphasis>, otherwise if <code>@kind</code> is <code>journal</code> then <code>&lt;entry&gt;</code> is of type <emphasis>JournalEntry</emphasis>, otherwise if <code>@kind</code> is <code>book</code> then <code>&lt;entry&gt;</code> is of type <emphasis>BookEntry</emphasis>, otherwise (none of the above conditions hold) <code>&lt;entry&gt;</code> is of type <emphasis>Entry</emphasis> (which is a type for generic bibliographic entries). Each <code>&lt;xs:alternative&gt;</code> element is named <emphasis>type alternative</emphasis> in XSD 1.1.</para><para>Conditional declarations arise an issue in the derivation by restriction. XSD (both 1.0 and 1.1) allows to define new types deriving existing ones. A derivation method is by restriction. The general principle behind the derivation by restriction is that the derived type accepts a subset of what the base type accepts. Thus, given an element <emphasis>E</emphasis>, it is required that the type assigned to <emphasis>E</emphasis> in the context of the derived type, be a restriction of the type assigned to <emphasis>E</emphasis> in the context of the base type. The presence of conditional declarations heavily complicates the static verification of such principle, as it require to analyze logic relations among XPath predicates. However, there are at least three alternative approaches to the problem of verifying whether a type is a legal restriction of its base:</para><variablelist><varlistentry><term>CTA limitation</term><listitem><para>Ad hoc limitations are imposed on the CTA usage to ensure a simple static verification of the restriction. For instance, a radical limitation is that when a type contains a conditional declaration it is implicitly final w.r.t. the derivation by restriction.</para></listitem></varlistentry><varlistentry><term>Full dynamic approach</term><listitem><para>At schema compile type, it is never checked whether a type is a legal restriction of its base. However, at validation time, it is checked whether the instance document is an evidence of the fact that a type is not a legal restriction of its base.</para></listitem></varlistentry><varlistentry><term>Hybrid approach</term><listitem><para>Very similar to the full dynamic approach, but at schema compile time conditional declarations are processed in order to precompute those cases in which the derivation by restriction is violated. Such precomputed information is then available (in some form) at run-time, and it can be used by the validator to decrease the number of operations required to conclude whether the current document is an evidence of the fact that a type is not a legal restriction of its base.</para></listitem></varlistentry></variablelist><para>The XSD current draft adopts the full dynamic approach. Indeed, at schema compile time the derivation by restriction is checked simply ignoring type alternatives. Given an element <emphasis>E</emphasis>, at run-time it is checked that the type conditionally assigned to <emphasis>E</emphasis> in the context of a derived type is a restriction of the type which would be assigned to <emphasis>E</emphasis> in the context of the base type. Such a condition is then recursively checked also for <emphasis>E</emphasis> and the base type, until the type hierarchy root is reached. I.e., the condition is checked on the entire derivation chain of the initial derived type. We call such a solution <emphasis>Run-Time Check</emphasis> (RTC). The XSD Working Group publicly <quote>solicit input from implementors and users of this specification as to whether the current run-time rule should be retained</quote> [<xref linkend="entryXSD1.1-structures"/>].</para><para><emphasis>Cartesian Product</emphasis> (CP) is another solution but adopts a hybrid approach. At schema compile time it analyzes <emphasis>all</emphasis> possible cases that may occur at run-time. As the number of such cases is very high, it has serious shortcomings concerning the computational cost of the static phase.</para><para>In this paper we propose a hybrid solution to the problem of the verification of the derivation by restriction in presence of conditional declarations. We call our solution <emphasis>Optimized Cartesian Product</emphasis> (OCP). OCP, CP, and RTC have the same extensional behavior. OCP can be seen as an optimization to RTC. Indeed, at run-time the number of XPath predicates evaluated by OCP is less than or equal to the number of XPath predicates evaluated by RTC. Moreover, the OCP static phase requires an acceptable computational cost. So OCP is also an optimization to CP (and hence its name). Our paper contribute is two-fold:</para><orderedlist><listitem><para>it proposes an optimization to the solution adopted by the XSD current draft;</para></listitem><listitem><para>it answers the feedback request about RTC, discussing various possible approaches to the derivation problem, and thoroughly describing three solutions: RTC, CP and OCP;</para></listitem></orderedlist><para>Our paper is organized as follows. The next section describes XSD 1.1 in relation to the problem of the co-constraint definition. In particular it describes CTA and assertions. Then we introduce some XSD 1.1 specific terminology in Section “<xref linkend="sectXSDTerminology"/>”. In Section “<xref linkend="sectRestrictionXSD1.1"/>” we describe the problem of the derivation by restriction in presence of conditional declarations. There we describe some possible approaches and solutions. In particular, we provide a detailed Run-Time Check description, including a computational cost analysis. We also discuss about Cartesian Product. Then in Section “<xref linkend="sectOurProposal"/>” we describe our proposal, providing a computational cost analysis for both the static phase and the dynamic phase. Then in Section “<xref linkend="sectComparison"/>” we compare RTC, CP and OCP, mainly focusing on the number of XPath predicates evaluated at validation time by the three techniques. Then we describe a prototype implementation for OCP, which demonstrates the feasibility of our proposal. Before concluding, in Section “<xref linkend="sectRelatedWorks"/>” we discuss about some related works.</para></section><!--=============================--><!-- XSD 1.1 AND CO-CONSTRAINTS --><!--=============================--><section xreflabel="XSD 1.1 and Co-Constraints" xml:id="sectXSD1.1CoConstraints"><title>XSD 1.1 and Co-Constraints</title><para>XSD (<emphasis>XML Schema Definition Language</emphasis>) is the schema language proposed by W3C. Its current version is 1.0, and it is described by two W3C recommendations [<xref linkend="entryXSD1.0-structures"/>, <xref linkend="entryXSD1.0-datatypes"/>]. Although there are many other schema languages (such as RELAX NG [<xref linkend="entryRELAXNGISOspecification"/>], and  Schematron [<xref linkend="entrySchematronISOspecification"/>]), XSD probably is the most known and supported. XSD 1.0 provides support for the definition of a number of constraint kinds. For instance, by means of content models, it is possible to define the legal content of elements. XSD provides a rich set of built-in types for the definition of legal data values. It also provides derivation mechanisms, permitting to construct new types from existing ones. Moreover, XSD allows to define uniqueness and reference constraints on elements and attributes (cumulatively called identity-constraints).</para><para>However, XSD 1.0 is widely recognized as unable to express a particular kind of constraints: <emphasis>co-occurrence constraints</emphasis> (commonly referred to as <emphasis>co-constraints</emphasis>). According to the definition given by the ESW Wiki, co-constraints are <quote>rules which govern what kinds of markup (elements, attributes) can occur together (co-occur) in an XML document.</quote> [<xref linkend="entryCoOccurrenceConstraintsESWWiki"/>]. In other words, a co-constraint relates the existence or values of an element (or attribute) to the existence or values of other elements (or attributes).</para><para>Some categorizations for co-constraints do exist. Within the ESW Wiki about 30 use-cases are listed, ranging from the mutual exclusion of attributes, to the requirement that two elements values must be in some arithmetic relation. In 2001, Norman Walsh and John Cowan identified seven kinds of co-constraints [<xref linkend="entryWalshCowan2001"/>].</para><para>XSD 1.0 is unable to express co-constraints. Such an inability is heavily felt for in many user communities. For instance, validation of incoming data is critical for e-business infrastructures, that require also the adoption of co-constraints. Adopting XSD 1.0 as validation language requires them to implement application-specific modules, in order to provide a complete validation process.</para><para>W3C is releasing a new version of XSD: XSD 1.1. At the time of writing, it is in Last Call Working Draft [<xref linkend="entryXSD1.1-structures"/>, <xref linkend="entryXSD1.1-datatypes"/>]. One of the major improvements over 1.0, is the support for co-constraints definition. For this purpose, XSD 1.1<footnote><para>From here on, we refer to XSD 1.1 just as XSD.</para></footnote> introduces two mechanisms: <emphasis>assertions</emphasis> and <emphasis>Conditional Type Assignment</emphasis>. Both mechanisms will be described in the next two sections.</para><!--============--><!-- Assertions --><!--============--><section xml:id="sectAssertions" xreflabel="Assertions"><title>Assertions</title><para>In XSD 1.1, a complex type may define a sequence of assertions. Each assertion basically is an XPath 2.0 predicate. In order to be considered valid, each element assigned to a type must satisfy all the assertions defined by that type. Syntactically, an assertion is represented by an <code>&lt;assert&gt;</code> element, whose <code>test</code> attribute specifies the XPath predicate.</para><!-- example --><figure xml:id="figAssertionsExample" floatstyle="1" xreflabel="Assertion Example"><title>Assertion Example</title><programlisting xml:space="preserve">
&lt;xs:element name="pages" type="PagesType" /&gt;
&lt;xs:complexType name="PagesType"&gt;
  &lt;xs:attribute name="from" type="xs:positiveInteger" /&gt;
  &lt;xs:attribute name="to" type="xs:positiveInteger" /&gt;
  <emphasis>&lt;xs:assert test="@from le @to" /&gt;</emphasis>
&lt;/xs:complexType&gt;
</programlisting><caption><para>An example of XSD assertions. This assertion enforces the <code>from</code> attribute being less than the <code>to</code> attribute.</para></caption></figure><para>For instance, suppose we want to define an XML language to represent bibliography entries. In order to specify the conference proceedings pages in which a paper appears, we might define a <code>&lt;pages&gt;</code> element with two attributes <code>from</code> and <code>to</code>. In order to enforce <code>from</code> being greater than <code>to</code>, the type definition shown in Figure “<xref linkend="figAssertionsExample"/>” might be used.</para><!-- brief comparison with Schematron --><para>Assertions are clearly inspired by Schematron. However, there are some points of distinction. XSD associates assertions to type definitions. On the other hand, Schematron is not a typed language, and thus it associates assertions to elements, or, more precisely, to a set of elements identified by an expression. Moreover, Schematron allows to define assertions involving elements and attributes placed anywhere in the document. On the other hand, in XSD an assertion is allowed to involve only elements and attributes of the subtree rooted by the element the assertion is checked on: nodes outside that subtree are not visible.<footnote><para>This limitation is not enforced by a syntactic limitation on the XPath expression, but rather by the way the XPath Data Model is constructed.</para></footnote></para></section><!--=============================--><!-- Conditional Type Assignment --><!--=============================--><section xml:id="sectCTA" xreflabel="Conditional Type Assignment"><title>Conditional Type Assignment</title><!-- descrizione --><para>XSD supports co-constraint definitions by means of another mechanism known as <emphasis>Conditional Type Assignment</emphasis> (CTA). An element declaration may specify a sequence of alternative types, each associated with an XPath predicate. Within this paper we refer to such declarations as <emphasis>conditional</emphasis>. When an element of the instance document is validated against a conditional declaration, the XPath predicates are evaluated using the element as context node. The type assigned to the element is the one corresponding to the satisfied predicate. If the element satisfies more than one predicate, the one occurring first within the schema takes precedence. A conditional declaration always specifies a default type, which is assigned when no predicates are satisfied. Each alternative type derives from the declared type.</para><!-- esempio --><figure xml:id="figCTAExample" xreflabel="CTA Example" floatstyle="1"><title>CTA Example</title><programlisting xml:space="preserve">
&lt;xs:complexType name="Entry"&gt;
  &lt;xs:sequence&gt;
    <emphasis>content model for a generic entry</emphasis>
  &lt;/xs:sequence&gt;
  &lt;xs:attribute name="kind" type="EntryKindType" /&gt;
&lt;/xs:complexType&gt;

&lt;xs:complexType name="ProceedingsEntry"&gt;
  &lt;xs:complexContent&gt;
    &lt;xs:extension base="Entry"&gt;
      &lt;xs:sequence&gt;
        &lt;xs:element name="conference" type="Conference" /&gt;
        &lt;xs:element name="pages" type="Pages" /&gt;
      &lt;/xs:sequence&gt;
    &lt;/xs:extension&gt;
  &lt;/xs:complexContent&gt;
&lt;/xs:complexType&gt;

&lt;xs:element name="entry" type="Entry"&gt;
  <emphasis>&lt;xs:alternative test="@kind = 'proceedings'" type="ProceedingsEntry" /&gt;</emphasis>
&lt;/xs:element&gt;
</programlisting><caption><para>An example of CTA usage. The conditional declaration assigns type <emphasis>Proceedings</emphasis> if the entry is of kind <emphasis>proceedings</emphasis>, otherwise it assigns type <emphasis>Entry</emphasis>.</para></caption></figure><para>In order to show the usefulness of CTA within the co-constraint definition, we consider again our language for bibliographic entries. Suppose we want to represent an entry through an <code>&lt;entry&gt;</code> element, whose <code>kind</code> attribute specifies the entry kind (i.e., conference proceedings, technical report, and so on). Suppose that we want to enforce the following co-constraint: if the entry is of kind proceedings, than the <code>&lt;conference&gt;</code> and <code>&lt;pages&gt;</code> elements must be present. Then we might define:</para><itemizedlist><listitem><para>an <emphasis>Entry</emphasis> type, constraining the content of a generic entry</para></listitem><listitem><para>a <emphasis>ProceedingsEntry</emphasis> type, derived by <emphasis>Entry</emphasis> and requiring the presence of both <code>&lt;conference&gt;</code> and <code>&lt;pages&gt;</code>.</para></listitem><listitem><para>a conditional declaration for <code>&lt;entry&gt;</code> assigning type <emphasis>ProceedingsEntry</emphasis> if the <code>kind</code> attribute has value <code>"proceedings"</code>, and type <emphasis>Entry</emphasis> otherwise.</para></listitem></itemizedlist><para>That solution is shown in Figure “<xref linkend="figCTAExample"/>”.</para><para>Note that, by default if an element does not satisfy any alternative, then it is assigned the type specified through the <code>type</code> attribute (known as the <emphasis>declared type</emphasis>). In order to specify a default type other than the declared type, it is possible to explicitly define a <emphasis>default type alternative</emphasis>, i.e., a type alternative occurring in last position and without any XPath predicate.</para><para>XSD 1.1 introduces a new built-in simple type named <emphasis>error</emphasis>. No element or attribute is valid against such a type. <emphasis>error</emphasis> is typically used in default type alternatives to states that it is an error if no type alternative is selected.</para><!-- brief comparison with SchemaPath --><para>The CTA mechanism is inspired by SchemaPath, an extension to XSD 1.0 introducing the concept of conditional type assignment [<xref linkend="entrySchemaPathWWW"/>]. However, there are some remarkable points of distinction (besides some syntactic aspects). In SchemaPath, a conditional declaration does not have the declared type. Moreover, while SchemaPath does not put any restriction on XPath predicates, XSD allows predicates to access attribute nodes only. Thus, it is not possible to put conditions on neither preceding, ancestor, nor descending nodes.</para></section></section><!--==================--><!-- XSD TERMINOLOGY --><!--==================--><section xml:id="sectXSDTerminology" xreflabel="XSD Terminology"><title>XSD Terminology</title><para>For the reader unfamiliar with XSD, this section introduces some CTA-related definitions taken from the XSD current draft [<xref linkend="entryXSD1.1-structures"/>].</para><variablelist><varlistentry><term><emphasis role="bold">Declared type</emphasis></term><listitem><para>Given an element declaration <emphasis>D</emphasis> the declared type of <emphasis>D</emphasis> is either the type referred to by the attribute <code>type</code>, or the anonymous type definition within <emphasis>D</emphasis></para></listitem></varlistentry><varlistentry><term><emphasis role="bold">Context-determined type</emphasis></term><listitem><para>Given an element <emphasis>E</emphasis> and a type <emphasis>T</emphasis>, the context-determined type of <emphasis>E</emphasis> in <emphasis>T</emphasis> is the declared type of the declaration <emphasis>D</emphasis> assigned to <emphasis>E</emphasis> by the <emphasis>T</emphasis> content model.</para></listitem></varlistentry><varlistentry><term><emphasis role="bold">Type Table</emphasis></term><listitem><para>A Type Table is a property of conditional declarations, and it is a sequence of type alternatives (or simply, alternatives). Each alternative corresponds to a <code>xs:alternative</code> element, and it mainly consits of an XPath predicate and a type.</para></listitem></varlistentry><varlistentry><term><emphasis role="bold">Selected type</emphasis></term><listitem><para>Given an element <emphasis>E</emphasis> and a Type Table <emphasis>TT</emphasis>, the selected type of <emphasis>E</emphasis> is the type associated to the <emphasis>TT</emphasis> alternative satisfied by <emphasis>E</emphasis>.</para></listitem></varlistentry><varlistentry><term><emphasis role="bold">Context-determined Type Table</emphasis></term><listitem><para>Given an element <emphasis>E</emphasis> and a type <emphasis>T</emphasis>, the context-determined Type Table of <emphasis>E</emphasis> in <emphasis>T</emphasis> is the Type Table of the declaration <emphasis>D</emphasis> assigned to <emphasis>E</emphasis> by the <emphasis>T</emphasis> content model. If <emphasis>D</emphasis> is non-conditional, the context-determined Type Table has the default alternative only, which assigns the declared type.</para></listitem></varlistentry></variablelist></section><!--=======================================--><!-- DERIVATION BY RESTRICTION IN XSD 1.1 --><!--=======================================--><section xml:id="sectRestrictionXSD1.1" xreflabel="Derivation by Restriction in XSD 1.1"><title>Derivation by Restriction in XSD 1.1</title><para>XSD allows to define new types from existing ones through two derivation mechanisms, <emphasis>extension</emphasis> and <emphasis>restriction</emphasis>. The latter is meant to define a type whose content model accepts a subset of what the base type content model accepts.</para><!-- problema description --><figure xml:id="figCTARestrictionExample" xreflabel="CTA Restriction Example" floatstyle="1"><title>CTA Restriction Example</title><programlisting xml:space="preserve">
&lt;xs:complexType name="B"&gt;
  &lt;xs:sequence&gt;
    &lt;xs:element name="e" type="xs:anyType"&gt;
      &lt;xs:alternative test="@a &gt;  @b" type="T1" /&gt;
      &lt;xs:alternative test="@a &lt;= @b" type="T2" /&gt;
    &lt;/xs:element&gt;
  &lt;/xs:sequence&gt;
&lt;/xs:complexType&gt;

&lt;xs:complexType name="R"&gt;
  &lt;xs:complexContent&gt;
    &lt;xs:restriction base="B"&gt;
      &lt;xs:sequence&gt;
        &lt;xs:element name="e" type="xs:anyType"&gt;
          &lt;xs:alternative test="@a &gt;  @b" type="T2" /&gt;
          &lt;xs:alternative test="@a &lt;= @b" type="T1" /&gt;
        &lt;/xs:element&gt;
      &lt;/xs:sequence&gt;
    &lt;/xs:restriction&gt;
  &lt;/xs:complexContent&gt;
&lt;/xs:complexType&gt;

&lt;xs:complexType name="T1"&gt;
  &lt;xs:sequence&gt;
    &lt;xs:element name="t1" /&gt;
  &lt;/xs:sequence&gt;
  &lt;xs:attributeGroup ref="ab" /&gt;
&lt;/xs:complexType&gt;

&lt;xs:complexType name="T2"&gt;
  &lt;xs:sequence&gt;
    &lt;xs:element name="t2" /&gt;
  &lt;/xs:sequence&gt;
  &lt;xs:attributeGroup ref="ab" /&gt;
&lt;/xs:complexType&gt;

</programlisting><caption><para>An example of derivation by restriction involving conditional declarations.</para></caption></figure><para>The presence of conditional declarations within content models immediately arises a question concerning the derivation by restriction. In order to explain the issue, let us consider the schema snippet shown in Figure “<xref linkend="figCTARestrictionExample"/>”. We can observe that neither <emphasis>T1</emphasis> derives from <emphasis>T2</emphasis>, nor the converse. Also it is easy to observe that whenever the conditional declaration within <emphasis>B</emphasis> assigns <emphasis>T1</emphasis>, the conditional declaration within <emphasis>R</emphasis> assigns <emphasis>T2</emphasis>, and vice versa.</para><para>Now let us consider the following XML fragment:</para><programlisting xml:space="preserve">
&lt;p xsi:type="R"&gt;
 &lt;e a="5" b="2"&gt; <emphasis>&lt;!-- @a &gt; @b --&gt;</emphasis>
   &lt;t2 /&gt;
 &lt;/e&gt;
&lt;/p&gt;
</programlisting><para>As <code>&lt;p&gt;</code> is assigned type <emphasis>R</emphasis>, its child <code>&lt;e&gt;</code> is assigned type <emphasis>T2</emphasis>. According to the schema, <code>&lt;e&gt;</code> is valid against <emphasis>T2</emphasis>. For what previously observed, the type that would be assigned to <code>&lt;e&gt;</code> if <code>&lt;p&gt;</code> was of type <emphasis>B</emphasis> is <emphasis>T1</emphasis>. Clearly, <code>&lt;e&gt;</code> is not valid against <emphasis>T1</emphasis>. Thus we have that <emphasis>B</emphasis> rejects something <emphasis>R</emphasis> accepts. We can reasonably argue that this is a violation of the principle behind the derivation by restriction. And actually, the XSD current draft imposes constraints meant to detect as illegal situations like the one above.</para><para>Before discussing in details how the XSD current draft faces the derivation by restriction in presence of conditional declarations, we examine some general approaches to the problem.</para><!--======================--><!-- full static approach --><!--======================--><section xml:id="sectFullStaticApproach" xreflabel="Full Static Approach"><title>Full Static Approach</title><para>We can think about approaches <emphasis>statically</emphasis> (i.e., at schema compile time) deciding whether a conditional declaration within a restricted type is compatible with a conditional declaration within the base type. I.e., such an approach should decide the following problem. Given two conditional declarations <emphasis>R</emphasis> and <emphasis>B</emphasis>, is there any XML document containing an element <emphasis>E</emphasis> such that if <emphasis>E</emphasis> is validated against <emphasis>R</emphasis> then it is assigned a type which is not a valid restriction of the type that would be assigned if <emphasis>E</emphasis> was validated against <emphasis>B</emphasis>?</para><para>This is not a simple problem, as it is necessary to verify logic relationships among XPath predicates. For instance consider again the conditional declarations shown in Figure “<xref linkend="figCTARestrictionExample"/>”. Clearly, the answer to the above question is <emphasis>yes</emphasis>, as if the conditional declaration within the restricted type assigns <emphasis>T1</emphasis>, then the conditional declaration within the base type assigns <emphasis>T2</emphasis>. Thus, we can assert that the derived type is not a legal restriction of its base. In order to prove it, we should consider the semantics of the relational operators <code>&gt;</code> and <code>&lt;=</code>. Probably, it is not a so difficult task, as the XPath predicates involved in the example are quite simple. But if we move to the general case, the problem becomes much more difficult, as we have to consider also the other XPath functions and operators.</para></section><!--===================================--><!-- expressivity limitation approach  --><!--===================================--><section xml:id="sectExpressivityLimitationApproach" xreflabel="Expressivity Limitation Approach"><title>Expressivity Limitation Approach</title><para>It is possible to identify a class of approaches facing the problem by narrowing the CTA usage, in order to avoid the XPath predicates analysis. For instance a simple and radical solution to the problem consists in implicitly setting as <emphasis>final</emphasis><footnote><para>In XSD, if a type is set as final, it cannot be derived.</para></footnote> every complex types containing a conditional declaration. This is the solution adopted by SchemaPath. Obviously, this solution might be felt as overly restrictive.</para><para>Less restrictive solutions may be found. The XSD Working Group discussed a number of them. One of such solutions, known as <emphasis>Prefix</emphasis>, forces the conditional declaration within the derived type to repeat all the alternatives of the conditional declaration within the base type, and allows to append new alternatives. Requiring new alternative types to be a restriction of the default type of the base declaration, ensures that every type assigned in the context of the derived type is a restriction of the type that would be assigned in the context of the base type.</para><figure xml:id="figCTARestrictionUseCase" floatstyle="1" xreflabel="CTA Restriction Use Case"><title>CTA Restriction Use Case</title><programlisting xml:space="preserve">
&lt;xs:complexType name="B"&gt;
  &lt;xs:sequence&gt;
    &lt;xs:element name="message" type="messageType"&gt;
      &lt;xs:alternative test="@kind='string'" type="messageTypeString"/&gt;
      &lt;xs:alternative test="@kind='base64'" type="messageTypeBase64"/&gt;
      &lt;xs:alternative test="@kind='binary'" type="messageTypeBase64"/&gt;
      &lt;xs:alternative test="@kind='xml'"    type="messageTypeXML"/&gt;
      &lt;xs:alternative test="@kind='XML'"    type="messageTypeXML"/&gt;
    &lt;/xs:element&gt;
  &lt;/xs:sequence&gt;
&lt;/xs:complexType&gt;

&lt;xs:complexType name="R"&gt;
  &lt;xs:complexContent&gt;
    &lt;xs:restriction base="B"&gt;
      &lt;xs:sequence&gt;
        &lt;xs:element name="message" type="messageType"&gt;
          &lt;xs:alternative test="@kind='string'" type="messageTypeString"/&gt;
          &lt;xs:alternative                       type="xs:error"/&gt;
        &lt;/xs:element&gt;
      &lt;/xs:sequence&gt;
    &lt;/xs:restriction&gt;
  &lt;/xs:complexContent&gt;
&lt;/xs:complexType&gt;
</programlisting><caption><para>An example of restriction in presence of CTA. The restricted type definition is meant to accept a subset of what the base type definition accepts.</para></caption></figure><para>Let us consider the schema shown in Figure “<xref linkend="figCTARestrictionUseCase"/>”, which is inspired by the example of CTA usage described in the XSD current draft. Within type <emphasis>B</emphasis>, the conditional declaration for <code>&lt;message&gt;</code> elements assigns a specific message type based on the <code>kind</code> attribute value. Type <emphasis>R</emphasis> is meant to accept string messages only. In this respect, we might argue that <emphasis>R</emphasis> is a legal restriction of <emphasis>B</emphasis>. However, Prefix rejects the above schema, as the alternative sequence of the base type are not listed in the alternative sequence of the restricted type.</para></section><!--=======================--><!-- Full dynamic approach --><!--=======================--><section xml:id="sectFullDynamicApproach" xreflabel="Full Dynamic Approach"><title>Full Dynamic Approach</title><para>Another approach to the problem is the following: do not perform any check at compile time and let schema authors to write conditional declarations as they like, but if at run time (i.e., at validation time) there is an evidence of the fact that a type is not a legal restriction of its base type, then report the error. W.r.t. the approach described in the previous section, this one reaches the maximum expressivity degree in writing conditional declarations. Clearly, the drawback is that the same schema error might become evident only for certain instance documents, and not for others.</para><para>This is the approach adopted by the XSD current draft, and will be described in details in Section “<xref linkend="sectRuntTimeCheck"/>”.</para></section><!--================--><!-- Run-Time Check --><!--================--><section xml:id="sectRuntTimeCheck" xreflabel="Run-Time Check"><title>Run-time Check</title><para>As already mentioned in the previous sections, the XSD current draft adopts a dynamic approach to the verification of the derivation by restriction in presence of conditional declarations. Indeed, the general problem of verifying whether a type <emphasis>R</emphasis> is legal restriction of a type <emphasis>B</emphasis> is divided in two phases. The first one is meant to be performed at schema compile time, and it considers the declared type of element declarations only, thus simply ignoring the presence of Type Tables. For the implementation of such a phase, the XSD draft refers to the algorithms described in [<xref linkend="entryFSAThompson"/>], [<xref linkend="entryFuchs"/>], and [<xref linkend="entryBrzozowski"/>].</para><para>The second phase (which we call <emphasis>Run-Time Check</emphasis> or simply RTC) is meant to be performed at run-time, and it takes into considerations Type Tables. The XSD specs describe it as a rule to decide the validity of an element w.r.t. a type. In order to be valid against a type <emphasis>T</emphasis>, an element must satisfy a number of constraints. One of such constraints states that each child <emphasis>E</emphasis> together with <emphasis>T</emphasis> must satisfy the <emphasis>Conditional Type Substitutable in Restriction</emphasis> constraint (CTSR).</para><para>Informally, given an element <emphasis>E</emphasis> and a type <emphasis>T</emphasis>, <emphasis>E</emphasis> and <emphasis>T</emphasis> satisfy CTSR if the type assigned to <emphasis>E</emphasis> in the context of <emphasis>T</emphasis>, is a valid restriction of the type assigned to <emphasis>E</emphasis> in the context of <emphasis>T</emphasis>'s base type. Moreover, <emphasis>E</emphasis> and <emphasis>T</emphasis>'s base type must recursively satisfy CTSR.</para><para>For instance, consider again the schema in Figure “<xref linkend="figCTARestrictionExample"/>”, and the following XML document fragment:</para><programlisting xml:space="preserve">
&lt;p xsi:type="R"&gt;
 &lt;e a="5" b="2"&gt;
   &lt;t2 /&gt;
 &lt;/e&gt;
&lt;/p&gt;
</programlisting><para>The first phase checks whether type <emphasis>R</emphasis> is a legal restriction of <emphasis>B</emphasis>, taking into consideration declared types only. In particular, it is checked whether the declared type of the element declaration within <emphasis>R</emphasis> (i.e., <emphasis>anyType</emphasis>) is a valid restriction of the declared type of the element declaration within <emphasis>B</emphasis> (i.e., <emphasis>anyType</emphasis>). As <emphasis>anyType</emphasis> is a valid restriction of itself, the first phase succeeds.</para><para>The second phase checks whether or not the input document is an evidence of the fact that <emphasis>R</emphasis> is <emphasis>not</emphasis> a legal restriction of <emphasis>B</emphasis>. In particular, RTC checks whether <code>&lt;e&gt;</code> and <emphasis>R</emphasis> satisfy CTSR. Thus, the Type Table determined by <emphasis>R</emphasis> is evaluated, obtaining the selected type <emphasis>T2</emphasis>. Then, also the Type Table determined by <emphasis>B</emphasis> is evaluated, obtaining the selected type <emphasis>T1</emphasis>. As <emphasis>T2</emphasis> is not a valid restriction of <emphasis>T1</emphasis>, <code>&lt;e&gt;</code> and <emphasis>R</emphasis> does not satisfy CTSR. As a consequence, <code>&lt;p&gt;</code> is not valid against <emphasis>R</emphasis>.</para><section xml:id="sectRTCAlgorithm" xreflabel="An Algorithm for Run-Time Check"><title>An Algorithm for Run-Time Check</title><figure xml:id="figRTCAlgorithm" floatstyle="1" xreflabel="RTC Algorithm"><title>RTC Algorithm</title><!--<programlisting linenumbering="numbered">--><programlisting xml:space="preserve">
void process-element(Element e) {

  <emphasis>...</emphasis>

  Type T = get-current-type(); <emphasis>// e's parent type</emphasis>

  TypeTable TT<subscript>T</subscript> = get-context-determined-type-table(e, T);
  int i = evaluate-type-table(e, TT<subscript>T</subscript>);
  Type S<subscript>T</subscript> = TT<subscript>T</subscript>.get-alternative(i).getType();

  boolean error = false;

  <emphasis>// walk on the derivation chain</emphasis>
  while (T is not xs:anyType and !error) do {

    Type B = T.base;
    TypeTable TT<subscript>B</subscript> = get-context-determined-type-table(e, B);
    int j = evaluate-type-table(e, TT<subscript>B</subscript>);
    Type S<subscript>B</subscript> = TT<subscript>B</subscript>.getAlternative(j).getType();

    if (validly-substitutable-as-restriction(S<subscript>T</subscript>, S<subscript>B</subscript>)) {
      T = B;
      S<subscript>T</subscript> = S<subscript>B</subscript>;
    } else {
      <emphasis>// CTSR violation</emphasis>
      error = true;
      report-schema-error("vr-cta-substitutable");
    }
  }

  <emphasis>...</emphasis>

}

</programlisting><caption><para>An algorithm implementing RTC.</para></caption></figure><para>In Figure “<xref linkend="figRTCAlgorithm"/>” we present an algorithm implementing RTC, in Java-like pseudo-code. The <emphasis>process-element</emphasis> function is meant to be invoked for each element of the instance document. The function body is a simple iterative version of the CTSR constraint definition given in the XSD draft. It iterates over the derivation chain for <emphasis>T</emphasis>, and it stops when either a violation of CTSR occurs or the type hierarchy root (i.e., <emphasis>anyType</emphasis>) is reached.</para></section><section xml:id="sectRTCAlgorithmCostAnalysis" xreflabel="RTC Algorithm Cost Analysis"><title>RTC Algorithm Cost Analysis</title><para>In presenting our cost analysis for the algorithm shown in the previous section, we need to introduce some notations. A Type Table <emphasis>TT</emphasis> is an ordered sequence of <emphasis>n</emphasis> pairs &lt;<emphasis>c</emphasis><subscript>1</subscript>, <emphasis>T</emphasis><subscript>1</subscript>&gt;, ..., &lt;<emphasis>c<subscript>n</subscript></emphasis>, <emphasis>T<subscript>n</subscript></emphasis>&gt;, where <emphasis>c<subscript>n</subscript></emphasis> is the always true condition. We also say that <emphasis>TT</emphasis> has size <emphasis>n</emphasis>.</para><para>Now, let <emphasis>E</emphasis> and <emphasis>T</emphasis> be an element and a type definition respectively. Consider the derivation chain for <emphasis>T</emphasis>. We indicate it with <emphasis>T</emphasis><subscript>1</subscript>, ..., <emphasis>T<subscript>k</subscript></emphasis>, where <emphasis>T</emphasis><subscript>1</subscript> is <emphasis>anyType</emphasis>, <emphasis>T<subscript>k</subscript></emphasis> is <emphasis>T</emphasis>, and for each <emphasis>i</emphasis> &lt; <emphasis>k</emphasis>, <emphasis>T<subscript>i</subscript></emphasis> is <emphasis>T<subscript>i</subscript></emphasis><subscript>+1</subscript>'s base type. For each <emphasis>i</emphasis> between 1 and <emphasis>k</emphasis>, we denote the context-determined Type Table for <emphasis>E</emphasis> in <emphasis>T<subscript>i</subscript></emphasis> by <emphasis>TT<subscript>i</subscript></emphasis>. Moreover we denote the <emphasis>TT<subscript>i</subscript></emphasis> size by <emphasis>d<subscript>i</subscript></emphasis>.</para><para>Now, we can start the RTC algorithm cost analysis. If there is no CTSR violation, the RTC algorithm iterates over the entire derivation chain for <emphasis>T</emphasis>. For each <emphasis>i</emphasis> between 1 and <emphasis>k</emphasis>, the following operations are performed:</para><orderedlist><listitem><para>get the context-determined Type Table for <emphasis>E</emphasis> in <emphasis>T<subscript>i</subscript></emphasis>, i.e., <emphasis>TT<subscript>i</subscript></emphasis></para></listitem><listitem><para>calculate the selected type for <emphasis>E</emphasis> according to <emphasis>TT<subscript>i</subscript></emphasis></para></listitem><listitem><para>check whether the selected type is validly substitutable as restriction for the selected type calculated at step <emphasis>i</emphasis> - 1.</para></listitem></orderedlist><para>We assume the first operation has a negligible cost. Indeed, given an element <emphasis>E</emphasis> and a type <emphasis>T</emphasis>, the <emphasis>Element Declarations Consistent</emphasis> (EDC) constraint ensures that the context-determined type table for <emphasis>E</emphasis> in <emphasis>T</emphasis> depends on the <emphasis>E</emphasis> name only. Thus it suffices to scan the <emphasis>T</emphasis> content model, looking for an element declaration named as <emphasis>E</emphasis>. As we are not interested in the content model size here, we assume <emphasis>TT<subscript>i</subscript></emphasis> can be found through a single memory access.</para><para>We assume the third operation has a negligible cost too. Indeed, we assume for each pair of types &lt;<emphasis>A</emphasis>, <emphasis>B</emphasis>&gt;, the schema compile phase already decided whether <emphasis>A</emphasis> is validly substitutable as restriction for <emphasis>B</emphasis>, and that the result is available at run-time and can be read through a single memory access.</para><para>In our analysis, we do not neglect the second operation cost. Given an element <emphasis>E</emphasis> and a Type Table <emphasis>TT</emphasis>, in order to decide the selected type for <emphasis>E</emphasis> in <emphasis>TT</emphasis>, it might be necessary to evaluate all the XPath conditions in <emphasis>TT</emphasis>.<footnote><para>To be more precise, as the last predicate of a Type Table is the always true condition, it suffices to evaluate <emphasis>n</emphasis>-1 predicates, where <emphasis>n</emphasis> is the Type Table size.</para></footnote> Indeed if <emphasis>TT</emphasis> has the following alternatives &lt;<emphasis>c</emphasis><subscript>1</subscript>, <emphasis>T</emphasis><subscript>1</subscript>&gt;, ..., &lt;<emphasis>c<subscript>n</subscript></emphasis>, <emphasis>T<subscript>n</subscript></emphasis>&gt;, then <emphasis>T<subscript>i</subscript></emphasis> is chosen if and only if none of the conditions <emphasis>c</emphasis><subscript>1</subscript>, ..., <emphasis>c<subscript>i</subscript></emphasis><subscript>-1</subscript> hold, and <emphasis>c<subscript>i</subscript></emphasis> holds. Thus a correct algorithm for a Type Table evaluation is that evaluating each alternative in order, and stopping as soon as a condition is satisfied. Clearly, such an algorithm is linear in the Type Table size.</para><para>Coming back to our cost analysis, as <emphasis>TT<subscript>i</subscript></emphasis> has size <emphasis>d<subscript>i</subscript></emphasis>, the second operation requires the evaluation of at most <emphasis>d<subscript>i</subscript></emphasis> XPath conditions.</para><para>Thus in our analysis, the RTC algorithm cost is given by the number of XPath predicates evaluated. By the observations above, we have that such a cost is upper-bounded by the equation shown in Equation “<xref linkend="equationRTCAlgoUpperBound"/>”.</para><equation xml:id="equationRTCAlgoUpperBound" xreflabel="RTC algorithm upper-bound"><mathphrase><emphasis>d</emphasis><subscript>1</subscript> + ... + <emphasis>d<subscript>k</subscript></emphasis></mathphrase></equation></section></section><!--====================================--><!-- Hybrid Approach: Cartesian Product --><!--====================================--><section xml:id="sectHybridApproachCartesianProduct" xreflabel="Hybrid Approach: Cartesian Product"><title>Hybrid Approach: Cartesian Product</title><para>There also exist an <emphasis>hybrid</emphasis> approach to the problem, i.e., an approach neither fully dynamic, nor fully static. A solution taking such an approach is named <emphasis>Cartesian Product</emphasis> (CP), and the XSD Working Group considered it for a period. As RTC, CP does not impose any limitation on the CTA usage.</para><para>Adopting a hybrid approach, CP consists of two phases. The former is performed at schema compile time, the latter at validation time. During the static phase, the Type Tables of the input schema are rewritten. In particular, given a Type Table <emphasis>TT<subscript>R</subscript></emphasis> within a type <emphasis>R</emphasis> and the corresponding Type Table <emphasis>TT<subscript>B</subscript></emphasis> within <emphasis>R</emphasis>'s base type, the static phase substitutes <emphasis>TT<subscript>R</subscript></emphasis> with the Type Table resulting from the <emphasis>Cartesian product</emphasis> between <emphasis>TT<subscript>R</subscript></emphasis> and <emphasis>TT<subscript>B</subscript></emphasis>.</para><para>The Cartesian product between <emphasis>TT<subscript>R</subscript></emphasis> and <emphasis>TT<subscript>B</subscript></emphasis> is a Type Table denoted by <emphasis>TT<subscript>R</subscript></emphasis> × <emphasis>TT<subscript>B</subscript></emphasis> whose size is the product of the sizes of <emphasis>TT<subscript>R</subscript></emphasis> and <emphasis>TT<subscript>B</subscript></emphasis>. For each pair of alternatives &lt;&lt;<emphasis>r<subscript>i</subscript></emphasis>, <emphasis>R<subscript>i</subscript></emphasis>&gt;, &lt;<emphasis>b<subscript>j</subscript></emphasis>, <emphasis>B<subscript>j</subscript></emphasis>&gt;&gt; (where the first item is the <emphasis>i</emphasis>-th alternative of <emphasis>TT<subscript>R</subscript></emphasis>, and the second item is the <emphasis>j</emphasis>-th alternative of <emphasis>TT<subscript>B</subscript></emphasis>), <emphasis>TT<subscript>R</subscript></emphasis> × <emphasis>TT<subscript>B</subscript></emphasis> has an alternative whose condition is the conjunction of <emphasis>r<subscript>i</subscript></emphasis> and <emphasis>b<subscript>j</subscript></emphasis>, and whose type is:</para><itemizedlist><listitem><para><emphasis>R<subscript>i</subscript></emphasis>, if <emphasis>R<subscript>i</subscript></emphasis> is a valid restriction of <emphasis>B<subscript>j</subscript></emphasis></para></listitem><listitem><para><emphasis>error</emphasis>, otherwise.</para></listitem></itemizedlist><para>At run-time, given a type <emphasis>T</emphasis> and an element <emphasis>E</emphasis>, in order to know whether <emphasis>E</emphasis> and <emphasis>T</emphasis> satisfy CTSR, it suffices to evaluate the (rewritten) context-determined Type Table of <emphasis>E</emphasis> in <emphasis>T</emphasis>. And thus there is no need to walk on the derivation chain.</para><para>However, given a derivation chain <emphasis>T</emphasis><subscript>1</subscript>, ..., <emphasis>T<subscript>k</subscript></emphasis>, and an element <emphasis>E</emphasis>, let <emphasis>TT</emphasis><subscript>1</subscript>, ..., <emphasis>TT<subscript>k</subscript></emphasis> be the sequence of context-determined Type Tables before the static phase rewrite them. The static phase rewrites those Type Tables in <emphasis>TT'</emphasis><subscript>1</subscript>, ..., <emphasis>TT'<subscript>k</subscript></emphasis>, where:</para><itemizedlist><listitem><para><emphasis>TT'</emphasis><subscript>1</subscript> is <emphasis>TT</emphasis><subscript>1</subscript>, and</para></listitem><listitem><para><emphasis>TT'<subscript>i</subscript></emphasis> is <emphasis>TT<subscript>i</subscript></emphasis> × <emphasis>TT'<subscript>i</subscript></emphasis><subscript>-1</subscript>, for every 1 &lt; <emphasis>i</emphasis> &lt;= <emphasis>k</emphasis></para></listitem></itemizedlist><para>The condition of each alternative of <emphasis>TT'<subscript>k</subscript></emphasis> is the conjunction of <emphasis>k</emphasis> XPath predicates. Moreover, the <emphasis>TT'<subscript>k</subscript></emphasis> size is <emphasis>d</emphasis><subscript>1</subscript> ⋅ ... ⋅ <emphasis>d<subscript>k</subscript></emphasis>, where <emphasis>d<subscript>i</subscript></emphasis> is the <emphasis>TT<subscript>i</subscript></emphasis> size. Fixing each <emphasis>d<subscript>i</subscript></emphasis> to a constant <emphasis>d</emphasis>, that product is <emphasis>d<superscript>k</superscript></emphasis>. Such an observation makes clear that the CP static phase might be too expensive.</para></section></section><!--=============================--><!-- OPTIMIZED CARTESIAN PRODUCT --><!--=============================--><section xml:id="sectOurProposal" xreflabel="Optimized Cartesian Product"><title>Optimized Cartesian Product</title><para>In this section we present our solution to the problem of the derivation by restriction in presence of conditional declarations. We call it <emphasis>Optimized Cartesian Product</emphasis> (OCP). From the expressivity point of view, our proposal is meant to be fully equivalent to Runt-Time Check and Cartesian Product. While being inspired by CP (and hence its name), it can also be seen as an RTC optimization.</para><!--==============--><!-- general idea --><!--==============--><section xml:id="sectGeneralIdea" xreflabel="General Idea"><title>General idea</title><para>Our idea is to perform a static analysis on Type Tables, anticipating at compile time those cases in which a CTSR violation occurs. Such an analysis is meant to avoid at run-time the evaluation of those XPath predicates which do not affect the CTSR checking result.</para><para>For instance, consider the schema shown in Figure “<xref linkend="figCTARestrictionUseCase"/>”. As <emphasis>error</emphasis> is not a valid restriction of any type, it is clear that whenever at run-time a <code>&lt;message&gt;</code> element does not satisfy the <code>@kind='string'</code> predicate within the <emphasis>R</emphasis> context (and hence is assigned type <emphasis>error</emphasis>), a CTSR violation occurs <emphasis>regardless</emphasis> of the actual type that would be assigned within the <emphasis>B</emphasis> context.</para><para>So our approach is to perform a static analysis in order to tell the run-time phase something like: if within the context of type <emphasis>R</emphasis> the predicate <emphasis>a</emphasis> is satisfied, do not evaluate any of the predicates <emphasis>b</emphasis>, <emphasis>c</emphasis>, or <emphasis>d</emphasis> within <emphasis>R</emphasis>'s base type, because such an evaluation will not affect the CTSR checking result.</para><para>Note that our approach does not consider the actual XPath predicate semantics during the static analysis. As we already observed, it would be too difficult. Moreover, note that, as seen for Cartesian Product, our approach is neither completely static, nor completely dynamic, i.e., it is a <emphasis>hybrid</emphasis> approach.</para><para>Before discussing in details our technique, it is worth considering some major problems our hybrid approach has to face</para><variablelist><varlistentry><term><emphasis>The corresponding Type Table problem</emphasis></term><listitem><para>Given a type <emphasis>T</emphasis> and an element declaration <emphasis>D</emphasis>, our general idea is to perform a static analysis of the cases in which <emphasis>T</emphasis> and a generic element matching <emphasis>D</emphasis> violate CTSR. Thus, the context-determined Type Table of that generic element in the base type of <emphasis>T</emphasis> have to be statically decided. Put in a bit more formal way, let <emphasis>B</emphasis> be <emphasis>T</emphasis>'s base type. The static analysis has to answer the following question: which is the Type Table <emphasis>TT'</emphasis> determined by <emphasis>B</emphasis> for an element matching <emphasis>D</emphasis>?</para></listitem></varlistentry><varlistentry><term><emphasis>The potential case enumeration problem</emphasis></term><listitem><para>Assume for the moment the previous problem can be solved. Thus consider a derivation chain <emphasis>T</emphasis><subscript>1</subscript>, ..., <emphasis>T</emphasis><emphasis><subscript>k</subscript></emphasis>, and the sequence of Type Tables <emphasis>TT</emphasis><subscript>1</subscript>, ..., <emphasis>TT<subscript>k</subscript></emphasis> such that each <emphasis>TT<subscript>i</subscript></emphasis> is the Type Table determined by <emphasis>T<subscript>i</subscript></emphasis> for a given element. If each <emphasis>TT<subscript>i</subscript></emphasis> has size <emphasis>d<subscript>i</subscript></emphasis>, the number of cases that might potentially be verified at run-time is given by <emphasis>d</emphasis><subscript>1</subscript> ⋅ ... ⋅ <emphasis>d<subscript>k</subscript></emphasis>. If the Type Tables average size is <emphasis>d</emphasis>, that product is similar to <emphasis>d<superscript>k</superscript></emphasis>. Enumerating all such potential cases, might lead to an unacceptable static analysis cost. It precisely is the problem of the Cartesian Product technique.</para></listitem></varlistentry></variablelist><para>For what concerns the first problem, both the <emphasis>Element Declarations Consistent</emphasis> (EDC) constraint and the context-determined Type Table definition provided by the XSD current draft help us. Indeed, by EDC it is possible to define for each type <emphasis>T</emphasis> a partial function <emphasis>tt-map<superscript>T</superscript></emphasis> : QName  → Type Table, such that for each element name <emphasis>e</emphasis>, <emphasis>tt-map<superscript>T</superscript></emphasis>(<emphasis>e</emphasis>) returns, if any, the Type Table of an element declaration named <emphasis>e</emphasis> within <emphasis>T</emphasis>.</para><para>Given: </para><orderedlist><listitem><para><emphasis>T</emphasis>, a type;</para></listitem><listitem><para><emphasis>B</emphasis>, the base type of <emphasis>T</emphasis>;</para></listitem><listitem><para><emphasis>D</emphasis>, an element declaration within <emphasis>T</emphasis>;</para></listitem><listitem><para><emphasis>e</emphasis>, the name of <emphasis>D</emphasis></para></listitem></orderedlist><para>the context-determined Type Table within <emphasis>B</emphasis> for any element matching <emphasis>D</emphasis>, cannot be directly calculated as <emphasis>tt-map<superscript>T</superscript></emphasis>(<emphasis>e</emphasis>), as we have to deal with wildcards (and some other minor details). However, wildcards do not pose any particular problem, as EDC states that if a type contains both an element declaration <emphasis>D</emphasis> and a wildcard <emphasis>W</emphasis>, then <emphasis>D</emphasis>'s Type Table and the Type Table of any top-level declaration matching <emphasis>W</emphasis> must be the same. Moreover, the context-determined Type Table definition states that if, within a type <emphasis>T</emphasis>, an element <emphasis>E</emphasis> does not match any declaration, but matches a wildcard <emphasis>W</emphasis>, then the context-determined Type Table within <emphasis>T</emphasis> for <emphasis>E</emphasis> is the Type Table of the top-level declaration matching <emphasis>W</emphasis>, if any. Here, the <emphasis>match</emphasis> predicate is always defined in terms of string matching, and never in terms of schema component semantics.</para><para>Thus, it is easy to extend our <emphasis>tt-map<superscript>T</superscript></emphasis> function definition so that for any name <emphasis>e</emphasis>, <emphasis>tt-map<superscript>T</superscript></emphasis>(<emphasis>e</emphasis>) returns the  context-determined Type Table within <emphasis>T</emphasis> for any element <emphasis>E</emphasis> named <emphasis>e</emphasis>, exactly as defined by the XSD current draft.</para><para>Note also that although the set of qualified names is infinite, <emphasis>tt-map<superscript>T</superscript></emphasis> is defined only for those qualified names matching some element declaration of the schema. As the number of element declarations within a schema is finite, also the <emphasis>tt-map<superscript>T</superscript></emphasis> domain is finite. The possibility to statically define the function <emphasis>tt-map<superscript>T</superscript></emphasis> solves the <emphasis>corresponding Type Table problem</emphasis>.</para><para>For what concerns the <emphasis>potential case enumeration problem</emphasis>, our idea is the following. Given a sequence of Type Tables <emphasis>TT</emphasis><subscript>1</subscript>, ..., <emphasis>TT<subscript>k</subscript></emphasis> as previously described in the problem definition, we do not consider all the <emphasis>k</emphasis> Type Tables together, but rather we analyze each pair of Type Tables <emphasis>TT<subscript>i</subscript></emphasis> and <emphasis>TT<subscript>i-1</subscript></emphasis> separately. In particular, for each such pair of Type Tables, we identify the cases that would cause a CTSR violation for that single derivation step. The results of such analysis are made available at run-time as annotations on <emphasis>TT<subscript>i</subscript></emphasis>. As we will see, this guarantees an acceptable cost for the static phase. Obviously, for any element <emphasis>E</emphasis> whose context-determined Type Table is <emphasis>TT<subscript>i</subscript></emphasis>, we have information about CTSR for a single derivation step only, and not for the whole derivation chain. If that information is not sufficient to decide whether CTSR is satisfied or not, it is necessary to walk on the derivation chain in order the access the annotations for <emphasis>TT<subscript>i-1</subscript>.</emphasis></para></section><!--==============--><!-- static phase --><!--==============--><section xml:id="sectOCPStaticPhase" xreflabel="OCP Static Phase"><title>OCP Static Phase</title><para>The static phase consists of two steps. The first step just builds for any type <emphasis>T</emphasis> of the schema type hierarchy the <emphasis>tt-map<superscript>T</superscript></emphasis> mapping. For any <emphasis>T</emphasis> of the schema type hierarchy, the second step annotates the Type Tables within the <emphasis>tt-map<superscript>T</superscript></emphasis> codomain with <emphasis>error conditions</emphasis>.</para><para>In particular, for each type <emphasis>T</emphasis> and for each name <emphasis>e</emphasis> within the <emphasis>tt-map<superscript>T</superscript></emphasis> domain, the alternatives of the Type Table <emphasis>tt-map<superscript>T</superscript></emphasis>(<emphasis>e</emphasis>) are annotated with an error condition. Such a condition specifies the cases in which CTSR is broken w.r.t. the context-determined Type Table within <emphasis>T</emphasis>'s base.<footnote><para>From here on, given a type <emphasis>T</emphasis>, an element <emphasis>E</emphasis> and the context-determined Type Table <emphasis>TT</emphasis> of <emphasis>E</emphasis> in <emphasis>T</emphasis>, by “<emphasis>TT</emphasis>'s base Type Table” we mean the context-determined Type Table of <emphasis>E</emphasis> in <emphasis>T</emphasis>'s base.</para></footnote> Each error condition simply is a boolean expression built on the base Type Table predicates.</para><figure xml:id="figStaticPhaseAlgo" floatstyle="1" xreflabel="OCP Static Phase Algorithm"><title>OCP Static Phase Algorithm</title><programlisting xml:space="preserve">
<emphasis>// visits a type of the schema type hierarchy</emphasis>
void visit-type(Type T) {

  <emphasis>// annotate each context-determined Type Table within the current type</emphasis>
  for each QName name in tt-map<superscript>T</superscript>.domain {
    annotate-type-table(tt-map<superscript>T</superscript>(name));
  }

  <emphasis>// recursive call</emphasis>
  for each Type D derived from T do {
    visit-type(D);
  }
}

<emphasis>// annotates a context-determined Type Table</emphasis>
void annotate-type-table(TypeTable ttr) {
  for each int i s.t. 1 &lt;= i &lt;= ttr.size {
    <emphasis>// build the error condition for the current alternative</emphasis>
    Expression expr = build-error-condition(ttr, i);
    <emphasis>// simplify the error condition</emphasis>
    expr = simplify(expr);
    <emphasis>// annotate the current alternative with the simplified condition</emphasis>
    ttr.get-alternative(i).error-condition = expr;
  }
}

<emphasis>// builds an error condition for a Type Table alternative</emphasis>
Expression build-error-condition(TypeTable ttr, int i) {

  <emphasis>// get the base Type Table</emphasis>
  TypeTable ttb = ttr.base;
  if (ttb is absent) {
    return Expression.FALSE; <emphasis>// In such a case no CTSR violation may occur</emphasis>
  } else {
    return build-error-condition_aux(ttr, ttb, i, Expression.FALSE, 1, STATE_OR);
  }
}

Expression build-error-condition_aux(TypeTable ttr, TypeTable ttb, int i, Expression left, int j, short state) {
  if (j &gt; ttb.size) {
    return left;
  } else {

    Expression a;
    if (j == ttb.size) {
      a = Expression.TRUE;
    } else {
      a = ttb.get-alternative(j).getTest();
    }

    Expression right;

    Type r = ttr.get-alternative(i).getType();
    Type b = ttb.get-alternative(j).getType();

    if (r validly restricts b) {
      NotExpression negatedLiteral = new NotExpression(a);
      right = build-error-condition_aux(ttr, ttb, i, negatedLiteral, j + 1, STATE_AND);
    } else {
      right = build-error-condition_aux(ttr, ttb, i, a, j + 1, STATE_OR);
    }

    if (state == STATE_OR) {
      return new OrExpression(left, right);
    } else { <emphasis>state == STATE_AND</emphasis>
      return new AndExpression(left, right);
    }
  }
}
</programlisting><caption><para>Pseudo-code for the static analysis of OCP.</para></caption></figure><para>The procedure described above is shown in Java-like pseudo-code in Figure “<xref linkend="figStaticPhaseAlgo"/>”. The <code>simplify</code> function is not shown: its purpose is to rewrite the error condition in a simpler form. In particular, if by <emphasis>atom</emphasis> we mean a Type Table XPath predicate, the <code>simplify</code> purpose is to minimize the number of atoms within the input expression. <code>simplify</code> can be implemented visiting the structure of the expression produced by <code>build-error-condition</code>, and applying the lazy boolean evaluation rules shown in the following table:<footnote><para>Symmetric rules for binary operators are not shown.</para></footnote></para><informaltable xml:id="tabRewritingRules" xreflabel="Rewriting Rules"><thead><tr><th>Input expression</th><th>Rewritten expression</th></tr></thead><!-- OR RULES --><tr><th colspan="2">or-rules</th></tr><tr><td><code>FALSE</code> or <emphasis>expr</emphasis></td><td><emphasis>expr</emphasis></td></tr><tr><td><code>TRUE</code> or <emphasis>expr</emphasis></td><td><code>TRUE</code></td></tr><!-- AND RULES --><tr><th colspan="2">and-rules</th></tr><tr><td><code>FALSE</code> and <emphasis>expr</emphasis></td><td><code>FALSE</code></td></tr><tr><td><code>TRUE</code> and <emphasis>expr</emphasis></td><td><emphasis>expr</emphasis></td></tr><!-- NOT RULES --><tr><th colspan="2">not-rules</th></tr><tr><td>not(<code>TRUE</code>)</td><td><code>FALSE</code></td></tr><tr><td>not(<code>FALSE</code>)</td><td><code>TRUE</code></td></tr></informaltable><para>The evaluation of the algorithm of Figure “<xref linkend="figStaticPhaseAlgo"/>” on the schema shown in Figure “<xref linkend="figCTARestrictionUseCase"/>”, annotates the alternatives of the Type Table within <emphasis>R</emphasis> with the following error conditions:</para><itemizedlist><listitem><para><code>not(@kind='string') and (@kind='base64' or (@kind='binary' or (@kind='xml' or @kind='XML')))</code></para></listitem><listitem><para><code>TRUE</code></para></listitem></itemizedlist><para>The error condition associated to the first alternative states that an element <emphasis>E</emphasis> of the instance document and <emphasis>R</emphasis> violates CTSR whenever <emphasis>E</emphasis> is assigned one of the types <emphasis>messageTypeBase64</emphasis> and <emphasis>messageTypeXML</emphasis> in the context of <emphasis>B</emphasis> (we are in the hypothesis that <emphasis>messageTypeString</emphasis> is not a valid restriction of any of those two types). The error condition associated to the second alternative states that regardless of the type assigned in the context of <emphasis>B</emphasis>, a CTSR violation occurs. This is because <emphasis>error</emphasis> is not a valid restriction of any of the types of the Type Table within <emphasis>B</emphasis>.</para><para>On the other hand, the algorithm annotates each alternative of the Type Table within <emphasis>B</emphasis> with the error condition <code>FALSE</code>. It means that for any element <emphasis>E</emphasis>, <emphasis>E</emphasis> and <emphasis>B</emphasis> do not violate CTSR. This is because <emphasis>B</emphasis>'s base is <emphasis>anyType</emphasis> and obviously the types within the Type Table within <emphasis>B</emphasis> are valid restrictions of <emphasis>anyType</emphasis>.</para></section><!--================--><!-- run-time phase --><!--================--><section xml:id="sectOCPRunTimPhase" xreflabel="OCP Run-Time Phase"><title>OCP Run-Time Phase</title><para>At validation time, the annotations on context-determined Type Tables are read in order to check CTSR. In particular, let <emphasis>E</emphasis> be an element of the instance document, <emphasis>T</emphasis> be the type of <emphasis>E</emphasis>'s  parent, and <emphasis>TT</emphasis> be the context-determined Type Table of <emphasis>E</emphasis> in <emphasis>T</emphasis>. Firstly, <emphasis>TT</emphasis> has to be evaluated. Then, the error condition associated to the satisfied alternative is also evaluated. If the error condition evaluates to true, then it is possible to conclude that CTSR is not satisfied. Otherwise, the same procedure has to be recursively executed using <emphasis>T</emphasis>'s base type. The recursive process stops either when a CTSR violation occurs, or <emphasis>anyType</emphasis> is reached.</para><figure xml:id="figRunTimePhaseAlgo" floatstyle="1" xreflabel="OCP Run-time Phase Algorithm"><title>OCP Run-time Phase Algorithm</title><programlisting xml:space="preserve">

void process-element(Element e) {

  <emphasis>...</emphasis>

  <emphasis>// e's parent type</emphasis>
  Type T = current-type();

  <emphasis>// get the context determined Type Table for e</emphasis>
  TypeTable tt = tt-map<superscript>T</superscript>(e);

  <emphasis>// evaluate the type table </emphasis>
  int i = evaluate-type-table(e, tt);

  if (!check-CTSR(e, tt, i)) {
    report-schema-error("vr-cta-substitutable");
  }

  <emphasis>...</emphasis>

}

boolean check-CTSR(Element e, TypeTable ttr, int i) {

  Alternative alt = ttr.get-alternative(i);
  Expression err-epxr = alt.error-condition;

  if (evaluate-error-condition(e, err-expr)) {
    return false;
  } else {
    TypeTable ttb = ttr.base;
    if (ttb is absent) {
      return true; // implicitly handles the xs:anyType case
    } else {
      int j = evaluate-type-table(e, ttb);

      <emphasis>// recursive call</emphasis>
      return check-CTSR(e, ttb, j);
    }
  }
}
</programlisting><caption><para>Algorithm for the run-time phase of OCP.</para></caption></figure><para>The procedure described above is shown in Java-like pseudo-code in Figure “<xref linkend="figRunTimePhaseAlgo"/>”. In order to show how it works, let us consider following document:</para><programlisting xml:space="preserve">
&lt;messages xsi:type="R"&gt;
  &lt;message kind="string"&gt;
    <emphasis>...</emphasis>
  &lt;/message&gt;
  &lt;message kind="binary"&gt;
    <emphasis>...</emphasis>
  &lt;/message&gt;
&lt;/messages&gt;
</programlisting><para>and suppose we have to validate it against the schema depicted in Figure “<xref linkend="figCTARestrictionUseCase"/>” (the error conditions built during the static phase are described in Section “<xref linkend="sectOCPStaticPhase"/>”). When the first <code>&lt;message&gt;</code> element is processed, its context-determined Type Table is evaluated. It is then checked that it satisfies the first condition <code>@kind='string'</code>. As a consequence, it is assigned the first alternative. So the error condition associated with that alternative is evaluated. Such a condition is <code>not(@kind='string') and (@kind='base64' or (@kind='binary' or (@kind='xml' or @kind='XML')))</code>. Clearly, the error condition is not satisfied (<code>not(@kind='string')</code> evaluates to false). Consequently, the Type Table within <emphasis>B</emphasis> has to be evaluated. Again, the first alternative is chosen, and thus its error condition is evaluated. But such a condition is <code>FALSE</code>. And so it is possible conclude that the first <code>&lt;message&gt;</code> element and <emphasis>R</emphasis> satisfy CTSR.</para><para>For what concerns the second <code>&lt;message&gt;</code> element, we have that it does not satisfy the first alternative predicate, and so it is assigned the default alternative. The error condition associated to such alternative is <code>TRUE</code>. So we have that the second <code>&lt;message&gt;</code> element and <emphasis>R</emphasis> do not satisfy CTSR.</para></section><!--===============--><!-- Cost Analysis --><!--===============--><section xml:id="sectOCPCostAnalysis" xreflabel="OCP Cost Analysis"><title>OCP Cost Analysis</title><para>In this subsection we provide a cost analysis for the static phase and a cost analysis for the run-time phase of OCP.</para><!-- static phase analysis --><section xml:id="sectOCPStaticPhaseAnalysis" xreflabel="OCP Static Phase Analysis"><title>OCP Static Phase Analysis</title><para>Here we are not interested in analyzing the cost of the static phase applied to the entire schema type hierarchy. Rather, we fix an element name and we consider a single path from the root to a generic leaf of the type hierarchy.</para><para>Thus, let <emphasis>T</emphasis><subscript>1</subscript>, ..., <emphasis>T<subscript>n</subscript></emphasis> be a derivation chain, and <emphasis>e</emphasis> be our element name. We can now consider the sequence of Type Tables <emphasis>TT</emphasis><subscript>1</subscript>, ..., <emphasis>TT<subscript>n</subscript></emphasis>, where <emphasis>TT<subscript>i</subscript></emphasis> is the context-determined Type Table for an element named <emphasis>e</emphasis> within <emphasis>T<subscript>i</subscript></emphasis>. The size of each <emphasis>TT<subscript>i</subscript></emphasis> is denoted by <emphasis>d<subscript>i</subscript></emphasis>.</para><para>Given a 1 &lt; <emphasis>i</emphasis> &lt;= <emphasis>n</emphasis>, now we analyze the time needed to annotate <emphasis>TT<subscript>i</subscript></emphasis>.</para><para>The function <code>build-error-condition</code> iterates over the whole alternative sequence of <emphasis>TT<subscript>i</subscript></emphasis><subscript>-1</subscript>, and for each alterantive it computes a number of operations whose cost is constant. Thus the function cost is linear in the <emphasis>TT<subscript>i</subscript></emphasis><subscript>-1</subscript> size, i.e., <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript></para><para>The function <code>simplify</code> can be implemented visiting the structure of the expression returned by <code>build-error-condition</code>. The number of nodes of such an expression is linear in <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript>. Thus, the <code>simplify</code> computational cost is linear in <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript> too.</para><para>As both <code>simplify</code> and <code>build-error-condition</code> are called for each alternative of <emphasis>TT<subscript>i</subscript></emphasis>, the asymptotic computational cost for the function <code>annotate-type-table</code> is <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript>⋅<emphasis>d<subscript>i</subscript></emphasis>.</para><para>Thus, the asymptotic cost for building and simplifying the error conditions of the whole sequence of Type Tables, is given by:</para><para><emphasis>d</emphasis><subscript>1</subscript> + <emphasis>d</emphasis><subscript>1</subscript>⋅<emphasis>d</emphasis><subscript>2</subscript> + ... + <emphasis>d<subscript>n</subscript></emphasis><subscript>-1</subscript>⋅<emphasis>d<subscript>n</subscript></emphasis></para><para>We believe such a cost is perfectly acceptable at schema compile time.</para></section><!-- run-time phase analysis --><section xml:id="sectOCPRunTimePhaseAnalysis" xreflabel="OCP Run-Time Phase Analysis"><title>OCP Run-Time Phase Analysis</title><para>Here we provide a computational cost analysis of the run-time phase of OCP. As similarly done for RTC, we are interested in determining the number of XPath predicates that have to be evaluated for a generic element of the instance document.</para><para>Let <emphasis>E</emphasis> be an element of the instance document, and <emphasis>T</emphasis> be the type assigned to <emphasis>E</emphasis>'s parent. Consider the derivation chain <emphasis>T</emphasis><subscript>1</subscript>, ..., <emphasis>T<subscript>k</subscript></emphasis>, where <emphasis>T</emphasis><subscript>1</subscript> is <emphasis>anyType</emphasis> and <emphasis>T<subscript>k</subscript></emphasis> is <emphasis>T</emphasis>. Also consider the usual Type Table sequence <emphasis>TT</emphasis><subscript>1</subscript>, ..., <emphasis>TT<subscript>k</subscript></emphasis>, where <emphasis>TT<subscript>i</subscript></emphasis> is the context-determined Type Table for <emphasis>E</emphasis> within <emphasis>T<subscript>i</subscript></emphasis>.</para><para>If <emphasis>E</emphasis> and <emphasis>T</emphasis> satisfy CTSR, the entire Type Table sequence is processed. For any  1 &lt; <emphasis>i</emphasis> &lt;= <emphasis>k</emphasis>, <emphasis>TT<subscript>i</subscript></emphasis> is evaluated to obtain the assigned alternative. The cost of such an operation is linear in <emphasis>d<subscript>i</subscript></emphasis>. Once the assigned alternative has been determined, the algorithm evaluates the corresponding error condition. As already discussed, such a condition is a boolean expression over the XPath predicates of <emphasis>TT<subscript>i</subscript></emphasis><subscript>-1</subscript>. In our analysis, the cost of evaluating an error condition with <emphasis>n</emphasis> predicates is linear in <emphasis>n</emphasis>. As by construction none XPath predicate appear more than once within the same error condition, we have that the error condition associated to the assigned alternative contains at most <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript> predicates of <emphasis>TT<subscript>i</subscript></emphasis><subscript>-1</subscript>. So its evaluation cost is linear in <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript>. Thus, the number of predicates evaluated for <emphasis>TT<subscript>i</subscript></emphasis> is upper-bounded by <emphasis>d<subscript>i</subscript></emphasis> + <emphasis>d<subscript>i</subscript></emphasis><subscript>-1</subscript>.</para><para>Considering the whole Type Table sequence, the number of evaluated XPath predicates is given by the formula shown in Equation “<xref linkend="equationOCPRunTimePhaseUpperBound"/>”.</para><equation xml:id="equationOCPRunTimePhaseUpperBound" xreflabel="OCP run-time phase upper-bound"><mathphrase>2⋅<emphasis>d</emphasis><subscript>1</subscript> + ... + 2⋅<emphasis>d<subscript>k</subscript></emphasis><subscript>-1</subscript> + <emphasis>d<subscript>k</subscript></emphasis></mathphrase></equation></section></section></section><!--================================--><!-- OCP, RTC, AND CP AT COMPARISON --><!--================================--><section xml:id="sectComparison" xreflabel="Comparing CP, OCP, and RTC"><title>Comparing CP, OCP, and RTC</title><para>In this section we provide a comparison among the main techniques discussed so far: Optimized Cartesian Product, Run-Time Check, and Cartesian Product. The comparison focuses on the number of XPath predicates evaluated at run-time. Before starting, let us first fix some notations. Let:</para><itemizedlist><listitem><para><emphasis>E</emphasis> be an element of the instance document;</para></listitem><listitem><para><emphasis>T</emphasis> be the type assigned to <emphasis>E</emphasis>'s parent;</para></listitem><listitem><para><emphasis>T</emphasis><subscript>1</subscript>, ..., <emphasis>T<subscript>k</subscript></emphasis> be the derivation chain for <emphasis>T</emphasis>, where <emphasis>T</emphasis><subscript>1</subscript> is <emphasis>anyType</emphasis> and <emphasis>T<subscript>k</subscript></emphasis> is <emphasis>T</emphasis>;</para></listitem><listitem><para><emphasis>TT</emphasis><subscript>1</subscript>, ..., <emphasis>TT<subscript>k</subscript></emphasis> be the sequence of context-determined Type Tables of <emphasis>E</emphasis> along the derivation chain;</para></listitem><listitem><para><emphasis>d<subscript>i</subscript></emphasis> be the <emphasis>TT<subscript>i</subscript></emphasis> size, for every <emphasis>i</emphasis>;</para></listitem><listitem><para><emphasis>TT'</emphasis><subscript>1</subscript>, ..., <emphasis>TT'<subscript>k</subscript></emphasis> be the Type Tables generated by the Cartesian Product static phase.</para></listitem></itemizedlist><para>Both OCP and RTC evaluate <emphasis>TT<subscript>k</subscript></emphasis> in order to decide which type alternative <emphasis>E</emphasis> has to be assigned. Clearly, both techniques evaluate the same XPath predicates of <emphasis>TT<subscript>k</subscript></emphasis>. The number of evaluated XPath predicates ranges from 1 to <emphasis>d<subscript>k</subscript></emphasis>.</para><para>On the other hand, CP evaluates <emphasis>TT'<subscript>k</subscript></emphasis>. If, for any <emphasis>i</emphasis> between 1 and <emphasis>k</emphasis>, <emphasis>E</emphasis> satisfies the first alternative of <emphasis>TT<subscript>i</subscript></emphasis>, CP is assigned the first alternative of <emphasis>TT<subscript>k</subscript></emphasis>, and thus the condition of that alternative only is evaluated. However, that condition is the conjunction of <emphasis>k</emphasis> XPath predicates. So in the best case, CP evaluates <emphasis>k</emphasis> XPath predicates. But if for every <emphasis>i</emphasis> between 1 and <emphasis>k</emphasis> <emphasis>E</emphasis> satisfies the last alternative of <emphasis>TT<subscript>i</subscript></emphasis>, than CP has to process every alternative of <emphasis>TT'<subscript>k</subscript></emphasis>. It means that it has to evaluate <emphasis>d</emphasis><subscript>1</subscript> ⋅ ... ⋅ <emphasis>d<subscript>k</subscript></emphasis> conditions, where each condition is the conjunction of <emphasis>k</emphasis> XPath predicates.</para><para>After the <emphasis>TT'<subscript>k</subscript></emphasis> evaluation, CP already knows whether <emphasis>E</emphasis> and <emphasis>T</emphasis> satisfy CTSR without the need to walk on the derivation chain: if <emphasis>TT'<subscript>k</subscript></emphasis> selected type <emphasis>error</emphasis> then CTSR is violated, otherwise CTSR is satisfied. The problem is that the evaluation of <emphasis>TT'<subscript>k</subscript></emphasis> might be very expensive.</para><para>On the other hand, after the <emphasis>TT<subscript>k</subscript></emphasis> evaluation, both OCP and RTC execute further operations. OCP evaluates the error condition linked to the alternative returned by <emphasis>TT<subscript>k</subscript></emphasis>, while RTC evaluates <emphasis>TT<subscript>k</subscript></emphasis><subscript>-1</subscript>. Thus, for the purposes of our comparison, it is important to understand whether evaluating the error condition is more or less expensive than evaluating <emphasis>TT<subscript>k</subscript></emphasis><subscript>-1</subscript>. In order deal with a clearer notation, we temporarily rename some variables:</para><itemizedlist><listitem><para><emphasis>T<subscript>k</subscript></emphasis> becomes <emphasis>R</emphasis>;</para></listitem><listitem><para><emphasis>T<subscript>k</subscript></emphasis><subscript>-1</subscript> becomes <emphasis>B</emphasis>;</para></listitem><listitem><para><emphasis>TT<subscript>k</subscript></emphasis> becomes <emphasis>TT<subscript>R</subscript></emphasis>;</para></listitem><listitem><para><emphasis>TT<subscript>k</subscript></emphasis><subscript>-1</subscript> becomes <emphasis>TT<subscript>B</subscript></emphasis>;</para></listitem><listitem><para><emphasis>d<subscript>k</subscript></emphasis> becomes <emphasis>n</emphasis>;</para></listitem><listitem><para><emphasis>d<subscript>k</subscript></emphasis><subscript>-1</subscript> becomes <emphasis>m</emphasis>;</para></listitem></itemizedlist><para>We denote the <emphasis>TT<subscript>R</subscript></emphasis> alternatives by &lt;<emphasis>r</emphasis><subscript>1</subscript>, <emphasis>R</emphasis><subscript>1</subscript>&gt;, ..., &lt;<emphasis>r<subscript>n</subscript></emphasis>, <emphasis>R<subscript>n</subscript></emphasis>&gt;; and the <emphasis>TT<subscript>B</subscript></emphasis> alternatives by &lt;<emphasis>b</emphasis><subscript>1</subscript>, <emphasis>B</emphasis><subscript>1</subscript>&gt;, ..., &lt;<emphasis>b<subscript>m</subscript></emphasis>, <emphasis>B<subscript>m</subscript></emphasis>&gt;. Moreover, let <emphasis>i</emphasis> be the (index of the) alternative selected by <emphasis>TT<subscript>R</subscript></emphasis>. We denote the error condition associated to that alternative by <emphasis>err<subscript>i</subscript></emphasis>.</para><para>As already observed in Section “<xref linkend="sectOCPStaticPhase"/>”, <emphasis>err<subscript>i</subscript></emphasis> is a boolean expression over the XPath predicates (here called <emphasis>atoms</emphasis>) of <emphasis>TT<subscript>B</subscript></emphasis>. Assuming the simplification process did <emphasis>not</emphasis> rewrite it, <emphasis>err<subscript>i</subscript></emphasis> contains each of the <emphasis>m</emphasis> atoms of <emphasis>TT<subscript>B</subscript></emphasis>.</para><figure xml:id="figErrorConditionStructure" floatstyle="1" xreflabel="OCP Error Condition Example"><title>OCP Error Condition Example</title><mediaobject><imageobject><imagedata format="png" fileref="../../../vol1/graphics/Marinelli01/Marinelli01-001.png"/></imageobject></mediaobject><caption><para>Structure for the error condition <code>not(@kind='string') and (@kind='base64' or (@kind='binary' or (@kind='xml' or (@kind='XML' or FALSE))))</code>. XPath predicates have been abbreviated for conciseness reasons.</para></caption></figure><para>At this point it is important to study the structure of a generic error expression <emphasis>err<subscript>i</subscript></emphasis>. As also shown in Figure “<xref linkend="figErrorConditionStructure"/>”, an error condition has a fixed structure: for each <code>or</code> (<code>and</code>) operator, its left operand is always a (negated) atom, while its right operand is either another binary operator, or <code>FALSE</code> (<code>TRUE</code>). Moreover, we can observe that the atoms appear in the same order they appear in <emphasis>TT<subscript>B</subscript></emphasis>.</para><para>It is easy to implement an error condition evaluator as a <emphasis>lazy boolean evaluator</emphasis>: for any input binary operator it always evaluates the left operand first, and it evaluates the right operand only if necessary. The atoms of <emphasis>err<subscript>i</subscript></emphasis> actually evaluated by such a boolean evaluator are exactly the same as those evaluated by RTC to decide the <emphasis>TT<subscript>B</subscript></emphasis> selected type.</para><para>For instance, suppose that for a given <emphasis>j</emphasis> our <emphasis>E</emphasis> element does not satisfy none of <emphasis>b</emphasis><subscript>1</subscript>, ..., <emphasis>b<subscript>j</subscript></emphasis><subscript>-1</subscript>, and it does satisfy <emphasis>b<subscript>j</subscript></emphasis>. RTC evaluates <emphasis>b</emphasis><subscript>1</subscript>, ...,<emphasis>b<subscript>j</subscript></emphasis>. Also our technique evaluates those predicates, and it does not evaluate further ones. Indeed within <emphasis>err<subscript>i</subscript></emphasis>, <emphasis>b<subscript>j</subscript></emphasis> appears either in negated form as left operand of an <code>and</code> operator, or directly as left operand of an <code>or</code> operator (it depends on whether or not <emphasis>R<subscript>i</subscript></emphasis> is validly substitutable as restriction for <emphasis>B<subscript>j</subscript></emphasis>). In either case, the <emphasis>err<subscript>i</subscript></emphasis> evaluation stops before processing the right operand.</para><para>Thus we can conclude that even if it is not possible to simplify <emphasis>err<subscript>i</subscript></emphasis>, OCP and RTC are equivalent in terms of evaluated atoms. But there are cases in which <emphasis>err<subscript>i</subscript></emphasis> is simplified by the rewriting rules described in Section “<xref linkend="sectOCPStaticPhase"/>”. Indeed, if there exists a <emphasis>j</emphasis> such that either</para><itemizedlist><listitem><para>for each  <emphasis>j</emphasis> &lt; <emphasis>j'</emphasis> &lt;= <emphasis>m</emphasis>, <emphasis>R<subscript>i</subscript></emphasis> is not validly substitutable as restriction for <emphasis>B<subscript>j'</subscript></emphasis></para></listitem></itemizedlist><para>or</para><itemizedlist><listitem><para>for each  <emphasis>j</emphasis> &lt; <emphasis>j'</emphasis> &lt;= <emphasis>m</emphasis>, <emphasis>R<subscript>i</subscript></emphasis> is validly substitutable as restriction for <emphasis>B<subscript>j'</subscript></emphasis>,</para></listitem></itemizedlist><para>then the simplification process removes from <emphasis>err<subscript>i</subscript></emphasis> the atmos <emphasis>b<subscript>j</subscript></emphasis><subscript>+1</subscript>, ..., <emphasis>b<subscript>m</subscript></emphasis>.</para><para>In such cases, if <emphasis>E</emphasis> does not satisfy any of the predicates <emphasis>b</emphasis><subscript>1</subscript>, ..., <emphasis>b<subscript>j</subscript></emphasis><subscript>+</subscript><emphasis><subscript>k</subscript></emphasis>, for some <emphasis>k</emphasis>, then OCP does not need to evaluate the <emphasis>k</emphasis> atoms <emphasis>b<subscript>j</subscript></emphasis><subscript>+1</subscript>, ..., <emphasis>b<subscript>j</subscript></emphasis><subscript>+</subscript><emphasis><subscript>k</subscript></emphasis> in order to decide whether CTSR is satisfied or not. On the other hand, RTC does evaluate those atoms, because it has to find the type actually selected by <emphasis>TT<subscript>B</subscript></emphasis>.</para><para>Thus, we can conclude that on a single step of a derivation chain, OCP evaluates a number of predicates less than or equal to the number of predicates evaluated by RTC.</para><para>However, as can be noted from the formulas shown in Equations “<xref linkend="equationRTCAlgoUpperBound"/>” and “<xref linkend="equationOCPRunTimePhaseUpperBound"/>”, OCP might evaluate twice the same atoms. Coming back to the notation introduced early in this section, if <emphasis>E</emphasis> does not satisfy the error condition of the alternative selected by <emphasis>TT<subscript>k</subscript></emphasis>, then OCP has to evaluate <emphasis>TT<subscript>k</subscript></emphasis><subscript>-1</subscript>. But as the error condition previously processed was built on the atoms of <emphasis>TT<subscript>k</subscript></emphasis><subscript>-1</subscript>, it is clear that some predicates of <emphasis>TT<subscript>k</subscript></emphasis><subscript>-1</subscript> might be processed twice.</para><para>However, it is possible to ease such an additional cost if during the processing of an error condition, the result of each atom evaluation is stored in some data structure. In this way, an XPath predicate is actually evaluated only if it has not been evaluated yet.</para><para>So we conclude that for a given derivation chain, OCP evaluates a number of XPath predicates less than or equal to the number of XPath predicates RTC evaluates.</para></section><!--================--><!-- IMPLEMENTATION --><!--================--><section xml:id="sectImplementation" xreflabel="Implementation"><title>Implementation</title><para>We realized a prototype implementation of Optimized Cartesian Product, thus demonstrating its feasibility. We implemented it in Java within Xerces [<xref linkend="entryXerces"/>]. Our prototype patches Xerces under three aspects:</para><orderedlist><listitem><para>support for XSD 1.1 related components;</para></listitem><listitem><para>implementation of the OCP static phase;</para></listitem><listitem><para>implementation of the OCP run-time phase within the existing validation code.</para></listitem></orderedlist><para>As Xerces is an XML parser for XSD 1.0, it does not handle 1.1-specific constructs. Our prototype modifies the Xerces modules delegated to the construction of schema components (package <code>org.apache.xerces.impl.xs.traversers</code>). It also modifies the Xerces implementation of the XML Schema API [<xref linkend="entryXMLSchemaAPI"/>], in order to represent type alternative components, and to give element declarations awareness of their Type Tables (packages <code>org.apache.xerces.xs</code> and <code>org.apache.xerces.impl.xs</code>).</para><para>The OCP static phase is implemented within a separated package <code>it.unibo.cs.cta</code>. The code for the error condition construction is within the class <code>it.unibo.cs.cta.preprocessor.impl.ErrorConditionBuilder</code>. Such a class processes an input XSD schema, associating each type with a map. That map is our implementation of <emphasis>tt-map<superscript>T</superscript></emphasis>. Indeed, it associates element names to context-determined Type Tables. <code>ErrorConditionBuilder</code> also annotates each context-determined Type Table with its error conditions. Error conditions are built directly using the algorithm described in Section “<xref linkend="sectOCPStaticPhase"/>”. The classes handling error conditions are within the package <code>it.unibo.cs.cta.errorexpr</code>. In particular, the simplification of error conditions is implemented by <code>ErrorExpressionSimplifier</code>, while their evaluation is implemented by <code>ErrorExpressionEvaluator</code>.</para><para>The static phase is delegated to a pre-processor invoked when a schema document is loaded. In order to invoke it, the simple and compact code below is used:</para><programlisting xml:space="preserve">
<emphasis>// instantiation </emphasis>
PreprocessorFactory pf = PreprocessorFactory.getInstance();
fPreprocessor = pf.createPreprocessorSequence(
    new String[]{"ErrorConditionBuilder"}
    );
<emphasis>// invocation on an XS Model</emphasis>
fPreprocessor.processModel(model);
</programlisting><para>The static phase result (i.e., association between types and maps) is read calling the pre-processor method <code>getStateByName("type-table-map")</code>.</para><para>The OCP run-time phase is implemented within the class <code>org.apache.xerces.impl.xs.OptimizedCTAXMLSchemaValidator</code>, a patched version of the original XSD validator provided by Xerces. In particular, the code for the CTSR verification is within the method <code>handleStartElement</code>. XPath predicates are evaluated using the interfaces in <code>javax.xml.xpath</code>. Currently, our prototype does not check whether an XPath predicate has already been evaluated. Thus, as observed in Section “<xref linkend="sectComparison"/>”, an XPath predicate might be evaluated twice for the same element.</para><para>Our prototype is meant to prove the OCP feasibility, and as such it is not aimed to be XSD 1.1 conformant. In particular it has some limitations, the most important of which are:</para><orderedlist><listitem><para>XPath 1.0 expressions only are accepted;</para></listitem><listitem><para>all non CTA related syntax is ignored. E.g., <code>&lt;assert&gt;</code> elements are not considered legal within a schema;</para></listitem><listitem><para>derivations by restriction are checked using the original Xerces code, i.e., XSD 1.0 rules are applied.<footnote><para>XSD 1.0 defines the derivation by restriction in terms of ad hoc rules provided by the recommendation itself. XSD 1.1 allows processors to choose the algorithm they like to check whether a content model includes another content model.</para></footnote></para></listitem></orderedlist><para>We also developed a small test suite for OCP. It can be run through a simple graphic interface. Source code and jars are available from <link xlink:href="http://tesi.fabio.web.cs.unibo.it/Tesi/OptimizedCartesianProduct" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://tesi.fabio.web.cs.unibo.it/Tesi/OptimizedCartesianProduct</link>.</para></section><!--===============--><!-- RELATED WORKS --><!--===============--><section xreflabel="Related Works" xml:id="sectRelatedWorks"><title>Related Works</title><para>Among the most known validation languages (DTD [<xref linkend="entryXML11"/>], RELAX NG [<xref linkend="entryRELAXNGISOspecification"/>], Schematron [<xref linkend="entrySchematronISOspecification"/>], DSD [<xref linkend="entryDSD20"/>], etc), the problem of verifying the subtype relation in presence of conditional declarations is very specific to XSD 1.1. Indeed, although there exist at least one language, DSD, permitting the definition of conditional content models, that language is not type-based, and consequently nor it has any concept of type derivation. We do not know works about restriction checking in presence of conditional declarations.</para><para>However, there exist works on the problem of verifying whether an XSD 1.0 type is a legal restriction of another type [<xref linkend="entryFSAThompson"/>], [<xref linkend="entryFuchs"/>], [<xref linkend="entryBrzozowski"/>]. Those works propose techniques to statically verify whether a type accepts a subset of what the base type accepts. On the same line, Neven et al present theoretical results about some basic decision problems concerning schemas, among which the problem of testing for inclusion of schemas [<xref linkend="entryNeven1PassPreorder"/>].</para></section><!--=============--><!-- CONCLUSIONS --><!--=============--><section xml:id="sectConclusions" xreflabel="Conclusions"><title>Conclusions</title><para>In XSD 1.1, the presence of conditional declarations increases the difficulty in verifying whether a type is a legal restriction of its base. We discussed about three main approaches to the problem: CTA usage limitation, run-time verification, and hybrid verification. Solutions of the first kind ensure it is possible to <emphasis>statically</emphasis> verify whether a type is a legal restriction of its base, but at the cost of limiting the CTA expressivity. Solutions of both second and third kinds allow the highest degree of expressivity, but they may recognize as legal restriction also a type accepting something its base rejects. They throw an error only for those instance documents actually proving that a type is not a legal restriction of its base. Hybrid solutions are meant to precompute during the static phase some information that might decrease the work to be done at run-time.</para><para>In particular, we described the solution adopted by the XSD current draft, which follows a run-time approach described within the specs by the Conditional Type Substitutable in Restriction (CTSR) constraint. We discussed about an algorithm verifying CTSR, and we called it Run-Time Check (RTC). Then we proposed an alternative solution to RTC, named Optimized Cartesian Product (OCP). OCP is a hybrid solution. Its idea is to analyze conditional declarations in order to statically decide which XPath predicates can be ignored at run-time. We showed as, contrary to Cartesian Product (CP) - another hybrid solution OCP can be seen as an optimization of - the OCP static analysis cost is perfectly acceptable.</para><para>We than compared the RTC, OCP and CP techniques, focusing on the number of XPath predicates evaluated at run-time. We showed as CP is the worst technique, as it inherits from the static phase a high volume of information that might heavily slow down the run-time phase. We also showed that although OCP might process the same alternatives twice, storing the XPath predicate evaluation results, we can assert that OCP evaluates a number of predicates less than or equal to the number of predicates RTC evaluates.</para><para>An interesting future work is the experimental comparison among RTC, OCP and CP on a base of real schema documents. Moreover it is interesting to improve our error condition simplification process. For instance, our simplification rules are not able to rewrite expressions like <code>not(@a = 'v1') and (@a = 'v2')</code> into <code>(@a = 'v2')</code>. There are also error conditions that are clearly unsatisfiable when associated to a particular alternative. For instance, if the alternative predicate is <code>(@a = 'v1')</code> and the error condition is <code>(@a = 'v2')</code>, it is clear that the error condition will never be satisfied. Improving the simplification rule set should increase the number of situations in which OCP is preferable to RTC.</para></section><!--==================--><!-- ACKNOWLEDGEMENTS --><!--==================--><section xml:id="sectAcknowledgements" xreflabel="Acknowledgements"><title>Acknowledgements</title><para>We would like to thank Stefano Zacchiroli for the technical discussions we had during the design of the Optimized Cartesian Product technique, the anonymous reviewers for their comments, and the XML Schema Working Group for the several and inspiring discussions on the topics covered by this paper.</para></section><!--==============--><!-- BIBLIOGRAPHY --><!--==============--><bibliography><title>References</title><bibliomixed xml:id="entryCoOccurrenceConstraintsESWWiki" xreflabel="Co-occurrence constraints ESW Wiki">Co-occurrence constraints ESW Wiki. <link xlink:href="http://esw.w3.org/topic/Co-occurrence_constraints" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://esw.w3.org/topic/Co-occurrence_constraints</link></bibliomixed><bibliomixed xml:id="entryDSD20" xreflabel="DSD 2.0">Møller, A. 2002. Document Structure Description 2.0. BRICS, Department of Computer Science, University of Aarhus, Aarhus, Denmark. <link xlink:href="http://www.brics.dk/DSD/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.brics.dk/DSD/</link>.</bibliomixed><bibliomixed xml:id="entryFuchs" xreflabel="M. Fuchs, and A. Brown, 2003">M. Fuchs, and A. Brown. Supporting UPA and restriction on an extension of XML Schema. In <emphasis>Proceedings of Extreme Markup Languages</emphasis>. August, 2003. Montréal, Québec. <link xlink:href="http://www.idealliance.org/papers/extreme03/html/2003/Fuchs01/EML2003Fuchs01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.idealliance.org/papers/extreme03/html/2003/Fuchs01/EML2003Fuchs01.html</link>.</bibliomixed><bibliomixed xml:id="entrySchemaPathWWW" xreflabel="P. Marinelli, C. Sacerdoti Coen, and F. Vitali, 2004">P. Marinelli, C. Sacerdoti Coen, and F. Vitali. SchemaPath, a Minimal Extension to XML Schema for Conditional Constraints. In <emphasis>Proceedings of the Thirteenth International World Wide Web Conference</emphasis>. New York, NY, USA. May, 2004. Pages 164-174. ACM Press. doi:<biblioid class="doi">10.1145/988672.988695</biblioid>.</bibliomixed><bibliomixed xml:id="entryNeven1PassPreorder" xreflabel="W. Martens, F. Neven, and T. Schwentick, 2005">W. Martens, F. Neven, and T. Schwentick. Which XML Schemas Admit 1-Pass Preorder Typing? In <emphasis>Proceedings of the 10<superscript>th</superscript> International Conference on Database Theory</emphasis>. Edinburgh, UK, January 5-7, 2005. LNCS. Volume 3363. Pages 68-82. doi:<biblioid class="doi">10.1007/978-3-540-30570-5_5</biblioid>.</bibliomixed><bibliomixed xml:id="entryRELAXNGISOspecification" xreflabel="RELAX NG ISO specification">Information technology -- Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG. ISO/IEC 19757-2:2003, JTC1/SC34 Committee. Publicly available at <link xlink:href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c037605_ISO_IEC_19757-2_2003(E).zip" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://standards.iso.org/ittf/PubliclyAvailableStandards/c037605_ISO_IEC_19757-2_2003(E).zip</link></bibliomixed><bibliomixed xml:id="entrySchematronISOspecification" xreflabel="Schematron ISO specification">Information technology -- Document Schema Definition Language (DSDL) -- Part 3: Rule-based validation -- Schematron. ISO/IEC 19757-3:2006, JTC1/SC34 Committee. Publicly available at <link xlink:href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip</link>.</bibliomixed><bibliomixed xml:id="entryBrzozowski" xreflabel="C. M. Sperberg-McQueen, 2005">C. M. Sperberg-McQueen. Applications of Brzozowski derivatives to XML Schema processing. In <emphasis>Proceedings of Extreme Markup Languages</emphasis>. August, 2005. Montréal, Québec. <link xlink:href="http://www.mulberrytech.com/Extreme/Proceedings/html/2005/SperbergMcQueen01/EML2005SperbergMcQueen01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.mulberrytech.com/Extreme/Proceedings/html/2005/SperbergMcQueen01/EML2005SperbergMcQueen01.html</link>.</bibliomixed><bibliomixed xml:id="entryFSAThompson" xreflabel="H. S. Thompson, and R. Tobin, 2003">H. S. Thompson, and R. Tobin. Using Finite State Automata to Implement W3C XML Schema Content Model Validation and Restriction Checking. In <emphasis>Proceedings of XML Europe</emphasis>. London, England. May, 2003. <link xlink:href="http://www.idealliance.org/papers/dx_xmle03/papers/02-02-05/02-02-05.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.idealliance.org/papers/dx_xmle03/papers/02-02-05/02-02-05.html</link>.</bibliomixed><bibliomixed xml:id="entryWalshCowan2001" xreflabel="N. Walsh and J. Cowan, 2001">N. Walsh, and J. Cowan. Schema Language Comparison. December, 2001. <link xlink:href="http://nwalsh.com/xml2001/schematownhall/slides/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://nwalsh.com/xml2001/schematownhall/slides/</link>.</bibliomixed><bibliomixed xml:id="entryXerces" xreflabel="Xerces">The Apache Software Foundation. Apache Xerces. <link xlink:href="http://xml.apache.org" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://xml.apache.org</link>.</bibliomixed><bibliomixed xml:id="entryXMLSchemaAPI" xreflabel="XML Schema API">Elena Litani. XML Schema API. W3C Member Submission. 22 January 2004. <link xlink:href="http://www.w3.org/Submission/2004/SUBM-xmlschema-api-20040122/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/Submission/2004/SUBM-xmlschema-api-20040122/</link>.</bibliomixed><bibliomixed xml:id="entryXML11" xreflabel="XML 1.1">Extensible Markup Language (XML) 1.1 (Second Edition). W3C Recommendation. 16 August 2006. <link xlink:href="http://www.w3.org/TR/xml11/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/xml11/</link>.</bibliomixed><bibliomixed xml:id="entryXSD1.0-structures" xreflabel="XSD 1.0: Structures">XML Schema Part 1: Structures Second Edition. W3C Recommendation. 28 October 2004. <link xlink:href="http://www.w3.org/TR/xmlschema-1/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/xmlschema-1/</link> </bibliomixed><bibliomixed xml:id="entryXSD1.0-datatypes" xreflabel="XSD 1.0: Datatypes">XML Schema Part 2: Datatypes Second Edition. W3C Recommendation. 28 October 2004. <link xlink:href="http://www.w3.org/TR/xmlschema-2/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/xmlschema-2/</link></bibliomixed><bibliomixed xml:id="entryXSD1.1-structures" xreflabel="XSD 1.1: Structures">W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures. W3C Working Draft. 20 June 2008. <link xlink:href="http://www.w3.org/TR/2008/WD-xmlschema11-1-20080620/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/2008/WD-xmlschema11-1-20080620/</link></bibliomixed><bibliomixed xml:id="entryXSD1.1-datatypes" xreflabel="XSD 1.1: Datatypes">W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. W3C Working Draft. 20 June 2008. <link xlink:href="http://www.w3.org/TR/2008/WD-xmlschema11-2-20080620/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/2008/WD-xmlschema11-2-20080620/</link></bibliomixed></bibliography></article>
