<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2" xml:id="Bal2008Baum1020"><title>Freedom to Constrain</title><subtitle>where does attribute constraint come from, mommy?</subtitle><info><confgroup><conftitle>Balisage: The Markup Conference 2008</conftitle><confdates>August 12 - 15, 2008</confdates></confgroup><abstract><para>Where should attribute constraints live? In an external schema? In the document’s own
        metadata? In a separate file? Several possibilities are examined, raising lots of questions
        and offering a few answers.</para></abstract><author><personname><firstname>Syd</firstname><surname>Bauman</surname></personname><personblurb><para>Syd Bauman is the technical person at the Brown University Women Writers Project,
          where he has worked since 1990, designing and maintaining a significantly extended
          TEI-conformant schema for encoding early printed books. He has served as the North
          American Editor of the Text Encoding Initiative Guidelines, has an AB from Brown
          University in political science, and has worked as an Emergency Medical Technician since
          1983.</para></personblurb><affiliation><jobtitle>Senior Programmer/Analyst</jobtitle><orgname>Brown University Women Writers Project</orgname></affiliation><email>Syd_Bauman@Brown.edu</email><link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.stg.brown.edu/staff/syd.html</link></author><legalnotice><para>Copyright © 2008 Syd Bauman. Some rights reserved.</para></legalnotice><keywordset role="author"><keyword>XML</keyword><keyword>attribute</keyword><keyword>TEI</keyword><keyword>ODD</keyword><keyword>constraint</keyword></keywordset></info><para>It is clear that constraining document structure is a very
  important part of document production. We test whether or not our
  XML documents are properly constrained through the process of
  validation. <quote>The … purpose of validation is to subject a
  document … to a test, to determine whether it conforms to a given
  set of external criteria. … Our need to test is simply explained and
  understood (so much so that it rarely needs to be explicated): if
  there exists a point in a process where it is less expensive to
  discover and correct problems than it is to save the work of testing
  and fix at later points, it is profitable to introduce a
  test.</quote><footnote><para>Piez, Wendell, “Beyond the ‘descriptive
  vs. procedural’ distinction”, presented at Extreme Markup Languages
  2001, Montréal, Canada. <link xlink:href="http://www.idealliance.org/papers/extreme/proceedings/html/2001/Piez01/EML2001Piez01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.idealliance.org/papers/extreme/proceedings/html/2001/Piez01/EML2001Piez01.html</link>.
  </para></footnote>
  </para><para>Michael Sperberg-McQueen may have summed this importance up
  best when he advised <quote>constrain your data early and
  often</quote>, which he often did.<footnote><para>Sperberg-McQueen,
  C. Michael. Oral conversation, and multiple oral presentations
  throughout the 1990s. See, e.g., <link xlink:href="http://www.w3.org/People/cmsmcq/2001/darmstadt.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/People/cmsmcq/2001/darmstadt.html</link>.</para></footnote>
  (It helped that he lived in Chicago at the time.)</para><para>So it is obvious that constraints need to be expressed in a
  formal language of some sort. Many such general-purpose formal
  languages are available, including closed schema languages like DTDs
  and RELAX NG, and open schema languages like Schematron and CLiX.
  Furthermore at least one literate encoding language exists in which
  such constraints along with documentation about them can be
  expressed. This language is called ODD (for “one document does it
  all”) — constraints expressed in other languages (DTDs, RELAX NG, or
  XML Schema; in theory others as well) can be derived from a set of
  constraints expressed in ODD.<footnote><para>Burnard, Lou and Syd Bauman, eds. “4.3.2 Floating Texts.”
  <emphasis>TEI P5: Guidelines for Electronic Text Encoding and
  Interchange</emphasis>. Version 1.1.0. 2008-07-04. TEI Consortium.
  <link xlink:href="http://www.tei-c.org/release/doc/tei-p5-doc/html/DS.html#DSFLT " xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.tei-c.org/release/doc/tei-p5-doc/html/DS.html#DSFLT
</link>  2008-08-30</para></footnote><footnote><para>Burnard, Lou and Syd Bauman, eds. “23.4 Implementation of an ODD System.”
  <emphasis>TEI P5: Guidelines for Electronic Text Encoding and
  Interchange</emphasis>. Version 1.1.0. 2008-07-04. TEI Consortium.
  <link xlink:href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/USE.html#IM " xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/USE.html#IM
</link>  2008-08-30</para></footnote><footnote><para>Sperberg-McQueen, C. Michael and Lou Burnard. “The Design of
  the TEI Encoding Scheme.” <emphasis>Computers and the
  Humanities</emphasis> 1995. 29 (1) p. 17–39. doi:10.1007/BF01830314</para></footnote><footnote><para>Burnard, Lou, Sebastian Rahtz. “RelaxNG
  with Son of ODD”, presented at Extreme Markup Languages 2004,
  Montréal, Canada. <link xlink:href="http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Burnard01/EML2004Burnard01.pdf" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Burnard01/EML2004Burnard01.pdf</link>.
  </para></footnote>
  Furthermore there are systems of constraint based on special-purpose
  languages, rather than general-purpose languages. The feature system
  declaration created by the Text Encoding Initiative (TEI) and now
  being incorporated into ISO 24610-2 is an example — a set of XML
  elements (the feature system declaration) that can be used to
  constrain the expression of another set of XML elements (the feature
  structure itself).<footnote><para>Burnard, Lou and Syd Bauman, eds. “18 Feature Structures”
  <emphasis>TEI P5: Guidelines for Electronic Text Encoding and
  Interchange</emphasis>. Version 1.1.0. 2008-07-04. TEI Consortium.
  <link xlink:href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html   " xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/FS.html</link>
  2008-08-30</para></footnote></para><para>So the choice of <emphasis>how</emphasis> to express a
  particular constraint is not always obvious. But a related question
  is perhaps just as important: <emphasis>where</emphasis> should
  these constraints be expressed? What are the consequences of
  expressing them in different places?</para><para>This paper will attempt to shed light on these general
  questions by taking an in-depth look at the possible locations for
  the expression of one particular kind of constraint, and the
  consequences of those different locations. The constraint discussed
  will be that of limiting the value an attribute may take to one of
  an enumerated list of possible values. For simplicity the presumed
  setting for this constraint will be in a TEI document, but the
  principles should be equally applicable to any other encoding
  language that separates the document from its metadata, including
  DocBook or XHTML. The locations considered will be
  <itemizedlist><listitem><para>the “normal” way, in the formal closed schema (RELAX NG
    will be used as the example)</para></listitem><listitem><para>in a formal open schema (ISO Schematron will be used as the example)</para></listitem><listitem><para>in the metadata element (i.e.
    <code>&lt;teiHeader&gt;</code>)</para></listitem><listitem><para>in a separate metadata file</para></listitem><listitem><para>in the metaschema file (i.e. the ODD file)</para></listitem><listitem><para>no formal constraint</para></listitem></itemizedlist>
  Each of the latter methods will be compared to and contrasted with
  the first.</para><section xml:id="uc"><title>Use Case</title><para>There are lots of reasons to wish to constrain markup
    constructs, in particular attribute values. One case worth
    considering is the markup project which has tens or hundreds of
    occurrences of a particular attribute in each of tens or hundreds
    of files, where the list of possible values for the attribute is
    different for each file.</para><para>Imagine, e.g., an epigraphy project transcribing thousands
    of inscriptions on various objects. Imagine further that the
    inscriptions are divided among 27 separate files, organized by
    some criteria other than the kind of object that bears the
    inscription (e.g. date the object was discovered, current museum in
    which it is held, whatever). That which the text bearing object is
    made of is recorded in a TEI manuscript description on the
    <code>material=</code> attribute of the
    <code>&lt;supportDesc&gt;</code> element. Possible values might
    include <code>"bronze"</code>, <code>"marble"</code>,
    <code>"limestone"</code>, <code>"plaster"</code>,
    <code>"wood"</code>, etc.</para><para>Such a typical humanities computing project is likely to have:
    <itemizedlist><listitem><para>a subject matter expert</para></listitem><listitem><para>an XML expert</para></listitem><listitem><para>encoders — getting the extant text into
      XML-encoded digital form may be accomplished in a variety of
      ways:
      <itemizedlist><listitem><para>typed from source</para></listitem><listitem><para>post-OCR editing</para></listitem><listitem><para>via an external vendor</para></listitem></itemizedlist></para></listitem><listitem><para>proofreaders, managers, web designers, research assistants, etc.</para></listitem></itemizedlist>
    </para></section><section><title>Background</title><section><title>Open vs Closed vs Extensible Schemas</title><para>Formal schema languages can generally be categorized into
      one of two types: open or closed. A closed schema language like
      RELAX NG specifies a complete document grammar. Only those
      documents that meet all of the constraints of the grammar are
      considered valid; all others are rejected as invalid.</para><para>An open schema language, like Schematron, specifies
      particular rules. Documents that violate the specified rules are
      rejected as invalid; all others are accepted as valid.</para><para>One can think of closed schema languages as a white list
      spam filter, and closed schema languages as a black list spam
      filter. Using a white list (closed schema language) only e-mail
      from the addresses specified get through, all others are
      rejected as spam. Using a black list (open schema language) any
      e-mail that is on the list of problematic addresses is rejected
      as spam, all others are allowed through.</para><para>Of course the situation is not as simple as that. One can
      specify some open constructs in many closed schema languages,
      and one can write sufficiently tight rules in most open
      languages that they behave like a closed language.</para><para>For example, validation against the following complete
      RELAX NG grammar will permit any XML document as long as it has
      a <code>&lt;foo&gt;</code> element with a <code>bar=</code>
      attribute as the first child of the root element.
      <programlisting xml:space="preserve">start = element * { any_attribute*, foo, any_element* }
any_attribute = attribute * { text }
any_element = element * { any* }
any = ( any_attribute | any_element | text )
any_sans_bar = ( attribute * - ( bar ) { text } | any_element | text )
foo = element foo { attribute bar { text }, any_sans_bar* }</programlisting>
      </para><para>Conversely, validation against the following Schematron
      rule will permit only those documents that have one
      <code>&lt;platypus&gt;</code> element with a <code>bill=</code>
      attribute that has the value <code>"duck"</code> as the only
      child of the root <code>&lt;enigma&gt;</code> element.
      <programlisting xml:space="preserve">  &lt;pattern&gt;
    &lt;rule context="/*"&gt;
      &lt;assert test="name(.)='enigma'"&gt;Root element must be "enigma"&lt;/assert&gt;
      &lt;report test="@*"&gt;Root "enigma" element can not have attributes&lt;/report&gt;
      &lt;assert test="count(child::*)=1"&gt;"enigma" can only have one child 
      ("platypus")&lt;/assert&gt;
      &lt;assert test="count(child::platypus)=1"&gt;"enigma" can only have one 
      "platypus" child&lt;/assert&gt;
      &lt;report test="child::text()[not(normalize-space(.)='')]"&gt;"enigma" is 
      not allowed to have text, just "platypus"&lt;/report&gt;
    &lt;/rule&gt;
    &lt;rule context="/enigma/platypus"&gt;
      &lt;assert test="@*[name(.)='bill']"&gt;"platypus" must have a bill= 
      attribute&lt;/assert&gt;
      &lt;report test="@*[not(name(.)='bill')]"&gt;"platypus" must not have any 
      attributes other than bill=&lt;/report&gt;
      &lt;report test="child::*"&gt;"platypus" must be empty (i.e., can not have 
      child elements)&lt;/report&gt;
      &lt;assert test="string-length( normalize-space(.) ) = 0"&gt;"platypus" 
      must be empty (i.e., can not contain text)&lt;/assert&gt;
    &lt;/rule&gt;
    &lt;rule context="/enigma/platypus/@bill"&gt;
      &lt;assert test="normalize-space(.)='duck'"&gt;The value of bill= of 
      "platypus" must be 'duck'&lt;/assert&gt;
    &lt;/rule&gt;
  &lt;/pattern&gt;</programlisting>
      </para><para>These reverse uses of open and closed schema languages may
      be thought of as analogous to black-list or white-list spam
      filters that permit wildcards.</para><para>Neither of the above examples are particularly good ways
      of performing the desired validation, but they serve as
      proofs-of-concept that when we refer to a schema language as
      “open” or “closed”, we may be referring to its default, and not
      its only, behavior.</para><para>There is one further twist worth mentioning. Some modular
      XML document systems, including DocBook and TEI, permit a user
      of the system to generate (closed) schemas that contain not only
      the element and attribute declarations native to the system, but
      also additional declarations for constructs added by the
      user.</para></section><section><title>Literate Encoding</title><para>Literate programming is a style of programming intended to
      make computer documentation better by, among other things,
      placing the documentation and source code in the same computer
      file. The TEI has applied this concept to the schemas used to
      validate documents to help ascertain whether or not they conform
      to the TEI Guidelines. The source code from which the schemas
      are generated and the prose documentation that make up the bulk
      of the TEI Guidelines are stored in one computer
      document.</para><para>In order to facilitate this, and in order to help make it
      easy to extract formal schemas in any of a variety of popular
      languages, the formal constraints are (for the most part)
      expressed in the TEI language, rather than any particular schema
      language.</para><para>Thus the TEI Guidelines proper (some 32 chapters of prose
      documentation), formal schemas expressed in RELAX NG, the XML
      DTD language, or the W3C Schema language, and reference
      documentation for those schemas, are all extracted from the same
      single document. We say that this “one document does” it all,
      and thus it is referred to as an ODD document.</para></section></section><section><title>In the Closed Schema (RELAX NG file)</title><section><title>how</title><para>Many are probably quite familiar with the mechanism for
      constraining an enumerated attribute in a formal closed schema
      language. E.g., in RELAX NG (compact syntax), the possible
      values of the <code>type=</code> attribute (in this case, of the
      <code>&lt;name&gt;</code> element) could be constrained with a
      construct like
      <programlisting xml:space="preserve">attribute type { "person" | "place" | "ship" | "sword" }</programlisting>
      A variety of readily available off-the-shelf software will test
      whether or not a document is valid with respect to a RELAX NG
      schema.</para></section><section><title>advantages</title><para>This method is extremely common for a reason: it makes a
      lot of sense. In many, many cases XML document structure is
      already governed by an external closed schema. These external
      schemas, at least when written in one of the three major
      languages (DTD, RELAX NG, W3C XML Schema) are generally easy to
      read and process. They describe the constraint in a standard
      formal language that has wide software support, including open
      source validators.</para><para>These languages typically provide the capability to
      specify a variety of structural and content constraints on XML
      documents. In particular, they provide the capability needed
      here: to constrain the set of possible values of the
      <code>type=</code> attribute to one of a list of possibilities.
      <footnote><para>DTDs impose greater restrictions on what the
      members of that list can be than the others: each possible value
      must be an XML Name.</para></footnote></para></section><section><title>disadvantages</title><para>In many cases, the person or persons who write and
      maintain the external schema is not the same as the person or
      persons who create the XML instances (or the programs that write
      the XML instances) that conform to it. In these cases, those who
      create the instances often do not have either the necessary
      knowledge (e.g., knowing the schema language) or capability
      (e.g., having read-write access to the schema) to make changes
      to it.</para><para>Furthermore in many cases (whether the instance creator is
      the same as the schema maintainer or not), a single external
      schema governs the validity of dozens or even tens of thousands
      of XML instances. But the desired constraints on a particular
      attribute may be different in different instances. Typically in
      these cases the schema limits the attribute to one of a set
      that is the union of all possible values in all governed
      documents. Here adding the additional constraint of <quote>only
      these values in <emphasis>this</emphasis> document</quote>
      requires making a separate schema that is like the original in
      all respects except for the declaration of the
      <code>type=</code> attribute of <code>&lt;name&gt;</code>.</para></section></section><section><title>In the Open Schema (ISO Schematron)</title><para>Many are probably quite familiar with the mechanism for
      constraining an enumerated attribute in a formal open schema
      language. E.g., in Schematron (DSDL part 4), the possible values
      of the <code>type=</code> attribute of the TEI
      <code>&lt;name&gt;</code> element could be constrained with a
      construct like
      <programlisting xml:space="preserve">&lt;pattern&gt;
  &lt;rule context="tei:name/@type"&gt;
    &lt;assert test="normalize-space(.)='person'
               or normalize-space(.)='place'
               or normalize-space(.)='ship'
               or normalize-space(.)='sword'"&gt;
      Names can only be of people, places, ships, or swords
    &lt;/assert&gt;
  &lt;/rule&gt;
&lt;/pattern&gt;</programlisting>
      </para><para>While the use of open vs closed schemas have a lot of
    advantages and disadvantages to the schema designer, with respect
    to this particular question, the advantages and disadvantages are
    primarily the same: while the constraint can be expressed in a
    formal, widely supported language, and can be tested with readily
    available tools, it is still in a separate file that may support
    many documents, that may not be accessible, and that uses a
    language that may be foreign to those who would like to change it.</para><para>There is one additional disadvantage of Schematron in
    particular with respect to RELAX NG: it is harder to annotate the
    Schematron schema than the RELAX NG schema. RELAX NG deliberately
    permits elements from other namespaces to be mixed in with the
    RELAX NG specifications, and defines where annotations relating to
    particular structures should go. Furthermore, because the four
    tokens against which we are trying to validate are expressed as
    four separate elements (in the XML syntax), there is a place to
    annotate each separately (the <code>&lt;a:documentation&gt;</code>
    element follows the <code>&lt;rng:value&gt;</code> element to which it
    refers). Schematron also has a built-in documentation feature (a
    <code>&lt;p&gt;</code> element), but because all four tokens are
    tucked into a single XPath expression, it is a bit harder to
    discuss them individually. This is partially confounded because
    <code>&lt;p&gt;</code> is not permitted in <code>&lt;rule&gt;</code>,
    <code>&lt;assert&gt;</code>, or <code>&lt;report&gt;</code>, making it
    difficult to put the documentation close to the code. This is
    partially alleviated because elements from foreign namespaces are
    permitted in those spaces, and inside <code>&lt;p&gt;</code>. Thus
    something like the following construct could be used to provide
    documentation of such a constraint.
<programlisting xml:space="preserve">&lt;pattern&gt;
  &lt;p class="annotation"&gt;The various values for &lt;tei:att&gt;type&lt;/tei:att&gt; of 
    &lt;tei:gi&gt;name&lt;/tei:gi&gt; came about as follows: &lt;tei:list type="gloss"&gt;
      &lt;tei:label&gt;
        &lt;tei:val&gt;person&lt;/tei:val&gt;
      &lt;/tei:label&gt;
      &lt;tei:item&gt;Added 2007-04-17 when we removed &lt;tei:gi&gt;persName&lt;/tei:gi&gt;&lt;/tei:item&gt;
      &lt;tei:label&gt;
        &lt;tei:val&gt;place&lt;/tei:val&gt;
      &lt;/tei:label&gt;
      &lt;tei:item&gt;Added 2007-04-17 when we removed &lt;tei:gi&gt;placeName&lt;/tei:gi&gt;&lt;/tei:item&gt;
      &lt;tei:label&gt;
        &lt;tei:val&gt;ship&lt;/tei:val&gt;
      &lt;/tei:label&gt;
      &lt;tei:item&gt;Added 2007-04-17 in order to accommodate the various ship names&lt;/tei:item&gt;
      &lt;tei:label&gt;
        &lt;tei:val&gt;ship&lt;/tei:val&gt;
      &lt;/tei:label&gt;
      &lt;tei:item&gt;Added 2007-10-02 when we found a reference to "Excalibur" that the
        professor needed to annotate&lt;/tei:item&gt;
    &lt;/tei:list&gt;
  &lt;/p&gt;
  &lt;rule context="tei:name/@type"&gt;
    &lt;tei:note&gt;&lt;tei:att&gt;type&lt;/tei:att&gt; of &lt;tei:gi&gt;rs&lt;/tei:gi&gt; is matched elsewhere.&lt;/tei:note&gt;
    &lt;assert test=".='person' or .='place' or .='ship' or .='sword'"&gt; Names may only be 
      of people, places, ships, or swords &lt;/assert&gt;
  &lt;/rule&gt;
&lt;/pattern&gt;</programlisting>
</para></section><section><title>In the Metaschema (ODD file)</title><section><title>how</title><para>The same constraint might be expressed, at a slightly
      higher level of abstraction and combined with some
      documentation, using the ODD literate encoding language:
      <programlisting xml:space="preserve">
&lt;attDef ident="<emphasis role="bold">type</emphasis>"&gt;
  &lt;valList type="closed"&gt;
    &lt;valItem ident="<emphasis role="bold">person</emphasis>"&gt;
      &lt;desc&gt;The name refers to a person&lt;/desc&gt;
    &lt;/valItem&gt;
    &lt;valItem ident="<emphasis role="bold">place</emphasis>"&gt;
      &lt;desc&gt;The name refers to a political or man-made region, for example
        a city, country, hamlet, town, or neighborhood. For geographical
        places such as rivers or valleys, use &lt;gi&gt;geogName&lt;/gi&gt;&lt;/desc&gt;
    &lt;/valItem&gt;
    &lt;valItem ident="<emphasis role="bold">ship</emphasis>"&gt;
      &lt;desc&gt;The name refers to a ship, whether sea-worthy, interplanetary,
        or interstellar&lt;/desc&gt;
    &lt;/valItem&gt;
    &lt;valItem ident="<emphasis role="bold">sword</emphasis>"&gt;
      &lt;desc&gt;The name refers to a sword&lt;/desc&gt;
    &lt;/valItem&gt;
  &lt;/valList&gt;
&lt;/attDef&gt;</programlisting>
      There exists software that will <quote>tangle</quote> ODD
      specifications like the above into formal declarations in one of
      several schema languages, including RELAX NG. Then any of the
      same variety of readily available off-the-shelf software could
      be used to test validity.</para><para>Furthermore, there exists software that will
      <quote>weave</quote> the same specification above into easily
      readable hyperlinked documentation.</para></section><section><title>advantages</title><para>The advantages of literate programming are well
      understood, and include more easily readable and understandable
      source code, and that documentation (because it is right next to
      the source code) is more likely to match the program and be
      updated when the source code changes.<footnote><para>Knuth,
      Donald. <emphasis>Literate Programming</emphasis>, ISBN
      0-9370-7380-6.</para></footnote> These advantages apply here as well.
      In addition, at least for those familiar with TEI, there is the
      advantage that the language used to describe the constraints is
      a TEI language, so schema designers are likely to be familiar
      with at least the documentation paradigm for the specialized
      schema-description elements, if not the elements themselves; in
      addition, they are likely familiar with the generic TEI elements
      (like <code>&lt;desc&gt;</code>, above) that are used in addition
      to the specialized elements.</para></section><section><title>disadvantages</title><para>The disadvantages of the external schema (whether open or
      closed) are present here as well. Furthermore, an extra
      processing step is required to generate (i.e.
      <quote>tangle</quote>) a schema that itself can be used to
      validate instances using off-the-shelf software. In addition, at
      least for those who are not intimately familiar with TEI, there
      is the disadvantage that the language used to describe the
      constraints is primarily a TEI language, so schema designers may
      not be familiar with the specialized schema-description
      elements.</para></section></section><section><title>In the Metadata (<code>&lt;teiHeader&gt;</code>)</title><section xml:id="pointing"><title>how — pointing</title><para>It should be quite feasible to develop a mechanism for
      expressing the list of possible values of an attribute in the
      same document in a rather abstract way. For
      example:<programlisting xml:space="preserve">&lt;codeGrp elementTypes="name rs" attributes="type"&gt;
  &lt;codeDef xml:id="person"&gt;The name or string refers to a
    person&lt;/codeDef&gt;
  &lt;codeDef xml:id="place"&gt;The name or string refers to a
    political or man-made region, for example a city, country,
    hamlet, town, or neighborhood. For geographical places such as
    rivers or valleys, use &lt;gi&gt;geogName&lt;/gi&gt;&lt;/codeDef&gt;
  &lt;codeDef xml:id="ship"&gt;The name or string refers to a ship,
    whether sea-worthy, interplanetary, or
    interstellar&lt;/codeDef&gt;
  &lt;codeDef xml:id="sword"&gt;The name or string refers to a
    sword, &lt;foreign xml:lang="fr"&gt;main-gauche&lt;/foreign&gt;, switchblade,
    or other edged weapon&lt;/codeDef&gt;
&lt;/codeGrp&gt;</programlisting>
      Given this encoding in the <code>&lt;teiHeader&gt;</code>, the
      <code>&lt;name&gt;</code> element could have <code>type=</code>
      values of <code>"#person"</code>, <code>"#place"</code>, etc.
      Software could be developed to validate that the value of
      <code>type=</code> of <code>&lt;name&gt;</code> is a URI that
      points to an element whose parent <code>&lt;codeGrp&gt;</code> has
      <code>"name"</code> in its <code>elementTypes=</code> list and
      <code>"type"</code> in its <code>attributes=</code> list. (I
      believe that Schematron code could probably be used for this
      test, but have not yet demonstrated this.) Note that the check
      does not specify the element type of the child of
      <code>&lt;codeGrp&gt;</code>. This gives the flexibility to have
      special-purpose <code>&lt;codeDef&gt;</code>-like elements that
      might provide structured information about the value. E.g., one
      can well imagine the TEI’s <code>&lt;handNote&gt;</code> element being
      used in this way.</para></section><section><title>advantages</title><para>This mechanism has significant potential advantages,
      particularly in cases where one schema is used for many files
      which may have different attribute constraint requirements. For
      most users it is much easier to change something in the same
      file they are working on, rather then needing to make changes to
      an external schema, particularly an external schema that may be
      in a language the user does not know or in a file to which the
      user does not have write access, and particularly changes that
      might inadvertently invalidate other existing instances. Thus
      the encoder, as opposed to the schema-designer, can add, remove,
      or change a value quite easily.</para><para>Another advantage is that the information about to what
      values the attribute is constrained, and what those values mean,
      is an integral part of the document. This means that this
      information will survive in the situation where a document
      instance is sent along without its schema or documentation.
      Furthermore the list of values in different files at a given
      project could be slightly different.</para><para>Moreover, the particular system shown here has the
      advantage that it uses a mechanism most users are already
      familiar with: <code>xml:id=</code> and relative URIs (i.e.,
      bare name fragment identifiers). It is worth noting, though,
      that there is no requirement that the URIs be bare name
      fragment identifiers, which permits this system to quickly and
      easily be changed to that which is discussed in <xref linkend="separate"/>.</para></section><section><title>disadvantages</title><para>This system has obvious inefficiencies when multiple,
      perhaps thousands, of document instances share the same
      constraints — the same information is repeated in each
      file.</para><para>Another significant disadvantage of this method is that we
      are using a non-standard language for constraint and
      documentation. The question, then, is whether or not this system
      is demonstrably significantly better than what can be obtained
      using standard languages.<footnote><para>What some call
      <emphasis>Syd’s rule</emphasis>, and I have begun to call my
      <emphasis>wheel re-invention prevention convention</emphasis>:
      <quote>unless your method is significantly and demonstrably
      superior to the standard, you should be using the
      standard.</quote>.</para></footnote></para><para>Lastly the fact that this system uses the URI pointing
      mechanism produces a disadvantages, one of which is
      severely problematic:
      <itemizedlist><listitem><para>of minor annoyance is that the user needs to
  encode a hash-mark (<quote><code>#</code></quote>, U+0023) at
  the beginning of each value;</para></listitem><listitem><para>the fact that values are restricted to XML
  Names could be a problem in some situations;</para></listitem><listitem><para>but far more problematic, because
  <code>xml:id=</code> needs to be unique within the document,
  any given possible attribute value can only occur on one
  attribute (although that attribute could be on multiple
  elements)<!-- <footnote>
  <para>Sort of an 11.3.3:12
  problem on steroids.<footnote>
  <para>See <citation>Goldfarb, Charles F., The SGML Handbook,
  Clarendon Press, Oxford, 1990. p. 424.</citation> for the
  clause in question. As for why it is a problem, those who had
  been used to grappling with SGML will probably remember; for
  those who have not done so or have forgotten, the discovery is
  left as an exercise.</para>
      </footnote>
  </para>
      </footnote>  --> — furthermore, no other element elsewhere in
      the document can use the same string as one of these attribute
      values as its identifier.</para></listitem></itemizedlist>
      </para></section><section><title>how — co-reference</title><para>Those last disadvantages that are the result of using
      <code>xml:id=</code> and URIs could be circumvented by matching
      the attribute values, rather than using a true pointer (e.g.
      ID/IDREF or URI). In the <code>&lt;teiHeader&gt;</code> the enumeration
      of the possible attribute values would look almost the same, but
      would use a different attribute for storing the actual
      value.</para><programlisting xml:space="preserve">&lt;codeGrp elementTypes="name rs" attributes="type"&gt;
  &lt;codeDef attrVal="person"&gt;The name or string refers to a
    person&lt;/codeDef&gt;
  &lt;codeDef attrVal="place"&gt;The name or string refers to a
    political or man-made region, for example a city, country,
    hamlet, town, or neighborhood. For geographical places such as
    rivers or valleys, use &lt;gi&gt;geogName&lt;/gi&gt;&lt;/codeDef&gt;
  &lt;codeDef attrVal="ship"&gt;The name or string refers to a ship,
    whether sea-worthy, interplanetary, or
    interstellar&lt;/codeDef&gt;
  &lt;codeDef attrVal="sword"&gt;The name or string refers to a
    sword, &lt;foreign xml:lang="fr"&gt;main-gauche&lt;/foreign&gt;, switchblade,
    or other edged weapon&lt;/codeDef&gt;
&lt;/codeGrp&gt;</programlisting><para>Software could be developed to validate that the value of
      <code>type=</code> of <code>&lt;name&gt;</code> is a string that
      matches the <code>attrVal=</code> attribute of an element whose
      parent <code>&lt;codeGrp&gt;</code> has <code>"name"</code> in its
      <code>elementTypes=</code> list and <code>"type"</code> in its
      <code>attribute=</code> list. (I believe that Schematron code
      could probably be used for this test, but have not yet
      demonstrated this. Certainly XSLT 1.0 can transform this into
      simple Schematron; this I have demonstrated, see <xref linkend="codeGrp2Schematron"/>.) Note that the check does not
      specify the element type of the child of
      <code>&lt;codeGrp&gt;</code>. This gives the flexibility to have
      special-purpose <code>&lt;codeDef&gt;</code>-like elements that
      might provide structured information about the value. E.g., one
      can well imagine the TEI’s <code>&lt;handNote&gt;</code> element
      being used in this way.</para><para>This system avoids the disadvantages of using
      <code>xml:id=</code>, and yet has several advantages over
      external schema files. E.g., encoders can quickly and easily add
      values to closed lists, in a manner that does not run the the
      risk that they might break the rest of the schema. I find the
      case of the encoder who wishes to quickly and easily express
      stricter constraints on her attribute values in a given file
      than those that come with the generic external schema very
      compelling.</para></section></section><section xml:id="separate"><title>In the Metadata (separate file)</title><para>In the method described in <xref linkend="pointing"/>
    the values of the <code>type=</code> attribute of
    <code>&lt;name&gt;</code> are URIs. Because of this, it would be
    feasible to store the <code>&lt;codeGrp&gt;</code> element with
    <code>xml:id=</code> attributes in a project-wide
    “attribute_definitions.xml” file. While this has the advantage
    of flexibility and reusability, it presents the sizable
    disadvantage that the attribute values would now depend on
    details of system features external to the document. E.g., the
    ability to validate <code>&lt;name
    type="../attribute_definitions.xml#sword"&gt;</code> breaks if the
    current file is moved to a sub-directory.</para><para>Furthermore, if the <code>&lt;codeGrp&gt;</code> is stored in a
    separate file, the maintenance issues are almost the same as those
    for a separate closed schema (e.g., a RELAX NG grammar), open
    schema (e.g., a Schematron schema), or metaschema (e.g., a TEI
    ODD): those who have reason to change the constraints expressed
    may not have the write-permissions necessary to do so, and if they
    do may be at risk for invalidating files other than the one being
    worked on.</para><para>So in some cases (in particular, the scenario sketched out
    in <xref linkend="uc"/>) it makes lots of sense to leave the
    formal constraints for some aspects of a document in the metadata
    section of that document itself, e.g. in the
    <code>&lt;teiHeader&gt;</code>. But having convinced ourselves there
    is a need to be able to express constraints in a different
    <emphasis>place</emphasis> than is usual, why require a separate
    formal construct to express the constraint? Why not include RELAX
    NG, Schematron, or ODD markup constructs in the
    <code>&lt;teiHeader&gt;</code> directly?<footnote><para>Indeed, James
    Cummings and I have suggested this on more than one occasion. See,
    e.g., <link xlink:href="http://lists.village.virginia.edu/pipermail/tei-council/2005/005627.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://lists.village.virginia.edu/pipermail/tei-council/2005/005627.html</link>.</para></footnote>
    This is worthy of consideration, but is outside the scope of the
    current paper.</para></section><appendix xml:id="codeGrp2Schematron"><title>&lt;codeGrp&gt; to Schematron</title><para>The following XSLT 1.0 stylesheet is a proof-of-concept
  demonstration for transforming the <code>&lt;codeGrp&gt;</code>
  elements discussed above into Schematron that could be used to
  validate that an XML instance used only the mentioned possible
  values of the attribute specified.</para><programlisting xml:space="preserve">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;!-- Tranform my mythical &lt;codeGrp&gt; elements into a Schematron schema --&gt;
&lt;!-- Copyleft 2008 Syd Bauman --&gt;
&lt;!-- Last updated: 2008-08-31 --&gt;
&lt;xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:sch="http://purl.oclc.org/dsdl/schematron"&gt;

  &lt;xsl:template match="/"&gt;
    &lt;!-- only mess with &lt;codeGrp&gt; elements; if there are none, we do nothing --&gt;
    &lt;!-- Note that we presume each &lt;codeGrp&gt; has both elementTypes= and  --&gt;
    &lt;!-- attriubtes= specified and that their values are lists of one or more --&gt;
    &lt;!-- XML Names. No error-checking for this here, schema validation should --&gt;
    &lt;!-- have already flagged any that don't have both required attributes or --&gt;
    &lt;!-- have inappropriate values. --&gt;
    &lt;xsl:if test="//codeGrp"&gt;
      &lt;!-- if there is one (or more) we write out a Schematron schema --&gt;
      &lt;sch:schema&gt;
        &lt;sch:ns uri="http://www.tei-c.org/ns/1.0" prefix="tei"/&gt;
        &lt;!-- and process each &lt;codeGrp&gt; into it --&gt;
        &lt;xsl:apply-templates select="//codeGrp"/&gt;
      &lt;/sch:schema&gt;
    &lt;/xsl:if&gt;
  &lt;/xsl:template&gt;

  &lt;!-- Each &lt;codeGrp&gt; becomes a Schematron &lt;pattern&gt; --&gt;
  &lt;xsl:template match="codeGrp"&gt;
    &lt;sch:pattern&gt;
      &lt;!-- append a blank to the GI list for easier parsing later --&gt;
      &lt;xsl:variable name="elementTypes" select="concat(normalize-space(@elementTypes),' ')"/&gt;
      &lt;!-- append a blank to the attribute name list for easier parsing later --&gt;
      &lt;xsl:variable name="attributes" select="concat(normalize-space(@attributes),' ')"/&gt;
      &lt;!-- Each GI/attribute pair becomes a Schematron &lt;rule&gt; --&gt;
      &lt;!-- A little more detail: each paired combination of --&gt;
      &lt;!-- 1. a GI listed on my elementTypes= attribute, and --&gt;
      &lt;!-- 2. an attribute name listed on my attributes= attribte --&gt;
      &lt;!-- becomes a &lt;rule&gt;. We do this by processing each GI in  --&gt;
      &lt;!-- a recursive template, which in turn calls another recursive --&gt;
      &lt;!-- template for the list of attributes. --&gt;
      &lt;xsl:call-template name="elementTypes"&gt;
        &lt;xsl:with-param name="gis" select="$elementTypes"/&gt;
        &lt;xsl:with-param name="attrs" select="$attributes"/&gt;
      &lt;/xsl:call-template&gt;
    &lt;/sch:pattern&gt;
  &lt;/xsl:template&gt;

  &lt;!-- Each GI listed on the elementTypes= attribute gets processed separately --&gt;
  &lt;xsl:template name="elementTypes"&gt;
    &lt;xsl:param name="gis"/&gt;
    &lt;xsl:param name="attrs"/&gt;
    &lt;!-- Taking advantage of that ending blank, parse off the 1st GI --&gt;
    &lt;xsl:variable name="this_gi" select="substring-before($gis,' ')"/&gt;
    &lt;xsl:variable name="rest" select="substring-after($gis,' ')"/&gt;
    &lt;!-- call attributes template to do the work for this particular GI --&gt;
    &lt;xsl:call-template name="attributes"&gt;
      &lt;xsl:with-param name="gi" select="$this_gi"/&gt;
      &lt;xsl:with-param name="attrs" select="$attrs"/&gt;
    &lt;/xsl:call-template&gt;
    &lt;!-- and do the same thing (via recursion) for the rest of the GIs, if any --&gt;
    &lt;xsl:if test="string-length($rest) &gt; 1"&gt;
      &lt;xsl:call-template name="elementTypes"&gt;
        &lt;xsl:with-param name="gis" select="$rest"/&gt;
        &lt;xsl:with-param name="attrs" select="$attrs"/&gt;
      &lt;/xsl:call-template&gt;
    &lt;/xsl:if&gt;
  &lt;/xsl:template&gt;

  &lt;!-- Each attibute name on the attributes= attribute gets processed in combination --&gt;
  &lt;!-- with the current GI --&gt;
  &lt;xsl:template name="attributes"&gt;
    &lt;xsl:param name="gi"/&gt;
    &lt;xsl:param name="attrs"/&gt;
    &lt;!-- Taking advantage of that ending blank, parse off the 1st attribute --&gt;
    &lt;xsl:variable name="this_attr" select="substring-before($attrs,' ')"/&gt;
    &lt;xsl:variable name="rest" select="substring-after($attrs,' ')"/&gt;
    &lt;!-- make a rule out of it --&gt;
    &lt;xsl:element name="sch:rule"&gt;
      &lt;xsl:attribute name="context"&gt;
        &lt;!-- There must be a better way to do this ... --&gt;
        &lt;xsl:text&gt;tei:&lt;/xsl:text&gt;
        &lt;xsl:value-of select="$gi"/&gt;
        &lt;xsl:text&gt;/@&lt;/xsl:text&gt;
        &lt;xsl:value-of select="$this_attr"/&gt;
      &lt;/xsl:attribute&gt;
      &lt;xsl:variable name="numVals" select="count(child::*/@attrVal)"/&gt;
      &lt;!-- if I have no children with attrVal= specified, then don't --&gt;
      &lt;!-- generate any assertions (luckily an emtpy &lt;rule&gt; is valid --&gt;
      &lt;!-- in Schematron). --&gt;
      &lt;xsl:if test="$numVals &gt; 0"&gt;
        &lt;xsl:element name="sch:assert"&gt;
          &lt;!-- Probably would be better to generate this test (i.e., the expression --&gt;
          &lt;!-- that is the value of this output test= attribute) only once per attrVal=, --&gt;
          &lt;!-- rather once for each attrVal= for each GI/attr combination. --&gt;
          &lt;xsl:attribute name="test"&gt;
            &lt;xsl:for-each select="child::*/@attrVal"&gt;
              &lt;xsl:text&gt;.='&lt;/xsl:text&gt;
              &lt;xsl:value-of select="."/&gt;
              &lt;xsl:text&gt;'&lt;/xsl:text&gt;
              &lt;xsl:if test="$numVals &gt; 1  and  position() != last()"&gt;
                &lt;xsl:text&gt; or &lt;/xsl:text&gt;
              &lt;/xsl:if&gt;
            &lt;/xsl:for-each&gt;
          &lt;/xsl:attribute&gt;
        &lt;/xsl:element&gt;
      &lt;/xsl:if&gt;
    &lt;/xsl:element&gt;
    &lt;!-- and do the same thing (via recursion) for the rest of the attributes, if any --&gt;
    &lt;xsl:if test="string-length($rest) &gt; 1"&gt;
      &lt;xsl:call-template name="attributes"&gt;
        &lt;xsl:with-param name="gi" select="$gi"/&gt;
        &lt;xsl:with-param name="attrs" select="$rest"/&gt;
      &lt;/xsl:call-template&gt;
    &lt;/xsl:if&gt;
  &lt;/xsl:template&gt;

&lt;/xsl:stylesheet&gt;</programlisting></appendix></article>
