<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2"><title>TEI Feature Structures as a Representation Format for Multiple Annotation and Generic XML
  Documents</title><info><confgroup><conftitle>Balisage: The Markup Conference 2009</conftitle><confdates>August 11 - 14, 2009</confdates></confgroup><abstract><para> Feature structures are mathematical entities (rooted labeled directed acyclic graphs) that
    can be represented as graph displays, attribute value matrices or as XML adhering to the
    constraints of a specialized TEI tag set. We demonstrate that this latter ISO-standardized
    format can be used as an integrative storage and exchange format for sets of multiple annotation
    XML documents. This specific domain of application is rooted in the approach of multiple
    annotations, which marks a possible solution for XML-compliant markup in scenarios with
    conflicting annotation hierarchies. A more extreme proposal consists in the possible use as a
    meta-representation format for generic XML documents. For both scenarios our strategy concerning
    pertinent feature structure representations is grounded on the XDM (XQuery 1.0 and XPath 2.0
    Data Model). The ubiquitous hierarchical and sequential relationships within XML documents are
    represented by specific features that take ordered list values. The mapping to the TEI feature
    structure format has been implemented in the form of an XSLT 2.0 stylesheet. It can be
    characterized as exploiting aspects of both the push and pull processing paradigm as
    appropriate. An indexing mechanism is provided with regard to the multiple annotation documents
    scenario. Hence, implicit links concerning identical primary data are made explicit in the
    result format. In comparison to alternative representations, the TEI-based format does well in
    many respects, since it is both integrative and well-formed XML. However, the result documents
    tend to grow very large depending on the size of the input documents and their respective markup
    structure. This may also be considered as a downside regarding the proposed use for generic XML
    documents. On the positive side, it may be possible to achieve a hookup to methods and
    applications that have been developed for feature structure representations in the fields of
    (computational) linguistics and knowledge representation. </para></abstract><author><personname><firstname>Jens</firstname><surname>Stegmann</surname></personname><personblurb><para>Jens Stegmann studied linguistics, psychology and computer science at Bielefeld
     University. Parts of this paper deal with aspects of his Master thesis.</para></personblurb><affiliation><orgname>Bielefeld University</orgname></affiliation><email>jens.stegmann@googlemail.com</email></author><author><personname><firstname>Andreas</firstname><surname>Witt</surname></personname><personblurb><para>Witt received his Ph.D. in Computational Linguistics and Text Technology from the
     Bielefeld University in 2002 (dissertation title: Multiple Informationsstrukturierung mit
     Auszeichnungssprachen. XML-basierte Methoden und deren Nutzen für die Sprachtechnologie). </para><para>After graduating in 1996, he started as a researcher and instructor in Computational
     Linguistics and Text Technology. He was heavily involved in the establishment of the minor
     subject Text Technology in Bielefeld University´s Magister and B.A. program in 1999 and 2002
     respectively. After his Ph.D. in 2002 he became an assistant lecturer, still at the Text
     Technology group in Bielefeld. In 2006 he moved to Tübingen University, where he was involved
     in a project on "Sustainability of Linguistic Resources" and in projects on the
     interoperability of language data. Since 2009 he is senior researcher at "Institut für Deutsche
     Sprache" (Institute for the German Language) in Mannheim. </para><para>Witt is and was a member of several research organizations, amongst them the TEI Special
     Interest Group on overlapping markup, for which he was involved in the writing of the latest
     version of the chapter "Multiple Hierarchies", which is included in TEI-Guidelines P5.</para><para>Witt's main research interests deal with questions on the use and limitations of markup
     languages for the linguistic description of language data.</para></personblurb><affiliation><orgname>Institute for the German Language (IDS), Mannheim</orgname></affiliation><email>witt@ids-mannheim.de</email></author><legalnotice><para>Copyright © 2009 by the authors.  Used with
    permission.</para></legalnotice><keywordset role="author"><keyword>Overlapping Structures</keyword><keyword>Multiple Hierarchies</keyword><keyword>Multiple Annotation</keyword><keyword>TEI</keyword><keyword>Text Encoding Initiative</keyword><keyword>Feature Structures</keyword></keywordset></info><section xml:id="s1" xreflabel="“Introduction”"><title>Introduction</title><para> As the title suggests, this contribution describes aspects of the use of a certain
   representation format ("TEI Feature Structures") with regard to a specific domain of application
   ("Multiple Annotation") and also concerning a second, much more general kind of scenario
   ("Generic XML Documents"). </para><para> TEI P5 <xref linkend="p5"/> compliant encodings of feature structures, which we refer to as
    <emphasis>TEI feature structures</emphasis> in this article, will receive much of our attention.
    <xref linkend="f1"/> shows a simple example: the encoding of a certain feature structure
     <emphasis>F<subscript>1</subscript></emphasis>. <emphasis>F<subscript>1</subscript></emphasis>
   serves to characterize a specific class of linguistic entities here, namely nominal phrases of
   the third person singular kind.</para><figure xml:id="f1" xreflabel="Figure 1"><title>TEI Encoding of a Feature Structure <emphasis>F<subscript>1</subscript></emphasis></title><programlisting xml:space="preserve">
&lt;fs&gt;
 &lt;f name="CAT"&gt;
  &lt;symbol value="np" /&gt;
 &lt;/f&gt;
 &lt;f name="AGR"&gt;
  &lt;fs&gt;
   &lt;f name="NUM"&gt;
    &lt;symbol value="sing" /&gt;
   &lt;/f&gt;
   &lt;f name="PER" /&gt;
    &lt;symbol value="third" /&gt;
   &lt;/f&gt;
  &lt;/fs&gt;
 &lt;/f&gt;
&lt;/fs&gt;
            </programlisting></figure><para> There are two features on the top-level of <emphasis>F<subscript>1</subscript></emphasis>:
    <code>CAT</code> with its value <code>np</code> and <code>AGR</code> with an associated complex
   value, which is a feature structure itself. This latter embedded structure consists of the
   feature-value pairs <code>NUM</code> with value <code>sing</code> and <code>PER</code> with value
    <code>third</code>. We will return to the theme of encoding
    <emphasis>F<subscript>1</subscript></emphasis> below (<xref linkend="s2.1"/>). Since we will use
   the same example there, it will be possible to compare different syntaxes for the display of
   feature structures in a straightforward way. We do not delve into details connected with the XML
   syntax exemplified in <xref linkend="f1"/> here, since this will be the topic of another part of
   this article (<xref linkend="s2.2"/>). In the rest of this introductory section, we shall try to
   shed some light upon the two application domains that have been mentioned above. </para><para> The more specific scenario consists in the integrative representation of annotation
   documents along the approach of <emphasis>multiple annotations</emphasis>
   <xref linkend="witt2004"/>. The multiple annotations approach marks a possible solution with
   regard to the markup of overlapping structures. Linguists, e.g., do often encounter XML-related
   problems, when they try to annotate a common core of linguistic data according to different
   levels of linguistic analysis (phonology, morphology, syntax, semantics, and pragmatics). The
   most straightforward way of marking things up might involve the incorporation of crossing edges.
   Such, however, is prohibited on grounds of XML's foundational tree structure. It can be argued
   that such configurations of data with conflicting hierarchies require a different kind of data
   structure, i.e., a <emphasis>multi-rooted tree</emphasis> (<xref linkend="carletta2003"/>,<xref linkend="woerner2006"/> and <xref linkend="witt2007"/>). A multi-rooted tree consists of several
   trees that span over the same data leaves. The multiple annotations approach now proposes to mark
   up each description level / tree as a document instance in its own right. This allows for each
   document to consist of well-formed XML, the modeling of alternative annotations is possible, the
   levels can be viewed separately, and new levels can be added at any time <xref linkend="witt2004"/>. However, such documents may seem to be somewhat unrelated and independent of each other. Witt
   therefore proposes to regard the primary textual data, which have to be identical across all such
   annotation documents, as the defining implicit link between them. Of course, it would be
   desirable to bring such implicit linkages forward as explicit ones. This can be done, e.g.,
   during the course of a transformation to an adequate representation format. We intend to show
   that the ISO-standardized TEI tag set for the representation of feature structures can be such a
   representation format. Pros, cons and alternative strategies with respect to
    <emphasis>overlapping structures</emphasis> are discussed in the pertinent literature, compare
    <xref linkend="derose2004"/>, <xref linkend="sperberg-mcqueen2007"/> and <xref linkend="carletta2007"/> for an overview.</para><para> Besides the different stages of the TEI recommendations (<xref linkend="p3"/>, <xref linkend="p4"/> and <xref linkend="p5"/>), at least one alternative proposal concerning the
    <emphasis>encoding of feature structures as SGML/XML markup</emphasis> has been brought forward
   in the literature <xref linkend="sailer2001"/>. However, to the best of our knowledge, no one has
   yet discussed the question how a representation in the opposite direction could look like, i.e.,
    <emphasis>encoding SGML/XML markup documents as feature structures</emphasis>. We will come up
   with an original answer to this question, as we succeed concerning the more specific goal of
   finding a way to represent sets of multiple annotation documents as TEI feature structures.
   Feature Structures can be regarded as a general type of data structure and there may be specific
   advantages associated with their use as a meta-representation format. We will speculate about
   related aspects in the last section of this paper. </para><para> The structure for the rest of this article looks as follows. In the next section (<xref linkend="s2"/>), we characterize feature structures as mathematical entities and introduce three
   syntaxes for means of visualization and encoding: graph displays, attribute value matrices and
   the pertinent TEI tag set. Ways to represent XML documents as TEI feature structures and aspects
   of the XSLT-implemented transformation from multiple annotation and generic XML documents to the
   integrative TEI feature structure format are discussed in the next section (<xref linkend="s3"/>). Finally, we summarize our findings, take up some loose ends from the previous sections and
   discuss the relative advantages and disadvantages of representations in terms of TEI feature
   structures in the last section (<xref linkend="s4"/>) of this contribution. </para></section><section xml:id="s2" xreflabel="“Feature Structures”"><title>Feature Structures</title><section xml:id="s2.1" xreflabel="“Feature Structures in a Nutshell”"><title>Feature Structures in a Nutshell</title><para><emphasis>Feature structures</emphasis> are a common means of representation in formal
    linguistic theory.<footnote><para>There are equivalent structures in other environments, too, as one of our anonymous
      reviewers remarked. Compare the National Library of Medicine's <emphasis>custom-meta
       structures</emphasis>
      <xref linkend="nlm"/>, for example. </para></footnote> Their use is most prominent in certain variants of generative grammar <xref linkend="shieber1986"/>
    <footnote><para>Namely unification-based grammars, whose name derives from the most important operation
      on feature structures, i.e., unification.</para></footnote>, but not constrained to the syntactic level of analysis, e.g., there are linguistic
    applications in phonology, morphology, semantics and pragmatics, too. Furthermore, feature
    structures can be characterized as a general purpose data structure <xref linkend="iso24610"/>
    with possible applications in the vast field of knowledge representation. Hence, their
    usefulness is by no means constrained to linguistic investigations alone.</para><para> From a mathematical stance, there are at least two perspectives on feature structures
     <xref linkend="shieber1986"/>. On the one hand, a feature structure can be construed as a
    partial function from a set of <emphasis>features</emphasis> to a set of
     <emphasis>values</emphasis>. The value associated with a certain feature can be either
     <emphasis>atomic</emphasis>, e.g., a specific symbolic value as <code>element</code> or a
    binary value like <code>true</code>, or it may be <emphasis>complex</emphasis>. The latter means
    that it can be a full-blown feature structure itself or it may be of a <emphasis>collection
     value</emphasis> type like a set or a list of, again, possibly complex values. We will come
    upon numerous examples below. Due to the availability of complex values, feature structures can
    embed other feature structures in value position and, hence, provide a considerable degree of
    representational articulateness. Note that there will be no significance to the order of
    features that are located on the same hierarchical level within a feature structure.</para><para>Another mathematical perspective derives from graph theory and leads to the
    characterization of feature structures as rooted labeled directed (acyclic)<footnote><para>Some formalizations of feature structures allow cycles and it can also be argued that
      cyclic structures may be needed for the representation of certain phenomena as the liar's
      paradox ("This statement is false.").</para></footnote> graphs. Graphs <xref linkend="diestel2005"/> are mathematical entities that consist
    of sets of nodes and edges. We can think of the edges of a graph as connecting its nodes. Graphs
    can be depicted in an intuitively appealing way as diagram displays. The labeled edges represent
    the features, the leaf nodes represent the atomic values, and the inner nodes, if any, represent
    the complex values of a feature structure. <xref linkend="f2"/> is an example <emphasis>graph
     display</emphasis> of the feature structure <emphasis>F<subscript>1</subscript></emphasis>,
    compare <xref linkend="f1"/> in the preceding section for the TEI counterpart. </para><figure xml:id="f2" xreflabel="Figure 2"><title>Graph Display of the Feature Structure
     <emphasis>F<subscript>1</subscript></emphasis></title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-001.jpg"/></imageobject></mediaobject></figure><para> There is an alternative to the visualization of feature structures as graph displays. It
    consists in the use of attribute value matrices.<footnote><para>Some linguistic theories use different notations for (total) models vs. (partial)
      descriptions. For example, HPSG <xref linkend="pollard1994"/> uses graph displays for models
      and AVMs for descriptions.</para></footnote> In <emphasis>attribute value matrix</emphasis> notation, the features are written  to the left of their associated values and there are brackets that indicate the
    scope of the (sub-)feature structure(s) involved.<footnote><para>Feature names are usually capitalized on grounds of a notational convention. </para></footnote> <xref linkend="f3"/> shows
      <emphasis>F<subscript>1</subscript></emphasis> in attribute value matrix notation, compare
     <xref linkend="f1"/> and <xref linkend="f2"/> above for the TEI- and graph display
    counterparts. Concerning the forthcoming examples in this article, we will only use the TEI
    format and the attribute value matrix notation.</para><figure xml:id="f3" xreflabel="Figure 3"><title>Attribute Value Matrix Notation of the Feature Structure
       <emphasis>F<subscript>1</subscript></emphasis></title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-002.jpg"/></imageobject></mediaobject></figure><para> Feature structures list correct information and only correct information, but they do not
    necessarily contain all the correct information with regard to a specific object, i.e., they may
    be of a <emphasis>partial</emphasis> nature.<footnote><para>HPSG theoreticians <xref linkend="pollard1994"/> draw a distinction between
       <emphasis>feature structures</emphasis>, which can be characterized as total objects in the
      sense of containing all the relevant specifications with respect to the objects they are a
      model of, and <emphasis>feature structure descriptions</emphasis>, which are partial
      descriptions of feature structures. From this perspective, feature structures and feature
      structure descriptions belong to different theoretical realms (model vs. formalism). We will
      not delve deeper into this discussion here and continue with our usage of the term feature
      structure for partial objects also. </para></footnote> Partiality can be a good thing, since it allows for feature structures to capture
    generalizations via the underspecification of certain properties.</para><para>When features have identical values, there are two scenarios to consider: the values can be
    either type- or token-identical. If the values are merely type-identical, we can characterize
    them as being independent of one another. A hypothetical change to one of the values would have
    no effect on the other values involved. However, in case of token-identity the features are
    associated with one and the same value token and, hence, are dependent on it. A change to the
    token would affect all the features that reference it. This latter scenario of token-identity is
    also called coreference, <emphasis>structure sharing</emphasis> or reentrancy. In attribute
    value matrix notation, it can be indicated by means of co-indexed boxes that either act as a
    referring place-holder in value position or they may be written before a certain value token and
    such all occurrences of the index within the feature structure are bound to that value. We will
    come upon an example in the next subsection of this article, cf. <xref linkend="f5"/> below. At
    the graph display level, we would use edges that lead into one and the same node in order to
    indicate structure sharing. </para><para> An important operation upon feature structures is unification <xref linkend="shieber1986"/>. The foundational idea is fairly simple and can be sketched as follows: the result of the
    unification of compatible feature structures is the most general feature structure that contains
    all the information of the unified feature structures. Technically, unification is defined via
    the auxiliary concept of subsumption. <emphasis>Subsumption</emphasis> implements an intuitive
    concept of specificity and wealth of information among feature structures. We define that a
    feature structure F' subsumes a feature structure F'' if F' contains a subset of the information
    in F'' <xref linkend="shieber1986"/>. Alternatively, we may say that F' carries less information
    than F'' or that F' is more general than F''. Subsumption is a partial order on the set of
    feature structures, since feature structures may be incompatible with each other. Now, we can
    define the <emphasis>unification</emphasis> of two feature structures F and G, if any, to be the
    most general feature structure H, such that F subsumes H and G subsumes H. If the feature
    structures to be unified are incompatible, we say that the unification fails. A related
    operation that works in the opposite direction is generalization. This operation is the dual of
    unification. We can define the <emphasis>generalization</emphasis> of two feature structures F
    and G to be the most specific feature structure E, such that E subsumes F and E subsumes G.
    Unlike unification, generalization cannot fail. In the worst case, the result will be the empty
    feature structure [ ] that subsumes every feature structure.</para><para> It should be noted that feature structures can be <emphasis>typed</emphasis>
    <xref linkend="carpenter1992"/>. However, neither the present state of the representations nor
    the implemented transformation that we describe in this paper does make use of typed feature
    structures, so we won't go into details regarding that topic here. </para></section><section xml:id="s2.2" xreflabel="“The TEI Tag Set for Feature Structures”"><title>The TEI Tag Set for Feature Structures</title><para>The TEI tag set for the representation of feature structures has been a part of the TEI
    Guidelines since version <emphasis>P3</emphasis>
    <xref linkend="p3"/>. Building on the <emphasis>P4</emphasis> version <xref linkend="p4"/>, an
     <emphasis>ISO</emphasis> standard <xref linkend="iso24610"/> was adopted by ISO TC37 SC4 and
    also implemented in the current <emphasis>P5</emphasis> version <xref linkend="p5"/> that we
    use here.</para><para> The foundational XML elements that are needed in order to encode feature structures are
     <code>fs</code> for feature structures and <code>f</code> for features. The content of an
     <code>fs</code> element consists of a sequence of feature-value specifications. A feature-value
    specification is encoded using an element of type <code>f</code> for the feature and the element
    content of <code>f</code> for the associated value. The details look as follows. Every
     <code>f</code> element has an attribute <code>name</code> for its feature name. The
    representation of the associated value of a feature depends on the exact type of the value
    involved. Atomic values of type <code>binary</code>, <code>symbol</code> or <code>numeric</code>
    are realized via a <code>value</code> attribute on a respective child element of <code>f</code>
    that corresponds to the actual value type. For example, <code>f</code> may have a child element
     <code>binary</code> which has a <code>value</code> attribute that provides the desired
    parameter. If the value is of the <code>string</code> type, however, the value is encoded in a
    slightly different form, i.e., as the literal element content of a respective
     <code>string</code> child element of <code>f</code>.</para><para> Complex values of the feature structure kind are encoded by means of <code>fs</code>
    elements, of course. However, there is also another class of complex values: these are the
    collection values of the <code>list</code>, <code>set</code> and <code>bag</code> type. Such
    collections of values are indicated via <code>vColl</code> elements that have an
     <code>org</code> attribute whose value specifies the respective collection type, i.e., whether
    it is a <code>bag</code>, a <code>set</code> or a <code>list</code>. The content of a
     <code>vColl</code> element consists of a succession of values of any kind.</para><figure xml:id="f4" xreflabel="Figure 4"><title>TEI Feature Structure <emphasis>F<subscript>2</subscript></emphasis>: Structure Sharing
     and Collection Values</title><programlisting xml:space="preserve">
&lt;fs&gt;              
 &lt;f name="F"&gt;
  &lt;vColl org="list"&gt;
   &lt;vLabel name="a"&gt;
    &lt;fs&gt;
     &lt;f name="I"&gt;
      &lt;symbol value="a"/&gt;
     &lt;/f&gt;
     &lt;f name="J"&gt;
      &lt;symbol value="b"/&gt;
     &lt;/f&gt;
    &lt;/fs&gt;
   &lt;/vLabel&gt;
   &lt;vLabel name="b"&gt;
    &lt;fs&gt;
     &lt;f name="K"&gt;
      &lt;symbol value="c"/&gt;
     &lt;/f&gt;
     &lt;f name="L"&gt;
      &lt;symbol value="d"/&gt;
     &lt;/f&gt;
    &lt;/fs&gt;
   &lt;/vLabel&gt;
  &lt;/vColl&gt;
 &lt;/f&gt;
 &lt;f name="G"&gt;
  &lt;vLabel name="a"/&gt;
 &lt;/f&gt;
 &lt;f name="H"&gt;
  &lt;vColl org="set"&gt;
   &lt;vLabel name="b"/&gt;
   &lt;fs&gt;
    &lt;f name="M"&gt;
     &lt;symbol value="e"/&gt;
    &lt;/f&gt;
    &lt;f name="N"&gt;
     &lt;symbol value="f"/&gt;
    &lt;/f&gt;
   &lt;/fs&gt;
  &lt;/vColl&gt;
 &lt;/f&gt;
&lt;/fs&gt;                       
                </programlisting></figure><para> There is a special element in order to indicate cases of structure sharing: the
     <code>vLabel</code> element. It either contains a value token as its element content or it
    occurs as a placeholder which indicates reference to an elsewhere specified value token. Each
     <code>vLabel</code> element has an associated <code>name</code> attribute. The value of the
     <code>name</code> attribute corresponds to the index of a tagged box in attribute value matrix
    notation, see below. This mechanism allows for various structure sharing configurations within a
    single feature structure. <xref linkend="f4"/> (TEI-based representation) and <xref linkend="f5"/> (attribute value matrix notation) display the same abstract example feature structure
      <emphasis>F<subscript>2</subscript></emphasis> in different notation formats and exemplify the
    themes of structure sharing and collection values. </para><figure xml:id="f5" xreflabel="Figure 5"><title>Attribute Value Matrix for <emphasis>F<subscript>2</subscript></emphasis>: Structure
     Sharing and Collection Values</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-003.jpg"/></imageobject></mediaobject></figure><para> There are three top-level features in <emphasis>F<subscript>2</subscript></emphasis>:
     <code>F</code>, <code>G</code>, and <code>H</code>. All of them are associated with complex
    values. <code>F</code> has a list collection value, which is encoded using angle brackets at the
    attribute value matrix level, <code>G</code> has a feature structure as its value, and
     <code>H</code> has a set collection value that is indicated using curly brackets in Figure 4.
    The first list value of <code>F</code> and the complex value of <code>G</code> are co-indicated.
    The same holds for the second list value of <code>F</code> and the firstly notated set member of
     <code>H</code>.</para></section></section><section xml:id="s3" xreflabel="“Representation and Transformation”"><title>Representation and Transformation</title><section xml:id="s3.1" xreflabel="“Representation of XML Documents via TEI Feature Structures”"><title>Representation of XML Documents via TEI Feature Structures</title><para>Both feature structures and XML documents can be regarded from the perspective of graph
    theory <xref linkend="diestel2005"/>. XML documents are specimen of ordered trees, while feature
    structures are merely unordered directed acyclic graphs. This holds because of the possibility
    of structure sharing within feature structures and because there is no order imposed among
    features of the same level within feature structures. So, the task of representing XML documents
    as feature structures seems to involve a transformation from a more rigidly structured
    representation format to a less rigidly structured one. Specifically, we have to find a way to
    represent the ordered <emphasis>sequential relations</emphasis> that hold among parts of XML
    documents both at the text and at the markup level in terms of feature-value pairs. Furthermore,
    also the <emphasis>hierarchical relationships</emphasis> have to be expressed in terms of
    feature-value specifications. A possible solution consists in the use of specific features for
    hierarchical aspects whose values will be structured themselves and which have to be interpreted
    as reflecting sequential relationships. </para><para>In the following, we shall regard a simple annotation data example that will help to
    illustrate our points. It is shown as <xref linkend="f6"/> and <xref linkend="f7"/> below, which
    contain morphological and phonological annotation layers of the German verb
    "geben" (engl.: to give).</para><figure xml:id="f6" xreflabel="Figure 6"><title>Simple Annotation Data: Example 1</title><programlisting xml:space="preserve">
&lt;w&gt;
 &lt;m type="lexical"&gt;geb&lt;/m&gt;
 &lt;m type="flexive"&gt;en&lt;/m&gt;
&lt;/w&gt;
                </programlisting></figure><figure xml:id="f7" xreflabel="Figure 7"><title>Simple Annotation Data: Example 2</title><programlisting xml:space="preserve">     
&lt;w&gt;
 &lt;syll n="s1"&gt;ge&lt;/syll&gt;
 &lt;syll n="s2"&gt;ben&lt;/syll&gt;
&lt;/w&gt;
                </programlisting></figure><para> In the rest of this section we will follow a historical route and discuss two
     <emphasis>representation alternatives</emphasis> that we came up with. Both of the sketched
    solutions will be sufficiently general and can hence be applied to generic XML documents and
    sets of multiple annotation documents alike. Our discussion will be framed more towards multiple
    annotation here.</para><para> Our first and historically older <emphasis>representation alternative I</emphasis> makes
    use of a list notation variant that is defined in a recursive way using <code>FIRST</code> and
     <code>REST</code> features <xref linkend="witt2009"/>. The basic idea is to have the very first
    element of a given sequence, e.g., the first character of a text sequence, as the value of the
     <code>FIRST</code> feature and the result for the rest of the sequence as the value of the
     <code>REST</code> feature. So, the latter value will usually be a complex value, again, that is
    structured according to the very same scheme, i.e., with the first item of the (rest-)sequence
    detached and so on.<footnote><para>Unless there is no rest sequence and we have reached the end of the sequence
      already.</para></footnote> We go over the sequence in this way until we reach its end where the recursion
    bottoms out by <code>*null*</code> as the value of the at most embedded <code>REST</code>
    feature within the list structure. It functions as a placeholder for the empty list.</para><figure xml:id="f8" xreflabel="Figure 8"><title>Representation Alternative I: TEI-based</title><programlisting xml:space="preserve">       
&lt;fs&gt;
 &lt;f name="DATA"&gt;
  &lt;fs&gt;
   &lt;f name="FIRST"&gt;
    &lt;vLabel name="1"&gt;
     &lt;symbol value="g"/&gt;
    &lt;/vLabel&gt;
   &lt;/f&gt;
   &lt;f name="REST"&gt;
    &lt;fs&gt;
     &lt;f name="FIRST"&gt;
      &lt;vLabel name="2"&gt;
       &lt;symbol value="e"/&gt;
      &lt;/vLabel&gt;
     &lt;/f&gt;
     &lt;f name="REST"&gt;
      &lt;fs&gt;
       &lt;f name="FIRST"&gt;
        &lt;vLabel name="3"&gt;
         &lt;symbol value="b"/&gt;
        &lt;/vLabel&gt;
       &lt;/f&gt;
       &lt;f name="REST"&gt;
        &lt;fs&gt;
         &lt;f name="FIRST"&gt;
          &lt;vLabel name="4"&gt;
           &lt;symbol value="e"/&gt;
          &lt;/vLabel&gt;
         &lt;/f&gt;
         &lt;f name="REST"&gt;
          &lt;fs&gt;
           &lt;f name="FIRST"&gt;
            &lt;vLabel name="5"&gt;
             &lt;symbol value="n"/&gt;
            &lt;/vLabel&gt;
           &lt;/f&gt;
           &lt;f name="REST"&gt;
            &lt;symbol value="*null*"/&gt;
           &lt;/f&gt;
          &lt;/fs&gt;
         &lt;/f&gt;
        &lt;/fs&gt;
       &lt;/f&gt;
      &lt;/fs&gt;
     &lt;/f&gt;
    &lt;/fs&gt;
   &lt;/f&gt;
  &lt;/fs&gt;
 &lt;/f&gt;
 &lt;f name="TIER1"&gt;
   ...
 &lt;/f&gt;
 &lt;f name="TIER2"&gt;
   ...
 &lt;/f&gt;
&lt;/fs&gt;
        </programlisting></figure><para>This way of representation is displayed in <xref linkend="f8"/> in an abridged TEI feature
    structure format that shows the top-level feature geometry of the structure :<footnote><para>It would be nice to have something like a specialized document grammar regarding the
      finer details of the representations that we propose in this article. One of our anonymous
      reviewers encouraged us to give <emphasis>Feature System Declarations</emphasis>
      <xref linkend="p5"/> for the TEI feature structures. However, it seems that TEI FSDs are
      reserved for typed feature structures and the present state of our work here makes use of
      untyped feature structures. Since we may well choose to make the switch to typed
      representations in the future (in a way, the new representation scheme below has been designed
      to make the switch easier), it will be a good idea to take up on that proposal in a respective
      update. For the moment, we can at least validate our documents against TEI feature structure
      schemas generated via the TEI ROMA tool (http://www.tei-c.org/Roma/).</para></footnote>
    <code>DATA</code> contains a representation of only the textual characters of the document
    adhering to the <code>FIRST/REST</code> scheme discussed above. Furthermore, each character is
    associated with its own index in order to allow for structure sharing references to it from
    other parts of the feature structure. We provide an index for every character in order to allow
    for arbitrarily specific levels of annotation with respect to the common textual data. The
    numbered <code>TIER</code> features contain the specific information of the annotation levels.
    Each one represents the information of one of the multiple annotation documents involved. The
    implicit link between the different levels is made explicit by means of structure sharing.
    Therefore, there will be plenty of references to the common data characters from within the
    different <code>TIER</code> features of the document. However, this is not an explicit part of
    the example display in <xref linkend="f8"/> due to space considerations.<footnote><para>Note that a complete representation of the above annotation data examples in TEI format,
      but according to the newer representation standard that will be discussed below in this
      subsection, can be found in <xref linkend="a1"/>.</para></footnote>The binding of indexes to certain values is shown within the <code>DATA</code>
    feature, but the reference to such values is hidden within the abridged <code>TIER</code> levels
    of the document. However, those parts and the connections provided by the structure sharing
    mechanism can be inspected in <xref linkend="f9"/> that shows the attribute value matrix
    notation. Unlike its TEI counterpart, this display is complete and probably a bit easier to
    follow. It also displays the mechanics of the representation of the hierarchical relationships.
    They find expression via <code>CONTENT</code> features, whose values contain the representation
    of the subordinated document parts, e.g., the content of an element. The mechanisms for the
    representation of the hierarchical and the sequential relationships have to be combined as
    appropriate. This means that <code>CONTENT</code> will have a list value and the respective
    position within that list will reflect the sequential order among the dominated document
    parts.</para><figure xml:id="f9" xreflabel="Figure 9"><title>Representation Alternative I: Attribute Value Matrix</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-004.jpg"/></imageobject></mediaobject></figure><para>We move on to the discussion of our historically newer <emphasis>representation alternative
     II</emphasis>, which forms the basis of our current work on the topic. The most important
    changes have been made regarding the representation of sequential relationships and concerning
    the general feature geometry makeup. Consider <xref linkend="f10"/> which shows the changed
    top-level geometry . As its predecessor counterpart in <xref linkend="f8"/>, this display is
    incomplete and printed in an abridged format here. However, interested readers can find the
    complete version of this representation of the example data in <xref linkend="a1"/>.</para><figure xml:id="f10" xreflabel="Figure 10"><title>Representation Alternative II: TEI-based</title><programlisting xml:space="preserve">            
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;fs&gt;
 &lt;f name="DATA"&gt;
  &lt;vColl org="list"&gt;
   &lt;vLabel name="1"&gt;
    &lt;string&gt;g&lt;/string&gt;
   &lt;/vLabel&gt;
   &lt;vLabel name="2"&gt;
    &lt;string&gt;e&lt;/string&gt;
   &lt;/vLabel&gt;
   &lt;vLabel name="3"&gt;
    &lt;string&gt;b&lt;/string&gt;
   &lt;/vLabel&gt;
   &lt;vLabel name="4"&gt;
    &lt;string&gt;e&lt;/string&gt;
   &lt;/vLabel&gt;
   &lt;vLabel name="5"&gt;
    &lt;string&gt;n&lt;/string&gt;
   &lt;/vLabel&gt;
  &lt;/vColl&gt;
 &lt;/f&gt;
 &lt;f name="DOCUMENTS"&gt;
  &lt;vColl org="list"&gt;
   &lt;fs&gt;
     ...
   &lt;/fs&gt;  
   &lt;fs&gt;
     ...  
   &lt;/fs&gt;
  &lt;/vColl&gt;
 &lt;/f&gt;
&lt;/fs&gt;
        </programlisting></figure><para> On the top level, this representation consists of a <code>DATA</code> feature and a
     <code>DOCUMENTS</code> feature. Both features take complex values of a collection type, i.e.,
    lists of values. Concerning <code>DATA</code>, we now have a flat list representation with
    little internal structure. This format can be built in an easier way as compared to the more
    structured variant. The move to this format is possible, since the TEI Guidelines provide this
    kind of notational sugar for values of the collection kind.<footnote><para>In terms of features and values alone, respective structures still have to be realized by
      FIRST/REST-like structured representations as introduced above. The format provided by the TEI
      is a shorthand for that.</para></footnote> The <code>DOCUMENTS</code> feature also makes use of this kind of list notation and
    embeds the representation of the different annotation documents as a flat list of respective
    feature structures. Note also that there is just one such top-level feature now, compare the
    different numbered <code>TIER</code> features in representation alternative I. If there is only
    one annotation document to process, the list will contain only one corresponding feature
    structure, of course. <xref linkend="f11"/> is a complete display of the attribute value matrix
    for our example data.</para><!--<figure xml:id="f11" xreflabel="Figure 11" pgwide="1">--><figure xml:id="f11" xreflabel="Figure 11"><title>Representation Alternative II: Attribute Value Matrix</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-005.jpg"/></imageobject></mediaobject></figure><para> This way of representation takes a stance that is based on the <emphasis>XQuery 1.0 and
     XPath 2.0 Data Model</emphasis> (XDM) and, hence, the representations will be predestined for
    processing in an XSLT 2.0 context. We use the attributes which are provided by the XDM in order
    to represent the different node kinds within an XML document, starting from the very root. The
    node kinds that are distinguished are: document, element, attribute, namespace, commentary,
    processing instruction and text nodes. Every occurrence of a node is represented as a feature
    structure with features as appropriate for the node kind involved. The type of a node is
    indicated via the <code>TYPE</code> feature for nodes of all kinds. Hierarchical relations are
    represented via the <code>CHILDREN</code> feature for document- and element nodes. Order among
    the children nodes is encoded by the position within a sequence, since <code>CHILDREN</code>
    takes a collection value of the list kind. Element nodes and attribute nodes have
     <code>NAME</code> features, attribute and text nodes have <code>VALUE</code> features.
    Furthermore, each element node has an <code>ATTRIBUTES</code> feature that takes a set value,
    since attributes are unordered. The semantics associated with the different feature-value pairs
    should be straightforward. All in all, this approach allows for a very systematic representation
    regime across the different parts of an arbitrary XML document instance. Unlike the older
    approach, every feature structure which is embedded below the <code>DOCUMENTS</code> top-level
    feature now represents a certain node at the XML tree model level. However, it also has to be
    noted that our feature structure representations of XML documents tend to grow very fast with
    the size of the input document, which, however, seems to be true for all approaches based on TEI
    feature structures due to the modeling as feature structure and also the retranslation to XML involved.<footnote><para>As one of our anonymous reviewers remarked, it would be interesting to investigate the
      prospects and the performance of bare feature structures for our purposes and see whether and
      how much better they can perform as compared to the TEI-serialized feature structures that are
      the focus of the present paper.</para></footnote>
   </para><para>If the input to the transformation program does not consist of multiple annotation
    documents, but rather of one or several arbitrary XML documents, which do not share identical
    primary data, an integrative representation of such documents will still be build in a similar
    way. However, there will be no indexing mechanism incorporated and so no implicit links will be
    made explicit.</para></section><section xml:id="s3.2" xreflabel="“Aspects of the XSLT Implementation of the Transformation”"><title>Aspects of the XSLT Implementation of the Transformation</title><para> The program <code>xmls2avm.xsl</code> that implements the transformation to the TEI
    feature structure format was written with multiple annotation documents in mind. Nevertheless,
    it is robust enough to provide a result document if the input documents to the transformation
    fail the test of primary data identity or if there is only one document to be transformed. Such
    kind of robustness marks a necessary condition for the program to be useful within the generic
    XML realm. </para><para> The program was written in <emphasis>XSLT 2.0</emphasis> and uses certain features of the
    new XSLT version. For example, data typing is used for at least some of the parameters and
    variables involved and, most importantly, we exploit the extended functionalities and constructs
    that are grounded on the XDM tree model. XSLT 2.0 comes with support for multiple output
    documents, but the multiple input documents that are needed here still have to be provided via a
    sort of workaround: a call of the <code>document()</code>-function to a post-processed
    representation of a stylesheet parameter. The latter contains a list of secondary input
    documents that has to be assigned by the user when invoking the transformation program from the
    command line. Several further stylesheet parameters are provided in order to parameterize
    certain aspects of the transformation process and to determine peculiarities of the desired
    representation format. Most of this is optional, however, since there are defaults for the
    relevant parameters. An example stylesheet parameter is <code>$firstrestRepr</code>: it
    influences the way how lists are represented. If it is set to <code>true</code>, then lists will
    be represented in the recursively structured way that has been introduced as our historically
    older representation alternative I in the previous section. If it is set to <code>false</code>,
    however, then lists will be represented according to the newer flat representation alternative
    II that exploits the notational sugar provided by the TEI guidelines. The parameter is set to
     <code>false</code> as a default.</para><para>Although we decided that we would not include detailed comments on the whole stylesheet<footnote><para>This decision was made on grounds of space considerations, since this is a rather long
      paper already. Some anonymous reviewers would have liked to see the whole stylesheet included.
      Others shared our perspective that examples suffice here.</para></footnote>, we do provide three illustrative template examples below. These will be the
    templates for document nodes (in default mode), attributes and text. Besides these, the full
    stylesheet also contains templates for document nodes (in secondary mode), elements, processing
    instructions, comments. Furthermore, there are named templates for the processing of nested
    sequences and for the processing of nested sequences with regard to namespaces, as well as many
    additional parameters and variables defined.</para><para>We begin our discussion with the template for <emphasis>document nodes in default
     mode</emphasis>, i.e., the mode that is used at the start of the transformation without further
    ado. The template shown in <xref linkend="f12"/> will be applied to the document node of the
    primary input document at the start of the transformation. </para><figure xml:id="f12" xreflabel="Figure 12"><title>Template for Document Nodes in Default Mode</title><programlisting xml:space="preserve">
&lt;xsl:template match="document-node()" mode="#default"&gt;
 &lt;xsl:variable name="children" select="node()"/&gt;
 &lt;xsl:variable name="textnodes" select="//text()"/&gt;
 &lt;fs&gt;
  &lt;xsl:if test="$dataIdentity and $dataRepr"&gt;
   &lt;f name="DATA"&gt;
    &lt;vColl org="list"&gt;
     &lt;xsl:for-each select="str:characters($primaryString)"&gt;
      &lt;vLabel name="{position()}"&gt;
       &lt;string&gt;
        &lt;xsl:value-of select="."/&gt;
       &lt;/string&gt;
      &lt;/vLabel&gt;
     &lt;/xsl:for-each&gt;
    &lt;/vColl&gt;
   &lt;/f&gt;
  &lt;/xsl:if&gt;
  &lt;f name="DOCUMENTS"&gt;
   &lt;vColl org="list"&gt;
    &lt;fs&gt;
     &lt;f name="TYPE"&gt;
      &lt;symbol value="document"/&gt;
     &lt;/f&gt;
     &lt;f name="CHILDREN"&gt;
      &lt;xsl:choose&gt;
       &lt;xsl:when test="$firstrestRepr"&gt;
        &lt;xsl:choose&gt;
         &lt;xsl:when test="$children"&gt;
          &lt;xsl:call-template name="SequenceProcessing"&gt;
           &lt;xsl:with-param name="seq" select="$children"/&gt;
          &lt;/xsl:call-template&gt;
         &lt;/xsl:when&gt;
         &lt;xsl:otherwise&gt;
          &lt;symbol value="*null*"/&gt;
         &lt;/xsl:otherwise&gt;
        &lt;/xsl:choose&gt;
       &lt;/xsl:when&gt;
       &lt;xsl:otherwise&gt;
        &lt;vColl org="list"&gt;
         &lt;xsl:apply-templates select="$children"/&gt;
        &lt;/vColl&gt;
       &lt;/xsl:otherwise&gt;
      &lt;/xsl:choose&gt;
     &lt;/f&gt;
    &lt;/fs&gt;
    &lt;xsl:apply-templates select="$docRoots" mode="secondary"/&gt;
   &lt;/vColl&gt;
  &lt;/f&gt;
 &lt;/fs&gt;
&lt;/xsl:template&gt;   
            </programlisting></figure><para>The template starts with the definition of variables that can be referenced within the
    scope of the template. Most of the other templates in the stylesheet use such template
    variables, too. Then the first <code>fs</code> element of the target representation is inserted.
    This will be the outer frame for all the result markup that is created during the
    transformation. The two usual top-level features for a feature structure representation of XML
    documents are <code>DATA</code> and <code>DOCUMENTS</code>, compare our discussion of
    representation alternative II in the previous section. It is possible to drop even the
     <code>DATA</code> feature and go with the <code>DOCUMENTS</code> feature on the top-level of
    the feature structure alone. This possibility has been parameterized using
     <code>$dataRepr</code>, i.e., the user may decide whether he wants a <code>DATA</code> feature
    at the top-level or not. Furthermore, a variable named <code>$dataidentity</code> has been
    defined on the global stylesheet level. This variable implements a test for the identity of the
    primary textual data of all the input documents involved. Now, if <code>DATA</code> shall be
    present and the test result concerning textual data identity is positive, then the feature will
    be inserted into the result and be given a list value. The content of that list will be
    construed as follows: we iterate over all textual characters of our primary input document. For
    each character, we insert index markup (<code>vLabel</code>) with a numerical index attribute
    according to the position value of the respective character. Furthermore, the index will be
    bound to the character value whereas the latter is framed by a <code>string</code> element to
    indicate its value type. Next is the obligatory <code>DOCUMENTS</code> feature. It will take a
    list of feature structures, i.e., a list of <code>fs</code> elements. Those will represent the
    input documents, respectively. </para><para>In what follows in this template, we build the representation for the primary input
    document. The corresponding job for the other input documents, if any, will have to be done by
    the template for document node kinds in secondary mode. The two features appropriate for
    document nodes are <code>TYPE</code> and <code>CHILDREN</code>. Concerning <code>TYPE</code>,
    its value will be <code>&lt;symbol value="document"/&gt;</code> obviously. The value of
     <code>CHILDREN</code>, however, is more complicated and has to be determined via a series of
    conditional constructs. Firstly, it depends on whether the list representation has been set to
    the older recursively structured kind (<code>$firstrestRepr</code>) or not. If list
    representations follow that approach, then it depends again on whether the document node has
    descendants or not. If he has none, we insert a value for the empty list (<code>*null*</code>).
    However, if there are descendant nodes to the document node, the further calculation of the list
    representation is taken over by a called template of the recursive kind named
     <code>SequenceProcessing</code>. This template is called with the sequence of the current
    document node's descendant nodes as a parameter. That template will build a recursively
    structured kind of list representation as appropriate. However, if the value of the parameter
     <code>$firstrestRepr</code> is set such that we will have the flat kind of list representation,
    which is the default, then markup for a collection of the list kind will be inserted. However,
    the content of that list will be determined by the result of applying templates to all the
    descendant nodes of the current document node. Thus, the content of the <code>fs</code> element
    for the current primary input document is complete and can be closed with the respective end
    tags. What remains to be computed is the markup for the other secondary input documents.
    Therefore, templates are applied to the members of <code>$docRoots</code>, which holds the
    document nodes of the secondary input documents in a sequence format. Note, that a mode
     (<code>secondary</code>) is used in the respective <code>apply-templates</code> instruction, so
    the present template will not fit and we avoid a repeated insertion of the initial framing
    markup for the outermost level of the feature structure representation, which is only included
    in the processing of the primary input document here. </para><para> The complete stylesheet can be characterized as exploiting aspects of both <emphasis>the
     push and the pull processing paradigm</emphasis>
    <xref linkend="tennison2005"/>, like most stylesheets of a considerable size and complexity do,
    whereas the focus is shifting in different parts of the stylesheet. In a similar vein, it can be
    classified as implementing different <emphasis>stylesheet design patterns</emphasis>
    <xref linkend="kay2008"/>. For example, the buildup of the initial target feature structure
    tends to be of the pull type or rather navigational, to use Kay's concept. This, however, shifts
    towards a more push- or rule-oriented approach, which helps to fill up the missing parts of the
    initial structure by applying templates to the descendants of the current node. Appropriate
    templates are provided for each specific node kind against the background of the XDM. Certain
    aspects, e.g., the buildup of the older <code>FIRST</code>/<code>REST</code> list structures
    have been realized in a computational way recursively via calls to named templates with
    parameters as their arguments. We shall look at a recipient template of the push- or
    rule-oriented style of processing next in <xref linkend="f13"/>. It is the template for the
    processing of <emphasis>attribute nodes</emphasis>, whose application will be initiated from
    within the template for the processing of element nodes.</para><figure xml:id="f13" xreflabel="Figure 13"><title>Template for Attribute Nodes</title><programlisting xml:space="preserve">
&lt;xsl:template match="attribute()" mode="#all"&gt;
 &lt;fs&gt;
  &lt;f name="TYPE"&gt;
   &lt;symbol value="attribute"/&gt;
  &lt;/f&gt;
  &lt;f name="NAME"&gt;
   &lt;string&gt;
    &lt;xsl:value-of select="node-name(.)"/&gt;
   &lt;/string&gt;
  &lt;/f&gt;
  &lt;f name="VALUE"&gt;
   &lt;string&gt;
    &lt;xsl:value-of select="."/&gt;
   &lt;/string&gt;
  &lt;/f&gt;
 &lt;/fs&gt;
&lt;/xsl:template&gt;
            </programlisting></figure><para> In comparison to the previous template for document nodes, this one is very
    straightforward. There are three features appropriate for feature structures that represent
    attribute nodes: these are <code>TYPE</code>, <code>NAME</code> and <code>VALUE</code>. The
    respective values are very easily determined. Readers who managed to follow through on our
    description of the previous template should have no problems with this one.</para><para> At the heart of the transformation of multiply annotated documents is the indexing of the
    single characters and the reference mechanism that exploits these indexes. It is dependent on
    the relative position of characters with respect to the other characters of the document. Those
    values can be used as numerical indexes since they are bound to be constant across all the
    documents that pass a test of primary data identity. However, it has to be stressed that the
    computational cost of implementing this functionality can be considerable for large input
    documents. <xref linkend="f14"/> shows the code which does the job: it is the template for
     <emphasis>text nodes</emphasis>.</para><figure xml:id="f14" xreflabel="Figure 14"><title>Template for Text Nodes</title><programlisting xml:space="preserve">
&lt;xsl:template match="text()" mode="#all"&gt;
 &lt;xsl:variable name="currentRoot" select="/"/&gt;
 &lt;fs&gt;
  &lt;f name="TYPE"&gt;
   &lt;symbol value="text"/&gt;
  &lt;/f&gt;
  &lt;f name="VALUE"&gt;
   &lt;xsl:choose&gt;
    &lt;xsl:when test="$dataIdentity"&gt;
     &lt;xsl:variable name="numberOfCharactersSoFar" as="xs:integer"
                   select="sum(for $textnode in preceding::text() return string-length($textnode))"/&gt;
     &lt;vColl org="list"&gt;
      &lt;xsl:for-each select="str:characters(string(.))"&gt;
       &lt;vLabel name="{position() + $numberOfCharactersSoFar}"&gt;
        &lt;xsl:if test="not($dataRepr) and $primary is $currentRoot"&gt;
         &lt;string&gt;
          &lt;xsl:value-of select="."/&gt;
         &lt;/string&gt;
        &lt;/xsl:if&gt;
       &lt;/vLabel&gt;
      &lt;/xsl:for-each&gt;
     &lt;/vColl&gt;
    &lt;/xsl:when&gt;
    &lt;xsl:otherwise&gt;
     &lt;string&gt;
      &lt;xsl:value-of select="."/&gt;
     &lt;/string&gt;
    &lt;/xsl:otherwise&gt;
   &lt;/xsl:choose&gt;
  &lt;/f&gt;
 &lt;/fs&gt;
&lt;/xsl:template&gt;
            </programlisting></figure><para> There are two appropriate features for text nodes: <code>TYPE</code> and
     <code>VALUE</code>. The <code>TYPE</code> feature is set to the symbolic value
     <code>text</code>, of course. The procedure for determining the value of the feature
     <code>VALUE</code>, however, is much more complicated. This holds at least for multiple
    annotation documents, where the identity of the primary data is given
     (<code>$dataIdentity</code>). If this is not the case, we can just insert the value of the
    textual node as a whole. With regard to the data-identity scenario, however, we will proceed on
    a character by character basis with the help of an appropriately defined external function
     (<code>str:characters</code>) and calculate the appropriate index for each character. The
    interesting part of the calculation is done in the binding of the variable
     <code>$numberOfCharactersSoFar</code>. That result will be modulated by the relative position
    of each character with respect to the string value of the text node processed. If the user chose
    to go without the <code>DATA</code> feature on the top feature geometry level
     (<code>not($dataRepr)</code>) and if we are processing the primary input document
     (<code>$primary is $currentRoot</code>), not only the calculated indexes will be included in
    the list-valued result, but also the character values. Now that there is no specialized
     <code>DATA</code> feature, the indexes will be bound to their respective value tokens
    here.</para></section></section><section xml:id="s4" xreflabel="“Summary and Outlook”"><title>Summary and Outlook</title><para>In the context of this article, we started by providing an informal introduction to feature
   structures and their encoding as proposed in the TEI P5 Guidelines. We continued to discuss
   aspects of the representation of multiple annotation documents as XML-encoded feature structures.
   Most of our pertinent remarks are also correct concerning the representation of generic XML
   documents. It is rather just the indexing mechanism that is lost for that more general domain.
   Furthermore, we characterized the implemented XSLT stylesheet that was written in order to bring
   about the transformation from multiply annotated or generic XML documents to TEI-based feature
   structure representations. In the remainder of this article, we will take up some loose ends and
   speculate about possible advantages and disadvantages that may be connected with the format. </para><para>In comparison to alternative proposals like <emphasis>XCONCUR</emphasis> (<xref linkend="hilbert2005"/>,<xref linkend="schonefeld2006"/>) and the <emphasis>NITE XML</emphasis>
   format <xref linkend="carletta2003"/>, the following advantages and disadvantages can be stated.
   Like NITE XML, but unlike XCONCUR documents, the TEI-based feature structure format is an XML
   format, which should count as a definitive plus in most contexts. Furthermore, like XCONCUR, but
   unlike the NITE XML representations, the proposed TEI feature structures are integrative in a
   strict sense of the word. What we mean is that all the distributed annotation information is made
   available within the context of a single document instance in which the implicit links have been
   made explicit. So, with regard to these two aspects, TEI feature structures seem to do quite well
   in comparison with the mentioned alternative formats, which lack in the one or the other way.
   However, there is also a big downside connected to them. The TEI feature structure
   representations grow very fast with the size of the input documents and their relative markup
   complexity, much faster than both rival formats.<footnote><para>XCONCUR seems to be leanest in this respect.</para></footnote> So serious doubts remain, whether this format can prevail in practical
   day-to-day-work if it is used for collections of large resource documents.</para><para>But are there any striking advantages that may be connected with the representation of XML
   documents in a feature structure format? Feature structures are a common data structure in
   linguistic theory and they play an important role in many implementations in computational
   linguistics. If the preferred representation format of computational linguists can be used, it
   may be possible to find a way to apply the processing tools that have been developed in that
   field and bridge the gap between the information given by annotations and the information
   contained in textual content. One may also speculate whether general operations on feature
   structures like <emphasis>unification</emphasis> and <emphasis>generalization</emphasis>, compare
   the section <xref linkend="s2.1"/> above, may be applicable to appropriately represented XML
   documents or linguistic corpora.</para><!--<figure xml:id="f15" pgwide="1" xreflabel="Figure 15">--><figure xml:id="f15" xreflabel="Figure 15"><title>Attribute Value Matrix Notation of the Annotation Example 1</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-006.jpg"/></imageobject></mediaobject></figure><para>Compare <xref linkend="f15"/> and <xref linkend="f16"/>. These are possible TEI feature
   structures for the simple linguistic annotation examples that we have used before. Unlike <xref linkend="f11"/>, which is an integrative representation of both example documents, each figure
   here displays the representation of just one annotation document. These examples will help us to
   explore some of the issues involved.</para><!--<figure xml:id="f16" pgwide="1" xreflabel="Figure 16">--><figure xml:id="f16" xreflabel="Figure 16"><title>Attribute Value Matrix Notation of the Annotation Example 2</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-007.jpg"/></imageobject></mediaobject></figure><para> As before, we have to consider two broad scenarios: operations among multiply annotated
   documents and operations among generic XML documents. The main difference between both has to do
   with the values of the <code>DATA</code> feature.<footnote><para>For the sake of the argument, we will presume that there will be a <code>DATA</code>
     feature on the top-level of all TEI feature structures. The stylesheet does not force this,
     though.</para></footnote> For multiple annotation, the values of <code>DATA</code> will be identical and the
   respective features can, hence, be unified. However, for generic XML documents the
    <code>DATA</code> values will almost always be different. Hence, they usually won't unify . And
   even multiply annotated documents will run into problems when it comes to the value of the
    <code>DOCUMENTS</code> feature slot, compare <xref linkend="f15"/> and <xref linkend="f16"/>. So
   the bare unification of complete representations does not seem to work out for either class of
   documents.</para><!--<figure xml:id="f17" pgwide="1" xreflabel="Figure 17">--><figure xml:id="f17" xreflabel="Figure 17"><title>Rule that uses Unification for Multiple Annotation Data</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-008.jpg"/></imageobject></mediaobject></figure><para>However, there is a way how unification may be put to use with regard to respective
   representations, but in a somewhat different way. It works analogously to the way in which
   unification is put to use in <emphasis>linguistic rules</emphasis> in unification-based grammars.
   We do not unify the whole representations, but only parts of it in accordance to a rule, which
   directs how to build a bigger structure from smaller structures (or vice versa, this is a
   question of procedural interpretation). Structures that are coindexed within a rule have to be
   unified when the rule is applied. In line with this, e.g., our annotation examples (on the right
   hand side of the rule) can be projected to a bigger structure (on the left hand side of the
   rule) as displayed in <xref linkend="f17"/>. For generic XML documents, a rule like <xref linkend="f18"/> might work.</para><!--<figure xml:id="f18" pgwide="1" xreflabel="Figure 18">--><figure xml:id="f18" xreflabel="Figure 18"><title>Rule that uses Unification for Generic XML Documents</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-009.jpg"/></imageobject></mediaobject></figure><para>For another perspective on the unification of XML documents compare <xref linkend="witt2005"/>.</para><para>There is also a second general operation on feature structures: generalization. Unlike
   unification, generalization cannot fail. And indeed, generalization can be put to use concerning
   our examples here. The result indicates what is common to both representations and is shown in
    <xref linkend="f19"/>.</para><!--<figure xml:id="f19" pgwide="1" xreflabel="Figure 19">--><figure xml:id="f19" xreflabel="Figure 19"><title>Generalization of the Annotation Data Examples 1 and 2</title><mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Stegmann01/Stegmann01-010.jpg"/></imageobject></mediaobject></figure><para> One of the anonymous reviewers of this paper stated that (s)he thinks that its strength is
   "as a sort of thought experiment that has not provided quite the breakthrough that was hoped for
   it; yet interesting things have been learned and observed." This is not too far off from our own
   perspective. Although we were able to show that this and that can be done, at least in
   principle---as things stand, we do not think that it is likely that TEI feature structures will
   turn out to be the silver bullet for the representation of linguistic annotations or generic XML
   documents. Our respective representations grow too fast and isn't yet clear, whether good and
   sensible use can be made of the general operations on feature structures open to us now, i.e.,
   whether the potential advantages can override the disadvantages connected to it. But it seems
   that there are at least some open questions that remain to be investigated. For example, perhaps
   we could come up with a different way of representing XML documents in terms of TEI feature
   structures as compared to our current representation practice and see if that helps in any way.
   Going with typed feature structures might be a worthwhile thing to try. However, we think that
   the prospects are not too good, since the foundational issue of complex modeling and
   retranslating to XML would basically stay the same and it seems that this is quite an overhead to
   cope with. Therefore, finally, we will at least mention a different direction that has been
   encouraged by the very same reviewer mentioned above. (S)he advised to step away from the
   TEI-ness of the present approach in order to investigate the prospects of bare feature
   structures, e.g., in the sense of an implemented library, with respect to the issues at
   hand.</para></section><appendix xml:id="a1" xreflabel="Appendix A"><title>Appendix: Result Document for the Annotation Data Examples</title><programlisting xml:space="preserve">
            &lt;?xml version="1.0" encoding="UTF-8"?&gt;
            &lt;fs&gt;
             &lt;f name="DATA"&gt;
              &lt;vColl org="list"&gt;
               &lt;vLabel name="1"&gt;
                &lt;string&gt;g&lt;/string&gt;
               &lt;/vLabel&gt;
               &lt;vLabel name="2"&gt;
                &lt;string&gt;e&lt;/string&gt;
               &lt;/vLabel&gt;
               &lt;vLabel name="3"&gt;
                &lt;string&gt;b&lt;/string&gt;
               &lt;/vLabel&gt;
               &lt;vLabel name="4"&gt;
                &lt;string&gt;e&lt;/string&gt;
               &lt;/vLabel&gt;
               &lt;vLabel name="5"&gt;
                &lt;string&gt;n&lt;/string&gt;
               &lt;/vLabel&gt;
              &lt;/vColl&gt;
             &lt;/f&gt;
             &lt;f name="DOCUMENTS"&gt;
              &lt;vColl org="list"&gt;
               &lt;fs&gt;
                &lt;f name="TYPE"&gt;
                 &lt;symbol value="document"/&gt;
                &lt;/f&gt;
                &lt;f name="CHILDREN"&gt;
                 &lt;vColl org="list"&gt;
                  &lt;fs&gt;
                   &lt;f name="TYPE"&gt;
                    &lt;symbol value="element"/&gt;
                   &lt;/f&gt;
                   &lt;f name="NAME"&gt;
                    &lt;string&gt;w&lt;/string&gt;
                   &lt;/f&gt;
                   &lt;f name="ATTRIBUTES"&gt;
                    &lt;vColl org="set"/&gt;
                   &lt;/f&gt;
                   &lt;f name="CHILDREN"&gt;
                    &lt;vColl org="list"&gt;
                     &lt;fs&gt;
                      &lt;f name="TYPE"&gt;
                       &lt;symbol value="element"/&gt;
                      &lt;/f&gt;
                      &lt;f name="NAME"&gt;
                       &lt;string&gt;m&lt;/string&gt;
                      &lt;/f&gt;
                      &lt;f name="ATTRIBUTES"&gt;
                       &lt;vColl org="set"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="attribute"/&gt;
                         &lt;/f&gt;
                         &lt;f name="NAME"&gt;
                          &lt;string&gt;type&lt;/string&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;string&gt;lexical&lt;/string&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                      &lt;f name="CHILDREN"&gt;
                       &lt;vColl org="list"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="text"/&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;vColl org="list"&gt;
                           &lt;vLabel name="1"/&gt;
                           &lt;vLabel name="2"/&gt;
                           &lt;vLabel name="3"/&gt;
                          &lt;/vColl&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                     &lt;/fs&gt;
                     &lt;fs&gt;
                      &lt;f name="TYPE"&gt;
                       &lt;symbol value="element"/&gt;
                      &lt;/f&gt;
                      &lt;f name="NAME"&gt;
                       &lt;string&gt;m&lt;/string&gt;
                      &lt;/f&gt;
                      &lt;f name="ATTRIBUTES"&gt;
                       &lt;vColl org="set"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="attribute"/&gt;
                         &lt;/f&gt;
                         &lt;f name="NAME"&gt;
                          &lt;string&gt;type&lt;/string&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;string&gt;flexive&lt;/string&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                      &lt;f name="CHILDREN"&gt;
                       &lt;vColl org="list"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="text"/&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;vColl org="list"&gt;
                           &lt;vLabel name="4"/&gt;
                           &lt;vLabel name="5"/&gt;
                          &lt;/vColl&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                     &lt;/fs&gt;
                    &lt;/vColl&gt;
                   &lt;/f&gt;
                  &lt;/fs&gt;
                 &lt;/vColl&gt;
                &lt;/f&gt;
               &lt;/fs&gt;
               &lt;fs&gt;
                &lt;f name="TYPE"&gt;
                 &lt;symbol value="document"/&gt;
                &lt;/f&gt;
                &lt;f name="CHILDREN"&gt;
                 &lt;vColl org="list"&gt;
                  &lt;fs&gt;
                   &lt;f name="TYPE"&gt;
                    &lt;symbol value="element"/&gt;
                   &lt;/f&gt;
                   &lt;f name="NAME"&gt;
                    &lt;string&gt;w&lt;/string&gt;
                   &lt;/f&gt;
                   &lt;f name="ATTRIBUTES"&gt;
                    &lt;vColl org="set"/&gt;
                   &lt;/f&gt;
                   &lt;f name="CHILDREN"&gt;
                    &lt;vColl org="list"&gt;
                     &lt;fs&gt;
                      &lt;f name="TYPE"&gt;
                       &lt;symbol value="element"/&gt;
                      &lt;/f&gt;
                      &lt;f name="NAME"&gt;
                       &lt;string&gt;syll&lt;/string&gt;
                      &lt;/f&gt;
                      &lt;f name="ATTRIBUTES"&gt;
                       &lt;vColl org="set"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="attribute"/&gt;
                         &lt;/f&gt;
                         &lt;f name="NAME"&gt;
                          &lt;string&gt;n&lt;/string&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;string&gt;s1&lt;/string&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                      &lt;f name="CHILDREN"&gt;
                       &lt;vColl org="list"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="text"/&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;vColl org="list"&gt;
                           &lt;vLabel name="1"/&gt;
                           &lt;vLabel name="2"/&gt;
                          &lt;/vColl&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                     &lt;/fs&gt;
                     &lt;fs&gt;
                      &lt;f name="TYPE"&gt;
                       &lt;symbol value="element"/&gt;
                      &lt;/f&gt;
                      &lt;f name="NAME"&gt;
                       &lt;string&gt;syll&lt;/string&gt;
                      &lt;/f&gt;
                      &lt;f name="ATTRIBUTES"&gt;
                       &lt;vColl org="set"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="attribute"/&gt;
                         &lt;/f&gt;
                         &lt;f name="NAME"&gt;
                          &lt;string&gt;n&lt;/string&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;string&gt;s2&lt;/string&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                      &lt;f name="CHILDREN"&gt;
                       &lt;vColl org="list"&gt;
                        &lt;fs&gt;
                         &lt;f name="TYPE"&gt;
                          &lt;symbol value="text"/&gt;
                         &lt;/f&gt;
                         &lt;f name="VALUE"&gt;
                          &lt;vColl org="list"&gt;
                           &lt;vLabel name="3"/&gt;
                           &lt;vLabel name="4"/&gt;
                           &lt;vLabel name="5"/&gt;
                          &lt;/vColl&gt;
                         &lt;/f&gt;
                        &lt;/fs&gt;
                       &lt;/vColl&gt;
                      &lt;/f&gt;
                     &lt;/fs&gt;
                    &lt;/vColl&gt;
                   &lt;/f&gt;
                  &lt;/fs&gt;
                 &lt;/vColl&gt;
                &lt;/f&gt;
               &lt;/fs&gt;
              &lt;/vColl&gt;
             &lt;/f&gt;
            &lt;/fs&gt;
        </programlisting></appendix><bibliography><title>Bibliography</title><bibliomixed xml:id="p5" xreflabel="(Burnard and Bauman, 2007)">Burnard, L. and Bauman, S.
    <emphasis>TEI P5: Guidelines for Electronic Text Encoding and Interchange.</emphasis> Text
   Encoding Initiative, 2007</bibliomixed><bibliomixed xml:id="carletta2003" xreflabel="(Carletta et al., 2003)">Carletta, J.; Kilgour, J.;
   O'Donnell, T.; Evert, S. and Voormann, H. <emphasis>The NITE Object Model Library for Handling
    Structured Linguistic Annotation on Multimodal Data Sets.</emphasis> In: Proceedings of the EACL
   Workshop on Language Technology and the Semantic Web (3rd Workshop on NLP and XML, NLPXML-2003),
   2003</bibliomixed><bibliomixed xml:id="carletta2007" xreflabel="(Carletta et al.,2007)">Carletta, J.; DeRose, S.;
   Durusau, P.; Piez, W.; Sperberg-McQueen, C. M.; Tennison, J. and Witt, A. <emphasis>International
    Workshop on Markup of Overlapping Structures.</emphasis> In: Usdin, B. T. (ed.) Proceedings of
   Extreme Markup Languages 2007, 2007</bibliomixed><bibliomixed xml:id="carpenter1992" xreflabel="(Carpenter, 1992)">Carpenter, B. <emphasis>The
    Logic of Typed Feature Structures: With Applications to Unification Grammars, Logic Programs and
    Constraint Resolution.</emphasis> Cambridge University Press, 1992</bibliomixed><bibliomixed xml:id="derose2004" xreflabel="(DeRose, 2004)">DeRose, S. <emphasis>Markup Overlap: A
    Review and a Horse.</emphasis> In: Usdin, B. T. (ed.) Proceedings of Extreme Markup Languages
   2004, 2004</bibliomixed><bibliomixed xml:id="diestel2005" xreflabel="(Diestel, 2005)">Diestel, R. <emphasis>Graph
    Theory</emphasis>. Springer, 2005</bibliomixed><bibliomixed xml:id="hilbert2005" xreflabel="(Hilbert et al.,2005)">Hilbert, M.; Schonefeld, O.
   and Witt, A. <emphasis>Making CONCUR work.</emphasis> In: Usdin, B. T. (ed.) Proceedings of
   Extreme Markup Languages 2005, 2005</bibliomixed><bibliomixed xml:id="iso24610" xreflabel="(ISO24610, 2006)">24610-1:2006, I. <emphasis>Language
    Resource Management -- Feature Structures -- Part 1: Feature Structure
    Representation.</emphasis>International Organization for Standardization, 2006</bibliomixed><bibliomixed xml:id="kay2008" xreflabel="(Kay, 2008)">Kay, M. <emphasis>XSLT 2.0 and XPath 2.0
    Programmer's Reference.</emphasis> Wrox Press Ltd., 2008</bibliomixed><bibliomixed xml:id="nlm" xreflabel="(NLM,2008)"><emphasis>Custom Metadata Group</emphasis>. In:
   Journal Archiving and Interchange Tag Set Tag Library version 3.0, Version of November
   2008.</bibliomixed><bibliomixed xml:id="pollard1994" xreflabel="(Pollard and Sag, 1994)">Pollard, C. and Sag, I.
    <emphasis>Head-Driven Phrase Structure Grammar.</emphasis> The University of Chicago Press,
   1994</bibliomixed><bibliomixed xml:id="sailer2001" xreflabel="(Sailer and Richter, 2001)">Sailer, M. and Richter, F.
    <emphasis>Eine XML-Kodierung für AVM-Beschreibungen.</emphasis> In: Lobin, H. (ed.). Sprach- und
   Texttechnologie in digitalen Medien: Proceedings der GLDV-Frühjahrstagung 2001. BOD - Books on
   Demand, 2001, 161-168</bibliomixed><bibliomixed xml:id="schonefeld2006" xreflabel="(Schonefeld and Witt, 2006)">Schonefeld, O. and
   Witt, A. <emphasis>Towards validation of concurrent markup.</emphasis> In: Usdin, B. T. (ed.).
   Proceedings of Extreme Markup Languages 2006, 2006</bibliomixed><bibliomixed xml:id="shieber1986" xreflabel="(Shieber, 1986)">Shieber, S. M. <emphasis>An
    Introduction to Unification-based Approaches to Grammar.</emphasis> CSLI Publications,
   1986</bibliomixed><bibliomixed xml:id="p3" xreflabel="(Sperberg-McQueen and Burnard, 1994)">Sperberg-McQueen, C. M.
   and Burnard, L. <emphasis>TEI Guidelines for Electronic Text Encoding and Interchange (TEI
    P3).</emphasis> Text Encoding Initiative, 1994</bibliomixed><bibliomixed xml:id="p4" xreflabel="(Sperberg-McQueen and Burnard, 2001)">Sperberg-McQueen, C. M.
   and Burnard, L. <emphasis>Guidelines for Electronic Text Encoding and Interchange (TEI
    P4).</emphasis> Text Encoding Initiative, 2001</bibliomixed><bibliomixed xml:id="sperberg-mcqueen2007" xreflabel="(Sperberg-McQueen, 2007)">Sperberg-McQueen,
   C. M. <emphasis>Representation of overlapping structures.</emphasis> In: Usdin, B. T. (ed.)
   Extreme Markup Languages 2007, 2007</bibliomixed><bibliomixed xml:id="tennison2005" xreflabel="(Tennison, 2005)">Tennison, J. <emphasis>Beginning
    XSLT 2.0: From Novice to Professional.</emphasis> Apress, 2005</bibliomixed><bibliomixed xml:id="witt2004" xreflabel="(Witt, 2004)">Witt, A. <emphasis>Multiple Hierarchies:
    New Aspects of an Old Solution.</emphasis> In: Usdin, B. T. (ed.) Proceedings of Extreme Markup
   Languages 2004, 2004 </bibliomixed><bibliomixed xml:id="witt2005" xreflabel="(Witt et al., 2005)">Witt, A.; Goecke, D.; Sasaki, F.
   and Lüngen, H. <emphasis>Unification of XML Documents with Concurrent Markup.</emphasis> Literary
   and Linguistic Computing, 2005, 20, 103-116, doi: <biblioid class="doi">10.1093/llc/fqh046</biblioid></bibliomixed><bibliomixed xml:id="witt2007" xreflabel="(Witt et al., 2007)">Witt, A.; Schonefeld, O.; Rehm, G.;
   Khoo, J. and Evang, K. <emphasis>On the Lossless Transformation of Single-File Multi-Layer
    Annotations into Multi-Rooted Trees.</emphasis> In: Usdin, B. T. (ed.). Proceedings of Extreme
   Markup Languages 2007, 2007 </bibliomixed><bibliomixed xml:id="witt2009" xreflabel="(Witt et al., 2009)">Witt, A.; Rehm, G.; Hinrichs, E.;
   Lehmberg, T. and Stegmann, J. <emphasis>SusTEInability of Linguistic Resources through Feature
    Structures.</emphasis> Literary and Linguistic Computing, 2009,  24, 363-372, doi: <biblioid class="doi">10.1093/llc/fqp024</biblioid></bibliomixed><bibliomixed xml:id="woerner2006" xreflabel="(Wörner et al., 2006)">Wörner, K.; Witt, A.; Rehm, G.
   and Dipper, S. <emphasis>Modelling Linguistic Data Structures.</emphasis> In: Usdin, B. T. (ed.).
   Proceedings of Extreme Markup Languages 2006, 2006</bibliomixed></bibliography></article>
