<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2"><title>A toolkit for multi-dimensional markup</title><subtitle>The development of SGF to XStandoff</subtitle><info><confgroup><conftitle>Balisage: The Markup Conference 2009</conftitle><confdates>August 11 - 14, 2009</confdates></confgroup><abstract><para>In this paper we describe the extended standoff approach defined by XStandoff (the
        successor of the Sekimo Generic Format, SGF), together with the accompanied collection of
        XSLT stylesheets. SGF has undergone further developments after its first presentation (cf.
          <xref linkend="Stührenberg2008"/>) which resulted into the new development version called
        XStandoff containing different changes addressed in this paper. In addition, refinements
        have been made to the already available transformation scripts that help generating SGF and
        XStandoff instances and newly developed stylesheets have been added for the deletion of
        single XStandoff annotations and the conversion into inline representations. </para></abstract><author><personname><firstname>Maik</firstname><surname>Stührenberg</surname></personname><personblurb><para>Maik Stührenberg studied Computational Linguistics at Bielefeld University. He worked
          four years as research assistant at Giessen University in different text-technological
          projects together with Henning Lobin and Georg Rehm. He now works as a research assistant
          at Bielefeld University together with Andreas Witt, Dieter Metzing and Daniela Goecke in
          the <emphasis role="ital">Sekimo</emphasis> project of the Research Group <emphasis role="ital">Text-technological modelling of information</emphasis> funded by the German
          Research Foundation. His main research interests include specifications for structuring
          multiple annotated data and query languages and query processing. </para></personblurb></author><author><personname><firstname>Daniel</firstname><surname>Jettka</surname></personname><personblurb><para>Daniel Jettka works on his Master degree in linguistics after acquiring a BA in text
          technology. During his studies he worked together with Andreas Witt, Dieter Metzing,
          Daniela Goecke and Maik Stührenberg in the <emphasis role="ital">Sekimo</emphasis> project
          of the Research Group 437 <emphasis role="ital">Text-technological modelling of
            information</emphasis> funded by the German Research Foundation on different XSLT
          stylesheets for the handling and transformation of overlapping markup.</para></personblurb></author><legalnotice><para>Copyright © 2009 by the authors.  Used with
        permission.</para></legalnotice><keywordset role="author"><keyword>Concurrent Markup</keyword><keyword>Overlapping Markup</keyword><keyword>SGF</keyword><keyword>XStandoff</keyword></keywordset></info><note><para>The work presented in this paper is part of the project A2 (<emphasis role="ital">Sekimo</emphasis>) of the Research Group 437 <emphasis role="ital">Text-technological
        modelling of information</emphasis> funded by the German Research Foundation<footnote><para>More information about the project can be obtained at <link xlink:href="http://www.text-technology.de/Sekimo" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.text-technology.de/Sekimo</link>. </para></footnote>.</para></note><section xml:id="sec.introduction"><title>Introduction</title><para>Multi-dimensionally annotated linguistic corpora have been established as a means for
      thorough linguistic analysis during the last years. An overview of architectures for complex
      or concurrent markup including non-XML based approaches such as the <emphasis role="ital">Layered Markup and Annotation Language</emphasis> (LMNL, cf. <xref linkend="Tennison2002"/>, <xref linkend="Cowan2006"/>) in conjunction with Trojan milestones following the HORSE
        (<emphasis role="ital">Hierarchy-Obfuscating Really Spiffy Encoding</emphasis>) or CLIX
      model can be found in <xref linkend="DeRose2004"/>. <xref linkend="Sperberg-McQueen2007"/> and
        <xref linkend="Marinelli2008"/>, too, discuss and compare state of the art approaches in
      overlapping markup such as colored XML (cf. <xref linkend="Jagadish2004"/>) and the tabling
      approach described by <xref linkend="Durusau2004"/>, further approaches can be found in <xref linkend="Stührenberg2008"/>. However, since standardization efforts with respect to a
      sustainable (i.e. preferable XML-based) annotation format and mechanism (e.g. the <emphasis role="ital">Graph-based Format for Linguistic Annotations</emphasis>, GrAF, cf. <xref linkend="Ide2007"/>) have not yet been finished, other XML-based solutions are available,
      such as using multiple or <emphasis role="ital">twin documents</emphasis> (as <xref linkend="Marinelli2008"/> call them if they share some annotation, the so-called <emphasis role="ital">sacred markup</emphasis>). The <emphasis role="ital">Text Encoding
        Initiative</emphasis> (TEI, <xref linkend="Burnard2008"/>) proposes additional solutions for
      dealing with multi-dimensional markup: apart from standoff markup, (cf. <xref linkend="Thompson1997"/> and TEI's chapters 16.9 and 20.4), milestone elements (chapter
      20.2) or fragmentations and joints (chapter 20.3) can be used. <xref linkend="Witt2009"/>
      describe a system that adopts TEI's feature structures (chapter 18) as a meta-format for
      representing heterogenous complex markup.</para><para>While most of the before mentioned approaches target at the representation of
      multi-dimensional annotation, their usage in validating and analyzing multiple annotation
      layers is restricted: non-XML-based formats usually lack mechanisms for validating overlapping
      markup, since most proposed document grammar formalisms remain in a proposal state only, such
      as Rabbit/Duck grammars for GODDAG (general ordered-descendant directed acyclic graph)
      structures/TexMECS (cf. <xref linkend="Sperberg-McQueen2006"/>), XCONCUR-CL (cf. <xref linkend="Schonefeld2007"/>) or Creole (<emphasis role="ital">Composable Regular Expressions
        for Overlapping Languages etc.</emphasis>, cf. <xref linkend="Tennison2007"/>) a powerful
      extension to RELAX NG (<xref linkend="RELAX2003"/>) developed in the LMNL community. The same
      holds for the support for XML's companion specifications such as XPath, XSLT or XQuery
      (although at least alternative query languages have been proposed by <xref linkend="Jagadish2004"/>, <xref linkend="Iacob2005"/>, <xref linkend="Iacob2005a"/>, <xref linkend="Alink2006"/>, <xref linkend="Alink2006a"/> and especially <xref linkend="Bird2006"/> for linguistic analysis). So if validating complex markup is an issue, it is easier to
      stick with XML-based approaches, such as NITE (cf. <xref linkend="Carletta2003"/>, <xref linkend="Carletta2005"/>), PAULA (<emphasis role="ital">Potsdam Austauschformat für
        Linguistische Annotationen</emphasis>, <emphasis role="ital">Potsdam Interchange Format for
        Linguistic Annotation</emphasis>, cf. <xref linkend="Dipper2005"/>, <xref linkend="Dipper2007"/>) or the <emphasis role="ital">Sekimo Generic Format</emphasis> (SGF,
      cf. <xref linkend="Stührenberg2008"/> and the following section).</para></section><section xml:id="sec.sgf"><title>SGF – the story so far</title><para>The development of the Sekimo Generic Format (SGF) has begun in 2006 and has been grounded
      on the Prolog fact base format that was introduced by <xref linkend="Witt2002"/> and <xref linkend="Witt2004"/> after first proposals by <xref linkend="Sperberg-McQueen2000"/> and
        <xref linkend="Sperberg-McQueen2002"/>. SGF was developed for storing multiple annotated
      linguistic corpus data and examining relationships between elements derived from different
      annotation layers in an XML conformant way. Speaking in technical terms, SGF follows the
      Annotation Graph's formal model (cf. <xref linkend="Bird1999"/>, <xref linkend="Bird2001"/>)
      and modifies the classic standoff approach in the way that multiple annotation layers are
      stored together in a single instance while XML namespaces are used to differentiate between
      SGF's base layer, metadata and annotation layers. This allows for the application of SGF for a
      large variety of linguistic annotations, including diachronic corpora and multimodal
      annotation, amongst others.</para><para>The basic principle of the Sekimo Generic Format (and its successor, XStandoff, cf. <xref linkend="sec.xstandoff"/>) is the use of positions in the character stream (or in time) for
      referring to annotations<footnote><para>Other segmentation units such as bytes or frames are possible as well but not
          supported by the XStandoff toolkit.</para></footnote>.</para><figure xml:id="numbering"><title>Addressing character positions</title><programlisting xml:space="preserve">
  T  h  e     s  u  n     s  h  i  n  e  s     b  r  i  g  h  t  e  r  .
00|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24</programlisting></figure><para> This character stream is used to link between annotations and the primary data. A classic
      example of concurring annotations where overlaps may occur is the combination of morpheme and
      syllable annotation. We start with the morpheme annotation shown in <xref linkend="lst.morph"/>. </para><figure xml:id="lst.morph"><title>Morpheme annotation for a simple sentence</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;morphemes xmlns="http://www.xstandoff.net/morphemes"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.text-technology.de/sekimo/morphemes ../xsd/morphemes.xsd"&gt;
  &lt;m&gt;The&lt;/m&gt;
  &lt;m&gt;sun&lt;/m&gt;
  &lt;m&gt;shine&lt;/m&gt;
  &lt;m&gt;s&lt;/m&gt;
  &lt;m&gt;bright&lt;/m&gt;
  &lt;m&gt;er&lt;/m&gt;.
&lt;/morphemes&gt;</programlisting></figure><para>The graphic representation of this annotation layer can be seen in <xref linkend="fig.layer_morph"/>.</para><figure xml:id="fig.layer_morph"><title>Graphic representation of the morpheme annotation layer</title><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Stuhrenberg01/Stuhrenberg01-001.png" width="90%"/></imageobject></mediaobject></figure><para>We then add a second layer containing syllable annotation, similar to the one shown in
        <xref linkend="lst.syll"/>. </para><figure xml:id="lst.syll"><title>Syllable annotation layer for a simple sentence</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;syllables xmlns="http://www.xstandoff.net/syllables"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.text-technology.de/sekimo/syllables ../xsd/syllables.xsd"&gt;
  &lt;s&gt;The&lt;/s&gt;
  &lt;s&gt;sun&lt;/s&gt;
  &lt;s&gt;shines&lt;/s&gt;
  &lt;s&gt;brigh&lt;/s&gt;
  &lt;s&gt;ter&lt;/s&gt;.
&lt;/syllables&gt;</programlisting></figure><para>The graphic representation of this annotation can be seen in <xref linkend="fig.layer_syll"/>.</para><figure xml:id="fig.layer_syll"><title>Graphic representation of the syllable annotation layer</title><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Stuhrenberg01/Stuhrenberg01-002.png" width="90%"/></imageobject></mediaobject></figure><para>When one tries to combine both annotation levels an overlap occurs at the position of the
      't' in the word 'brighter', which can easily be observed in <xref linkend="fig.layer_both"/>.</para><figure xml:id="fig.layer_both"><title>Combined graphic representation of both annotation layers (labels removed for
        readability reasons)</title><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Stuhrenberg01/Stuhrenberg01-003.png" width="90%"/></imageobject></mediaobject></figure><para>Classic XML-based inline annotation formats fail to model these overlapping structures due
      to their formal model of a single-rooted tree while classic standoff annotation (i.e. using
      markup separated from the primary data and storing different annotation layers in separate
      files) lacks mechanisms for analyzing relations between several annotation layers. The Sekimo
      Generic Format was developed for storing multiple annotated linguistic corpus data and
      examining relationships between elements derived from different annotation layers <emphasis>in
        a single file</emphasis> and therefore tries to use the benefits of standoff markup without
      taking the before-mentioned problems into account.</para><para>A second design goal during the development of SGF was the possibility to reuse the
      structure and features of existing annotation formats. SGF consists only of a base layer which
      serves as meta-markup language or as a container for standoff representations of the original
      inline annotations. In addition, the base layer supports storing of the primary data (i.e.,
      the data that is annotated), its segmentation, metadata regarding both the primary data and
      its annotation (either internally inside the SGF instance or via a reference to external
      metadata resources), and provides SGF's log functionality (i.e. the possibility to store an
      instance's edit history). Segmentation of the primary data is application driven, i.e. for
      textual primary data usually a character based segmentation is established by importing a
      single inline annotation and computing the start and end positions of each annotation element
      (cf. <xref linkend="sec.inline2XStandoff"/>). XML's inherent ID/IDREF mechnism is used for
      linking between annotation elements and the corresponding segments of the primary data which
      are defined by SGF's <code>segment</code> element. Overlapping segments may occur and a single
      character range (a segment) may be part of multiple annotation levels, reducing the overall
      amount of segments by eliminating duplicates. SGF is defined by a number of XML schema files
      containing embedded Schematron (<xref linkend="Schematron"/>) assertions for additional
      validation constraints and is available under the LGPL 3 license<footnote><para>All SGF XML Schema files can be found at <link xlink:href="http://www.xstandoff.net" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.xstandoff.net</link>.</para></footnote>.<xref linkend="fig.sgf_overview"/> shows a graphical overview of a prototypic SGF
      instance's structure. Note that both <code>corpus</code> and <code>corpusData</code> are valid
      root elements of an SGF instance allowing for storing single corpus entries or whole corpora
      in a single instance. Following <xref linkend="Goecke2009"/> we differentiate between the
      concept that serves as the background for the annotation (the level) and its XML serialization
      (the layer). Elements drawn with dashed borders are optional (not every possible occurence
      shown due to space restrictions), elements colored red do not belong to the SGF namespace and
      are shown for demonstration purposes only.</para><figure xml:id="fig.sgf_overview"><title>The structure of an SGF instance</title><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Stuhrenberg01/Stuhrenberg01-004.png" width="98%"/></imageobject></mediaobject></figure><para><xref linkend="lst.sgf_combined"/> shows the respective SGF instance containing both
      annotation layers.</para><figure xml:id="lst.sgf_combined"><title>The SGF instance containing both syllable and morpheme annotation</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;sgf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.text-technology.de/sekimo ../xsd/sgf.xsd" 
   xmlns="http://www.text-technology.de/sekimo"
   xmlns:sgf="http://www.text-technology.de/sekimo" 
   xml:id="c1" sgfVersion="1.0"&gt;
   &lt;sgf:primaryData start="0" end="24" xml:lang="en"&gt;
      &lt;textualContent&gt;The sun shines brighter.&lt;/textualContent&gt;
   &lt;/sgf:primaryData&gt;
   &lt;sgf:segments&gt;
      &lt;sgf:segment xml:id="seg1" type="char" start="0" end="24"/&gt;
      &lt;sgf:segment xml:id="seg2" type="char" start="0" end="3"/&gt;
      &lt;sgf:segment xml:id="seg3" type="char" start="4" end="7"/&gt;
      &lt;sgf:segment xml:id="seg4" type="char" start="8" end="14"/&gt;
      &lt;sgf:segment xml:id="seg5" type="char" start="8" end="13"/&gt;
      &lt;sgf:segment xml:id="seg6" type="char" start="13" end="14"/&gt;
      &lt;sgf:segment xml:id="seg7" type="char" start="15" end="21"/&gt;
      &lt;sgf:segment xml:id="seg8" type="char" start="15" end="20"/&gt;
      &lt;sgf:segment xml:id="seg9" type="char" start="20" end="23"/&gt;
      &lt;sgf:segment xml:id="seg10" type="char" start="21" end="23"/&gt;
   &lt;/sgf:segments&gt;
   &lt;sgf:annotation xml:id="a_morph"&gt;
      &lt;sgf:level xml:id="l_morph" priority="1"&gt;
         &lt;sgf:layer xmlns:morph="http://www.text-technology.de/sekimo/morphemes"
            xsi:schemaLocation="http://www.text-technology.de/sekimo/morphemes ../xsd/morphemes.xsd"&gt;
            &lt;morph:morphemes sgf:segment="seg1"&gt;
               &lt;morph:m sgf:segment="seg2"/&gt;
               &lt;morph:m sgf:segment="seg3"/&gt;
               &lt;morph:m sgf:segment="seg5"/&gt;
               &lt;morph:m sgf:segment="seg6"/&gt;
               &lt;morph:m sgf:segment="seg7"/&gt;
               &lt;morph:m sgf:segment="seg10"/&gt;
            &lt;/morph:morphemes&gt;
         &lt;/sgf:layer&gt;
      &lt;/sgf:level&gt;
   &lt;/sgf:annotation&gt;
   &lt;sgf:annotation xml:id="a_syll"&gt;
      &lt;sgf:level xml:id="l_syll" priority="0"&gt;
         &lt;sgf:layer xmlns:syll="http://www.text-technology.de/sekimo/syllables"
            xsi:schemaLocation="http://www.text-technology.de/sekimo/syllables ../xsd/syllables.xsd"&gt;
            &lt;syll:syllables sgf:segment="seg1"&gt;
               &lt;syll:s sgf:segment="seg2"/&gt;
               &lt;syll:s sgf:segment="seg3"/&gt;
               &lt;syll:s sgf:segment="seg4"/&gt;
               &lt;syll:s sgf:segment="seg8"/&gt;
               &lt;syll:s sgf:segment="seg9"/&gt;
            &lt;/syll:syllables&gt;
         &lt;/sgf:layer&gt;
      &lt;/sgf:level&gt;
   &lt;/sgf:annotation&gt;
&lt;/sgf:corpusData&gt;</programlisting></figure><para>SGF's main benefit with respect to other non-XML-based developments such as TexMECS (cf.
        <xref linkend="Huitfeldt2001"/>) or the above-mentioned approaches is the usage of standard,
      non-extended XML together with its accompanying technologies, such as XPath, XSLT and XQuery.
      We believe that this may be an argument when it comes to the sustainability aspect of larger
      corpora. In our project, we have developed a corpus consisting of 14 texts (both german
      scientific and newspaper articles, 3,084 sentences, 56,203 tokens, 11,740 discourse entities,
      4,323 anaphoric relations in total), densely annotated on four different levels (logical
      document structure, syntactic annotation, discourse entities and anaphoric relations). A
      partner project adopted SGF as export format for lexical chaining (SGF-LC, cf. <xref linkend="Waltinger2008"/>). In addition, it is used as import and export format of the web
      based annotation tool <emphasis role="ital">Serengeti</emphasis> (cf. <xref linkend="Stührenberg2007"/>) and was chosen as one of the possible pivot formats for the
        <emphasis role="ital">Anaphoric Bank</emphasis> (cf. <link xlink:href="http://www.anaphoricbank.org" xlink:title="AnaphoricBank Homepage" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.anaphoricbank.org</link> and <xref linkend="Poesio2009"/>). Cf.
        <xref linkend="Stührenberg2008"/> and <xref linkend="Witt2009a"/> for a detailed discussion
      of SGF and its use in analyzing the before-mentioned linguistic corpus.</para></section><section xml:id="sec.xstandoff"><title>The development of SGF to XStandoff</title><para>The version of SGF discussed in <xref linkend="Stührenberg2008"/> is considered as stable
      version 1.0. However, work has begun on a forthcoming version addressing some minor issues
      – this developer version is called XStandoff (XStandoff version 1.1 internally,
      although the final version number may change during the ongoing development process). These
      newer developments which are described in this section are accompanied by the creation of
      different tools which allow the broader use of XStandoff for the analysis of multi-dimensional
      markup (discussed in <xref linkend="sec.sgf_toolkit"/>).</para><para>The changes made to the SGF's base layer can be divided into structural changes that
      affect the compatibility between SGF 1.0 and XStandoff 1.1 instances (some of which break the
      compatibility – for this reason a new name and namespace was chosen) and changes
      made in the underlying XML schemas that define the XStandoff meta format. XStandoff 1.1
      supports external metadata via the newly introduced <code>metaRef</code> element which can be
      used as child element of the <code>corpus</code>, <code>corpusData</code>,
        <code>resource</code>, <code>annotation</code>, <code>level</code>, <code>layer</code> and
        <code>log</code> elements as an alternative to the already established <code>meta</code>
      element. Analogical, SGF's <code>location</code> element that has been used for providing a
      reference to a file containing the primary data was dropped in favor of the new
        <code>primaryDataRef</code> element. Both, <code>metaRef</code> and
        <code>primaryDataRef</code> share the same globally defined <code>RefType</code> type
      (together with the <code>corpusDataRef</code> and <code>resourceRef</code> elements) which
      stores the attributes <code>uri</code>, <code>encoding</code> and <code>mime-type</code>,
      improving the readability of the underlying XStandoff schema. The <code>type</code> attribute
      which was located at the <code>corpusData</code> element in SGF and was used to differentiate
      between textual and multimodal corpus entries was removed, since <code>corpusData</code>
      elements containing multiple <code>primaryDataRef</code> children depicting primary data files
      of different mime-types (e.g. video, audio and text files) should not have a singular type.
      The <code>priority</code> attribute was moved from the <code>level</code> element to the
        <code>layer</code> element. The attribute has been used to prioritize annotation layers
      (i.e. the XML serialization) in case of overlapping markup, therefore the re-arrangement
      should clarify its application.</para><para>Other elements have been renamed: <code>segments</code> is now called
        <code>segmentation</code> and the <code>annotation</code> element may only appear once at
      most underneath a <code>corpusData</code> entry. SGF allowed several <code>annotation</code>
      elements but since XStandoff supports multiple levels and layers, the <code>annotation</code>
      element serves only as a wrapper similar to the <code>segmentation</code> element.</para><para>The XML schema for XStandoff's log functionality is now incorporated into XStandoff's base
      schema and can be considered as an integrated component of XStandoff's core functionality
      (although it is still application-driven). <xref linkend="fig.xsf_overview"/> shows the
      resulting new structure. Again, keep in mind that the elements colored red do not belong to
      the XStandoff meta language.</para><figure xml:id="fig.xsf_overview"><title>The structure of an XStandoff instance</title><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Stuhrenberg01/Stuhrenberg01-005.png" width="98%"/></imageobject></mediaobject></figure><para>In general, valid SGF instances can be updated with little or no effort at all (depending
      on the use of former optional attributes and elements) to fully comply to XStandoff.</para><section xml:id="sec.containment"><title>Disjoints and continuous segments and containment and dominance</title><para>Annotating discontinuous or disjoint such as non-contiguous multi word expressions or
        separable verbs structures (e.g. in German) is a challenging task in XML (cf. <xref linkend="Pianta2004"/>). Although it would be possible to use separate layers in XStandoff
        to describe these structures as a workaround, disjoints and continuous segments are
        supported natively. As a concrete example, take the well-known beginning of 'Alice in
        Wonderland' (cf. <xref linkend="Sperberg-McQueen2008"/> for a general discussion of the
        problems raised by discontinuity).</para><blockquote><para>Alice was beginning to get very tired of sitting by her sister on the bank and of
          having nothing to do: once or twice she had peeped into the book her sister was reading,
          but it had no pictures or conversations in it, "and what is the use of a book," thought
          Alice, "without pictures or conversations?"</para></blockquote><para>The XStandoff instance of this example is shown in <xref linkend="lst.alice.sgf"/>.
        Since both, <code>p</code> and <code>q</code> element belong to the logical document
        structure level, only one <code>level</code> element is included. The quotation is annotated
        as a <code>q</code> element constructed by the disjoint <code>segment</code> with the
        identifier <emphasis role="ital">seg4</emphasis> which itself uses the segments <emphasis role="ital">seg2</emphasis> and <emphasis role="ital">seg3</emphasis>.</para><figure xml:id="lst.alice.sgf"><title>Disjoint segments in XStandoff</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;xsf:corpusData xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
 xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1 ../xsd/xsf.xsd"
 xsfVersion="1.1" xml:id="alice"&gt;
  &lt;xsf:primaryData start="0" end="302"&gt;
    &lt;textualContent&gt;Alice was beginning to get very tired of sitting by her
     sister on the bank and of having nothing to do: once or twice she had
     peeped into the book her sister was reading, but it had no pictures or
     conversations in it, "and what is the use of a book," thought Alice,
     "without pictures or conversations?"&lt;/textualContent&gt;
  &lt;/xsf:primaryData&gt;
  &lt;xsf:segmentation&gt;
    &lt;xsf:segment xml:id="seg1" type="char" start="0" end="302"/&gt;
    &lt;xsf:segment xml:id="seg2" type="char" start="218" end="250"/&gt;
    &lt;xsf:segment xml:id="seg3" type="char" start="266" end="302"/&gt;
    &lt;xsf:segment xml:id="seg4" type="seg" segments="seg2 seg3" mode="disjoint"/&gt;
  &lt;/xsf:segmentation&gt;
  &lt;xsf:annotation&gt;
    &lt;xsf:level xml:id="alice-log"&gt;
      &lt;xsf:layer xmlns:log="http://www.xstandoff.net/alice/log"
       xsi:schemaLocation="http://www.xstandoff.net/alice/log ../xsd/alice-log.xsd"
       priority="0"&gt;
        &lt;log:text xsf:segment="seg1"&gt;
          &lt;log:p xsf:segment="seg1"&gt;
            &lt;log:q xsf:segment="seg4"/&gt;
          &lt;/log:p&gt;
        &lt;/log:text&gt;
      &lt;/xsf:layer&gt;
    &lt;/xsf:level&gt;
  &lt;/xsf:annotation&gt;
&lt;/xsf:corpusData&gt;</programlisting></figure><para>Due to the differentiation between the annotation concept (the level) and its XML
        representation (the layer), it is possible to sum up different XML serializations underneath
        the very same level element. XStandoff's <code>layer</code> element serves as a wrapper for
        the slightly converted representation of the former inline annotation – the only
        changes that are made concern the deletion of text nodes and the addition of the
          <code>segment</code> attribute that links to the corresponding <code>segment</code>
        element. The conversion applies to both the annotation instance and its underlying XML
        schema description (into which XStandoff's base layer is imported to add the
          <code>segment</code> attribute to all element nodes as optional attribute) – the
        latter can be used both for validating the original inline annotation and the converted
        representation as part of the XStandoff instance. In contrast to other approaches the
        hierarchical structure of and the attributes included in the imported annotation layers
        remain unchanged.</para><para> In <xref linkend="Sperberg-McQueen2008"/> the authors make an argument for
        distinguishing more broadly between dominance and containment. The former should be regarded
        as <citation linkend="Sperberg-McQueen2008">the transitive closure of the parent/child
          relation</citation> while the latter is regarded as <citation linkend="Sperberg-McQueen2008">superset/subset relation on the leaf nodes reachable from a
          node by following parent-child arcs</citation>. To clarify this distinction between both
        concepts different XStandoff representations are possible as well (cf. <xref linkend="lst.alice.sgf.alt"/> and <xref linkend="lst.alice.sgf.alt2"/>). In these
        alternative representations, dominance is encoded as hierarchical relationship between the
        two nodes <code>p</code> and <code>q</code> while containment is encoded via the
        corresponding segments' start and end positions.</para><figure xml:id="lst.alice.sgf.alt"><title>Disjoint segments in XStandoff – Excerpt of an alternative representation
          in which <code>p</code> dominates the <code>q</code> fragments</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;!-- ... --&gt;
&lt;xsf:layer xmlns:log="http://www.xstandoff.net/alice/log"
 xsi:schemaLocation="http://www.xstandoff.net/alice/log ../xsd/alice-log.xsd"
 priority="0"&gt;
  &lt;log:text xsf:segment="seg1"&gt;
    &lt;log:p xsf:segment="seg1"&gt;
      &lt;log:q xsf:segment="seg2"/&gt;
      &lt;log:q xsf:segment="seg3"/&gt;
    &lt;/log:p&gt;
    &lt;log:q xsf:segment="seg4"/&gt;
  &lt;/log:text&gt;
&lt;/xsf:layer&gt;
&lt;!-- ... --&gt;</programlisting></figure><figure xml:id="lst.alice.sgf.alt2"><title>Disjoint segments in XStandoff – Excerpt of an alternative representation
          in which <code>p</code> does not dominate <code>q</code></title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;!-- ... --&gt;
&lt;xsf:layer xmlns:log="http://www.xstandoff.net/alice/log"
 xsi:schemaLocation="http://www.xstandoff.net/alice/log ../xsd/alice-log.xsd"
 priority="0"&gt;
  &lt;log:text xsf:segment="seg1"&gt;
    &lt;log:p xsf:segment="seg1"/&gt;
    &lt;log:q xsf:segment="seg4"/&gt;
  &lt;/log:text&gt;
&lt;/xsf:layer&gt;
&lt;!-- ... --&gt;</programlisting></figure><para>Other possible XStandoff instances include the separation of the <code>q</code> element
        from the <code>p</code> layer at all (i.e. using separate <code>layer</code> elements which
        is permitted by XStandoff), however, as already stated above, when dealing with disjoints
        units, the inherent methods should be preferred.</para></section><section xml:id="sec.inline-xstandoff"><title>Inline XStandoff</title><para>In contrast to SGF, XStandoff supports a newly introduced inline representation (cf.
        Section <xref linkend="sec.XStandoff2inline"/>), containing the virtual root element
          <code>inline</code> and the generic <code>milestone</code> element (modelled according to
        the respective TEI element, cf. <xref linkend="Burnard2008"/>). During the Sekimo project
        Schiller's 'Die Bürgschaft' was annotated on the following annotation levels: words,
        morphemes, syllables, verse, prose and phrase<footnote><para>All single annotation instances are available at <link xlink:href="http://coli.lili.uni-bielefeld.de/Texttechnologie/Forschergruppe/Phase1/sekimo/internet-praesentation/buergschaft.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://coli.lili.uni-bielefeld.de/Texttechnologie/Forschergruppe/Phase1/sekimo/internet-praesentation/buergschaft.html</link>.</para></footnote>. We've converted these six levels into a single XStandoff instance<footnote><para>The converted XStandoff instance is available at <link xlink:href="http://www.xstandoff.net/examples/buergschaft.html" xlink:title="Example XStandoff instance" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.xstandoff.net/examples/buergschaft.html</link>.</para></footnote>. The inline representation shown in <xref linkend="lst.buergschaft"/> was
        generated afterwards (cf. <xref linkend="sec.demo"/>).</para><figure xml:id="lst.buergschaft"><title>Part of the inline XStandoff instance of Schiller's 'Die Bürgschaft' annotated on six
          levels.</title><!--<programlisting linenumbering="numbered" startinglinenumber="1" xml:space="preserve">--><programlisting xml:space="preserve">&lt;xsf:inline xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
            xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xmlns:silbe="http://www.xstandoff.net/buergschaft/silbe"
            xmlns:wort="http://www.xstandoff.net/buergschaft/wort"
            xmlns:vers="http://www.xstandoff.net/buergschaft/vers"
            xmlns:morpheme="http://www.xstandoff.net/buergschaft/morpheme"
            xmlns:prosa="http://www.xstandoff.net/buergschaft/prosa"
            xmlns:phrase="http://www.xstandoff.net/buergschaft/phrase"&gt;
   &lt;silbe:text xsf:segment="seg1"&gt;
      &lt;silbe:body xsf:segment="seg1"&gt;
         &lt;wort:text xsf:segment="seg1"&gt;
            &lt;wort:body xsf:segment="seg1"&gt;
               &lt;vers:text a="b" xsf:segment="seg1"&gt;
                  &lt;vers:body xsf:segment="seg1"&gt;
                     &lt;morpheme:text xsf:segment="seg1"&gt;
                        &lt;morpheme:body xsf:segment="seg1"&gt;
                           &lt;prosa:text xsf:segment="seg1"&gt;
                              &lt;prosa:body xsf:segment="seg1"&gt;
                                 &lt;phrase:text xsf:segment="seg1"&gt;
                                    &lt;phrase:body xsf:segment="seg1"&gt;
                                       &lt;silbe:head xsf:segment="seg2"&gt;
                                          &lt;wort:head xsf:segment="seg2"&gt;
                                             &lt;vers:head xsf:segment="seg2"&gt;
                                                &lt;morpheme:head xsf:segment="seg2"&gt;
                                                   &lt;prosa:head xsf:segment="seg2"&gt;
                                                      &lt;phrase:head xsf:segment="seg2"&gt;Die Bürgschaft&lt;/phrase:head&gt;
                                                   &lt;/prosa:head&gt;
                                                &lt;/morpheme:head&gt;
                                             &lt;/vers:head&gt;
                                             &lt;!-- ... --&gt;
&lt;/xsf:inline&gt;</programlisting></figure><para>Since each single annotation layer contains a <code>text</code> root element, a
          <code>body</code> and a <code>head</code> element, these elements occur six times in the
        resulting XStandoff instance. It should be noted, that in this case the inline XStandoff
        instance is more than two-thirds bigger in size than the 'classic' (i.e. standoff) XStandoff
        instance (763.9 KB vs. 452.5 KB) and is far from being easily readable.</para></section><section xml:id="sec.xsf.all"><title>Introducing the 'all' namespace</title><para>For both 'classic' XStandoff and especially for inline XStandoff instances, a new
        namespace – <code>http://www.xstandoff.net/2009/all</code> – was
        introduced. Elements belonging to this namespace do not only share the same range of
        characters but the generic identifier and are present in all annotation layers (e.g. the
        above mentioned <code>text</code> root element). <xref linkend="lst.buergschaft_all"/> shows
        the same part of the inline XStandoff instance using the newly introduced namespace. Note
        that the <code>a</code> attribute of the <code>text</code> element of the verse layer
        remains intact. Cf. <xref linkend="lst.inline_xsf"/> in <xref linkend="sec.demo"/> for an
        extended excerpt.</para><figure xml:id="lst.buergschaft_all"><title>Excerpt of the inline XStandoff instance of Schiller's 'Die Bürgschaft' using the
            <emphasis role="ital">all</emphasis> namespace.</title><!--<programlisting linenumbering="numbered" startinglinenumber="1" xml:space="preserve">--><programlisting xml:space="preserve">&lt;xsf:inline xmlns:sgf="http://www.text-technology.de/sekimo"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xmlns="http://www.xstandoff.net/2009/xstandoff/1.1"
            xmlns:all="http://www.xstandoff.net/2009/all"
            xmlns:silbe="http://www.xstandoff.net/buergschaft/silbe"
            xmlns:wort="http://www.xstandoff.net/buergschaft/wort"
            xmlns:vers="http://www.xstandoff.net/buergschaft/vers"
            xmlns:morpheme="http://www.xstandoff.net/buergschaft/morphem"
            xmlns:prosa="http://www.xstandoff.net/buergschaft/prosa"
            xmlns:phrase="http://www.xstandoff.net/buergschaft/phrase"&gt;
   &lt;all:text xsf:segment="seg1" vers:a="b"&gt;
      &lt;all:body xsf:segment="seg1"&gt;
         &lt;all:head xsf:segment="seg2"&gt;Die Bürgschaft&lt;/all:head&gt;
            &lt;!-- ... --&gt;
&lt;/xsf:inline&gt;</programlisting></figure><para>This mechanism allows for a better readability of a multi-dimensional annotated inline
        XStandoff instance. However, one should be aware that the <emphasis role="ital">all</emphasis> namespace should only be used when the respective elements not only bear
        the same generic identifier and character range but also the same semantic value.</para><para>Note that the inline XStandoff instance lacks the support for validation of
        multi-dimensional annotations that is present in 'classic' XStandoff instances and should be
        therefore considered for demonstration purposes only.</para></section></section><section xml:id="sec.sgf_toolkit"><title>The XStandoff toolkit</title><para> In the context of XStandoff's development several XSLT 2.0 stylesheets (cf. <xref linkend="Kay2007"/>, <xref linkend="Kay2008"/>) have been created which allow for the
      convenient generation and editing of XStandoff files<footnote><para>The stylesheets are available at <link xlink:href="http://www.xstandoff.net/files/xsf_stylesheets.zip" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.xstandoff.net/files/xsf_stylesheets.zip</link>. An additional Java
          implemention with a restricted functionality can be obtained as well.</para></footnote>. Since there are no product specific extensions, transformations should be able to
      be executed by every XSLT processor supporting XSLT 2.0. The transformations during the test
      phase of the stylesheets have been performed by the Saxon XSLT processor which is available
      both as Open Source version Saxon-B and as optimized, schema-aware, commercial version Saxon-SA<footnote><para>Download the Open Source version at <link xlink:href="http://saxon.sourceforge.net" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://saxon.sourceforge.net</link>, for the
          commercial Saxon-SA visit <link xlink:href="http://www.saxonica.com" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.saxonica.com</link>.</para></footnote>.</para><para> Currently the XStandoff toolkit consists of four stylesheets which perform basic
      processing of XStandoff instances. They are responsible for the conversion of a single inline
      annotation to XStandoff, the merging of XStandoff annotation layers corresponding to the same
      primary data, the removing of single layers from an XStandoff file and, the conversion of an
      XStandoff instance to an XStandoff inline annotation (cf. <xref linkend="sec.xstandoff"/>).
      These tasks might seem simple, but as you will see it would hardly be possible to perform them
      manually. </para><section xml:id="sec.inline2XStandoff"><title>Building XStandoff instances: <emphasis role="ital">inline2XF.xsl</emphasis></title><para> At the moment the easiest way of building XStandoff files is to transform an inline
        annotation into XStandoff by the stylesheet <emphasis role="ital">inline2XSF.xsl</emphasis>.
        This excludes the coverage of overlapping markup at first sight because of the inline
        annotation not including such structures. But we will show how to include more than one
        inline annotation into a single XStandoff file by merging several XStandoff annotations (cf.
          <xref linkend="sec.mergeXStandoff"/>). By this means overlapping structures can be
        covered.</para><para>The transformation into XStandoff requires an input XML file ideally containing elements
        bound by XML namespaces. Every single namespace evokes the output of a layer in XStandoff
        which contains the elements of the namespace. Thereby the default (or empty) namespace is
        treated like the named ones. Thus inline annotations without explicit namespace declarations
        can also be processed (in this case a namespace will be generated). </para><para>The process of converting an inline annotation to XStandoff is divided into two steps:
        Firstly, segments are built on the basis of the occurring elements. There are two possible
        ways of mapping the element boundaries to the textual content in the form of character
        positions. The preferred way of reaching such a mapping is the use of a primary data file
        which contains the bare text without annotations. The name of this file can be provided
        during the transformation call by specifying the stylesheet parameter <emphasis role="ital">primary-data</emphasis>. Providing the location of a primary data file leads to a
        comparison of the content of the primary data file and the textual content of the input file
        of the transformation. This guarantees primary data identity. If no primary data location is
        provided, the textual content of the input file is used to build up the primary data. This
        has certain disadvantages such as the lack of line breaks since these cannot easily be
        inferred by the textual content of an XML file. Furthermore, the automatic conversion of the
        textual content of the input file to be used as primary data relies on heuristics which have
        to detect white space characters. Because of the complexity of this task it is possible to
        get undesirable results.</para><para>After the building of segments, the second step of the transformation is to return
        layers on the basis of namespaces. Thus for every namespace the corresponding elements are
        released from the initial inline annotation and copied into the layer maintaining the
        embedding relations. Meanwhile the elements in the XStandoff layer get connected to the
        according segments by ID/IDREF binding. In this manner one segment can serve as a reference
        for elements from different layers.</para><para>There are several additional optional parameters which can control certain aspects of
        the transformation from inline to XStandoff. The most important of them will be briefly
        outlined below.</para><itemizedlist><listitem><para><emphasis role="ital">virtual-root</emphasis> (data type: xs:string; default value:
            '')</para><para>Specifies a unique element (if no error occurs) which serves as the starting point
            of the conversion of the input file. By default the whole file will be converted
            starting at the document root. </para></listitem><listitem><para><emphasis role="ital">meta-root</emphasis> (data type: xs:string; default value:
            'header')</para><para>This parameter determines the location of metadata in the input XML document (if
            any). By this means it can be copied into the XStandoff instance. Analogous to the
            parameter virtual-root an element name is expected as a value of meta-root.</para></listitem><listitem><para><emphasis role="ital">local-xsd</emphasis> (data type: xs:boolean; default value:
            0)</para><para>Determines whether the corresponding XStandoff XML Schema files are stored locally
            or shall be taken from the WWW (i.e. from
              <code>http://www.xstandoff.net/2009/xstandoff/1.1</code>). By default the global XSDs
            are used.</para></listitem><listitem><para><emphasis role="ital">include-ws-segments</emphasis> (data type: xs:boolean; default
            value: 0)</para><para>XStandoff uses segments for referencing units of the primary data annotated in one
            or more annotation layers. Additional segments for non-character data (i.e. white space
            characters, such as blanks, line breaks, etc.) can be computed and returned by the
            stylesheet as well if the parameter include-ws-segments is set to '1'.</para></listitem><listitem><para><emphasis role="ital">all-layer</emphasis> (data type: xs:boolean; default value:
            0)</para><para>The output of an 'all-layer' (cf. <xref linkend="sec.inline-xstandoff"/>) containing
            elements present in all annotation layers depends on the specification of this
            parameter. By default its value is set to '0' which avoids the transfer of the
            respective elements into the 'all-layer' (cf. <xref linkend="sec.xsf.all"/>). </para></listitem></itemizedlist></section><section xml:id="sec.mergeXStandoff"><title>Merging XStandoff instances: <emphasis role="ital">mergeXSF.xsl</emphasis></title><para>Due to the frequent use of the ID/IDREF mechanism in XStandoff for establishing
        connections between <code>segment</code> elements (i.e. the limits of the respective text
        span in the primary data's character stream) and the corresponding annotation, manually
        merging XStandoff files seems quite unpromising. The XSLT stylesheet <emphasis role="ital">
          mergeXSF.xsl</emphasis> transforms two XStandoff instances into a single one containing
        the annotation levels (or layers) from both input files. The first XStandoff file is
        provided as the input file of the transformation, the second file's name has to be included
        via the stylesheet parameter <emphasis role="ital">merge-with</emphasis>. </para><para> The main problem is to adapt the segments from the involved XStandoff files to each
        other. On the one hand there are segments in the different files spanning over the same
        string of the primary data, but having distinct IDs. In this case the two segments have to
        be replaced by one. On the other hand there will be segments with the same ID, but spanning
        over different character positions. These have to get new unique IDs. The merging of the
        XStandoff files in general leads to a complete reorganization of the segment list making it
        necessary to update the segment references of the elements in the XStandoff layers. After
        fulfilling this duty, the XStandoff layers are included in the new XStandoff file. </para><para> The reorganization of the segment list can be disabled by configuring the stylesheet
        parameter <emphasis role="ital">keep-segments</emphasis>. Specifying the value '1' causes
        the perpetuation of the segments of the input XStandoff file. However, the segments of the
        file provided by the <emphasis role="ital">merge-with</emphasis> parameter are always
        subject to a reorganization.</para><para>In addition, the stylesheet handles the optional output of the 'all-layer'. Setting the
        value of the <emphasis role="ital">all-layer</emphasis> parameter to '1' evokes the
        inclusion of this special layer containing the elements that are present in every single
        annotation layer. It is irrelevant if there was an 'all-layer' present in the input files or
        not. Though it might happen that no such layer is returned, namely if there are no elements
        in the several layers which share the required features. </para><para> Currently the stylesheet only supports the merging of two single XStandoff files.
        Naturally this allows for a successive merging of more than two files. However it would be
        more straightforward to have the possibility of merging more than two XStandoff files during
        a single transformation. A future version of the stylesheet supporting multi-file-merge is
        in preparation. </para></section><section><title>Deleting parts of XStandoff instances: <emphasis role="ital">removeXSFcontent.xsl</emphasis></title><para>For deletion of parts of an XStandoff instance the stylesheet <emphasis role="ital">removeXSFcontent.xsl</emphasis> can be applied. During the transformation call one has to
        supply the ID of the element to be deleted (either <code>level</code> or <code>layer</code>)
        via the value of the <emphasis role="ital">remove-ID</emphasis> parameter. The matching
        element is removed from the XStandoff file and the list of segments is updated, i.e. the
        segments which solely were referenced by descendant elements of the mentioned element are
        excluded. This, admittedly, again leads to a reorganzation of the segments since these by
        default get a continuous numbering. </para><para>Similar to the usage of the <emphasis role="ital">mergeXSF.xsl</emphasis> stylesheet the
        reorganization of the segments can be disabled by the stylesheet parameter <emphasis role="ital">keep-segments</emphasis>. The value '1' lets the stylesheet keep the old
        segments, but without those only referenced by descendant elements of the deleted
        element.</para><para>Furthermore, the removed XStandoff content can be returned in a separate file.
        Specifying the value '1' for the stylesheet parameter <emphasis role="ital">return-removed</emphasis> will encourage the stylesheet to return the removed content
        into a new XStandoff instance. Accordingly, the new file will contain the segments
        referenced by elements of the removed content and the removed content itself. </para></section><section xml:id="sec.XStandoff2inline"><title>Building inline XStandoff annotations: <emphasis role="ital">XSF2inline.xsl</emphasis></title><para>In addition to the <emphasis role="ital">inline2XSF.xsl</emphasis> stylesheet there is a
        counterpart. <emphasis role="ital">XSF2inline.xsl</emphasis> creates an inline annotation on
        the basis of an XStandoff instance. The approach covers the handling of overlapping markup
        insofar as these structures are represented by milestone elements in the resulting inline
        annotation (cf. <xref linkend="lst.buergschaft"/> in <xref linkend="sec.xstandoff"/>).
        Concerning this matter, the first task of the stylesheet is to detect segments whose start
        and end position information constitute an overlap. These segments are split up into
        segments representing milestones so that they can be used as an adequate basis to build an
        inline annotation. The linear list of segments is processed recursively by taking the
        currently outermost segments (those who are not included in other segments' spans which
        respectively are determined by their start and end positions in the character range of the
        primary data). The elements of the XStandoff layers which are referenced by the outermost
        segments are copied into the inline annotation. The segments which are embedded in the
        outermost segments are processed recursively. </para><para>However, copying the elements from the XStandoff layers has to be controlled by a
        mechanism regarding the possibility of elements from different layers referencing the same
        segment. These elements share the same positions for start and end tags and therefore a
        decision has to be made in which order they should be nested into one another. The optional
        stylesheet parameter <emphasis role="ital">sort-by</emphasis> refers to this circumstance.
        Its default value is 'measure' which means that a statistical analysis is performed by the
        stylesheet that diagnoses the embedding relations of all occuring element types by
        frequency. An expected result would be that elements representing sentence boundaries are
        embedded into those for paragraphs because this embedding relation is more frequent than
        vice versa. The only case this approach could be inadequate for are
        elements of different layers for which no definite statistical result for embedding can be
        achieved. For instance, there could be elements whose boundaries always share the same
        character positions, but which can clearly semantically be assigned to a certain embedding.
        This case cannot be covered by the statistical method. </para><para> In addition, the embedding can be based on the <code>priority</code> attribute of the
          <code>level</code> (in SGF version 1.0) or the <code>layer</code> (in XStandoff version
        1.1, cf. <xref linkend="sec.xstandoff"/>) element. This strategy can be accessed by
        specifying the value 'priority' for the parameter <emphasis role="ital">sort-by</emphasis>.
        Low values of the <code>priority</code> attribute in the XStandoff annotation are nested
        deeper in the inline annotation than higher ones. By this means the user can specify the
        embedding manually, but one has to be sure to set the values of the attribute correctly to
        get the desired result. This method underlies the assumption that there can be a
        semantically grounded, definite decision for the embedding. The most promising concept was a
        mixture of the both approaches which has to be realized in future work. </para><para>There is an additional optional stylesheet parameter <emphasis role="ital">return-segID</emphasis> which can be very helpful to retain a connection between the
        XStandoff file and the resulting inline annotation. By default the value of this parameter
        is set to '1' which means that the <code>segment</code> attribute is retained throughout the
        conversion process. The parameter has been added mainly for control issues. </para></section><section xml:id="sec.demo"><title>Demonstration: Creating an inline XStandoff annotation</title><para> In this section the conversion of six original annotations into one inline XStandoff
        annotation will be outlined. This will demonstrate the function and execution of the several
        XSLT stylesheets which are involved in this process. An extract of the final result of the
        conversion process can be seen in <xref linkend="sec.inline-xstandoff"/> where the inline
        XStandoff annotation was introduced. </para><para> There are three major steps in order to create one inline XStandoff annotation out of
        the six input annotations: </para><orderedlist><listitem><para>Applying the stylesheet <emphasis role="ital">inline2XSF.xsl</emphasis> to each input annotation</para></listitem><listitem><para>Successively merging the resulting standoff XStandoff files into one with
            <emphasis role="ital">mergeXSF.xsl</emphasis></para></listitem><listitem><para>Creating an inline XStandoff instance via the stylesheet <emphasis role="ital">XSF2inline.xsl</emphasis></para></listitem></orderedlist><section><title>Applying inline2XSF.xsl</title><para> The six original annotation files are given by different annotations of Schiller's
          'Die Bürgschaft' which cover several linguistic fields: words, morphemes, syllables,
          verse, prose and phrase.<footnote><para>As already stated in <xref linkend="sec.inline-xstandoff"/> the single-layered
              annotation instances are available at <link xlink:href="http://coli.lili.uni-bielefeld.de/Texttechnologie/Forschergruppe/Phase1/sekimo/internet-praesentation/buergschaft.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://coli.lili.uni-bielefeld.de/Texttechnologie/Forschergruppe/Phase1/sekimo/internet-praesentation/buergschaft.html</link>.</para></footnote>
        </para><para>
          <xref linkend="lst.wort_anno"/> shows an extract of one of the original annotations. In
          case of missing namespaces these are created automatically by the stylesheet. For the sake
          of readability the namespace prefix 'wort' was added to the elements derived from this
          namespace.</para><figure xml:id="lst.wort_anno"><title>The original word level annotation of 'Die Bürgschaft'
            (<code>buergschaft-wort.xml</code>)</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;wort:text xmlns:wort="http://www.xstandoff.net/buergschaft/wort"&gt;
  &lt;wort:body&gt;
    &lt;wort:head&gt;Die Bürgschaft&lt;/wort:head&gt;
    &lt;wort:ws&gt; &lt;/wort:ws&gt;
    &lt;wort:bibl&gt;&lt;wort:author&gt;Friedrich Schiller&lt;/wort:author&gt;&lt;/wort:bibl&gt;
    &lt;wort:ws&gt; &lt;/wort:ws&gt;
    &lt;wort:p&gt;
      &lt;wort:s&gt;
        &lt;wort:w&gt;Zu&lt;/wort:w&gt;
        &lt;wort:w&gt;Dionys&lt;/wort:w&gt;
        &lt;wort:w&gt;dem&lt;/wort:w&gt;
        &lt;wort:w&gt;Tirannen&lt;/wort:w&gt;
        &lt;wort:w&gt;schlich&lt;/wort:w&gt;
        &lt;wort:w&gt;Damon&lt;/wort:w&gt;,
        &lt;!--...--&gt;
&lt;/wort:text&gt;</programlisting></figure><para>The corresponding primary data file <code>buergschaft-pd.txt</code> has the following plain text
          content:</para><figure xml:id="lst.buergschaft_pd"><title>Primary data of 'Die Bürgschaft' (<code>buergschaft-pd.txt</code>)</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">Die Bürgschaft Friedrich Schiller 
Zu Dionys dem Tirannen schlich Damon, den Dolch im Gewande, Ihn schlugen (...)</programlisting></figure><para>The result of the invoked transformation (<code>saxon buergschaft-wort.xml
            inline2XSF.xsl primary-data=buergschaft-pd.txt</code>) is an XStandoff instance
          containing the converted annotation including references to the respective spans in the
          primary data's character stream defined by the <code>segments</code> elements (cf. <xref linkend="lst.wort_xsf"/>). </para><figure xml:id="lst.wort_xsf"><title>The converted word level annotation of 'Die Bürgschaft'
            (<code>buergschaft-wort-xsf.xml</code>)</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
                xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1 ../xsd/xsf.xsd"
                xsfVersion="1.1"
                xml:id="wort-ns"&gt;
   &lt;xsf:primaryData start="0" end="5188"&gt;
      &lt;primaryDataRef xmlns="http://www.xstandoff.net/2009/xstandoff/1.1" uri="buergschaft-pd.txt"/&gt;
   &lt;/xsf:primaryData&gt;
   &lt;xsf:segmentation&gt;
      &lt;xsf:segment xml:id="seg1" type="char" start="0" end="5186"/&gt;
      &lt;xsf:segment xml:id="seg2" type="char" start="0" end="14"/&gt;
      &lt;xsf:segment xml:id="seg3" type="char" start="14" end="15"/&gt;
      &lt;xsf:segment xml:id="seg4" type="char" start="15" end="33"/&gt;
      &lt;xsf:segment xml:id="seg5" type="char" start="33" end="34"/&gt;
      &lt;!--...--&gt;
   &lt;/xsf:segmentation&gt;
   &lt;xsf:annotation&gt;
      &lt;xsf:level xml:id="wort-ns-level1"&gt;
         &lt;xsf:layer xmlns:wort="http://www.xstandoff.net/buergschaft/wort" priority="0"
                    xsi:schemaLocation="http://www.xstandoff.net/buergschaft/wort ../xsd/buergschaft_wort.xsd"&gt;
            &lt;wort:text xsf:segment="seg1"&gt;
               &lt;wort:body xsf:segment="seg1"&gt;
                  &lt;wort:head xsf:segment="seg2"/&gt;
                  &lt;wort:ws xsf:segment="seg3"/&gt;
                  &lt;wort:bibl xsf:segment="seg4"&gt;
                     &lt;wort:author xsf:segment="seg4"/&gt;
                  &lt;/wort:bibl&gt;
                  &lt;wort:ws xsf:segment="seg5"/&gt;
                  &lt;wort:p xsf:segment="seg6"&gt;
                     &lt;wort:s xsf:segment="seg7"&gt;
                        &lt;wort:w xsf:segment="seg8"/&gt;
                        &lt;wort:w xsf:segment="seg9"/&gt;
                        &lt;wort:w xsf:segment="seg10"/&gt;
                        &lt;!--...--&gt;
&lt;/xsf:corpusData&gt;</programlisting></figure><para>The other five input annotations are transformed analogously, serving as
          the basis for the next processing step: the application of the mergeXSF stylesheet.
        </para></section><section><title>Merging XStandoff instances</title><para>As mentioned above, several different XStandoff instances relying on the same primary
          data can be combined into a single XStandoff file. However, it is not yet possible to
          integrate the respective annotation levels and layers within a single transformation call.
          In order to get a single result instance, one has to use five calls in total. The first
          call merges two of the instances (e.g. <code>saxon -o &lt;RESULT&gt;.xml
            buergschaft-wort-xsf.xml mergeXSF.xsl merge-with=buergschaft-silbe-xsf.xml
            all-layer=1</code>). </para><para> The next four transformation calls merge the result
            (<code>&lt;RESULT&gt;.xml</code> from the previous call) with the single
          remaining XStandoff instances. Due to the stylesheet parameter <emphasis role="ital">all-layer</emphasis> which is set to <code>1</code> in the call, the result will
          contain a layer which stores those elements present in all layers (cf. <xref linkend="sec.mergeXStandoff"/>). <xref linkend="lst.all_xsf"/> shows an extract of the
          result of the five mergeXSF transformations. </para><figure xml:id="lst.all_xsf"><title>The merged XStandoff annotation of 'Die Bürgschaft'
            (<code>buergschaft-wort-prosa-silbe-vers-phrase-morphem-xsf.xml</code>)</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
                xsfVersion="1.1"
                xml:id="wort-ns-prosa-ns-silbe-ns-vers-ns-phrase-ns-morphem-ns"
                xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1 ../xsd/xsf.xsd"&gt;
   &lt;!-- ... --&gt;
   &lt;xsf:annotation&gt;
      &lt;xsf:level xml:id="all-level"&gt;
         &lt;xsf:layer xmlns:all="http://www.xstandoff.net/2009/all" priority="0"
                    xsi:schemaLocation="http://www.xstandoff.net/2009/all ../xsd/xsf.xsd"&gt;
            &lt;all:text xmlns:vers="http://www.xstandoff.net/buergschaft/vers"
                      xsf:segment="seg1"
                      vers:a="b"&gt;
               &lt;all:body xsf:segment="seg1"&gt;
                  &lt;all:head xsf:segment="seg2"/&gt;
                  &lt;!--...--&gt;
               &lt;/all:body&gt;
            &lt;/all:text&gt;
         &lt;/xsf:layer&gt;
      &lt;/xsf:level&gt;
      &lt;xsf:level xml:id="wort-ns-level1"&gt;
         &lt;xsf:layer xmlns:wort="http://www.xstandoff.net/buergschaft/wort" priority="0"
                    xsi:schemaLocation="http://www.xstandoff.net/buergschaft/wort ../xsd/buergschaft_wort.xsd"&gt;
            &lt;wort:p xsf:segment="seg6"&gt;
            &lt;!--...--&gt;
            &lt;/wort:p&gt;
         &lt;/xsf:layer&gt;
      &lt;/xsf:level&gt;
      &lt;xsf:level xml:id="prosa-ns-level1"&gt;
         &lt;xsf:layer xmlns:prosa="http://www.xstandoff.net/buergschaft/prosa" priority="0"
                    xsi:schemaLocation="http://www.xstandoff.net/buergschaft/prosa ../xsd/buergschaft_prosa.xsd"&gt;
            &lt;prosa:p xsf:segment="seg6"&gt;
            &lt;!--...--&gt;
&lt;/xsf:corpusData&gt;</programlisting></figure></section><section><title>Converting standoff to inline</title><para> The last step which has to be applied in order to get an inline XStandoff annotation
          created from the six original input annotations is the transformation done by the
          stylesheet <emphasis role="ital">XSF2inline.xsl</emphasis> (<code>saxon -o &lt;RESULT&gt;.xml
            buergschaft-wort-prosa-silbe-vers-phrase-morphem-xsf.xml XSF2inline.xsl</code>).
          Overlaps which may occur, are covered by insertion of the newly introduced
            <code>milestone</code> element. </para><figure xml:id="lst.inline_xsf"><title>The inline XStandoff annotation of 'Die Bürgschaft'
            (<code>buergschaft-wort-prosa-silbe-vers-phrase-morphem-inline.xml</code>)</title><!--<programlisting xml:space="preserve" linenumbering="numbered">--><programlisting xml:space="preserve">&lt;xsf:inline xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
   xmlns:silbe="http://www.xstandoff.net/buergschaft/silbe"
   xmlns:wort="http://www.xstandoff.net/buergschaft/wort"
   xmlns:vers="http://www.xstandoff.net/buergschaft/vers"
   xmlns:morphem="http://www.xstandoff.net/buergschaft/morphem"
   xmlns:prosa="http://www.xstandoff.net/buergschaft/prosa"
   xmlns:phrase="http://www.xstandoff.net/buergschaft/phrase" 
   xmlns:all="http://www.xstandoff.net/2009/all"&gt;
   &lt;all:text xsf:segment="seg1" vers:a="b"&gt;
      &lt;all:body xsf:segment="seg1"&gt;
         &lt;all:head xsf:segment="seg2"&gt;Die Bürgschaft&lt;/all:head&gt;
            &lt;all:bibl xsf:segment="seg3"&gt;
               &lt;all:author xsf:segment="seg3"&gt;Friedrich Schiller&lt;/all:author&gt;
            &lt;/all:bibl&gt;
            &lt;silbe:p xsf:segment="seg4"&gt;
               &lt;wort:p xsf:segment="seg4"&gt;
                  &lt;vers:lg xsf:segment="seg4"&gt;
                     &lt;morphem:p xsf:segment="seg4"&gt;
                        &lt;prosa:p xsf:segment="seg4"&gt;
                           &lt;phrase:p xsf:segment="seg4"&gt;
                              &lt;!--...--&gt;
                                &lt;xsf:milestone xsf:segment="seg16~1" xsf:unit="morphem:m" xsf:n="1" xsf:charpos="49"
                                               xsf:type="start"/&gt;
                                &lt;wort:w xsf:segment="seg15"&gt;
                                   &lt;silbe:syll xsf:segment="seg17"&gt;Ti&lt;/silbe:syll&gt;
                                   &lt;silbe:syll xsf:segment="seg18"&gt;ran&lt;/silbe:syll&gt;
                                   &lt;silbe:syll xsf:segment="seg19"&gt;n&lt;xsf:milestone xsf:segment="seg16~2" 
                                               xsf:unit="morphem:m" xsf:n="2" xsf:charpos="55" xsf:type="end"/&gt;
                                      &lt;morphem:m xsf:segment="seg20"&gt;en&lt;/morphem:m&gt;
                                   &lt;/silbe:syll&gt;
                                &lt;/wort:w&gt;
                              &lt;!--...--&gt;
&lt;/xsf:inline</programlisting></figure><para> Note that the above listing contains the representation of an overlap. Since morpheme
          and syllable level contain concurring annotations
            (<code>&lt;morphem:m&gt;Tiran&lt;silbe:syll&gt;n&lt;/morphem:m&gt;en&lt;/silbe:syll&gt;</code>)
          <code>milestone</code> elements have been added to mark the original boundaries of the <code>morphem:m</code>
          element.</para></section></section></section><section xml:id="sec.xquery"><title>Analyzing cross-level relations with XStandoff and XQuery</title><para>Since XStandoff is a meta language based on standard XML, a variety of XML processing
      tools may be used. In addition to the XStandoff toolkit an XQuery script is available as a
      first starting point for analyzing relations between elements derived from different
      annotation levels (and layers respectively). The query takes a valid XStandoff instance as
      input and demands to provide a target element on an included layer which is used as starting
      point for the finding of relations to other elements derived on different annotation layers.
      For this example we have chosen the English translation of Brothers Grimm's 'The three lazy
      ones' which was annotated with two different linguistic parsers: the freely available TreeTagger<footnote><para>Download available at <link xlink:href="http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/</link>.</para></footnote> and the POS tagger that is part of the eHumanities Desktop<footnote><para>The eHumanities Desktop can be found at <link xlink:href="http://hudesktop.hucompute.org/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://hudesktop.hucompute.org/</link>.</para></footnote> which can be used at no charge as an online resource (cf. <xref linkend="Gleim2009"/>). The resulting XStandoff instance can be downloaded at <link xlink:href="http://www.xstandoff.net/examples/grimm.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.xstandoff.net/examples/grimm.html</link>. The output of the query contains the
      target element and the corresponding elements on other layers, starting with elements that share
      the same segment (and share therefore the identical text span in the primary data). The
      relations identified by the XQuery are based on the work done by <xref linkend="Durusau2002"/>. An excerpt of the output can be seen in <xref linkend="lst.xquery_output"/>.</para><figure xml:id="lst.xquery_output"><title>Excerpt of the output of the XQuery applied to the 'The three lazy ones'</title><!--<programlisting linenumbering="numbered" startinglinenumber="1" xml:space="preserve">--><programlisting xml:space="preserve">&lt;results&gt;
  &lt;relations docID="grimm_the_three_lazy_ones"&gt;
    &lt;targetElement xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1" start="0" end="1"
      name="eHD:w" xml:id="w1_wo1" type="XY" lemma="A" xsf:segment="seg3" part="N"&gt;
      &lt;identical start="0" end="1" name="tree:token" word="a" pos="DT" xsf:segment="seg3"/&gt;
      &lt;startPointIdentical start="0" end="1091" name="eHD:text" xml:id="t0" xsf:segment="seg1"/&gt;
      &lt;startPointIdentical start="0" end="1091" name="eHD:body" xml:id="b0" xsf:segment="seg1"/&gt;
      &lt;startPointIdentical start="0" end="1091" name="eHD:div" xml:id="w1_di1" xsf:segment="seg1"
        part="N" org="uniform" sample="complete"/&gt;
      &lt;startPointIdentical start="0" end="1091" name="eHD:p" xml:id="w1_pa1" xsf:segment="seg1"/&gt;
      &lt;startPointIdentical start="0" end="127" name="eHD:s" xml:id="w1_se1" xsf:segment="seg2"
        part="N"/&gt;
      &lt;startPointIdentical start="0" end="1091" name="tree:corpus" xsf:segment="seg1"/&gt;
    &lt;/targetElement&gt;
    &lt;targetElement xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1" start="2" end="6"
      name="eHD:w" xml:id="w1_wo2" type="NN" lemma="king" xsf:segment="seg4" part="N"&gt;
      &lt;identical start="2" end="6" name="tree:token" word="king" pos="NN" xsf:segment="seg4"/&gt;
      &lt;inclusion start="0" end="1091" name="eHD:text" xml:id="t0" xsf:segment="seg1"/&gt;
      &lt;inclusion start="0" end="1091" name="eHD:body" xml:id="b0" xsf:segment="seg1"/&gt;
      &lt;inclusion start="0" end="1091" name="eHD:div" xml:id="w1_di1" xsf:segment="seg1" part="N"
        org="uniform" sample="complete"/&gt;
      &lt;inclusion start="0" end="1091" name="eHD:p" xml:id="w1_pa1" xsf:segment="seg1"/&gt;
      &lt;inclusion start="0" end="127" name="eHD:s" xml:id="w1_se1" xsf:segment="seg2" part="N"/&gt;
      &lt;inclusion start="0" end="1091" name="tree:corpus" xsf:segment="seg1"/&gt;
    &lt;/targetElement&gt;
    &lt;!-- ... --&gt;
  &lt;/relations&gt;
&lt;/results&gt;</programlisting></figure><para>In this example the <code>eHD:w</code> element (derived from eHumanities Desktop's
      tagger) is used as target element. An element that shares the same segment of the primary data
      is the <code>tree:token</code> element (marked as <code>identical</code> element) derived from
      the TreeTagger parse. Since the target element is the first element other elements share the
      same starting point, such as <code>eHD:text</code>, <code>eHD:body</code>,
        <code>eHD:div</code>, <code>eHD:p</code> and <code>eHD:s</code> (on the same layer) and
        <code>tree:corpus</code> (derived from the TreeTagger output). The second target element
      (the second <code>eHD:w</code> element) is again identical to the tree:token element, and is
      included (in the sense of containment, cf. <xref linkend="sec.containment"/>) by a variety of
      other elements.</para><para>In this scenario XStandoff can be used for different purposes: for comparing different
      linguistic resources' output in terms of both quality and quantity of the annotation, for
      refining and boosting overall annotation performance by combining several resources, and of
      course for analyzing virtual parenthoods between elements derived from different annotation
      layers (cf. <xref linkend="Stührenberg2008"/> for a real-world example).</para></section><section><title>Conclusion and Future Work</title><para>We have demonstrated both the further developments that have been made to the Sekimo
      Generic Format, resulting in XStandoff, and the respective XSLT stylesheets that can be used
      to generate and modify XStandoff instances. It is planned that XStandoff's current development
      version will be finished as a stable version in time with XML Schema 1.1 which is in the
      Working Draft status at the time of writing (<xref linkend="XMLSchema2009"/>) and which would
      allow us to dismiss the embedded Schematron assertions. Regarding the XSLT part of the
      XStandoff toolkit, future enhancements are supposable in the support of the
      containment/dominance feature and since the creation of XStandoff instances is a multi-step
      process (cf. <xref linkend="sec.demo"/>), providing an XProc pipeline (cf. <xref linkend="XProc2009"/>) is another future option. Further perspectives include the building
      of a corpus of multi-dimensional annotated (and possibly overlapping) markup for further
      developments regarding both XStandoff's specification and the respective XSLT stylesheets (a
      starting point has been made at <link xlink:href="http://www.xstandoff.net/examples/index.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.xstandoff.net/examples/index.html</link>). This corpus should be gathered
      together with other interested parties, e.g. the Markup Languages for Complex Documents (MLCD)
      project, the XCONCUR developers and the LMNL community. Alternative XQuery realizations of the
      discussed XSLT stylesheets should give clues about further optimizations regarding the
      processing times (cf. <xref linkend="Stührenberg2008"/> for a small test example). In
      addition, it would be promising to share our developments with the framework for format
      translations described by <xref linkend="Marinelli2008"/>.</para><para>Although XStandoff's formal model is considered as multi-rooted tree – at least
      if one restricts oneself to not using discontinous segments – additional future work
      has to be made regarding its expressiveness compared to other approaches, including GODDAG
      structures (cf. <xref linkend="Sperberg-McQueen2004"/>, <xref linkend="Sperberg-McQueen2008a"/>, <xref linkend="Marcoux2008"/> and <xref linkend="Marcoux2008a"/>) and LMNL. This holds
      especially for an examination of the containment and dominance relations described in <xref linkend="Sperberg-McQueen2008"/> but also to other formal aspects of concurrent
      markup.</para></section><bibliography><title>References</title><bibliomixed xml:id="Alink2006" xreflabel="Alink et al., 2006">Alink, W., Bhoedjang, R., de
      Vries, A. P., and Boncz, P. A. <emphasis role="ital">Efficient XQuery Support for Stand-Off
        Annotation</emphasis>. In: Proceedings of the 3rd International Workshop on XQuery
      Implementation, Experience and Perspectives, in cooperation with ACM SIGMOD, Chicago, USA,
      2006</bibliomixed><bibliomixed xml:id="Alink2006a" xreflabel="Alink et al., 2006a">Alink, W., Jijkoun, V., Ahn,
      D., and de Rijke, M. <emphasis role="ital">Representing and Querying Multi-dimensional Markup
        for Question Answering</emphasis>. In: Proceedings of the 5th EACL Workshop on NLP and XML
      (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing, Trento,
      2006</bibliomixed><bibliomixed xml:id="Bird1999" xreflabel="Bird and Liberman, 1999">Bird, S. and Liberman, M.
        <emphasis role="ital">Annotation graphs as a framework for multidimensional linguistic data
        analysis</emphasis>. In: Proceedings of the Workshop "Towards Standards and Tools for
      Discourse Tagging", pages 1–10. Association for Computational Linguistics, 1999</bibliomixed><bibliomixed xml:id="Bird2001" xreflabel="Bird and Liberman, 2001">Bird, S. and Liberman, M.
        <emphasis role="ital">A formal framework for linguistic annotation</emphasis>. Speech
      Communication, 33(1–2): pages 23–60, 2001. Doi: <biblioid class="doi">10.1016/S0167-6393(00)00068-6</biblioid></bibliomixed><bibliomixed xml:id="Bird2006" xreflabel="Bird et al., 2006">Bird, S., Chen, Y., Davidson, S.,
      Lee, H. and Zheng,Y. <emphasis role="ital">Designing and Evaluating an XPath Dialect for
        Linguistic Queries</emphasis>. In: Proceedings of the 22nd International Conference on Data
      Engineering (ICDE), Atlanta, USA, 2006. Doi: <biblioid class="doi">10.1109/ICDE.2006.48</biblioid> </bibliomixed><bibliomixed xml:id="Burnard2008" xreflabel="Burnard and Bauman, 2008">Burnard, L., and Bauman,
      S. (eds.). <emphasis role="ital">TEI P5: Guidelines for Electronic Text Encoding and
        Interchange</emphasis>. published for the TEI Consortium by Humanities Computing Unit,
      University of Oxford, Oxford, Providence, Charlottesville, Nancy, 2008</bibliomixed><bibliomixed xml:id="Carletta2003" xreflabel="Carletta et al., 2003">Carletta, J., Kilgour, J.,
      O’Donnel, T. J., Evert, S. and Voormann, H. <emphasis role="ital">The NITE Object Model
        Library for Handling Structured Linguistic Annotation on Multimodal Data Sets</emphasis>.
      In: Proceedings of the EACL Workshop on Language Technology and the Semantic Web (3rd Workshop
      on NLP and XML (NLPXML-2003)), Budapest, Ungarn, 2003</bibliomixed><bibliomixed xml:id="Carletta2005" xreflabel="Carletta et al., 2005">Carletta, J.; Evert, S.;
      Heid, U. and Kilgour, J. <emphasis role="ital">The NITE XML Toolkit: data model and query
        language</emphasis>. In: Language Resources and Evaluation, Springer, Dordrecht, 2005, 39,
      pages 313-334. Doi: <biblioid class="doi">10.1007/s10579-006-9001-9</biblioid></bibliomixed><bibliomixed xml:id="Cowan2006" xreflabel="Cowan et al., 2006">Cowan, J., Tennison J., and Piez,
      W. <emphasis role="ital">LMNL update</emphasis>. In: Proceedings of Extreme Markup Languages,
      Montréal, Québec, 2006</bibliomixed><bibliomixed xml:id="DeRose2004" xreflabel="DeRose, 2004">DeRose, S. J. <emphasis role="ital">Markup Overlap: A Review and a Horse</emphasis>. In: Proceedings of Extreme Markup
      Languages, Montréal, Québec, 2004</bibliomixed><bibliomixed xml:id="Dipper2005" xreflabel="Dipper, 2005">Dipper, S. <emphasis role="ital">XML-based stand-off representation and exploitation of multi-level linguistic
        annotation</emphasis>. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), pages 39–50,
      Berlin, Germany, 2005</bibliomixed><bibliomixed xml:id="Dipper2007" xreflabel="Dipper et al., 2007">Dipper, S., Götze, M., Küssner,
      U. and Stede, M. <emphasis role="ital">Representing and Querying Standoff XML</emphasis>. In:
      Rehm, G., Witt, A. and Lemnitzer, L. (eds.), Datenstrukturen für linguistische Ressourcen und
      ihre Anwendungen. Data Structures for Linguistic Resources and Applications. Proceedings of
      the Biennial GLDV Conference 2007, pages 337–346, Tübingen, 2007. Gunter Narr
      Verlag</bibliomixed><bibliomixed xml:id="Durusau2002" xreflabel="Durusau and O'Donnell, 2002">Durusau, P. and
      O'Donnell, M.B.. <emphasis role="ital">Concurrent Markup for XML Documents</emphasis>. In:
      Proceedings of the XML Europe conference 2002.</bibliomixed><bibliomixed xml:id="Durusau2004" xreflabel="Durusau and O'Donnel, 2004">Durusau, P. &amp;
      O'Donnel, M. B. <emphasis role="ital">Tabling the Overlap Discussion</emphasis>. In:
      Proceedings of Extreme Markup Languages, Montréal, Québec, 2004</bibliomixed><bibliomixed xml:id="Goecke2009" xreflabel="Goecke et al., 2009">Goecke, D., Lüngen, H.,
      Metzing, D., Stührenberg, M. and Witt, A. <emphasis role="ital">Different Views on Markup.
        Distinguishing levels and layers</emphasis>. In: Linguistic modeling of information and
      Markup Languages. Contributions to language technology. Springer, 2009. To
      appear</bibliomixed><bibliomixed xml:id="Gleim2009" xreflabel="Gleim et al., 2009">Gleim, R., Waltinger, U., Ernst,
      A., Mehler, A., Esch, D., and Feith, T. <emphasis role="ital">The eHumanities Desktop
        – An Online System for Corpus Management and Analysis in Support of Computing in
        the Humanities</emphasis>. In: Proceedings of the Demonstrations Session of the 12th
      Conference of the European Chapter of the Association for Computational Linguistics EACL 2009,
      30 March – 3 April, Athens, 2009</bibliomixed><bibliomixed xml:id="Huitfeldt2001" xreflabel="Huitfeldt and Sperberg-McQueen, 2001">Huitfeldt,
      C. and Sperberg-McQueen, C. M. <emphasis role="ital">TexMECS: An experimental markup
        meta-language for complex documents</emphasis>. Markup Languages and Complex Documents
      (MLCD) Project, February 2001</bibliomixed><bibliomixed xml:id="Iacob2005" xreflabel="Iacob and Dekhtyar, 2005">Iacob, I. E. and Dekhtyar,
      A. <emphasis role="ital">Processing XML documents with overlapping hierarchies</emphasis> In:
      JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, ACM Press,
      2005, pages 409-409. Doi: <biblioid class="doi">10.1145/1065385.1065513</biblioid></bibliomixed><bibliomixed xml:id="Iacob2005a" xreflabel="Iacob and Dekhtyar, 2005a">Iacob, I. E. and
      Dekhtyar, A. <emphasis role="ital">Towards a Query Language for Multihierarchical XML:
        Revisiting XPath</emphasis>. In: Proceedings of the 8th International Workshop on the Web
      &amp; Databases (WebDB 2005), 2005, pages 49-54</bibliomixed><bibliomixed xml:id="Ide2007a" xreflabel="Ide and Romary, 2007">Ide, N. and Romary, L. <emphasis role="ital">Towards International Standards for Language Resources</emphasis>. In: Dybkjaer,
      L., Hemsen, H., and Minker, W., (eds.), Evaluation of Text and Speech Systems, pages 263-284.
      Springer</bibliomixed><bibliomixed xml:id="Ide2007" xreflabel="Ide and Suderman, 2007">Ide, N. and Suderman, K.
        <emphasis role="ital">GrAF: A Graph-based Format for Linguistic Annotations</emphasis>. In:
      Proceedings of the Linguistic Annotation Workshop, pages 1-8, Prague, Czech Republic.
      Association for Computational Linguistics, 2007</bibliomixed><bibliomixed xml:id="RELAX2003" xreflabel="ISO/IEC 19757-2:2003">ISO/IEC 19757-2:2003. <emphasis role="ital">Information technology – Document Schema Definition Language (DSDL) –
        Part 2: Regular-grammar-based validation – RELAX NG (ISO/IEC 19757-2)</emphasis>.
      International Standard, International Organization for Standardization, Geneva,
      2003</bibliomixed><bibliomixed xml:id="Schematron" xreflabel="ISO/IEC 19757-3:2006">ISO/IEC 19757-3:2006.
        <emphasis role="ital">Information technology – Document Schema Definition Language (DSDL)
        – Part 3: Rule-based validation – Schematron</emphasis>. International standard,
      International Organization for Standardization, Geneva, 2006</bibliomixed><bibliomixed xml:id="Jagadish2004" xreflabel="Jagadish et al., 2004">Jagadish, H. V.,
      Lakshmanany, L. V. S., Scannapieco, M., Srivastava, D. and Wiwatwattana, N. <emphasis role="ital">Colorful XML: One hierarchy isn’t enough</emphasis>. In: Proceedings of ACM
      SIGMOD International Conference on Management of Data (SIGMOD 2004), pages 251–262, Paris,
      June 13-18 2004. ACM Press New York, NY, USA. Doi: <biblioid class="doi">10.1145/1007568.1007598</biblioid></bibliomixed><bibliomixed xml:id="Kay2007" xreflabel="Kay, 2007">Kay, M. <emphasis role="ital">XSL
        Transformations (XSLT) Version 2.0</emphasis>. World Wide Web Consortium. 2007. –
      W3C Recommendation</bibliomixed><bibliomixed xml:id="Kay2008" xreflabel="Kay 2008">Kay, M. <emphasis role="ital">XSLT 2.0 and
        XPath 2.0 Programmer’s Reference</emphasis>. Wiley Publishing, Indianapolis, 4th edition,
      2008</bibliomixed><bibliomixed xml:id="Marinelli2008" xreflabel="Marinelli et al., 2008">Marinelli, P., Vitali,
      F., and Zacchiroli, S. <emphasis role="ital">Towards the unification of formats for
        overlapping markup</emphasis>. In: New Review of Hypermedia and Multimedia, 14(1): pages
      57-94, 2008. Doi: <biblioid class="doi">10.1080/13614560802316145</biblioid></bibliomixed><bibliomixed xml:id="Marcoux2008" xreflabel="Marcoux 2008">Marcoux, Y. <emphasis role="ital">Graph characterization of overlap-only texmecs and other overlapping markup
        formalisms</emphasis>. In: Proceedings of Extreme Markup Languages, Montréal, Québec,
      2008</bibliomixed><bibliomixed xml:id="Marcoux2008a" xreflabel="Marcoux 2008a">Marcoux, Y. <emphasis role="ital">Variants of GODDAGs and suitable ﬁrst-layer semantics</emphasis>. Presentation given at the
      GODDAG workshop, Amsterdam, 1-5 December 2008</bibliomixed><bibliomixed xml:id="Pianta2004" xreflabel="Pianta and Bentivogli, 2004">Pianta, E. and
      Bentivogli., L. <emphasis role="ital">Annotating Discontinuous Structures in XML: the
        Multiword Case</emphasis>. In: Proceedings of LREC 2004 Workshop on ”XML-based richly
      annotated corpora”, pages 30–37, Lisbon, Portugal. </bibliomixed><bibliomixed xml:id="Poesio2009" xreflabel="Poesio et al., 2009">Poesio, M., Diewald, N.,
      Stührenberg, M., Chamberlain, J., Jettka, D., Goecke, D. and Kruschwitz, U. <emphasis role="ital">Markup Infrastructure for the Anaphoric Bank, Part I: Supporting Web
        Collaboration</emphasis>. In: Mehler, A., Kühnberger, K.-U., Lobin, H., Lüngen, H., Storrer,
      A. and Witt, A. (eds.), Modelling, Learning and Processing of Text Technological Data
      Structures, Dordrecht: Springer, Berlin, New York. To appear</bibliomixed><bibliomixed xml:id="Schonefeld2007" xreflabel="Schonefeld, 2007">Schonefeld, O. <emphasis role="ital">XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of
        concurrent markup</emphasis>. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.), Datenstrukturen
      für linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources
      and Applications. Proceedings of the Biennial GLDV Conference 2007, Tübingen, Germany, 2007.
      Gunter Narr Verlag</bibliomixed><bibliomixed xml:id="Sperberg-McQueen2000" xreflabel="Sperberg-McQueen et al., 2000">Sperberg-McQueen, C. M., Huitfeldt, C. and Renear, A.. <emphasis role="ital">Meaning and
        Interpretation of markup</emphasis>. Markup Languages – Theory &amp; Practice,
      2, pages 215-234, 2000. Doi: <biblioid class="doi">10.1162/109966200750363599</biblioid></bibliomixed><bibliomixed xml:id="Sperberg-McQueen2002" xreflabel="Sperberg-McQueen et al., 2002">Sperberg-McQueen, C. M., Dubin, D., Huitfeldt, C. and Renear, A. <emphasis role="ital">Drawing inferences on the basis of markup</emphasis>. In: Proceedings of Extreme Markup
      Languages, 2002</bibliomixed><bibliomixed xml:id="Sperberg-McQueen2004" xreflabel="Sperberg-McQueen and    Huitfeldt, 2004">Sperberg-McQueen, C. M. and Huitfeldt, C. <emphasis role="ital">GODDAG: A Data Structure for
        Overlapping Hierarchies</emphasis>. In: King, P. and Munson, E. V. (eds.), Proceedings of
      the 5th International Workshop on the Principles of Digital Document Processing (PODDP 2000),
      volume 2023 of Lecture Notes in Computer Science, pages 139–160. Springer, 2004</bibliomixed><bibliomixed xml:id="Sperberg-McQueen2006" xreflabel="Sperberg-McQueen, 2006">Sperberg-McQueen,
      C. M. <emphasis role="ital">Rabbit/Duck grammars: a validation method for overlapping
        structures</emphasis>. In: Proceedings of Extreme Markup Languages, Montréal, Québec,
      2006</bibliomixed><bibliomixed xml:id="Sperberg-McQueen2007" xreflabel="Sperberg-McQueen, 2007">Sperberg-McQueen,
      C. M. <emphasis role="ital">Representation of overlapping structures</emphasis>. In:
      Proceedings of Extreme Markup Languages, Montréal, Québec, 2007</bibliomixed><bibliomixed xml:id="Sperberg-McQueen2008" xreflabel="Sperberg-McQueen and Huitfeldt, 2008">Sperberg-McQueen, C. M. and Huitfeldt, C. <emphasis role="ital">Markup Discontinued
        Discontinuity in TexMecs, Goddag structures, and rabbit/duck grammars</emphasis>. Presented at Balisage: The Markup Conference 2008, Montréal, Canada, August 12 - 15, 2008. In: Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). Doi: <biblioid class="doi">10.4242/BalisageVol1.Sperberg-McQueen01</biblioid></bibliomixed><bibliomixed xml:id="Sperberg-McQueen2008a" xreflabel="Sperberg-McQueen and Huitfeldt, 2008a">Sperberg-McQueen, C. M. and Huitfeldt, C. <emphasis role="ital">GODDAG</emphasis>. Presented at the Goddag workshop, Amsterdam, 1-5 December 2008</bibliomixed><bibliomixed xml:id="Stührenberg2007" xreflabel="Stührenberg et al., 2007">Stührenberg, M.,
      Goecke, D, Diewald, N., Cramer, I. and Mehler, A. <emphasis role="ital">Web-based annotation
        of anaphoric relations and lexical chains</emphasis>. In: Proceedings of the Linguistic
      Annotation Workshop (LAW), pages 140–147, Prague. Association for Computational Linguistics,
      2007</bibliomixed><bibliomixed xml:id="Stührenberg2008" xreflabel="Stührenberg and Goecke, 2008">Stührenberg, M.
      and Goecke, D.<emphasis role="ital">SGF – An integrated model for multiple
        annotations and its application in a linguistic domain</emphasis>. Presented at Balisage: The Markup Conference 2008, Montréal, Canada, August 12 - 15, 2008. In: Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). Doi: <biblioid class="doi">10.4242/BalisageVol1.Stuehrenberg01</biblioid></bibliomixed><bibliomixed xml:id="Tennison2002" xreflabel="Tennison, 2002">Tennison, J. <emphasis role="ital">Layered Markup and Annotation Language (LMNL)</emphasis>. In: Proceedings of Extreme Markup
      Languages, Montréal, Québec, 2002</bibliomixed><bibliomixed xml:id="Tennison2007" xreflabel="Tennison, 2007">Tennison, J. <emphasis role="ital">Creole: Validating Overlapping Markup</emphasis>.In: Proceedings of XTech 2007: The
      Ubiquitous Web Conference, 2007 </bibliomixed><bibliomixed xml:id="Thompson1997" xreflabel="Thompson and McKelvie, 1997">Thompson, H. S. and
      D. McKelvie. <emphasis role="ital">Hyperlink semantics for standoff markup of read-only
        documents</emphasis>. In: Proceedings of SGML Europe ’97: The next decade – Pushing the
      Envelope, pages 227–229, Barcelona, 1997</bibliomixed><bibliomixed xml:id="Waltinger2008" xreflabel="Waltinger et al., 2008">Waltinger, U., Mehler, A.
      Mehler, and Stührenberg, M. <emphasis role="ital">An Integrated Model of Lexical Chaining:
        Application, Resources and its Format</emphasis>. Proceedings of the 9th Conference on
      Natural Language Processing (KONVENS 2008)</bibliomixed><bibliomixed xml:id="XProc2009" xreflabel="XProc 2009">Walsh, N., Milowski, A., and Thompson, H.
      S. (2009). XProc: An XML Pipeline Language. W3C Candidate Recommendation 28 May 2009, World
      Wide Web Consortium.</bibliomixed><bibliomixed xml:id="Witt2002" xreflabel="Witt, 2002"> Witt, A. <emphasis role="ital">Meaning
        and interpretation of concurrent markup</emphasis>. In: Proceedings of ALLC-ACH2002, Joint
      Conference of the ALLC and ACH, 2002</bibliomixed><bibliomixed xml:id="Witt2004" xreflabel="Witt, 2004">Witt, A. <emphasis role="ital">Multiple
        hierarchies: New Aspects of an Old Solution</emphasis>. In: Proceedings of Extreme Markup
      Languages, 2004</bibliomixed><bibliomixed xml:id="Witt2009" xreflabel="Witt et al., 2009">Witt, A., Rehm, G., Hinrichs, E.,
      Lehmberg, T. and Stegmann, J. <emphasis>SusTEInability of Linguistic Resources through Feature
        Structures</emphasis>. In: Literary and Linguistic Computing, 24(3): pages 363-372, 2009. Doi: <biblioid class="doi">10.1093/llc/fqp024</biblioid></bibliomixed><bibliomixed xml:id="Witt2009a" xreflabel="Witt et al., 2009a">Witt, A., Stührenberg, M.,
      Goecke, D. and Metzing, D. <emphasis role="ital">Integrated Linguistic Annotation Models and
        their Application in the Domain of Antecedent Detection</emphasis>. In: Mehler, A.,
      Kühnberger, K.-U., Lobin, H., Lüngen, H., Storrer, A. and Witt, A. (eds.), Modelling, Learning
      and Processing of Text Technological Data Structures, Dordrecht: Springer, Berlin, New York.
      To appear</bibliomixed><bibliomixed xml:id="XMLSchema2009" xreflabel="XML Schema 1.1 Part 1, 2009">W3C XML Schema
      Definition Language (XSD) 1.1 Part 1: Structures. W3C Working Draft, World Wide Web
      Consortium, W3C Candidate Recommendation 30 April 2009. Online: <link xlink:href="http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/</link></bibliomixed></bibliography></article>
