<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2" xml:id="Bal2009pero0115"><title>Towards markup support for full GODDAGs and beyond: the EARMARK approach</title><info><confgroup><conftitle>Balisage: The Markup Conference 2009</conftitle><confdates>August 11 - 14, 2009</confdates></confgroup><abstract><para>One of the most evident tenets of the literature on overlapping markup is that the
                philosophy of documents as trees (as dictated by meta-markup languages such as SGML
                and XML) is a simplification that sometimes fails and requires corrections. These
                corrections have been proposed at the markup level (e.g., milestones, segmentation),
                at the meta-markup level (e.g., LMNL, TexMecs, XCONCUR, etc.) or at level of the
                abstract model (e.g., GODDAG). Unfortunately full GODDAGs do not allow
                linearizations in general, and as such a restricted version of GODDAG, r-GODDAG, has
                been proposed that is guaranteed to be linearizable (in TexMecs) and still allows
                many nice features beyond trees.</para><para>In this paper we discuss that the problem of linearizing more-than-hierarchical
                structures lies basically in the embedding of markup within content and that no such
                problem arises with an appropriate standoff approach, that is able to represent full
                GODDAGs without restrictions. This gives ample opportunities to deal with
                interesting markup features that are describable with GODDAGs but not with
                r-GODDAGs, such as non-contiguous elements and virtual elements.</para><para>Besides, we discuss whether a specific constraint of full GODDAGs is really
                necessary once all residual hopes of embeddability are given up, and we further
                propose a minimal extension to GODDAG, genially called "extended GODDAG" (e-GODDAG)
                that, by removing the requirement for names in non-terminal nodes, adds support for
                additional interesting markup features such as content repetitions. In truth,
                e-GODDAGs are even less embeddable than full GODDAGs, but they are just as easily
                dealt with by using stand-off markup.</para><para>We further propose a meta-syntax for non-embedded markup, called EARMARK, that can
                be used for stand-off annotations of textual content, and that naturally represents
                e-GODDAGs with fully W3C-compliant technologies. EARMARK is based on an
                ontologically precise definition of markup that instantiates the markup of a text
                document as an OWL document, and through appropriate OWL and SWRL characterizations
                it can define structures such as trees, r-GODDAGs, full GODDAGs and e-GODDAGs, and
                can be used to generate validity constraints (including co-constraints), and to
                verify adherence to content model patterns.</para><para>As mentioned, in general the embedding of a full EARMARK document is not straightforward,
                but approaches can be taken in that direction: just like segmentation and
                fragmentation are strategies to embed in a strictly-hierarchical language a
                r-GODDAG-specific feature such as overlapping elements, similarly a number of
                strategies exist to provide embedding of GODDAG and e-GODDAG features in less
                expressive syntaxes. In the final part of the paper we discuss our wish to provide
                at the metalanguage level a series of embedding strategies of the non-hierarchical
                features of EARMARK, i.e. a number of language-independent mechanisms to express
                e-GODDAGs structures into XML (as well as in TexMecs and in LMNL) and that can be
                recognized as such (i.e., as strategies, as tricks) by tools and readers alike,
                especially for further uses of such documents.</para></abstract><author><personname><firstname>Angelo</firstname><surname>Di Iorio</surname></personname><personblurb><para>Angelo Di Iorio holds a Ph.D. in Computer Science, from the University of
                    Bologna. His thesis is positioned over markup languages and document engineering
                    areas, being focused on design patterns for digital documents and automatic
                    processes of analysis and segmentation. During his PhD he has also worked on
                    collaborative authoring, document versioning, content formatting, and semantic
                    web technologies. His research interests have recently extended towards layout
                    languages and algorithms. He is a member of the W3C XSL-FO working group, and
                    author of several conference and journal papers on markup languages, digital
                    publishing and Web technologies.</para></personblurb><affiliation><orgname>Department of Computer Science, University of Bologna</orgname></affiliation><email>diiorio@cs.unibo.it</email></author><author><personname><firstname>Silvio</firstname><surname>Peroni</surname></personname><personblurb><para>Silvio Peroni holds a degree in Computer Science at the University of Bologna.
                    The main research interests in his current Ph.D. career include Semantic Web
                    technologies, markup languages for complex documents, design patterns for
                    digital documents and automatic processes of analysis and segmentation. He has
                    published 4 scientific papers about these subjects.</para></personblurb><affiliation><orgname>Department of Computer Science, University of Bologna</orgname></affiliation><email>speroni@cs.unibo.it</email></author><author><personname><firstname>Fabio</firstname><surname>Vitali</surname></personname><personblurb><para>Fabio Vitali is associate professor in Computer Science at the University of
                    Bologna, where he teaches Web Technologies and Human-Computer Interaction. His
                    interests lie in models and languages for document management and hypertext
                    support, and has published more than 60 papers in national and international
                    venues. He is member of the W3C Working Group on XML Schema, and member of the
                    scientific committee of several conferences and journals in Web engineering and
                    technologies. He is author of important standards in the legislative XML Domain,
                    and work on issues related to digital publishing, Web technologies and Semantic
                    Web technologies.</para></personblurb><affiliation><orgname>Department of Computer Science, University of Bologna</orgname></affiliation><email>fabio@cs.unibo.it</email></author><legalnotice><para>Copyright © 2009 by the authors.  Used with
                permission.</para></legalnotice></info><section><title>Introduction</title><para>Not everybody working with markup languages needs support for overlaps and multiple
            hierarchies. But those that need it, usually need it badly. So badly, in fact, that a
            robust slice of markup literature is devoted to it, specialized scientific events have
            taken place, and a number of extremely varied approaches have been proposed in the last
            years for this issue. </para><para>Some of these approaches were proposed at the language level: many XML languages (TEI
                <xref linkend="bib-1"/> being the most evident) took the decision to add specific
            language-dependent markup structures devoted to supporting overlapping. Some of such
            solutions, such as milestones and fragmentation, are so general and widely applicable
            that have been proposed even outside of the specificity of just one markup language
            (e.g., see <xref linkend="bib-20"/>), as architectural forms available in general in
            XML languages. Further approaches have been to create new meta-markup languages,
            inspired by but independent from XML such as TexMecs, LMNL, XCONCUR, that provide at the
            metalanguage itself support for more than a single hierarchy. This makes it possible for
            any language defined within the syntax to make use of the features for overlaps, without
            the need for further special tools to make sense of the annotations. </para><para>At the conceptual level, what has shown its limits is the idea of forcing tree-like
            structures over documents. While some may be fully described by trees, some just are not,
            and we need more powerful abstract data structures to describe them. GODDAGs <xref linkend="bib-5"/> have been proposed exactly for this purpose: direct acyclic
            ddgraphs with ordered children relax exactly the kind of constraints of trees that were
            in the way for sophisticated markup features such as overlap. Unfortunately, generalized
            GODDAGs do not allow an immediate linearization in form of an XML-like syntax (even if
            extended in some way), but a variant, restricted GODDAGs (<xref linkend="bib-5"/> and
            then <xref linkend="bib-23"/>), does allow a linearization in TexMECS. In general,
            though, the linearization of full generalized GODDAGs does not allow to keep all information
            expressed in the original graph. Many additional useful features of
            GODDAGs (e.g., virtual elements) can only be converted in XML structures by recurring to 
            procedural tricks. </para><para>In this paper we discuss whether the problem of generalized GODDAGs lies in the
            embedding nature of meta-markup languages such as XML, TexMecs and LMNL, and whether by
            getting rid of embeddability altogether we can exploit the full potentiality of GODDAGs.
            Furthermore, we propose a minimal extension to GODDAGs to provide full support of
            repeated content in GODDAGs (currently only allowed if appearing in different
            substructures), thereby generalizing the idea of repeatability of markup structures. </para><para>Additionally, in this paper we propose a meta language for GODDAGs and extended
            GODDAGs that relies 100% on well-known and widely available W3C technologies: EARMARK
                (<emphasis role="ital">Extreme Annotational RDF Markup</emphasis>) is a language for
            standoff annotations over documents that is based on an OWL ontology and uses RDF
            annotations as its linearization approach. EARMARK annotations are facts expressed about
            OWL classes such as ranges and markup elements whose properties are fully and explicitly
            described in the OWL ontology itself, and minimally depend on syntactic constraint that
            are inherent of embedded languages. Thus all standard hierarchy assertions as usually
            expressed in XML are available in EARMARK, but the language also supports overlapping
            structures, virtual elements, anonymous elements and structured attributes, that are
            available with more sophisticated non-XML languages such as TexMECS or LMNL, as well as
            unsupported features such as repeated structures, content variants and partially
            overlapping multiple hierarchies<footnote xml:id="sdfootnote1sym"><para>Defined as the “set of partial or independent overlapping hierarchies in which
                    the textual content between the tags is visible in some hierarchies but not in
                    others” <xref linkend="bib-2"/>.</para></footnote> etc. are trivially expressed in EARMARK, and contribute to generate a
            language that is suited to fully support generalized GODDAGs and extended GODDAGs. </para><para>EARMARK documents are therefore OWL documents that can be expressed as RDF assertions,
            and using plain and standard W3C technologies a number of Semantic Web tools can be used
            for generating, converting, querying and displaying EARMARK documents. Particularly
            relevant here is the process of embedding EARMARK documents in traditional embedded
            languages, such as XML or TexMecs. Of course, not all EARMARK assertions can be directly
            transformed into XML markup structures. The specific subset of the EARMARK document that
            can be expressed in the destination syntax (e.g., any of the possible tree substructures
            for XML, or of the r-GODDAG substructures for TexMecs) can be directly generated, and
            the remaining ones need to be either left out or forcedly embedded using any of a number
            of well-known or newly-introduced syntactic tricks (up to, of course, leaving part of
            the EARMARK markup directly as RDF fragments within the destination document). </para><para>Providing a recognizable and repeatable two-way process for generating EARMARK
            documents out of embedded documents, and vice versa for generating embedded documents
            out of EARMARK document with embedding tricks represents also a chance to collect and
            generalize all such embedding tricks, and providing an additional conversion model
            between different syntaxes. All in all, we propose EARMARK as the most natural
            syntactical rendering of GODDAGs (and e-GODDAGs, of course) and as the intermediate
            representation of any conversion path for documents, XML or otherwise, that use
            overlapping features of any form, i.e., as a generalization of the conversion algorithms
            for overlapping structures proposed in <xref linkend="bib-7"/>. </para></section><section><title>Embedding multiple hierarchies</title><para>There comes a time, in marking up documents, where different types of annotations need
            to be placed upon the same content, and different markup needs to be used. Sometimes
            these different annotations nest easily, and sometimes they do not. Trying to express
            these different annotations using a hierarchical metamarkup language such as XML is,
                <emphasis role="ital">per se</emphasis>, unfeasible: each structure needs to be
            described by its own hierarchy, and the overlapping situations pose a big problem,
            since, as we know, XML is not naturally equipped to deal with them. </para><para>Some approaches to deal with overlapping structures in markup languages were proposed
            in past years. Each approach tries mediating between the support for overlapping and the
            hierarchical organization of XML documents, as illustrated in <xref linkend="bib-20"/>,
            as well as in <xref linkend="bib-1"/>, <xref linkend="bib-8"/> and <xref linkend="bib-7"/>. The five main overlap-handling techniques described in
            literature can be summarized in the following:</para><itemizedlist><listitem><para>
                    <emphasis role="ital">milestones</emphasis>, through which one hierarchy is
                    expressed using the standard hierarchical XML markup and the elements belonging
                    to the other ones are represented through a pair of empty elements representing the
                    start and the end tags, and connected to each other by special
                    attributes.</para></listitem><listitem><para>
                    <emphasis role="ital">flat milestones</emphasis>, that represents each of the
                    hierarchy elements as a milestone, i.e., an empty element placed where the start
                    or the end tag should be, all of them contained as children of the same root
                    element.</para></listitem><listitem><para>
                    <emphasis role="ital">fragmentation</emphasis>, in which one hierarchy (the
                    primary) is expressed though the standard hierarchical XML markup, and the
                    elements of the secondary hierarchies are fragmented within the primary elements
                    as needed to suit the primary hierarchy and are connected to each other by
                    special attributes.</para></listitem><listitem><para>
                    <emphasis role="ital">twin documents</emphasis>, in which each hierarchy is
                    represented by a different document, which contains the same textual
                    content but marks up the elements according to the individual hierarchy.</para></listitem><listitem><para>
                    <emphasis role="ital">stand-off markup</emphasis>, which puts all the textual
                    content in a single structure with the possible specification of the shared
                    hierarchy, and puts the remaining elements in other structures (e.g., files) with the
                    positional association of each starting and ending location to the main
                    structure, using, for instance, XPointer <xref linkend="bib-9"/> locations.
                </para></listitem></itemizedlist><para>A separate approach is to give up the XML requirements of a single hierarchy, and
            try new approaches where multiple hierarchies can be specified in the same text flow.
            The data structure itself, of course, is not a tree anymore, and needs to become
            something more general. The <emphasis role="ital">General Ordered-Descendant Directed
                Acyclic Graph</emphasis>
            <xref linkend="bib-5"/>, or <emphasis role="ital">GODDAG</emphasis>, is the most
            relevant data structure that has been used to specify complex markup hierarchies, such
            as overlapping between elements and fragmentation.</para><para>Although GODDAG is not able to handle directly other features such as anonymous
            elements and structured attributes, yet another different non-XML approach for these and
            other well-known overlapping scenarios is given by <emphasis role="ital">Layered Markup
                anNotation Language</emphasis>
            <xref linkend="bib-6"/>, or <emphasis role="ital">LMNL</emphasis>. Contrarily to
            GODDAG, that expresses the many hierarchies with a graph, LMNL uses a XML-like syntax
            where named or anonymous elements can overlap with other ones in one or more element
                <emphasis role="ital">layers</emphasis>.</para><para>A similar approach is used by XConcur <xref linkend="bib-24"/>. An XConcur document
            is made of multiple layers coexisting in the same multi-root structure, written in a
            XML-like syntax: each layer represents an independent hierarchy that can be extracted as
            a single unit and validated against a DTD, XML-Schema or RelaxNG schema. Relationships
            and constraints between multiple hierarchies are ruled by a related constraint language
            called XConcur-CL. XConcur documents end up being very complex and few tools to
            manipulate them are available.</para></section><section><title>Could singing songs be such a big deal?</title><para>To illustrate some of the difficulties in handling complex structures, let us examine
            a fictitious karaoke application in which lyrics are displayed on a screen in sync with
            a recording of the instrumental parts of the corresponding song; in order to make the
            example even more interesting, let us consider the situation whereby, beside the
            screenfuls of lyrics, the application would also show the chords of the song for any
            additional instrument playing along, and a few fun facts popups here and there to keep
            the attention of the readers. </para><para>We will use as an example for our discussion the song "And I love her" by the Beatles,
            one of the most famous and sung songs of the history of modern music. The lyrics of the original
            version appear in <xref linkend="Tablyrics"/>.</para><table border="1" xml:id="Tablyrics" xreflabel="Table 1"><caption><para>Lyrics and structure of “And I love
                    her” by The Beatles</para></caption><tbody><tr><th>
                        <para>Title</para>
                    </th><td>
                        <para>And I love her</para>
                    </td></tr><tr><th>
                        <para>1</para>
                    </th><td>
                        <para>I give her all my love / That's all I do / And if you saw my love /
                            You'd love her too</para>
                    </td></tr><tr><th>
                        <para>Chorus</para>
                    </th><td>
                        <para>I love her</para>
                    </td></tr><tr><th>
                        <para>2</para>
                    </th><td>
                        <para>She gives me ev'rything / And tenderly / The kiss my lover brings /
                            She brings to me</para>
                    </td></tr><tr><th>
                        <para>Chorus</para>
                    </th><td>
                        <para>And I love her</para>
                    </td></tr><tr><th>
                        <para>3</para>
                    </th><td>
                        <para>A love like ours / Could never die / As long as I / Have you near
                            me</para>
                    </td></tr><tr><th>
                        <para>4</para>
                    </th><td>
                        <para>Bright are the stars that shine / Dark is the sky / I know this love
                            of mine / Will never die</para>
                    </td></tr><tr><th>
                        <para>Chorus</para>
                    </th><td>
                        <para>And I love her</para>
                    </td></tr><tr><th>
                        <para>4</para>
                    </th><td>
                        <para>Bright are the stars that shine / Dark is the sky / I know this love
                            of mine / Will never die</para>
                    </td></tr><tr><th>
                        <para>Chorus</para>
                    </th><td>
                        <para>And I love her</para>
                    </td></tr></tbody></table><para>The first difficulty for our karaoke application is to handle more than one structure
            at the same time. We may be interesting in building multiple structures over the same
            content:</para><itemizedlist><listitem><para>the lyrics organized in stanzas and verses</para></listitem><listitem><para>the notation for the time-driven excerpts of lyrics as shown on screen during
                    the playback</para></listitem><listitem><para>additional time-driven visualization of the chords, with different time
                    intervals</para></listitem><listitem><para>the (either time-driven or content-driven) visualization of pop-ups with fun
                    fact sentences</para></listitem></itemizedlist><para>Furthermore, we may need to deal with small difference in lyrics if the gender of the
            loved one is female, as in Beatles' original (“and I love her”) or male, as in several
            covers (“and I love him”).</para><section xml:id="xml"><title>Dark is the overlapping sky</title><para>Let us concentrate on a single stanza of the song, the fourth, and its
                refrain:</para><programlisting xml:space="preserve">Bright are the stars that shine / Dark is the sky / I know this love of mine / Will never die / And I love her</programlisting><para> The first hierarchy represents the lyrics. We may employ an XHTML vocabulary, using the
                class attribute for characterizing containers (e.g. “stanza” and “refrain”),
                obtaining a clear and straightforward structure.</para><programlisting xml:space="preserve">&lt;body&gt;
    &lt;div class="stanza" title="4"&gt;
        &lt;p&gt;Bright are the stars that shine&lt;/p&gt;
        &lt;p&gt;Dark is the sky&lt;/p&gt;
        &lt;p&gt;I know this love of mine&lt;/p&gt;
        &lt;p&gt;Will never die&lt;/p&gt;
    &lt;/div&gt;
    &lt;div class="refrain"&gt;
        &lt;p&gt;And I love her&lt;/p&gt;
    &lt;/div&gt;
&lt;/body&gt;</programlisting><para> The harmony of the song uses two chords, Em and Bm, for each of the first three
                lines, then moving to the bridge G for the last line and then the refrain in A and
                D. The A chord starts while the melody is still singing the second part of the last
                line of the stanza. A possible, trivial hierarchy for chords would then be: </para><programlisting xml:space="preserve">&lt;chords&gt;
    &lt;Em&gt;Bright are the&lt;/Em&gt;
    &lt;Bm&gt;stars that shine&lt;/Bm&gt;
    &lt;Em&gt;Dark is the&lt;/Em&gt;
    &lt;Bm&gt;sky&lt;/Bm&gt;
    &lt;Em&gt;I know this&lt;/Em&gt;
    &lt;Bm&gt;love of mine&lt;/Bm&gt;
    &lt;G&gt;Will never&lt;/G&gt;
    &lt;A&gt;die And I&lt;/A&gt;
    &lt;D&gt;love her&lt;/D&gt;
&lt;/chords&gt;</programlisting><para> We have a different issue with the timings for the lyrics. We want each line to
                appear exactly when the music calls for it to be sung, but at the same time we want that the next
                line is shown, too, so that the singer gets ready to sing it aftwerward. Thus each line
                has to appear twice in each screenful, as in the following XML fragment: </para><programlisting xml:space="preserve">&lt;timing&gt;
    &lt;screenful starts=”68”&gt;
        &lt;main&gt;Bright are the stars that shine&lt;/main&gt;
        &lt;next&gt;Dark is the sky&lt;/next&gt;
    &lt;/screenful&gt;
    &lt;screenful starts=”72”&gt;
        &lt;main&gt;Dark is the sky&lt;/main&gt;
        &lt;next&gt;I know this love of mine&lt;/next&gt;
    &lt;/screenful&gt;
    &lt;screenful starts=”76”&gt;
        &lt;main&gt;I know this love of mine&lt;/main&gt;
        &lt;next&gt;Will never die&lt;/next&gt;
    &lt;/screenful&gt;
    &lt;screenful starts=”80”&gt;
        &lt;main&gt;Will never die&lt;/main&gt;
        &lt;next&gt;And I love her&lt;/next&gt; 
    &lt;/screenful&gt;
    &lt;screenful starts=”84”&gt;
        &lt;main&gt;And I love her&lt;/main&gt;
    &lt;/screenful&gt;
&lt;/timing&gt;</programlisting><para> Only this is not nice: each line appears twice in the screen, and therefore twice
                in the XML document, and forcing them to appear only once in the XML structure would
                either require overlapping, or forcing some structural semantics into procedural
                attributes, that would imply implementing ad hoc visualization tools, as in: </para><programlisting xml:space="preserve">       &lt;p main=”68”&gt;Bright are the stars that shine&lt;/p&gt;
       &lt;p main=”72” next=”68”&gt;Dark is the sky&lt;/p&gt;
       &lt;p main=”76” next=”72”&gt;I know this love of mine&lt;/p&gt;
       &lt;p main=”80” next=”76”&gt;Will never die&lt;/p&gt;
       &lt;p main=”84” next=”80”&gt;And I love her&lt;/p&gt;</programlisting><para> We do not like this approach and will not consider it further. Yet the
                multiplicity of the lines is worrisome, as it creates a redundancy that has to be
                carefully considered. </para><para>As for the pop-ups, we want to show some additional text exactly at the right time
                – i.e. when the song gets to the precise point of the lyrics associated to these
                paragraphs. It is not even given that pop-ups are aligned with whole lines, indeed
                it could very well happen that the association transcends line boundaries, as in the
                following: </para><programlisting xml:space="preserve">&lt;funfacts&gt;
 &lt;popup&gt;
  &lt;lyrics&gt;this love of mine Will never die&lt;/lyrics&gt;
  &lt;fact&gt;
   &lt;p&gt;Paul McCartney wrote this about his girlfriend, 
      an actress named Jane Asher.&lt;/p&gt;
  &lt;/fact&gt;
 &lt;/popup&gt;
&lt;/funfacts&gt;</programlisting><para> As we have seen, each of these structures, taken individually, is a single
                hierarchy and could be easily managed with a traditional XML document. There is a
                final issue related to text variants: depending on the preferences of the singer, we
                may want to decline the lyrics in the masculine or feminine gender. As such, we end
                up with two variants of the refrain, and no syntactically evident way to point out
                which variant to show and which to ignore in each given run of the application: </para><programlisting xml:space="preserve">    &lt;div class="refrain"&gt;
        &lt;p&gt;And I love &lt;span class=”feminine” &gt;her&lt;/span&gt;
                      &lt;span class=”masculine”&gt;him&lt;/span&gt;&lt;/p&gt;
    &lt;/div&gt;</programlisting><para> This is not declarative at all: it is the application's job to know that when the
                feminine version is chosen, elements of class feminine are present (i.e., shown) and
                elements of class masculine are absent (i.e., hidden), and vice-versa: the class
                attribute suddenly impacts not only on the presentation of the lines, but on their
                presence and content, too.</para><para>Finally, the issue of repetitions has a further and subtler issue, that impacts on
                the difference between the content of a document and its <emphasis role="ital">structured</emphasis> content. </para><para>In the Beatles' song there are some repetitions of quite large structures, such as
                the refrain and the fourth stanza. It could be considered a pointless exercise in
                futility to decide whether the refrains of a song are to be considered as one
                instance of content to be repeated as needed after each stanza, or many different
                instances whose content happens to be identical. Yet, the praxis of transcription of
                song lyrics is usually to qualify the refrain lyrics as such the first time they are
                sung, and then refer back to them all other times without actually repeating the
                content, and as such we will treat them in our example. Yet the refrains are
                necessary handled, in an XML document, by repeating the entire structure,
                duplicating the markup code and the text. We could introduce it the first time
                    only<footnote xml:id="sdfootnote2sym"><para>We will ignore, for the time being, that in the lyrics that are actually
                        sung by the Beatles the first refrain is slightly different from the other
                        ones, since they sing “I love her” instead of “and I love her”</para></footnote>, and refer to it in some way the other times – for example, through an
                “href” attribute – in the other parts of the lyrics. Similarly we could handle the
                repetition of the fourth stanza, i.e., as follows:</para><programlisting xml:space="preserve">&lt;body&gt;
    &lt;h1&gt;And I love her&lt;/h1&gt;
    &lt;div id=”first” class="stanza" title="1"&gt;[...]&lt;/div&gt;
    <emphasis role="bold">&lt;div id=”refrain” class="refrain"&gt;</emphasis>
				<emphasis role="bold">&lt;p&gt;And I love her&lt;/p&gt;</emphasis>
				<emphasis role="bold">&lt;/div&gt;</emphasis>
    &lt;div id=”second” class="stanza" title="2"&gt;[...]&lt;/div&gt;
    <emphasis role="bold">&lt;div href=”#refrain” /&gt;</emphasis>
    &lt;div id=”third” class="stanza" title="3"&gt;[...]&lt;/div&gt;
    &lt;div id=”fourth” class="stanza" title="4"&gt;[...]&lt;/div&gt;
    <emphasis role="bold">&lt;div href=”#refrain” /&gt;</emphasis>
    &lt;div href=”#fourth” /&gt;
    <emphasis role="bold">&lt;div href=”#refrain” /&gt;</emphasis>
&lt;/body&gt;</programlisting><para> Unfortunately, we believe again that this approach is not declarative enough: it
                is the application's job to understand that the last three div elements are not
                empty, but refer to the previous-declared elements and repeat their content.</para><para> XML entity references could be used to express repetitions too: content can be
                declared as an entity to be resolved when users view the document. From a merely
                presentation perspective, such an approach would be enough as all the repeated
                content is retrieved and merged into the final XML file. On the other hand,
                processing entities in a more sophisticated way still require entangled and
                application-dependant operations. For instance, it is rather complex to add metadata
                about entities, to extract information about that content, to process those
                fragments via XSLT or to validate entity fragments.</para><para>Joining the above mentioned different hierarchies in a single document and dealing
                with the issues mentioned so far presents issues that are not manageable
                with the plain XML armamentarium, and requires special approaches: </para><itemizedlist><listitem><para>the timing of the A chord overlaps two lines of the lyrics;</para></listitem><listitem><para>the timed display of the lyrics requires each line to be shown multiple
                        times;</para></listitem><listitem><para>the popups introduce additional text content to the document, and do so
                        independently of the stanza and line structure of the lyrics;</para></listitem><listitem><para>text variants require elements that exist or do not exist depending on
                        context;</para></listitem><listitem><para>repetitions require that content is specified once, and referred to many
                        times. </para></listitem></itemizedlist><para>Some of these structural issues can be handled by standard overlapping approaches,
                and other can be dealt with by introducing ad hoc, non-declarative markup that is
                procedurally interpreted by specialized tools. But it is at the level of the data
                structure model that we prefer to study the problem. </para></section><section><title>Restricted GODDAG</title><para>Handling overlapping elements requires a more expressive data structure than
                trees, such as directed graphs. <emphasis role="ital">Restricted GODDAG</emphasis>
                (<xref linkend="bib-5"/> and <xref linkend="bib-23"/>) are able to deal with the
                lyrics/chord overlap, as shown in <xref linkend="rGODDAG"/>.</para><figure xml:id="rGODDAG" floatstyle="1" xreflabel="Figure 1"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-001.png"/></imageobject><caption><para>The rGODDAG structure to handle lyrics and chords overlapping. The red
                            dashed line represents, here and in the following figures, the document
                            order. </para></caption></mediaobject></figure><para>Restricted GODDAGS gives strong support for overlapping structures and guarantees
                their full linearizability into TexMECS documents <xref linkend="bib-18"/>.</para><para>Restricted GODDAGs, on the other hand, will not help us with the management of
                popups. Restricted GODDAGS have some strong constraints that prevent this:</para><itemizedlist><listitem><para>each r-GODDAG node dominates a contiguous sequence of leaf nodes (i.e.,
                        nodes that contain text);</para></listitem><listitem><para>no two r-GODDAG nodes that are not connected by a dominance relation,
                        dominate the same subsequence of leaf nodes.</para></listitem></itemizedlist><para>For our popups, these constraints appear quite strong, and particularly the first
                one. Basically, the requirement of contiguity prevents two hierarchies to overlap on
                some leaf nodes whenever the content of other leaf nodes is different. </para><para>In our example, inserting in the same document both the lyrics
                structure and the one related to the popups implies breaking the contiguousness of
                one of the two hierarchies, because there is always a node that breaks the constraint.</para><para>Consider the situation in which a popup is associated to the string “this love of
                mine Will never die”. If the content of the fun fact is put at the end of the
                lyrics, as shown in <xref linkend="rGODDAGfactfail"/>, the elements “funfacts” and
                “fact” both dominate non-contiguous leaf nodes, as “this love of mine” is
                non-contiguous with “Paul McCartney...”, thus breaking the constraint.</para><figure xml:id="rGODDAGfactfail" floatstyle="1" xreflabel="Figure 2"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-002.png"/></imageobject><caption><para>The first tentative to make an r-GODDAG for describing both lyrics and
                            fun fact.</para></caption></mediaobject></figure><para>If, on the other hand, the content of the fun fact is put before or after the
                lines it refers to (<xref linkend="rGODDAGbodyfail"/>), the element “body” will
                dominate non-contiguous leaf nodes (two of them will be interrupted by the “Paul
                McCartney...” node that does not belong to that hierarchy) and therefore violate the
                constraint.</para><figure xml:id="rGODDAGbodyfail" floatstyle="1" xreflabel="Figure 3"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-003.png"/></imageobject><caption><para>The second tentative to make an r-GODDAG for describing both lyrics
                            and fun facts.</para></caption></mediaobject></figure><para>So, even if a restricted GODDAG is a more expressive data structure than a tree,
                it is still not sufficient to handle complex scenarios such as the ones described.
                The overall point of the contiguity constraint is to allow for embedding markup
                within text; thus r-GODDAG structures are indeed representable with milestones or
                fragmentation in XML, or with TexMECS documents, but more complex structures are
                still unavailable, such as those involving non-contiguous leaf nodes.</para></section><section><title>A more general data structure: the GODDAG</title><para>If we give up the feature of embedding, we already have a data structure for
                handling complex overlapping scenarios: the full GODDAG, which does not require the
                two constraints mentioned above: it does not require leaf nodes to follow document order, 
                and it does not require that any two different nodes dominate
                different sets of leaf nodes.</para><para>Without these restrictions, we can describe all four hierarchies in a GODDAG, as
                shown in <xref linkend="GODDAG"/>. The obvious disadvantage of this data structure
                is that embedding everything in a linear structure such as an XML
                document implies either losing some information or recurring to procedural tricks that
                would subject the structural meaning of the document to specific tools.</para><figure xml:id="GODDAG" floatstyle="1" xreflabel="Figure 4"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-004.png"/></imageobject><caption><para>A GODDAG for the four hierarchies defined for the song. Document order
                            is not shown since where it is not obvious (e.g., in the lyrics lines
                            it is completely arbitrary).</para></caption></mediaobject></figure><para>In <xref linkend="GODDAG"/> the full GODDAG structure of the three data hierarchies is
                shown: lyrics (in blue), the time in which the lyrics are shown (in yellow), the
                chords (in violet) and the fun facts popups (in green). Non-bordered nodes are
                content, bordered nodes represent markup: rectangles are XML elements, and
                rounded rectangles are XML attributes.</para><para>Moreover, through GODDAG we can handle cases of textual variants and some simple
                cases of repetitions (for instance, specifying the presence of the class attribute
                in multiple div elements, and even  specifying that the refrain text appears in
                multiple places, but is really only defined once). </para><para>GODDAG can also be employed for textual variants: as shown in <xref linkend="textualVariation"/>, since the refrain uses “her” if the lyrics are
                feminine and “him” otherwise, we actually generate two different and almost
                identical lyrics hierarchies that point to the text in different manners depending
                on the chosen gender. </para><figure xml:id="textualVariation" floatstyle="1" xreflabel="Figure 5"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-005.png"/></imageobject><caption><para>A GODDAG with repetitions and textual variants depending on the gender
                            of the lyrics.</para></caption></mediaobject></figure><para>Clearly, the best thing we can do for linearizing all these kinds of structures
                presented in this section is to use stand-off markup or twin documents techniques in
                order to embed all the elements in a rationally unique document. Expressing all 
                information in a single XML tree requires some procedural tricks: elements 
                with procedural values, for instance, are an acceptable
                trade-off between the structure and the relative document representation.</para></section><section><title>Beyond GODDAG: extensions for repeatability</title><para>Even if the GODDAG is able to handle perfectly all the above-mentioned scenarios,
                there are more things that are interesting to represent, such as a different type of
                repetition. The screenful of lines of the karaoke example is interesting in that
                sense. </para><figure xml:id="repetitions" floatstyle="1" xreflabel="Figure 6"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-006.png"/></imageobject><caption><para>This graph describes the structure of “And I Love Her” avoiding the
                            explicit repetitions (refrain and fourth stanza).</para></caption></mediaobject></figure><para>In <xref linkend="repetitions"/> we show a plausible graph for describing the
                entire structure of “And I Love Her” that avoids the explicit repetitions of the
                refrain and the fourth stanza. Note that the body element has many arcs going to the
                div of the refrain, and two going to the div of the fourth stanza, and that we had
                to specify the order of the arcs themselves.</para><para>Unfortunately, GODDAGs do not support this kind of repetitions. In fact, for any
                non-terminal node <emphasis role="ital">n</emphasis>, the sets of arcs from
                    <emphasis role="ital">n</emphasis> is ordered and, if two arcs <emphasis role="ital">n→a</emphasis> and <emphasis role="ital">n→b</emphasis> exist and if 
                <emphasis role="ital">a</emphasis>
                is equal to <emphasis role="ital">b</emphasis>, then <emphasis role="ital">n→a</emphasis> and <emphasis role="ital">n→b</emphasis> are the same arc. 
                This prevents us from creating
                multiple arcs from body to the refrain divs, which is exactly what we are trying to
                do.</para><para>In order to avoid this constraint, we need to extend the definition of GODDAGs. In
                particular, we believe that we can solve our problem by simply relaxing the rule
                that requires non-terminal nodes in GODDAGs to have a general identifier (a label)
                associated to them. We call <emphasis role="ital">anonymous</emphasis> all
                non-terminal nodes that do not have such general identifier and we refer to this new
                GODDAG as <emphasis role="ital">extended GODDAG</emphasis> (or <emphasis role="ital">e-GODDAG</emphasis>). Anonymous e-GODDAG nodes allow the definition of
                anonymous elements <emphasis role="ital">a la</emphasis> LMNL <xref linkend="bib-6"/>, and at the same time provide the necessary infrastructure for our repeating
                refrains.</para><para>Through e-GODDAG, in fact, we are now able to allow the previous repetitions: we
                have to add as many anonymous nodes as needed for any repetition we need. Since
                anonymous nodes do not introduce markup or content, they can be used to disambiguate
                multiple arcs going from and to the same nodes: each repeated arc from <emphasis role="ital">body</emphasis> to <emphasis role="ital">div</emphasis> is therefore
                interrupted by a different intermediate anonymous node<footnote xml:id="sdfootnote4sym"><para>An issue to consider relates to another GODDAG constraint: no node can
                        dominate another node both directly and indirectly. That simply means that
                        we need to add an anonymous node for each repeated arc of a node, and just
                        the ones after the first one, so that the we only have indirect dominance in all of
                        them.</para></footnote>. </para></section></section><section><title>EARMARKing cats and docs</title><para>The problems described in the previous section derive mostly from the very act of
            embedding annotations: multiple overlapping annotations, especially when referring to
            the same text multiple times and reordering the document order, do not naturally fit in
            a linear structure of an XML document, and analogously there is no natural position for
            embedding annotations to the whole document.</para><para>The opposite approach – full externalization of annotations <emphasis role="ital">a
                la</emphasis> RDF – does not satisfy our requirements, for different motivations.
            RDF annotations do not change the annotated resource in any way, but refer to it via
            URIs. The problem we face in this case is that there exists no URI referring to a
            fragment of text that is not wrapped within an XML or XHTML element provided with an ID.
            And since XHTML or XML elements need to follow a nice, hierarchical,
            document-order-compatible structure, we are back to the beginning with the problem of
            overlapping hierarchies that play with multiplicities and reshuffling of the document
            order.</para><para>An approach has been recently proposed in <xref linkend="bib-3"/> in order to try to
            offer a way to identify precise document locations, called <emphasis role="ital">pointers</emphasis>, through different means (character positions, string indexing,
            etc.) and languages (XPath <xref linkend="bib-12"/>, XPointer <xref linkend="bib-9"/>,
            etc.). Unfortunately some languages mentioned, such as XPointer, were never standardized
            by the W3C and there is no sign that they will ever be in the foreseeable future.
            Furthermore, from the RDF point of view all URIs are opaque strings referring to
            different resources, and as such it would be difficult to create ontologies and make
            inferences that differentiate assertions on text fragments from assertions on elements
            or other structures, the required infrastructure to verify overlapping or
            superimposition of assertions.</para><para>There is another (less important) consideration that comes down against a fully
            externalized approach: the fact that assertions are disjoint from the original document
            and require a more articulated process for storing and transfer (this is known as the 
            so-called fragility of standoff markup). Consider the case of
            textual variations in our karaoke example: it would be useful to handle all variations
            (and any other overlapping hierarchy) within a unique document, easier to move and
            manipulate. The <emphasis role="ital">variant graph</emphasis> approach, introduced in
                <xref linkend="bib-2"/>, goes in that direction and allows users to express these
            differences and to extract multiple text linearizations, depending on the particular
            context.</para><para>Our approach takes inspiration from this work and from the GODDAG-related theories.
            The goal is to introduce a new syntactic approach for overlapping markup that combines
            advantages of embedded and external annotations into a unified framework. In this
            section we define an ontology-based model for expressing such complex overlapping
            structures, similar but more general of existing research efforts such as <xref linkend="bib-4"/> and <xref linkend="bib-11"/>.
        </para><para>A very central point of our proposal is the reliance on Semantic Web
            technologies. The reason is that we want to create tools that can exploit existing
            modules, that can be integrated with other applications and that can be extended by
            other researchers too.</para><para>As expected, RDF and OWL are the candidates for our proposal. Actually, we propose an
            intermediate language built on the top of RDF and OWL data model, that can be
            straightforwardly translated into these standards. We called this language <emphasis role="ital">EARMARK</emphasis> (<emphasis role="ital">Extreme Annotational RDF
                Markup</emphasis>). EARMARK allows us to build e-GODDAG-equivalent data structures
            that encode all the aforementioned scenarios. High-level data structures can be then
            instantiated into W3C standard documents, easy to integrate in legacy tools and
            environments.</para><para>Basically, EARMARK allows us create assertions on text fragments by using an
            intermediate ontology that subsumes the XPointer schemas in a manageable way and builds
            from there the concepts of markup structures and generic identifiers useful for the
            specification of elements and attributes.</para><section><title>General model</title><para>This section describes the model behind EARMARK, <emphasis role="ital">Extreme
                    Annotational RDF Markup. </emphasis>The model itself is defined through an OWL
                document specifying classes and relationships. Through these classes we can produce
                EARMARK documents with assertions about individuals.</para><para>We introduce four concepts: <emphasis role="ital">docuverses</emphasis>, <emphasis role="ital">locations</emphasis>, <emphasis role="ital">ranges</emphasis> and
                    <emphasis role="ital">markup items</emphasis>. Each of them is represented in
                EARMARK with a different (and disjoint) OWL class. The following code snippets are
                written using Turtle <xref linkend="bib-13"/>
                <footnote xml:id="sdfootnote5sym"><para>In all code examples we will also be implying the following
                        prefixes:</para><para>
                        <programlisting xml:space="preserve">
    @prefix :        &lt;http://www.essepuntato.it/2008/12/earmark#&gt; .
    @prefix rdf:     &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
    @prefix rdfs:    &lt;http://www.w3.org/2000/01/rdf-schema#&gt; .
    @prefix owl:     &lt;http://www.w3.org/2002/07/owl#&gt; .
    @prefix xsd:     &lt;http://www.w3.org/2001/XMLSchema#&gt; .
    @prefix swrl:    &lt;http://www.w3.org/2003/11/swrl#&gt; .
    @prefix swrlb:   &lt;http://www.w3.org/2003/11/swrlb#&gt; .</programlisting>
                    </para></footnote>.</para><para>The textual content of a EARMARK document is conceptually separated from the
                annotations, and is referred to by means of assertions on the specific class called
                “Docuverse”. This class (and its name) is based on the concept introduced by Ted
                Nelson in his Xanadu Project <xref linkend="bib-14"/> to refer to the collection of
                text fragments that can be interconnected to each other and transcluded into new
                documents. </para><para>The individuals of this class represent the object of discourse, i.e. all the text
                containers related to a particular EARMARK document.</para><programlisting xml:space="preserve">:Docuverse
      a       owl:Class ;
      rdfs:subClassOf owl:Thing .

:has-text
      a       owl:FunctionalProperty , owl:DatatypeProperty ;
      rdfs:domain :Docuverse ;
      rdfs:range xsd:string .

:has-uri
      a       owl:FunctionalProperty , owl:DatatypeProperty ;
      rdfs:domain :Docuverse ;
      rdfs:range xsd:anyURI .</programlisting><para> Any individual of the <emphasis role="ital">Docuverse</emphasis> class – commonly
                called a <emphasis role="ital">docuverse</emphasis> (lowercase to distinguish it
                from the class) – might <emphasis role="ital">contain</emphasis> or <emphasis role="ital">refer to</emphasis> the text fragments representing the actual
                content of the document. That is expressed through two properties: <emphasis role="ital">has-uri</emphasis> if the content is stored at a particular URI and
                    <emphasis role="ital">has-text</emphasis> if the content lies in the document
                itself.</para><para>A <emphasis role="ital">location</emphasis> is the expression of a position in a
                particular docuverse. It is an instance of the class
                    “<emphasis role="ital">Location</emphasis>”. The property <emphasis role="ital">at</emphasis> defines a precise point in the docuverse, while the property
                    <emphasis role="ital">refers-to </emphasis>indicates the docuverse the location
                refers to.</para><programlisting xml:space="preserve">:Location
      a       owl:Class ;
      rdfs:subClassOf owl:Thing .

:refers-to
      a       owl:FunctionalProperty , owl:ObjectProperty ;
      rdfs:domain :Location ;
      rdfs:range :Docuverse .

:at   a       owl:FunctionalProperty , owl:DatatypeProperty ;
      rdfs:domain :Location ;
      rdfs:range xsd:string .</programlisting><para> The value for the property <emphasis role="ital">at </emphasis>is a string. The
                overall ontology is then independent from the actual addressing mechanism. In fact,
                we expect several syntaxes to be used there, including XPointers.</para><para>We then define the class “<emphasis role="ital">Range”</emphasis> for any text
                lying between two locations:</para><programlisting xml:space="preserve">:Range
      a       owl:Class ;
      rdfs:subClassOf owl:Thing .

:begins
      a       owl:FunctionalProperty , owl:ObjectProperty ;
      rdfs:domain :Range ;
      rdfs:range :Location .

:ends
      a       owl:FunctionalProperty , owl:ObjectProperty ;
      rdfs:domain :Range ;
      rdfs:range :Location .</programlisting><para> A range, i.e, an individual of the class Range, is defined by a starting and an
                ending location through the properties <emphasis role="ital">begins</emphasis> and
                    <emphasis role="ital">ends</emphasis> respectively. These locations must refer
                to the same docuverse. Since this restriction cannot be directly expressed in OWL,
                we add the following SWRL <xref linkend="bib-15"/> rules to enforce that
                    constraint<footnote xml:id="sdfootnote6sym"><para>Because of the functional property declarations of “begins” and “end” and
                        the SWRL rules illustrated, an EARMARK document will be consistent if and
                        only if the constraint is valid. Otherwise, there will be a range with two
                        locations that refer to two different documents.</para></footnote>:</para><equation xml:id="swrl1" xreflabel="(1)"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-007.png"/></imageobject></mediaobject></equation><equation xml:id="swrl2" xreflabel="(2)"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-008.png"/></imageobject></mediaobject></equation><para>There is no restriction on locations used for the
                <emphasis role="ital">begins</emphasis> and <emphasis role="ital">ends</emphasis>
                properties. That is very useful: it allows us to define ranges that “follow”
                or “reverse” the text order of the docuverse they refer to. For instance, the string
                “desserts” can be considered both in document order, with the <emphasis role="ital">begins</emphasis> location lower than the <emphasis role="ital">ends</emphasis>
                location or in the opposite one, forming “stressed”<emphasis role="ital">
                    <footnote xml:id="sdfootnote7sym"><para>An interesting example of <emphasis role="ital">semordnilap,
                            </emphasis>http://en.wikipedia.org/wiki/Palindrome#Semordnilaps</para></footnote>
                </emphasis>. Thus, the properties “begins” and “ends” define the way a range must be
                read.</para><para>The class “<emphasis role="ital">MarkupItem</emphasis>” is the superclass defining
                artefacts to be interpreted as markup (such as elements and attributes).</para><programlisting xml:space="preserve">:MarkupItem
      a       owl:Class ;
      rdfs:subClassOf owl:Thing .

:has-general-identifier
      a       owl:FunctionalProperty , owl:DatatypeProperty ;
      rdfs:domain :MarkupItem ;
      rdfs:range xsd:string .</programlisting><para> A <emphasis role="ital">markupitem</emphasis> individual is a sequence (rdf:Bag
                or rdf:Seq) of individuals belonging to the classes MarkupItem and Range. Is it then
                possible to define elements containing nested elements or text, or attributes
                containing values, as well as overlapped and complex structures.</para><para>A markupitem might have a name, specified in the property “<emphasis role="ital">has-general-identifier</emphasis>” (recalling the SGML term to refer to the
                name of elements <xref linkend="bib-16"/>). Note that we can classify markup items
                as <emphasis role="ital">anonymous – </emphasis>as possible in LMNL<xref linkend="bib-6"/> and e-GODDAG – by simply not asserting a general identifier
                for the items. </para><para>All the concepts represented by an EARMARK document are expressed using these four
                disjoint classes and their relative properties:</para><programlisting xml:space="preserve">[]    a       owl:AllDisjointClasses ;
      owl:members (:Docuverse :Location :MarkupItem :Range) .</programlisting></section><section><title> Detailed model</title><para>The model discussed so far gives us a general picture of the EARMARK framework
                and, as expected, is not enough to describe all the scenarios we are interested in.
                We then need to refine our model. Such a refinement is actually a specialization of
                three classes – all except “Range” – in subclasses that apply specific
                restrictions.</para><para>First of all, the class Docuverse is specified into a “StringDocuverse” (the
                content is specified as value of <emphasis role="ital">has-text</emphasis> and no
                value is associated to <emphasis role="ital">has-uri</emphasis>) or an
                “URIDocuverse” (the actual content is located at the URL specified in <emphasis role="ital">has-uri</emphasis> and no value is given to <emphasis role="ital">has-text</emphasis>).</para><programlisting xml:space="preserve">:StringDocuverse
      a       owl:Class ;
      rdfs:subClassOf :Docuverse ;
      owl:equivalentClass
              [ a       owl:Class ;
                owl:intersectionOf (:Docuverse [ a       owl:Restriction ;
                            owl:cardinality "1"^^xsd:nonNegativeInteger ;
                            owl:onProperty :has-text
                          ] [ a       owl:Restriction ;
                            owl:cardinality "0"^^xsd:nonNegativeInteger ;
                            owl:onProperty :has-uri
                          ])
              ] .

:URIDocuverse
      a       owl:Class ;
      rdfs:subClassOf :Docuverse ;
      owl:equivalentClass
              [ a       owl:Class ;
                owl:intersectionOf (:Docuverse [ a       owl:Restriction ;
                            owl:cardinality "0"^^xsd:nonNegativeInteger ;
                            owl:onProperty :has-text
                          ] [ a       owl:Restriction ;
                            owl:cardinality "1"^^xsd:nonNegativeInteger ;
                            owl:onProperty :has-uri
                          ])
              ] .

[]    a       owl:AllDisjointClasses ;
      owl:members (:StringDocuverse :URIDocuverse) .</programlisting><para> Depending on particular scenarios or on the kind of docuverse we are dealing with
                – it could be plain-text, XML, LaTeX, a picture, etc. – we need to be able to use
                different kinds of locations. Therefore, the class “Location” has at least three
                different disjoint subclasses:</para><programlisting xml:space="preserve">:CharNumberLocation
      a       owl:Class ;
      rdfs:subClassOf :Location .

:XPathLocation
      a       owl:Class ;
      rdfs:subClassOf :Location .

:XPointerLocation
      a       owl:Class ;
      rdfs:subClassOf :Location .

[]    a       owl:AllDisjointClasses ;
      owl:members (:CharNumberLocation :XPathLocation :XPointerLocation) .</programlisting><itemizedlist><listitem><para>“CharNumberLocation” defines a location by counting characters. In that
                        case, the string value of the “at” property must be an integer<footnote xml:id="sdfootnote8sym"><para>It is a positive integer (including zero) if we are counting from
                                the begin of the document to the end, and a negative integer for
                                vice versa.</para></footnote> that identifies an unambiguous position in the character
                        stream;</para></listitem><listitem><para>“XPathLocation” defines a location as a node of an XML docuverse. In this
                        case, the property “at” will be an XPath expression <xref linkend="bib-12"/>;</para></listitem><listitem><para>“XPointerLocation” defines a precise point in a docuverse. In that case,
                        the expression “xpointer(point(.42))”, for instance, indicates the cursor
                        in-between the 42nd and the 43rd character; with “xpointer(point(/1/9.3))”
                        we mean the cursor between the 3rd and the 4th character of the ninth node
                        of the root, and so on.</para></listitem></itemizedlist><para>MarkupItem is specialized in three disjointed sub-classes: “Element”, “Attribute”
                and “Comment”, that allow a more precise characterization of markup items.</para><programlisting xml:space="preserve">:Element
      a       owl:Class ;
      rdfs:subClassOf :MarkupItem .

:Attribute
      a       owl:Class ;
      rdfs:subClassOf :MarkupItem .

:Comment
      a       owl:Class ;
      rdfs:subClassOf :MarkupItem .

[]    a       owl:AllDisjointClasses ;
      owl:members (:Attribute :Comment :Element) .</programlisting><para> Through this classification, shown also in <xref linkend="earmarkontology"/>, we
                can describe all the concepts introduced by XML, LMNL or TexMecs, including virtual
                elements <xref linkend="bib-17"/>
                <xref linkend="bib-18"/>, structured attributes <xref linkend="bib-6"/> and so on. </para><figure xml:id="earmarkontology" floatstyle="1" xreflabel="Figure 7"><mediaobject><imageobject><imagedata format="png" fileref="../../../vol3/graphics/Peroni01/Peroni01-009.png"/></imageobject><caption><para>The class hierarchy of the EARMARK ontology.</para></caption></mediaobject></figure><para>In order to discuss such potentialities, the next section analyses in detail the
                EARMARK encoding of the aforementioned karaoke example.</para></section><section><title>“And I Love Her” in EARMARK</title><para>The approach to mark up a complex document with EARMARK is composed of the
                following steps:</para><itemizedlist><listitem><para>creation of one or more docuverses depending on the number of data streams
                        we must handle;</para></listitem><listitem><para>identification of the ranges within the docuverses;</para></listitem><listitem><para>identification of the <emphasis role="ital">leaf</emphasis> markup items,
                        i.e. those containing attributes and ranges only;</para></listitem><listitem><para>identification of the <emphasis role="ital">internal</emphasis> markup
                        items, i.e. those containing markup items or a mixed content of markup items
                        and ranges.</para></listitem></itemizedlist><para>Let us take into consideration the fragment of the lyrics of “And I Love Her” by
                The Beatles, introduced in <xref linkend="xml"/>.</para><para>In EARMARK strings are placed in one or more docuverses. As mentioned, there are
                two different types of docuverses: <emphasis role="ital">autonomous
                    resources</emphasis> (i.e., independent files identified by a URIs, appropriate
                for the actual lyrics of the song and the content of the fun fact popups) and
                    <emphasis role="ital">local strings</emphasis> (i.e. an internal data value,
                appropriate for strings that do not exist as independent units like attribute
                values, metadata, and so on).</para><para>For the XML version of “And I Love Her”<footnote xml:id="sdfootnote9sym"><para>The complete Turtle example if “And I Love Her” is available at
                        “http://www.essepuntato.it/2009/01/andiloveher.ttl”.</para></footnote>, we will employ four docuverses: </para><itemizedlist><listitem><para>an independent text file with the lyrics;</para></listitem><listitem><para>a local string containing strings for all attribute values;</para></listitem><listitem><para>a local string containing the timings of the screenfuls of lyrics;</para></listitem><listitem><para>an independent file with a selection of fun facts. This could just as well
                        be an existing, independent HTML resource such as the one in
                        http://www.songfacts.com/detail.php?id=43.</para></listitem></itemizedlist><para>Note that we have immediately introduced the machinery for overlapping elements
                and shared text fragments. We can also add any additional annotation (such as
                spaces, separators, etc.) to each docuverse in order to make it more readable. We
                will be explicitly ignoring the non-relevant text within the docuverses. </para><para>The Turtle translation of the docuverses could be<footnote xml:id="sdfootnote10sym"><para>The prefix “e” refers to
                        “http://www.essepuntato.it/2009/01/andiloveher#”.</para></footnote>:</para><programlisting xml:space="preserve">
     e:lyrics
         a :URIDocuverse ; :has-uri "http://www.essepuntato.it/2009/01/andiloveher.txt"^^xsd:anyURI .
     
     e:funfacts    
         a :URIDocuverse ; :has-uri "http://www.songfacts.com/detail.php?id=43"^^xsd:anyURI .
     
     e:attribute_values    
         a :StringDocuverse ; :has-text "stanza - refrain - 4"^^xsd:string .
     
     e:time_values    
         a :StringDocuverse ; :has-text "68 - 72 - 76 – 80 - 84"^^xsd:string .</programlisting><para> All the strings defining the actual text content of an EARMARK document are
                identified by ranges. Ranges refer to any of the docuverses, and can overlap and
                invert order. For example, the ranges for the refrain and the last chord overlap
                over the same range.</para><para>We next define a range for each text node of the song, encoded as element or
                attribute, e.g.:</para><programlisting xml:space="preserve">
    e:r_refrain_1    
        a :Range ; :begins e:location0-lyrics ; :ends e:location6-lyrics .
    
    e:r_refrain_2    
        a :Range ; :begins e:location6-lyrics ; :ends e:location14-lyrics .
    
    e:r_attribute_class_refrain 
        a :Range ; :begins e:location9-attribute_values ; :ends e:location16-attribute_values .
    
    e:location0-lyrics    
        a :XPointerLocation ; :refers-to lyrics ; :at "xpointer(point(.0))"^^xsd:string .
    
    e:location6-lyrics    
        a :XPointerLocation ; :refers-to lyrics ; :at "xpointer(point(.6))"^^xsd:string .
    
    e:location14-lyrics    
        a :XPointerLocation ; :refers-to lyrics ; :at "xpointer(point(.14))"^^xsd:string .
    
    e:location9-attribute_values    
        a :XPointerLocation ; :refers-to attribute_values ; :at "xpointer(point(.9))"^^xsd:string .
    
    e:location16-attribute_values    
        a :XPointerLocation ; :refers-to attribute_values ; :at "xpointer(point(.16))"^^xsd:string .</programlisting><para> Some ranges can be used more than once in the final EARMARK document. For
                instance, the “r_refrain_2” range is used both in the refrain of the song and in the
                last chord of the refrain.</para><para>Using these ranges we can now create the leaf markup items, i.e. all the
                attributes and all the <emphasis role="ital">first-level</emphasis> elements. The
                latter are all the elements that have a simple content, i.e., sequences of ranges
                and attributes only.</para><para>Given an e-GODDAG node <emphasis role="ital">N</emphasis>, an EARMARK markup item
                is made as follows:</para><itemizedlist><listitem><para>it has an identifier generated randomly;</para></listitem><listitem><para>the name of <emphasis role="ital">N</emphasis>, if it exists, is the
                        general identifier;</para></listitem><listitem><para>all children non-terminal nodes of <emphasis role="ital">N</emphasis> are
                        translated into individual markup items. They are recursively generated with
                        these same rules;</para></listitem><listitem><para>the ranges corresponding to the text content end up as the sequence of the
                        new markup item.</para></listitem></itemizedlist><para>In the next piece of code we take into consideration both the e-GODDAG structure
                and the implicitly given XML description for all the markup items, that defines the
                kind – <emphasis role="ital">element</emphasis> or <emphasis role="ital">attribute</emphasis> – of each of them. For instance, the Turtle translation of
                the attribute class and of the p element of the refrain, using the ranges previously
                defined, is:</para><programlisting xml:space="preserve">
    e:attr_refrain_class    
        a :Attribute , [ a rdf:Bag ; rdf:_1 e:r_attribute_class_refrain ] ;    
        :has-general-identifier "class"^^xsd:string .
    
    e:refrain_div    
        a :Element , [ a rdf:Seq ; rdf:_1 e:attr_refrain_class ; rdf:_2 e:refrain_p ] ;    
        :has-general-identifier "div"^^xsd:string .
    
    e:refrain_p    
        a :Element , [ a rdf:Seq ; rdf:_1 e:r_refrain_1 ; rdf:_2 e:r_refrain_2 ] ;    
        :has-general-identifier "p"^^xsd:string .</programlisting><para> The difference between those leaf elements that are simply sequences of ranges
                and those that are sequences of attributes and ranges mirrors the difference between
                types in XML Schema <xref linkend="bib-19"/>, with the former resembling simple
                type elements with simple content, and the latter resembling complex type elements
                with simple content and attributes.</para><para>The expressiveness of e-GODDAG's is clearly within EARMARK's : through EARMARK we
                can express general digraphs with or without <emphasis role="ital">repeatable
                    edges</emphasis> depending on the particular context we are taking into
                consideration.</para><para>Through such digraphs we can handle particular scenarios that involve overlapping
                – i.e. different elements partially dominate the same content, such as with chords
                and lines – as well as virtual elements – i.e. non-contiguous ranges are contained
                by a markup item, such as with the fun fact pop-up.</para><para>Finally, it is interesting to note that EARMARK is actually more expressive than
                e-GODDAGs. Consider the case of unordered items. Although e-GODDAGs always considers
                ordered markup items and ranges within a container, EARMARK allows us to specify
                whether the items are ordered or not, by simply using “rdf:Seq” and “rdf:Bag”
                container classes. The ordering of inner elements becomes a matter of explicit
                choice rather than implicitly given by the markup embedding.</para><para>Consequently, EARMARK even allows us to specify sequences of attributes, elements
                and ranges in any arbitrary order. Differently than XML, LMNL and TexMECS, EARMARK
                makes possible sequences in which attributes, elements and ranges are freely mixed
                in any order, including elements followed by attributes followed by other elements
                and so on. Moreover, the same global identifier can be specified for multiple
                attributes in the sequence (i.e., EARMARK allows multiple attributes with the same
                name for the same element). These situations are not directly expressible in any embedded
                markup model.</para></section></section><section xml:id="embedding"><title>Embedding EARMARK documents</title><para>The process of generating a linearized structure (such as an XML document) from a set
            of EARMARK annotations is not immediate, mostly because of the substantially greater
            expressive power of EARMARK annotations. Without loss of generality, we will be
            describing a conversion to XML, since converting to LMNL or TexMecs will constitute a
            much simpler exercise of stopping the linearization a few steps earlier. </para><para>Although the conversion of any EARMARK subset that already describes a tree is
            obviously immediate and fully automatic, several different options exist for any further
            EARMARK annotations that we wish to linearize. Since these additional annotations are at
            odds with a tree-like structure, we need to use a few embedding tricks to obtain a
            well-formed XML document, and of course the choice of tricks to use is wide and rich. In
            this section we will explore the task of linearizing a chosen tree-shaped subset of the
            EARMARK document, and in the following section we will describe a few options for the
            remaining assertions. </para><para>The construction of the tree we envision is bottom up:</para><itemizedlist><listitem><para>the first step is deciding which docuverses (or fragments thereof) will
                    constitute the content of the document, which the content of the attributes, and
                    which, if any, will be ignored;</para></listitem><listitem><para>then a subset of the first-level elements needs to be chosen, as well as the
                    ranges they contain. Of course, no overlapping or reverse order ranges can be
                    accepted as such;</para></listitem><listitem><para>there might well be the situation whereby multiple independent sets of
                    first-level elements exist, each of which is by itself non-overlapping, but
                    combined with others would. In this situation, of course, only one set can be
                    selected as the main hierarchy, and all others will need to employ an embedding
                    trick to be expressed in the final linearized document. One possible way to do
                    so is to create independent sets of elements and hierarchy over elements, and
                    then choose the largest set as composing the principal hierarchy, and all others
                    as candidates for tricks;</para></listitem><listitem><para>mixed content elements are sequences of ranges and first-level elements, and
                    are generated once all contained elements are ready;</para></listitem><listitem><para>similarly, structure elements (only containing other elements) are available
                    for creation once their content is already generated;</para></listitem><listitem><para>finally, attributes and their ranges are selected as well and converted into
                    linearized form and associated to their elements;</para></listitem><listitem><para>the final result of this linearization is possibly a selection of separate and
                    disjoint trees, each linearizing a connected component of the EARMARK document.
                    It is then a linearization choice either to generate several independent XML
                    documents or to employ the <emphasis role="ital">universal root
                        </emphasis>pattern<footnote xml:id="sdfootnote11sym"><para> http://www.xmlpatterns.com/UniversalRootMain.shtml</para></footnote> and include these structures within a single <emphasis role="ital">new</emphasis> elements that become their container.</para></listitem></itemizedlist><para>Whatever is left out of this linearization process needs to be approached using one or
            more of the methods described in the next section.</para></section><section><title>Handling the remaining EARMARK structures</title><para>Some kinds of EARMARK structures are not directly linearizable by embedding. In order
            to allow a full representation of the EARMARK document we therefore need to apply some
            stratagem to force the hierarchical structure to accept these <emphasis role="ital">remaining structures</emphasis>.</para><para>Reasonably, frequent unmanaged structures would include:</para><itemizedlist><listitem><para>overlapping leaf elements referring to contiguous ranges;</para></listitem><listitem><para>overlapping leaf elements referring to non-contiguous ranges;</para></listitem><listitem><para>shared ranges;</para></listitem><listitem><para>text variants;</para></listitem><listitem><para>overlapping structural elements;</para></listitem><listitem><para>structured attributes.</para></listitem></itemizedlist><para>In <xref linkend="embedding"/> we listed the EARMARK assertions that could not be
            directly translated into an XML document. Let us examine a few potential approaches
            (which we call <emphasis role="ital">embedding </emphasis>
            <emphasis role="ital">tricks</emphasis>) for forcing the conversion. A few of such
            approaches, as well as algorithms for passing from one to the other, are described in
                <xref linkend="bib-7"/>
        </para><section><title>Milestones</title><para>Plain overlapping leaf elements (i.e. elements that partially share the text
                content, but no lower structures) may be forced into an XML structure via <emphasis role="ital">milestones</emphasis> as proposed in CLIX <xref linkend="bib-20"/>.</para><para>The open and close tags of the unconverted elements are considered as individual
                empty elements placed in the positions where they should reside. The attribute role
                specifies whether the empty element corresponds to a start or end tag, and the
                    <emphasis role="ital">sID</emphasis> and <emphasis role="ital">eID</emphasis>
                attributes connect the two elements in a single conceptual one. </para><programlisting xml:space="preserve">&lt;body&gt;
    &lt;div class=”stanza” title=”4”&gt;
        &lt;p&gt;
            &lt;chord name=”G”&gt;Will never&lt;/chord&gt;
            &lt;chord name=”A” clix:role=”start-range” clix:sID=”A”/&gt;
            die
        &lt;/p&gt;
    &lt;/div&gt;
    &lt;div class=”refrain”&gt;
        &lt;p&gt;
             And I
             &lt;chord name=”A” clix:role=”end-range” clix:eID=”A”/&gt; 
             &lt;chord name=”D”&gt;love her&lt;/chord&gt;
        &lt;/p&gt;
    &lt;/div&gt;
&lt;/body&gt;</programlisting><para> Although easy to implement and appreciate, milestones are nonetheless limited in
                that only frontier overlapping (i.e., overlapping on ranges) is expressible. </para></section><section><title>Fragmentation</title><para>Another approach is to use fragmentation as introduced by the TEI guidelines <xref linkend="bib-1"/>.</para><para>Overlapping elements are separated in many multiple fragments each of which
                properly nests within their container. Individual fragments are then connected via
                attributes such as <emphasis role="ital">next</emphasis> or <emphasis role="ital">previous</emphasis>. </para><programlisting xml:space="preserve">&lt;body&gt;
    &lt;div class=”stanza” title=”4”&gt;
        &lt;p&gt;
            &lt;chord name=”G”&gt;Will never&lt;/chord&gt;
            &lt;chord name=”A” xml:id=”a1” next=”a2”&gt;die&lt;/chord&gt;
        &lt;/p&gt;
    &lt;/div&gt;
    &lt;div class=”refrain”&gt;
        &lt;p&gt;
             &lt;chord name=”A” xml:id=”a2”&gt;And I&lt;/chord&gt; 
             &lt;chord name=”D”&gt;love her&lt;/chord&gt;
        &lt;/p&gt;
    &lt;/div&gt;
&lt;/body&gt;</programlisting></section><section><title> Repetitions</title><para>The easiest embedding trick for dealing with shared ranges is simply to multiply
                the instances of the corresponding text and possibly annotate that all instances
                except the first one is redundant.</para><programlisting xml:space="preserve">&lt;p&gt;
    &lt;span class=”repeat” title=”r_refrain_1”&gt;And I &lt;/span&gt;
    &lt;span class=”repeat” title=”r_refrain_2”&gt;love her&lt;/span&gt;
&lt;/p&gt;</programlisting></section><section><title> Hidden variants</title><para>When we have multiple variants of the same text, we may want to hide in
                substructures (such as attributes or subelements) the alternative variants.</para><programlisting xml:space="preserve">&lt;p&gt;And I love &lt;span class=”alternative” title=”him”&gt;her&lt;/span&gt;&lt;/p&gt;</programlisting></section><section><title> RDFa</title><para>RDFa <xref linkend="bib-21"/> allows arbitrary assertions to be placed on
                existing elements. It is understood that if an assertion exists over a text fragment
                that is not wrapped within an existing element, a generic element (such as the HTML
                    <emphasis role="ital">span</emphasis>) is added to allow for RDFa assertions to
                attach to the corresponding content. </para><para>For instance, support for overlapping inner structures are difficult to provide in
                either fragmentation or milestones, but become possible in RDFa. Consider for
                instance the sequence which contains individual chord elements and overlaps with the
                    <emphasis role="ital">div</emphasis> element containing individual <emphasis role="ital">p</emphasis> elements. </para><para>RDFa thus supports the specification of a virtual instance of the class Chords,
                expressed as a sequence of three instances of the Chord class (in fact, one instance
                each of subclasses GChord, AChord and DChord of the Chord class) as follows:</para><programlisting xml:space="preserve">&lt;body about=”#Chs” typeof=”#Chords”&gt;
    &lt;div typeof=”rdf:Seq” property=”rdf:_1” href=”#G”&gt;
        &lt;p property=”rdf:_2” href=”#A”&gt;
            &lt;span about=”#G” typeof=”#GChord” property=”#has”&gt;
                Will never
            &lt;/span&gt;
            &lt;span 
                about=”#A” typeof=”#AChord” property=”#has-first-part”&gt;
                die
            &lt;/span&gt;
        &lt;/p&gt;
    &lt;/div&gt;
    &lt;div property=”rdf:_3” href=”#D”&gt;
        &lt;p&gt;
             &lt;span about=”#A” property=”#has-second-part”&gt;
                 And I
             &lt;/span&gt;
             &lt;span about=”#D” typeof=”#DChord” property=”has”&gt;
                 love her
             &lt;/span&gt;
        &lt;/p&gt;
    &lt;/lg&gt;
&lt;/body&gt;</programlisting></section><section><title> Embedded RDF </title><para>When all else fails, the fallback approach is simply to place the remaining
                assertions as an RDF/XML block in the XML structure, either in a block properly
                thought out for external vocabularies, or converted into some local vocabulary, or
                even as a lump of XML elements placed in a random position within the document. </para><para>This is useful, for instance, for dealing with structured attributes <emphasis role="ital">a la</emphasis> LMNL <xref linkend="bib-6"/>. In the following
                example, a RDF block is inserted in the XML document to provide support for the
                attribute <emphasis role="ital">name</emphasis> of the <emphasis role="ital">chord</emphasis> element, which contains a structure of two different values
                wrapped by elements <emphasis role="ital">normal</emphasis> and <emphasis role="ital">jazzy</emphasis>. This allows the <emphasis role="ital">name</emphasis> of the <emphasis role="ital">chord</emphasis> to cater for both
                a pop and a jazz rendering of the tune, while at the same time remaining one
                attribute of one element. </para><programlisting xml:space="preserve">&lt;body&gt;
  &lt;rdf:RDF     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns="http://www.essepuntato.it/2008/12/earmark#"&gt;
       &lt;Attribute rdf:about="#attr_chord_structured"&gt;
          &lt;has-general-identifier rdf:datatype="xsd:string"&gt;
            name
          &lt;/has-general-identifier&gt;
          &lt;rdf:type&gt;
            &lt;rdf:Seq&gt;
                &lt;rdf:li rdf:resource="#normal"/&gt;
                &lt;rdf:li rdf:resource="#jazzy"/&gt;
            &lt;/rdf:Seq&gt;
          &lt;/rdf:type&gt;
       &lt;/Attribute&gt;
       &lt;Element rdf:about="#normal"&gt;
          &lt;has-general-identifier rdf:datatype="xsd:string"&gt;
            normal
          &lt;/has-general-identifier&gt;
          &lt;rdf:type&gt;
            &lt;rdfs:Bag&gt;
                &lt;rdf:li rdf:resource="#r_chord_D"/&gt;
            &lt;/rdfs:Bag&gt;
          &lt;/rdf:type&gt;
       &lt;/Element&gt;
       &lt;Element rdf:about="#jazzy"&gt;
          &lt;has-general-identifier rdf:datatype="xsd:string"&gt;
            jazzy
          &lt;/has-general-identifier&gt;
          &lt;rdf:type&gt;
            &lt;rdfs:Bag&gt;
              &lt;rdf:li rdf:resource="#r_chord_Dmaj7"/&gt;
            &lt;/rdfs:Bag&gt;
          &lt;rdf:type&gt;
       &lt;/Element&gt;
  &lt;rdf:RDF&gt;
  ...
&lt;/body&gt;</programlisting></section></section><section><title> Conclusions</title><para>In this paper we presented the Extreme Annotational RDF Markup (EARMARK), a proposal
            for expressing GODDAGs (and e-GODDAG) structures in a general metamarkup language that
            does not rely on embedding, and that integrates the advantages of standoff annotations
            and embedded markup into a single unifying framework. </para><para>Through EARMARK authors can express a large number of markup assertions and
            observations that would otherwise be non expressible, including overlapping elements,
            elements over non contiguous ranges, repeated structures, text variants, overlapping
            hierarchies, structured attributes, etc.</para><para>In further work we plan both to explore the application of the structural patterns
            defined in <xref linkend="bib-22"/>, trying to give a formal ontological demonstration
            if an EARMARK document follows them, and to explore, both formally and pragmatically,
            the expressive power of the EARMARK language and its applications.</para></section><section><title>Acknowledgements</title><para>The authors wish to thank all that have commented on this paper and on EARMARK in general. 
            The anonymous reviewers of this and other EARMARK papers were incredibly useful in smoothing
            out both the basic concepts and the written explanation of the most oscure parts of the proposal. 
            The participants to the Goddag Workshop in Amsterdam (December 2008) have in no small part 
            provided thought fodder for what ended up becoming EARMARK. Finally, we wish to thank explicitly 
            Michael Sperberg-McQueen and Federico Meschini for their help, comments and suggestions.</para></section><bibliography><title>Bibliography</title><bibliomixed xml:id="bib-21" xreflabel="[1]">Adida, B., Birbeck, M., McCarron, S.,
            Pemberton, S. (2008). RDFa in XHTML: Syntax and processing. W3C Recommendation. World
            Wide Web Consortium. <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/rdfa-syntax/</link>.</bibliomixed><bibliomixed xml:id="bib-13" xreflabel="[2]">Becket, D., Berners-Lee, T. (2008). Turtle -
            Terse RDF Triple Language. W3C Team Submission.
                <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TeamSubmission/turtle/</link>.</bibliomixed><bibliomixed xml:id="bib-12" xreflabel="[3]">Berglund, A., Boag, S., Chamberlin, D.,
            Fernández, M. F., Kay, M., Robie, J., Siméon, J. (2007). XML Path Language (XPath) 2.0.
            W3C Recommendation. <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/xpath20/</link>.</bibliomixed><bibliomixed xml:id="bib-22" xreflabel="[4]">Dattolo, A., Di Iorio, A., Duca, S., Feliziani,
            A.A., Vitali, F. (2007). Structural patterns for descriptive documents. In the
            Proceedings of the Seventh International Conference on Web Engineering 2007, Como,
            Italy, 2007.</bibliomixed><bibliomixed xml:id="bib-9" xreflabel="[5]">DeRose, S., Maler, E., Daniel, R. (2001). XML
            Pointer Language (XPointer) Version 1.0. W3C Candidate Recommendation.</bibliomixed><bibliomixed xml:id="bib-20" xreflabel="[6]">DeRose. S. (2004). Markup overlap: A review and
            a horse. In Extreme Markup Languages.</bibliomixed><bibliomixed xml:id="bib-16" xreflabel="[7]">Goldfarb, C. F. (1990). The SGML Handbook.
            Oxford University Press, USA.</bibliomixed><bibliomixed xml:id="bib-15" xreflabel="[8]">Horrocks, I., Patel-Schneider, P. F., Boley, H.
            Tabet, S., Grosof, B., Dean, M. (2004). SWRL: A Semantic Web Rule Language Combining OWL
            and RuleML. W3C Member Submission.
            <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/Submission/SWRL/</link>.</bibliomixed><bibliomixed xml:id="bib-18" xreflabel="[9]">Huitfeldt, C., Sperberg-McQueen, C. M. (2001).
            TexMECS: An experimental markup meta-language for complex documents.</bibliomixed><bibliomixed xml:id="bib-3" xreflabel="[10]">Iglesias, C., Squillace, M. (2009). Pointer
            Methods in RDF. W3C Working Draft availables in
                <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/Pointer-in-RDF</link>.</bibliomixed><bibliomixed xml:id="bib-10" xreflabel="[11]">Manola, F., Miller, E. (2004). RDF Primer. W3C
            Recommendation. <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/rdf-primer/</link>.</bibliomixed><bibliomixed xml:id="bib-23" xreflabel="[12]">Marcoux, Y. (2008). Graph characterization of
            overlap-only TexMECS and other overlapping markup formalisms. Paper presented at the
            Balisage: The Markup Conference. doi: <biblioid class="doi">10.4242/BalisageVol1.Marcoux01</biblioid>.</bibliomixed><bibliomixed xml:id="bib-7" xreflabel="[13]">Marinelli, P., Vitali, F., Zacchiroli, S.
            (2008). Towards the unification of formats for overlapping markup. The New Review of
            Hypermedia and Multimedia.</bibliomixed><bibliomixed xml:id="bib-14" xreflabel="[14]">Nelson, T. (1980). Literary Machines: The
            report on, and of, Project Xanadu concerning word processing, electronic publishing,
            hypertext, thinkertoys, tomorrow's intellectual... including knowledge, education and
            freedom - Mindful Press, Sausalito, CA, USA.</bibliomixed><bibliomixed xml:id="bib-24" xreflabel="[15]">Oliver Schonefeld und Andreas Witt (2006).
            Towards validation of concurrent markup. In: Proceedings of the Extreme Markup 2006,
            Montréal, Canada.</bibliomixed><bibliomixed xml:id="bib-2" xreflabel="[16]">Schmidt, D., Colomb, R. (2009). A data
            structure for representing multi-version texts online. International Journal of
            Human-Computer Studies.</bibliomixed><bibliomixed xml:id="bib-1" xreflabel="[17]">Sperberg-McQueen, C. M., Burnard, L. (2005).
            TEI P5 Guidelines for Electronic Text Encoding and Interchange (revised). The
            Association for Computers and the Humanities.</bibliomixed><bibliomixed xml:id="bib-17" xreflabel="[18]">Sperberg-McQueen, C. M., Huitfeldt, C. (2008).
            Markup Discontinued: Discontinuity in TexMecs, Goddag structures, and rabbit/duck
            grammars. doi: <biblioid class="doi">10.4242/BalisageVol1.Sperberg-McQueen01</biblioid>.</bibliomixed><bibliomixed xml:id="bib-5" xreflabel="[19]">Sperberg-McQueen, C.M., Huitfeldt, C. (2004).
            GODDAG: A Data Structure for Overlapping Hierarchies. Lecture Notes In Computer Science.
            Springer.</bibliomixed><bibliomixed xml:id="bib-8" xreflabel="[20]">Tennison, J. (2008). Representing Overlap in
            XML. Article from “Jeni's Musings” blog, available in
                <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.jenitennison.com/blog/node/97</link>.</bibliomixed><bibliomixed xml:id="bib-6" xreflabel="[21]">Tennison, J., Piez, W. (2002). The Layered
            Markup and Annotation Language (LMNL). Paper presented at the Late breaking at Extreme
            Markup. Montreal, Canada.</bibliomixed><bibliomixed xml:id="bib-19" xreflabel="[22]">Thompson, H. S., Beech, D., Maloney, M.,
            Mendelsohn, N. (2001). XML Schema Part 1: Structures. W3C Recommendation.
                <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/xmlschema-1/</link>.</bibliomixed><bibliomixed xml:id="bib-4" xreflabel="[23]">Tummarello, G., Morbidni, C., Pierazzo, E.
            (2005). Toward textual encoding based on RDF. 9th ICCC Conference on Electronic
            Publishing (ELPUB 2005). Leuven, Belgium.</bibliomixed><bibliomixed xml:id="bib-11" xreflabel="[24]">W3C OWL Working Group (2009). OWL 2 Web
            Ontology Language Document Overview. W3C Working Draft.
                <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/TR/owl2-overview/</link>.</bibliomixed></bibliography></article>
