<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2" xml:id="HR-23632987-8973"><title>Discourse situations and markup interoperability</title><subtitle>An application of situation semantics to descriptive metadata</subtitle><info><confgroup><conftitle>Balisage: The Markup Conference 2010</conftitle><confdates>August 3 - 6, 2010</confdates></confgroup><abstract><para>Interoperability is often considered a key issue for information systems, but there are
        many different kinds of interoperability and only a few have been give precise definitions.
          <emphasis role="ital">Situation semantics</emphasis> and the notions of <emphasis role="ital">discourse situations</emphasis> and <emphasis role="ital">resource
          situations</emphasis> are promising tools for the conceptualization of markup within a
        general theory of communication. This paper uses these concepts to explore the role of
        context in determining the meaning of markup and to define a particular kind of
        interoperability for markup structures. A Dublin Core OAI-PMH record and associated schema
        are used to show how the use of contextual information supports (or fails to support) the
        interoperability of the record.</para></abstract><author><personname><firstname>Karen</firstname><surname>Wickett</surname></personname><personblurb><para/></personblurb><affiliation><jobtitle>Doctoral Student</jobtitle><orgname>Graduate School of Library and Information Science at the University of Illinois
          Urbana-Champaign</orgname></affiliation><email>wickett2@illinois.edu</email></author><legalnotice><para>Copyright © 2010 by the author.  Used by
permission.</para></legalnotice><keywordset role="author"><keyword>situation semantics</keyword><keyword>metadata</keyword><keyword>markup interpretation</keyword></keywordset></info><section><title>Introduction</title><para>Although context clearly influences how markup structures are deployed and how they are
      retrieved and processed, the ways that context determines the interpretation of markup are not
      well-understood or consistently conceptualized. The role of context has been acknowledged in a
      few narrow areas -- for example, in how it determines the grammatical mood of the assertions
      signaled by tags and the ontological status of XML documents (<xref linkend="renear03"/>).
      Context is an obvious part of markup systems, since processing any XML document requires the
      interplay of the document with structures and systems that are not part of the document. </para><para>A developed understanding of the role of context in markup can illuminate well-known
      problems for information systems, particularly those associated with interoperability. While
      there is a general sense that interoperability is a sort of freedom from local context,
      interoperability itself is not well-theorized in terms of markup structures and
      interpretation. Preservation and portability of markup structures may be aided by approaches
      that operate in reference to the logical meaning of those structures (<xref linkend="dubin03"/>), but the reliance on the semantic meaning of structures presents a sort of boot-strapping
      challenge since these interpretations must be determined at some point. Context (both internal
      and external to a document) will play an essential role in mapping from syntax into semantics,
      so we need a clear understanding of how context shapes the assignment of meaning to markup
      structures.</para><para>Wrightson has demonstrated how situation semantics (<xref linkend="bp1983"/>) can provide
      insights into the communication of information through markup structures (<xref linkend="wrightson01"/>) and has presented a corresponding toolkit for the analysis of XML
      documents as utterances (<xref linkend="wrightson05"/>). Applying situation semantics to
      markup is attractive since it is intended as a general account of communication and so holds
      the promise of integraing our understanding of markup meaning into a general theory of
      communication. </para><para>That much markup seems to function at least in some respects like a natural language,
      rather than a formal language, makes the connection with theories of natural lanaguage
      particularly promsing. Situation semantics is used here to support an account of the
      assignment of meaning to markup structures; first through the definition of a particular kind
      of interoperability for markup structures and then in an examination of the role of schema in
      interoperability and interpretation.</para><para>Situation semantics supports a view of communication of information through markup
      structures that gives explicit accounting for the context in which the record was created and
      in which it is interpreted. In what follows situation semantics is used to define
      interoperability and to demonstrate how specific parts of Dublin Core OAI-PMH records convey
      contextual information that supports the interpretation of those records.</para></section><section><title>Situation Semantics and XML Metadata</title><para>This section briefly introduces concepts from situation semantics and shows how they may
      be applied to XML metadata. </para><para>Barwise and Perry apply situation semantics to the interpretation of natural language
      utterances (<xref linkend="bp1983"/>) by modeling the meaning of an indicative sentence as a
      relation between a situation in which the sentence is uttered and a situation which is
      described by the utterance of the sentence. The position here is that since metadata records
      are sequences of indicative sentences, situation semantics can usefully explain the ways in
      which they carry meaning. An utterance situation is made up of a <emphasis role="ital">discourse situation</emphasis> and the speaker's <emphasis role="ital">connections</emphasis> (discussed in the following section). Any situation which uniquely
      anchors the roles of a speaker, an addressee, a discourse location, and an expression is
      considered a discourse situation. </para><para> The technical notion of a situation here is very close to our intuitive one: a situation
      occurs at a space-time location and involves individuals participating in certain roles and
      standing in relations. It also closely corresponds to the notion of a <emphasis role="ital">state of affairs</emphasis>, especially since situations are abstract objects that may or
      may not obtain. </para><para>In the case of descriptive metadata the role of the speaker can be anchored to the
      metadata creator at the time when the record is created. <footnote><para> This is a simplifying assumption, since metadata creation will generally involve a
          chain of communicating individuals and institutions. </para></footnote>In contrast, the role of the addressee is left open until the record is retrieved
      and viewed by some consumer, and it is only at this point that all of the roles are anchored
      to form a complete discourse situation. The entire discourse situation will be extended in
      time and typically mediated through several computational environments. This extension in time
      is a primary motivator for interoperability efforts that are concerned with supporting the
      preservation (especially in the long term) of metadata records.</para><para>Since descriptive metadata records are created without specification of the addressee, the
      speaker and the addressee may be operating in very distinct environments. This kind of
      disparity is common in written communication, where an author and a reader may be separated by
      centuries or differences of language. Regardless, the differences between the environment of a
      speaker and an addressee of XML documents requires our attention in order to understand the
      contribution that elements of those environments make to the interpretation of documents. </para><para>The described situation for a metadata record is one in which a resource with each of the
      properties given in the record exists. The space-time location for this described situation
      will temporally precede and overlap the space-time location of the situation in which the
      record was created, since presumably a resource exists before it is described.<footnote><para>With the exception of edge cases like pre-publication metadata or description of
          fictional resources.</para></footnote> Hopefully the discourse situation that captures the communication of metadata will
      fall entirely within the described situation that arises from that metadata, and consumers are
      able to access the described resource. But one can easily imagine cases where the resource
      indicated by the metadata record no longer exists at the time of the retrieval of the record,
      or it no longer has the properties indicated in the record (e.g. it is no longer accessible
      through a given URL). It is also possible to imagine cases where the described situation does
      not obtain at all, since the record is simply erroneous. In any case, a record won't
      facilitate communication and access to resources unless an addressee is able to interpret it.
      To ensure the operation of records and to support access, we need to understand what goes into
      the interpretation of descriptive metadata records.</para><section><title>Connections and Resource Situations</title><para>Situation semantics explicitly models of the <emphasis role="ital">compositionality</emphasis> of meaning by accounting for how the meaning of structures
        that are part of an utterance make systematic contributions to the meaning of the utterance
        as a whole. Expressions that occur at one point in a discourse situation can supply a
          <emphasis role="ital">setting</emphasis> that influences how expressions that occur later
        in the discourse situation are understood.</para><para>The speaker's <emphasis role="ital">connections</emphasis> in this model are an
        assignment of referents to the expressions within an utterance. Since the speaker and the
        addressee in a discourse situation corresponding to a metadata record are far away from each
        other (in terms of space, time, and computational environments) the connections on both
        sides of the utterance require direct attention. In general, the goal of interoperability is
        for the speaker's connections to match the connections of the addressee for each expression
        that makes up a record.</para><para>In XML records, it is relatively clear how the connections established by certain parts
        of a record determine how other parts of the record are to be processed or understood. For
        example, the connections given in the XML declaration provide a setting that constrains how
        the record is to processed. We can further divide this expression and focus on the part that
        leads (the "?" that immediately follows the opening bracket), which gives a setting in which
        the rest of the tag can be recognized as containing processing information.</para><para>The hierarchical nature of XML documents gives rise to a hierarchy of settings. The
        primary setting is established by the XML declaration, which establishes that what follows
        is an XML document. Any further declaration of an associated schema or namespace will also
        establish a setting that influences how the expressions that occur within it are
        interpreted. These schematic settings are secondary to the setting provided by the XML
        declaration since the context in which it is possible to process and use them is given by
        the fact that they occur within an XML document.</para><para>The settings are internal to an utterance, but context is external. The settings of an
        utterance indicate the <emphasis role="ital">resource situations</emphasis> necessary for
        interpretation of a record. Resource situations are the situations that the actors
        participating in a discourse situation have access to and use to identify and assign
        connections for expressions. In this model, resource situations are naturally seen as
        supplying context for the creation or interpretation of markup. A resource situation
        functions in this model to account for the specific elements of the environment that a
        metadata creator uses to assign meaning to markup and a consumer uses to intepret that
        markup. </para></section></section><section><title>Defining Descriptive Interoperability</title><para>While interoperability is frequently called upon as a motivating factor for improvement in
      information systems, it is not often given a systematic characterization within a general
      framework for description and communication. A working notion of interoperability for
      descriptive metadata in XML (inspired by (<xref linkend="sperberg00"/>), (<xref linkend="renearetal02"/>) and in line with the OAIS Reference Model (<xref linkend="lavoie04"/>)) is that <emphasis role="ital">an interoperable description is one that supports licensing
            the same set of assertions in any environment</emphasis>.</para><para>In terms of situation semantics then, an interoperable description is one for which, given
      any addressee, the the set of connections that link expressions to their referents remains
      fixed. Interoperability then can be viewed as a kind of independence from particular features
      of the discourse situation. The time or location associated with the utterance of an
      interoperable description can vary without a change in the meaning of the utterance. In order
      to achieve this kind of interoperability, metadata creators must either convey these
      connections explicitly, or supply enough information to allow a user to discover the intended
      connections. Unless connections are given in (or pointed to by) a record, someone trying infer
      information from the document is left to determine connections on her own.</para><para>Of course, such a reader will not be entirely on her own. She can still use her background
      knowledge, and given a record that is human-readable, she might have a fair degree of success.
      But she would have to rely on a larger set of resource situations to establish connections for
      the expressions in the record than in a case where connections are explicitly given.</para><para>In fact, resource situations that supply the information necessary to process or interpret
      XML metadata will always be a requirement for determining the assertions licensed by a
      document. Grasping the intended connections for records requires a certain set of knowledge
      about the conventions in play, so it is unrealistic to expect a descriptive record to be fully
      interoperable as given above. This suggests an adjustment to the definition of
      interoperability, and characterizing it as an effort to reduce the burden of gaining access to
      the resource situations required to fully interpret a record.</para></section><section><title>Example: Analyzing a Dublin Core OAI-PMH Record</title><para>
      <figure xml:id="example"><title>A partial OAI-PMH record</title><mediaobject><imageobject><imagedata fileref="../../../vol5/graphics/Wickett01/Wickett01-001.png" format="png"/></imageobject></mediaobject></figure><xref linkend="example"/> shows part of an XML record retrieved from an OAI server.
      As discussed above, since this is an XML document, the primary setting for the record is given
      by the XML processing instruction. This expression indicates connections that establish how
      the document is to be processed or interpreted. A secondary setting can be identified by
      examining the attributes within the root "OAI-PMH" element for the record.</para><para>The schema location attribute within the "OAI-PMH" tag acts as a pointer to documentation
      that specifies a certain kind of connections for child elements that follow this element.
      These connections provide the vocabulary and structural constraints for tags and attributes,
      and they may be overridden by further schema declarations within a child element. In this
      case, while the OAI-PMH.xsd document does give instructions for the construction of valid
      OAI-PMH documents by giving structural and datatype restrictions, it does not give the full
      intended semantics for tags that conform to it. </para><para>A third setting is given in this example by the namespace and schema attributes in the
      "oai_dc" element. As above, the schema location attribute points to documentation that
      specifies syntactic connections for the child elements. The connections given here override
      the those given in the OAI-PMH element. This means that the connections for "identifier"
      within the "oai_dc" element correspond to the Dublin Core Element Set, whereas the occurrences
      of "identifier" elements higher up in the record have the connections established by
      OAI-PMH.</para><para>The oai_dc.xsd documentation imports simpledc20021212.xsd, and therefore does provide
      access to the resource situation necessary to make the correct connections for element names
      in the core element set for Dublin Core. Each of the element names that appear within the
      third setting are defined in this documentation through marked-up annotations. In contrast,
      OAI-PMH.xsd only gives syntactic information and processing instructions. The intended
      semantics for OAI element names such as "identifier", "record", or "metadata" are not provided
      in the documentation itself, nor is there any reference (network locatable or otherwise) to a
      source for definitions of these terms. These terms are defined in the protocol documentation,
      but it is not clear how an addressee (reader/consumer) of the record could reach that
      documentation without additional information or direction.</para><section><title>The OAI Identifier</title><para>An "identifier" element within a OAI-PMH record is specified by the documentation given
        in OAI-PMH.xsd as occurring within a "header" element and as containing a URI. However,
        there is no direct indication of what the given identifier serves to identify. A user of
        this record can rely on the nesting of the element within the "header" of the "record"
        element to infer that the identifier serves to identify the record. Or they could use their
        knowledge that the record contains metadata about a resource to infer that the identifier
        picks out a resource that has all of properties indicated within the "metadata"
        element.</para><para>In fact, some services that convert OAI-PMH records in RDF take the second approach and
        use the OAI identifier as a subject URI for statements about a resource that are derived
        from the record. However, the OAI Protocol for Metadata Harvesting states that "the
        identifier described here is <emphasis role="ital">not</emphasis> that of a <emphasis role="ital">resource</emphasis>. The nature of a resource identifier is outside the scope
        of the OAI-PMH" (<xref linkend="lagoze02"/> emphasis original). Rather, this identifier
        picks out the OAI Item, which is a conceptual container for metadata.</para><para>This is a failure of interoperability. The RDF that result from this kind of
        transformation are intended to make statements about a resource, but they will actually be
        making erroneous statements about a metadata container. Thus the assertions licensed by
        markup in the environment of the addressee who has transformed the description will differ
        significantly from the assertions licensed in the environment of the speaker who issued the
        original record. The resource situation in use by the transformation was incomplete with
        respect to the connections for the OAI identifier, and an error resulted. The connections
        used in the generation of the OAI record did not match the connection used to transform the
        record into RDF. This kind of a failure of interoperability is common and well-known, and
        situation semantics provides a systematic framework in which to understand it.</para></section></section><section><title>Discussion</title><para>The meaning of markup structures can be modeled by representing the set of inferences that
      are licensed from a document (<xref linkend="marcoux09"/>), and the OAI markup vocabulary
      specifically has been given such an interpretation (<xref linkend="sperberg05"/>). However,
      such an approach assumes an unambiguous mapping from tags to the logical predicates associated
      with those tags. The formal tag set description approach does not directly address how meaning
      is constructed by a reader who encounters an XML document "in the wild", especially when that
      reader was not privy to the development of the vocabulary in use by the document.  An
      overarching theory of communicative meaning like situation semantics can help us grasp the
      role that a formal tag set description plays when one is available, and see what is missing
      when that information is not available.</para><para>The use of natural-language identifiers for element names allows readers to interpret
      markup by exploiting the everyday resource situations that constantly support language-based
      communication (<xref linkend="wrightson05"/>). But, as we demonstrated above with the case of
      the OAI identifier, the name of a tag alone does not always convey everything necessary to
      properly interpret the logical meaning of corresponding markup. The problems that arise for
      this kind of mis-interpretation become obvious when the markup is used to derive RDF that
      makes incorrect logical statements.</para><para>The lack of access to documentation that explains the intended semantics for tags is an
      aspect of what has been called the <emphasis role="ital">documentation problem
        (</emphasis><xref linkend="sperberg04"/>). The issue could be addressed by sufficient access
      to documentation (prose or computational) that describes the resource situation against which
      markup was created, to allow readers of the documents to establish connections to interpret
      the document correctly. Without such support, interoperability across time and systems is an
      unlikely prospect.</para><para>Acknowledgements: I would like to thank Allen Renear, Ingbert Floyd, and members of the
      Conceptual Foundation Group at GSLIS for comments and support in developing this paper, and
      Richard Urban for bringing the example of the OAI identifier to my attention.</para></section><bibliography><title>Bibliography</title><bibliomixed xml:id="bp1983" xreflabel="Barwise and Perry 1983">Barwise, J. and Perry,
        J.<emphasis role="ital">Situations and Attitudes</emphasis> (1983). MIT Press, Cambridge,
      MA.</bibliomixed><bibliomixed xml:id="dubin03" xreflabel="Dubin et al. 2003">Dubin, David, C. M.
      Sperberg-McQueen, Allen Renear, and Claus Huitfeldt (2003). “A Logic Programming Environment
      for Document Semantics and Inference,” <emphasis role="ital">Journal of Literary and
        Linguistic Computing</emphasis>, 18:1, 39-47. doi: <biblioid class="doi">10.1093/llc/18.1.39</biblioid>.</bibliomixed><bibliomixed xml:id="lagoze02" xreflabel="Lagoze, et al. 2002">Lagoze, Carl, Herbert Van de
      Sompel, Michael Nelson, and Simeon Warner, ed. 2002. The Open Archives Initiative Protocol for
      Metadata Harvesting. Protocol Version 2.0 of 2002-06-14. Document Version
      2008-12-07T20:42:00Z. Open Archives Initiative, 2002.
        <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm</link></bibliomixed><bibliomixed xml:id="lavoie04" xreflabel="Lavoie 2004">Lavoie, B. (2004) "The Open Archival
      Information System Reference Model: Introductory Guide" <emphasis role="ital">Microform and
        Imaging Review</emphasis>, 33:2, 68-81. doi: <biblioid class="doi">10.1515/MFIR.2004.68</biblioid>.</bibliomixed><bibliomixed xml:id="marcoux09" xreflabel="Marcoux et al. 2009">Marcoux, Yves, C.M.
      Sperberg-McQueen, and Claus Huitfeldt “Formal and informal meaning from documents through
      skeleton sentences” in Balisage: The Markup Conference 2009 Proceedings, Montreal, Canada.
      2009. doi: <biblioid class="doi">.4242/BalisageVol3.Sperberg-McQueen01</biblioid>.</bibliomixed><bibliomixed xml:id="renearetal02" xreflabel="Renear, et al. 2002">“Towards a Semantics for XML
      Markup.” Renear, Allen H., David Dubin, C. M. Sperberg-McQueen, and Claus Huitfeldt. In R.
      Furuta, J. I. Maletic, and E. Munson, (Eds.), <emphasis role="ital">Proceedings of the 2002
        ACM Symposium on Document Engineering</emphasis>, (pp. 119-126), McLean, VA, November. New
      York: Association for Computing Machinery (2002). doi: <biblioid class="doi">10.1145/585058.585081</biblioid>.</bibliomixed><bibliomixed xml:id="renear03" xreflabel="Renear 2003">Renear, Allen H. “Text from Several
      Different Perspectives, the Role of Context in Markup Semantics.” Proceedings of the 2003
      Conference on Computers, Literature, and Philology. Florence: University of
      Florence.</bibliomixed><bibliomixed xml:id="sperberg00" xreflabel="Sperberg-McQueen, et al. 2000">Sperberg-McQueen,
      C.M., Claus Huitfeldt and Allen H. Renear (2000) "Meaning and interpretation of
        markup."<emphasis role="ital">Markup Languages: Theory and Practice</emphasis>,
      2:3. doi: <biblioid class="doi">10.1162/109966200750363599</biblioid>.</bibliomixed><bibliomixed xml:id="sperberg04" xreflabel="Sperberg-McQueen and Miller 2004">Sperberg-McQueen,
      C.M. and E. Miller "On mapping from colloquial XML to RDF using XSLT"<emphasis role="ital">Proceedings of Extreme Markup Languages 2004</emphasis> (Montréal, Canada, August 2004). <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://conferences.idealliance.org/extreme/html/2004/Sperberg-McQueen01/EML2004Sperberg-McQueen01.html</link></bibliomixed><bibliomixed xml:id="sperberg05" xreflabel="Sperberg-McQueen 2005">Sperberg-McQueen, C. M.
      (2005) "The meaning of OAI 2.0 Markup: An exercise in markup interpretation"
        <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/2004/04/em-msm/ioai.html</link></bibliomixed><bibliomixed xml:id="wrightson01" xreflabel="Wrightson 2001">Wrightson, A. "Some Semantics for
      Structured Documents, Topic Maps and Topic Map Queries."<emphasis role="ital">Proceedings of
        Extreme Markup Languages 2001</emphasis> (Montréal, Canada, August 2001). <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://conferences.idealliance.org/extreme/html/2001/Wrightson01/EML2001Wrightson01.html</link></bibliomixed><bibliomixed xml:id="wrightson05" xreflabel="Wrightson 2005">Wrightson, A. "Semantics of Well
      Formed XML as a Human and Machine Readable Language: Why is some XML so difficult to
        read?"<emphasis role="ital">Proceedings of Extreme Markup Languages 2005</emphasis>
      (Montréal, Canada, August 2005). <link xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://conferences.idealliance.org/extreme/html/2005/Wrightson01/EML2005Wrightson01.html</link></bibliomixed></bibliography></article>
