<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2"><title>Markup Meaning and Mereology</title><!--    <subtitle>Fumbling around with the Calculus of Individuals among tags</subtitle> --><info><confgroup><conftitle>Balisage: The Markup Conference 2009</conftitle><confdates>August 11 - 14, 2009</confdates></confgroup><abstract><para> When marking up a document we chop it up into elements. Elements are parts of the
        document, some of which contain further elements, i.e., have parts of their own. Thus, the
        part-whole relation is central to the way markup works.</para><para>Mereology is precisely the theory of part-whole relationships, but has not yet found
        much application in markup theory. In this paper we provide a sketch of how mereology, in
        the form more specifically of Nelson Goodman's Calculus of Individuals, might be applied to
        markup.</para><para>We discuss ways of identifying the individuals of marked-up documents and of referencing
        these individuals, and we sketch some ways of applying the calculus to the problem of
        propagation of properties in documents. </para></abstract><author><personname><firstname>Claus</firstname><surname>Huitfeldt</surname></personname><personblurb><para>Claus Huitfeldt is Associate Professor at the Department of Philosophy of the
          University of Bergen. His research interests are within philosophy of language, philosophy
          of technology, text theory, editorial philology and markup theory. He was founding
          Director (1990-2000) of the Wittgenstein Archives at the University of Bergen, for which
          he developed the text encoding system MECS as well as the editorial methods for the
          publication of Wittgenstein's Nachlass - The Bergen Electronic Edition (Oxford University
          Press, 2000). He was active in the Text Encoding Initiative (TEI) since 1991, and was
          centrally involved in the foundation of the TEI Consortium. Huitfeldt was Research
          Director (2000-2002) of Aksis (Section for Culture, Language and Information Technology at
          the Bergen University Research Foundation). </para></personblurb><affiliation><jobtitle>Associate professor</jobtitle><orgname>University of Bergen, Norway</orgname></affiliation><email>claus.huitfeldt@fof.uib.no</email></author><author><personname><firstname>C. M.</firstname><surname>Sperberg-McQueen</surname></personname><personblurb><para>Sperberg-McQueen, C. M. is an independent consultant for Black Mesa Technologies LLC.
          He currently serves as an editor of the W3C XML Schema Definition Language (XSD)
        1.1.</para></personblurb><affiliation><orgname>Black Mesa Technologies LLC</orgname></affiliation><email>cmsmcq@blackmesatech.com</email></author><author><personname><firstname>Yves</firstname><surname>Marcoux</surname></personname><personblurb><para>Yves Marcoux is a faculty member at EBSI, University of Montréal, since 1991. He is
          mainly involved in teaching and research activities in the field of document informatics.
          Prior to his appointment at EBSI, he has worked for 10 years in systems maintenance and
          development, in Canada, the U.S., and Europe. He obtained his Ph.D. in theoretical
          computer science from University of Montréal in 1991. His main research interests are
          document semantics, structured document implementation methodologies, and information
          retrieval in structured documents. Through GRDS, his research group at EBSI, he has been
          principal architect for the Governmental Framework for Integrated Document Management, a
          project funded by the National Archives of Québec and by the Québec Treasury Board.</para></personblurb><affiliation><jobtitle>Associate professor</jobtitle><orgname>Université a Montréal, Canada</orgname></affiliation><email>yves.marcoux@umontreal.ca</email></author><legalnotice><para>Copyright © 2009 by the authors.  Used with
  			permission.</para></legalnotice></info><section xml:id="intro"><title>Introduction</title><para>XML documents consist of marked elements, which may in turn contain sequences of marked
      elements, etc. This hierarchy of elements is conveniently represented as a tree in which each
      node stands for an element, in which each arc between elements stand for a parent-child
      relationship, and in which the children of each node are ordered sequentially in accordance
      with their document order.</para><para>While it is commonly the case that the generic identifier of an element is understood to
      ascribe a property to the element's content, that elements represented by nodes dominated by
      that element's node in the document tree are also understood to be contained by it, and that
      these nodes are understood to inherit the properties ascribed to their ancestor elements, none
      of this is always or necessarily the case. </para><para>As we have pointed out elsewhere [<xref linkend="bielefeld"/>], the parent-child
      relationship may be taken to indicate either a containment relationship, or a dominance
      relationship. Frequently these relationships coincide, and no harm is caused by not
      distinguishing them. When they do not coincide, however, the result may easily be confusing. </para><para>One view of the structure of XML documents emphasizing the part-whole relationship is
      this: A document contains elements, i.e., parts. Some of these parts contain further elements,
      i.e., have parts of their own. The generic identifiers of elements ascribe properties to their
      own content and/or to the content of elements related to them by part-whole relationships. </para><para>Mereology is precisely the theory of part-whole relationships. Even so, mereology does not
      seem to have found much application in markup theory until now. It may therefore be
      interesting to investigate whether the application of mereology may give insights relevant to
      the understanding of interpretation and processing of marked-up documents. </para><!-- Note to coauthors:
            please rewrite the following para into readable English; - or perhaps it should simply be deleted ?  --><para>It is sometimes said that XML provides a formal syntax for document representation, but no
      formal semantics for the interpretation or processing of this syntax. If mereology can be
      brought to bear on the ascription and propagation of properties and relations between parts of
      marked-up documents, it may help in providing a general approach to markup semantics. For
      example, the work presented here may turn out to be of direct relevance for the work on formal
      tag set descriptions and intertextual semantics specifications presented in [<xref linkend="balisage2009"/>] and [<xref linkend="dh2009"/>].</para><para>Before we proceed, some words on the limitations of this paper are in place. First,
      although our focus is on XML, and although we mention other markup languages in passing, we
      believe that mereology deserves to be studied in relation to markup languages in general (such
      as XML, SGML, TexMecs, LMNL, and others) rather than XML only. We think so partly because
      application of mereology may be equally or more profitable when it comes to some non-XML
      markup systems, and partly because such broader studies might inspire modifications of
      — or alternatives to — any or all of these. We hope to come back to
      applications of mereology to markup more generally in future work. </para><para>Second, the concept <quote>XML document</quote> as used in this paper refers almost
      exclusively to XML in its serialized form. We do not explicitly attempt to apply mereology to
      XML documents considered as graphs of xPath nodes, Infoset items, or the like. </para><para>Finally, we limit ourselves to an attempt to apply the so-called Calculus of Individuals,
      a mereological system worked out by Nelson Goodman [<xref linkend="Goodman1977"/>] (initially
      in cooperation with Henry S. Leonard [<xref linkend="LeonardandGoodman1940"/>]). As a further
      simplification, and in order to ensure focus, we will ignore XML attributes, entities,
      declarations, comments, processing instructions, and marked sections; in short, we will regard
      XML documents as consisting of elements and their content only . </para></section><section xml:id="coi"><title>The Calculus of Individuals</title><para>The origins of mereology go back to ancient Greece, but it was taken up as a formal study
      and developed mathematically only early in the 20th century. Today, it is a well developed
      formal discipline, and there are a number of different mereological systems. The term
      mereology is sometimes used to refer to these formal calculi in particular, sometimes to
      formal as well as non-formalized theories of part-whole relationships in general [<xref linkend="Libardi1994"/>, pp. 13–15].</para><para>Early developments of formal mereology were largely motivated by scepticism towards set
      theory and the calculus of classes, and a desire to translate or <quote>reduce</quote> all
      talk of abstract classes and their members to talk of concrete individuals and their parts.
      Mereology therefore came to be associated with a particular ontological stance, nominalism,
      and to be shunned by most adherents of other ontological views.<footnote><para>Goodman, whose work we will take as our basis here, was a well known nominalist,
          however of a peculiar kind. For Goodman, nominalism did not consist in the rejection of
          abstract entities, or even of universals, but in the refusal to admit anything but
          individuals as values of variables.</para><para>He strongly repudiated all talk of classes as <quote>incomprehensible</quote> [<xref linkend="Goodman1977"/>, pp. 25-26, <xref linkend="Goodman1972"/>,
          p. 156] and therefore philosophically suspect. He also worked hard to establish a
          foundation for mathematics replacing set theory with the calculus of individuals. But at
          the same time he had no qualms taking abstract objects such as <quote>qualia</quote> as
          basic constituents of his own ontology [<xref linkend="Goodman1977"/>, chapters IV
        ff].</para></footnote></para><para>Such ontological considerations may or may not motivate, but do not in any way need to
      concern, our attempt to apply mereology to markup languages, however: later work in the field
      is generally taken to demonstrate that mereology and set theory may live merrily together,
      that in fact the one may be seen as an extension of the other, and that the adoption of
      mereology does not by itself commit one to any particular ontological stance.<footnote><para><quote>...there is no necessary internal link between mereology and the philosophical
            position of nominalism. We may simply think of the former as a theory concerned with the
            analysis of parthood relations among whatever entities are allowed into the domain of
            discourse (including sets and other abstract entities, if one will).</quote> [<xref linkend="CasatiVarzi1999"/>]</para></footnote>
    </para><!--   Note to coauthors: We may want to preserve this para, but I do not know exactly where to put it back
                right now: 
                 <para>Goodman and Leonard observe that in traditional logic, the <quote>relations
                of segments of the universe</quote> are treated in two different places: one set of
                rules for the identity, non-identity, and other relations of
                <emphasis>individuals</emphasis> (essentially, for our purposes, non-sets), and a
                separate set of rules for identity, non-identity, and other relations of sets. A more
                economical approach, they suggest, would be to avoid making the <emphasis>a
                priori</emphasis> distinction usually made between sets and their members, and to start
                instead from a notion of individuals and their parts. What conventional logic regards as
                a set and its members can, on this view, be regarded as a compound individual and its
                parts. </para> --><para>The part-whole relationships that mereology studies are relationships between entities
      that are, in Goodman's terminology, called <emphasis>individuals</emphasis>. Generally
      speaking an individual may be any <quote>thing</quote> in a very wide sense of the word
      — a concrete, an abstract, a universal or a particular — i.e., any object
      or entity of which something can be predicated. This is admittedly still pretty general, and
      more specific talk may be in order: As examples of individuals we may take stones, tables,
      chairs, animals and other medium-sized everyday objects; but if we like we may also populate
      our world with individuals such as molecules, atoms, electrons, quarks; or planets, stars and
      galaxies; or for that matter persons, visual after-images, mental images or sense data. If we
      believe in abstract objects we may include numbers, geometrical objects, concepts, etc., and
      according to some applications of mereology there may also be <quote>temporal</quote>
      individuals such as processes, events, and snippets of time. </para><para> Individuals need not be contiguous, neither in space nor in time. This is one of the
      principles of the Calculus of Individuals which has provoked some discussion. In its defence
      one may point to the fact that we actually do employ the notion of at least some such
      disconnected wholes in everyday language. Thus, to treat <quote>the land mass of Japan</quote>
      (or any geographic entity which includes two or more islands) as an individual may seem
      unobjectionable. However, according to another principle, the sum of any two individuals is
      always also an individual. This seems to force us to accept as individuals, i.e.,
        <quote>wholes</quote>, sums of randomly scattered parts such as <quote>Caesar's nose and the
        state of Utah</quote> [<xref linkend="Goodman1972"/>, p. 37].<footnote><para>For an entertaining collection of other candidate sum individuals, see [<xref linkend="Fitzgerald"/>].</para></footnote> Goodman bites that bullet, while much of the ensuing debate has been concerned
      with attempts to find ways of distinguishing such scattered and arbitrary sums from more
        <quote>cohesive</quote> or <quote>integral</quote> individuals as wholes consisting of parts
      in a more intuitively satisfactory sense. </para><para>A formal mereological theory takes conventional first-order predicate logic as its basis.
      We will use conventional modern logical notation for quantifiers, operators, predicates,
      variables and constants. More specifically, we will use (x) for universal and
      (∃x) for existential quantification over x; ¬ for negation, →
      for implication, ∨ for inclusive disjunction, ∧ for conjunction, ⇔ for
      equivalence, and = for identity. We use the small roman letters a, b, c... for constants, x,
      y, z... for variables, and upper roman letters A, B, C... for predicates. We will occasionally
      use the conventional abbreviation <quote>iff</quote> for <quote>if and only if</quote>.</para><para>The extension which mereology makes to this basis is very modest: In fact the extension
      consists in adding only one single primitive relation to the first-order system. This
      specifically <quote>mereological</quote>, primitive relation may be chosen from among the
      relations <quote>part of</quote>, <quote>proper part of</quote>, <quote>discrete from</quote>
      or <quote>overlapping with</quote>. As each of these relations may be defined in terms of any
      of the others, it does not matter much which one we chose as our undefined primitive.<footnote><para>Equivalent systems (or rather, systems with only minimal and trivial differences) may
          be built whichever we choose as the primitive relation.</para></footnote> With a hopefully obvious appeal to markup theorists, we will follow [<xref linkend="Goodman1977"/>] in choosing <quote>overlap</quote> for our primitive relation.<footnote><para> In [<xref linkend="LeonardandGoodman1940"/>], Leonard and Goodman chose
            <quote>discrete from</quote> as the primitive relation. A more common practice seems to
          be the choice of <quote>part</quote> or <quote>proper part.</quote></para></footnote>
    </para><para>Variables are taken to range over individuals only, and predicates are taken to ascribe
      properties of or relations between individuals. </para><para>From a mereological point of view, two individuals <emphasis>overlap</emphasis> iff they
      have some content in common. One consequence of this definition may briefly confuse markup
      specialists: since in an XML document a child element and its parent element have some content
      in common (everything contained by the child is also contained by the parent), it follows that
      in the sense introduced here the child and the parent <emphasis>overlap</emphasis>. That is,
      the term <emphasis>overlap</emphasis>, as used in the calculus of individuals, includes proper
      nesting or normal part/whole relations. </para><para>Thus, if we think of XML elements as individuals consisting of stretches of consecutive
      character occurrences, and if we consider the following four cases (strictly speaking, the
      first line is not well formed XML and is included only for purposes of illustration):
      <programlisting xml:space="preserve">            &lt;s&gt;  &lt;q&gt;   &lt;/s&gt; &lt;/q&gt;
            &lt;s&gt;  &lt;q&gt;   &lt;/q&gt; &lt;/s&gt;
            &lt;q&gt;  &lt;s&gt;   &lt;/s&gt; &lt;/q&gt;
            &lt;s&gt;  &lt;/s&gt;  &lt;q&gt;  &lt;/q&gt;</programlisting>
      the first three cases exhibit an overlap between elements <code>s</code> and <code>q</code>.
      Only in the last case do the two elements not overlap, i.e., they are discrete. In contrast,
      markup theorists would probably consider only the first case to be one of overlap.</para><!-- Note to coauthors: CMSMCQ said:
                <para>Although the calculus of individuals is by no means restricted to discussions of
                geographic regions, many readers find it easiest to grasp the concepts introduced from examples
                involving spatial or geographic regions. For example [further examples ..., one for each notion introduced 
                below? </para>
                1) I am not sure it is such a good idea to restrict examples to the geographical domain.
                2) Illustrative examples for every notion is a good idea, though...
                3) ... but it is a lot of work!!
                4) I don't have the time, but if you do, please feel free...
            --><para>The <emphasis>overlap</emphasis> operator is written <code>∘</code>. The following
      condition on <code>∘</code> captures the intuitive notion of “having some content in
      common,” and we thus take it as an axiom:<footnote><para>Numbers in the left margin give references to theorem and definition numbers in [<xref linkend="Goodman1977"/>]. Note that Goodman used a notation slightly different from
          ours, but that we have retained Goodman's use of implicit universal quantification.</para></footnote>
      <programlisting xml:space="preserve">2.41  x ∘ y  ⇔ (∃z)(w)((w ∘ z) → ((w ∘ x) ∧ (w ∘ y)))</programlisting>
      Any relation satisfying this condition is necessarily reflexive and symmetric (but not
      necessarily transitive).</para><para>We now state further relation and operator definitions, theorems and axioms. Note that not
      all of them belong to all variants of mereological systems; they do, however, belong to ours.</para><para>As already mentioned, the relations <quote>part of,</quote>
      <quote>proper part,</quote> and <quote>discrete</quote> may all be defined in terms of the
      overlap relation.</para><para>Iff x is a <emphasis>part</emphasis> of y, then everything that overlaps x also overlaps
      y:
      <programlisting xml:space="preserve">D2.042 x &lt; y =<subscript>df</subscript> (z)((z ∘ x) → (z ∘ y))</programlisting>
      The part relation is reflexive, anti-symmetric and transitive. </para><para> Iff x is a <emphasis>proper part</emphasis> of y, then x is a part of y but y is not a
      part of x:
      <programlisting xml:space="preserve">D2.043 x ≪ y =<subscript>df</subscript> (x &lt; y) ∧ ¬(y &lt; x)</programlisting>
      The proper part relation is irreflexive, anti-symmetric and transitive.</para><para> Iff x and y are <emphasis>discrete</emphasis>, then they have no part in common, i.e.,
      they do not overlap<footnote><para>Leonard and Goodman use for the <quote>discrete from</quote> relation a symbol we have
          not been able to locate in Unicode; we use here a fairly close approximation, the symbol
            “ <code>ʅ</code> ”, which usually means
          <quote>caution.</quote></para></footnote>:
      <programlisting xml:space="preserve">D2.041 x ʅ y =<subscript>df</subscript> ¬(x o y) </programlisting> The
      discrete relation is irreflexive and symmetric (and thus, non-transitive).</para><para> It is worth noting that <emphasis>identity</emphasis> can be defined in terms of the
      primitive relation:
      <programlisting xml:space="preserve">D2.044 x = y =<subscript>df</subscript>  (z)((z o x) ⇔ (z o y))</programlisting>
    </para><para>The <emphasis>product</emphasis> of x and y is the individual which exactly contains their
      common part: <programlisting xml:space="preserve">D2.045 x · y =<subscript>df</subscript> (℩z)(w)((w &lt; z) ⇔ ((w &lt; x) ∧ (w &lt; y)))</programlisting>
      <!-- This part of the para not strictly necessary now:
                This means that x and y have a product iff they overlap:
            <programlisting>2.42 (&exist;z)(z = (x&prod;y)) &iff; (x &ol; y)</programlisting>
            Conversely, if two individuals do not overlap, they have no product. -->
    </para><para>The <emphasis>sum</emphasis> of x and y is the individual which contains exactly and
      exhaustively both of them, or, in other words, the individual which overlaps all and only
      those individuals which overlap any of them: <programlisting xml:space="preserve">D2.047 x + y =<subscript>df</subscript> (℩z)(w)((w ∘ z) ⇔ ((w ∘ x) ∨ (w ∘ y)))</programlisting>
      <!-- This part of the para not strictly necessary now:
                Any two individuals have a sum:
            <programlisting>2.45 (&exist;z)(z = (x + y))</programlisting> -->
    </para><para> The <emphasis>negate</emphasis> of an individual includes everything which does not
      overlap with that individual (i.e., what is often called its <quote>complement</quote>, or
        <quote>the rest of the world</quote>): <programlisting xml:space="preserve">D2.046 –x =<subscript>df</subscript> (℩z)(y)((y ʅ x) ⇔ (y &lt; z))</programlisting>
      <!-- This part of the para not strictly necessary now:
                This means that every individual (except the individual consisting of the entire world)
            has a negate:
            <programlisting>2.43 (&exist;y)(y = &neg;x) &iff; (&exist;z)(z  &disc; x)</programlisting> -->
    </para><para>The <emphasis>difference</emphasis> between x and y is what remains of x after we
      eliminate the parts it has in common with y:
      <programlisting xml:space="preserve">x – y =<subscript>df</subscript> (x · –y)</programlisting>
    </para><para> There is considerable controversy in the literature over the <emphasis>nil</emphasis>
      individual. The <emphasis role="ital">nil</emphasis> individual is the mereological analogue
      of the empty class. If accepted, it is part of any individual. Most mereological systems
      reject its existence, and we will do the same in this paper.<footnote><para>This may be seen simply as a reflection of the fact that most mereologists have been
          nominalists (in Goodman's sense). But the topic also has other far-reaching repercussions
          — see [<xref linkend="Varzi"/>].</para></footnote></para><para>There is less controversy over the existence of the <emphasis>universal</emphasis>
      individual, i.e., the one individual of which every other is a part — the
        <quote>world</quote> or the <quote>universe</quote> as an individual. In our case, we are
      not applying the Calculus of Individuals as a <quote>Grand Theory of Everything,</quote> but
      limit its application to domains consisting of a single document, to collections (not to say
      sets or classes) of documents, or perhaps to documents and whatever else we may need to take
      into consideration to make sense of what these documents say. So we, too, will endorse the
      existence of a universal individual, customarily denoted by the letter <quote>W</quote>:
      <programlisting xml:space="preserve">W =<subscript>df</subscript> (℩x)(y)(y &lt; x)</programlisting>
    </para><para>Note that, because there is no <emphasis role="ital">nil</emphasis> individual:</para><itemizedlist><listitem><para>the product of <code>x</code> and <code>y</code> can possibly exist only if
          <code>x</code> and <code>y</code> overlap,</para></listitem><listitem><para>the difference between <code>x</code> and <code>y</code> can possibly exist only if
            <code>x</code> is not a part of <code>y</code>, and</para></listitem><listitem><para>W (the universe) does not have a negate.</para></listitem></itemizedlist><para>However, the following statements hold, either as axioms or theorems, depending on how one
      elaborates the system:</para><itemizedlist><listitem><para><code>(x)(y)(∃z)(z = x + y)</code>, i.e., the sum of any
          two individual exists (that is, is an individual),</para></listitem><listitem><para><code>(x)(y)((x ∘ y) ⇔ (∃z)(z = x
            · y))</code>, i.e., the product of any two individuals exists iff they
        overlap,</para></listitem><listitem><para><code>(x)(¬(x = W) ⇔ (∃z)(z = –x))</code>,
          i.e., the negate of an individual exists iff the individual is not the universe,
        and</para></listitem><listitem><para><code>(x)(y)((¬x &lt; y) ⇔ (∃z)(z = x
            – y))</code>, i.e., the difference between any individual <code>x</code> and any
          individual <code>y</code> exists iff <code>x</code> is not a part of
        <code>y</code>.</para></listitem></itemizedlist><!--<para>exists iff x is
		not a part of y: 
		<programlisting>2.44 (&exist;z)(z = (x &dif; y)) &iff; &not;(x &pof; y)</programlisting> </para>--><!-- No need for this much detail here any more:
        <para>Whereas overlap is a reflexive, symmetric and non-transitive relation,
            <programlisting>x &ol; x
x &ol; y &iff;  y &ol; x 
&not;(((x &ol; y) &and;  (y &ol; z)) &implies; (x &ol; z))</programlisting>
            the part relation is reflexive, anti-symmetric and transitive:
            <programlisting>x &pof; x
((x &pof; y) &and; (y &pof; x)) &iff; x = y
((x &pof; y) &and;  (y &pof; z)) &implies; (x &pof; z)</programlisting>
            Some commentators have found it counter-intuitive that these relations should be
            reflexive, i.e., that an individual overlaps or is part of itself, but little more than
            a technically motivated and innocent deviance from everyday language seems to be at
            stake. The deviance from everyday language posed by the transitivity of parthood may be
            less innocent: Even though his liver is a part of any soldier, and any soldier is part
            of an army, it is not necessarily obvious that his liver is a part of the army of which
            a soldier is a part. </para>
            --><para>Do all individuals have parts, or are there some individuals which are not further
      divisible into parts? Whether we take the one or the other position may have wide-reaching
      consequences for other properties of a mereological system, and the literature abounds with
      discussion on the subject. Given our domain of application, however, we believe that any
      system will have to be <emphasis>atomistic</emphasis> — on none of our analyses will
      documents have parts below character-level, or at least we foresee no need to talk about parts
      of characters.
      <!-- Note to coauthors:
                The axiom of atomicity was first included in an incorrect form, then romoved, and now, that I found it in Varzi's book I reintroduce it in (a) correct form.  -->
      So we may simply add the axiom of atomicity to our system right away:
      <programlisting xml:space="preserve">(x)(∃y)((y &lt; x) ∧ ¬(∃z)(z ≪ y)) </programlisting>
        [<xref linkend="CasatiVarzi1999"/>, p. 61] </para><!-- Note to coauthors:
            I find it hard to say goodbye to this para, as it records some of my own confusion in getting clear about 
            the Calculus. But I realize that it is now more likely to create than to remove the same or some similar sort 
            of confusion from others:
            
        <para>A note on the relation between properties and individuals may be in place here. For
            every property there exists a sum of individuals having that property [<xref
                linkend="Goodman1972"/>, p.&nbsp;37]. But that does not mean there necessarily
            exists any individual every part of which has that property, or any individual which has
            that and no other property. An individual does not necessarily have as many parts as it
            has properties, and in atomistic systems, any or every atom may have more than one
            property. </para>
    --></section><section xml:id="coi-xml"><title>The Calculus applied to XML</title><para>What might it mean to apply the Calculus of Individuals to XML documents (or, for short,
        <quote>to XML</quote>) and what purpose might such an application of the calculus serve? A
      preliminary answer to the first question is that an application of the Calculus of Individuals
      to XML would require us to decide which entities to count as individuals, to decide which of
      these are to count as atomic individuals, as well as which properties they can have and which
      relations hold between them. Given the Calculus of Individual's rules of composition,
      different decisions on these issues will bring us to recognize the existence of individuals
      which may or may not coincide with established ways of viewing the structure of XML documents.
      Identifying rules which replicate such conventional views is, if possible, in itself of
      interest. Identifying rules which provide alternative views of XML documents may be of even
      greater interest, at least if they also suggest alternate and useful ways of analysing the
      parts of a document, of addressing them, and of how to ascribe properties of and relations
      between parts of a document. </para><para>A preliminary answer to the second question has thus already been suggested: We suspect
      that an application of the Calculus of Individuals to XML might suggest ways of identifying
      and addressing parts of a document which in some cases, or for some purposes, would be more
      convenient or more powerful than existing methods such as SAX, DOM or xPath. We also suspect
      that some application of the Calculus of Individuals to XML might suggest ways of dealing with
      what is sometimes called the <quote>semantics</quote> of XML, i.e., how to understand XML
      documents in terms of properties ascribed to and relations indicated between the various parts
      of them indicated by the markup. </para><para> In what follows we have nothing but tentative answers to the general questions just
      posed. Trying to answer the first question, we will present different ways of applying the
      Calculus of Individuals to XML. We will also explore some of their implications for answers to
      the second question. The explorative nature of our work should be emphasized: We do not want
      to suggest that these are the only, or the best, ways of applying the Calculus of Individuals
      to XML, nor do we suggest that we have identified all or even the most important implications
      of the approaches that we consider. </para><para>Therefore, each of the following sections begins by suggesting a different answer to the
      question <quote>Which are the individuals of a marked-up document?</quote> First, we consider
      the possibility that the individuals simply are XML elements. Next, we go down one step in
      level of granularity and identify tags and character strings as individuals. Finally, we
      proceed to a still finer level of granularity in order to see what happens if we recognize
      individual characters as atomic individuals, and distinguish between different kinds of
      individuals built from these atoms. </para><section xml:id="coi1"><title>The element-as-individual approach</title><para>What to count as individuals is a matter of choice, a choice which must be made on the
        basis of such criteria as naturalness, convenience, expressiveness, simplicity, etc. We
        begin by simply assuming a one-to-one matching between the <emphasis>elements</emphasis> of
        an XML document and the individuals of our calculus. On this assumption, consider the
        following simple XML document:
        <programlisting xml:space="preserve">(1) &lt;para&gt;A &lt;quote&gt;rose&lt;/quote&gt; is &lt;emph&gt;a&lt;/emph&gt; rose.&lt;/para&gt;</programlisting>
      </para><para>If each element is an individual, then (1) itself, as well as the elements
        <programlisting xml:space="preserve">(2) &lt;quote&gt;rose&lt;/quote&gt;
(3) &lt;emph&gt;a&lt;/emph&gt;</programlisting>
        are individuals. Now, the sum of any two individuals must (by our mereological axioms) be an
        individual. Thus, the sum of (2) and (3) must be an individual and, by our hypothesis, an
        XML element. No matter what model we have in mind for XML elements and documents, it is hard
        to imagine a way in which the sum of (2) and (3) could be an XML element — it
        would be at best two!</para><para>In fact, the goal we have set ourselves here turns out to be self-defeating: It is not
        possible to identify XML elements with individuals, without accepting as individuals parts
        of the document which are not XML elements. In other words, if all XML elements are
        individuals, then some XML documents necessarily give rise to individuals which are not XML elements.<footnote><para>In practice, we may read <quote>nearly all</quote> for <quote>some</quote> here.
            Examples of exceptions would be documents consisting of only one element, or in which
            each element has at most one child element. Examples:
            <programlisting xml:space="preserve">&lt;s&gt;...&lt;/s&gt;
&lt;s&gt;&lt;t&gt;...&lt;/t&gt;&lt;/s&gt;
&lt;s&gt;&lt;t&gt;&lt;u&gt;...&lt;/u&gt;&lt;/t&gt;&lt;/s&gt;</programlisting>
            and so on. Only in such cases may there in fact be a one-to-one correlation between
            elements and individuals.</para></footnote>
      </para><!-- Note to coauthors:
                This may be the place for mentioning a speculation about what it
                would take for a markup language to satisfy a requirement of one-to-one
                correlation between elements and individuals. CH's speculation 1: Darrell
                Raymond's analysis indicates that no embedded markup language can satisfy
                the requirement. (Possible exception: a document consisting of a sequence of
                elements without sub-elements.) CH's speculation 2: The requirement might
                perhaps be met with some sort of stand-off markup. --><para>An obvious fix would be to retain the decision that every element is an individual, but
        allow for composite individuals having more than one element as their parts. This would
        solve the problem of sums, but others would remain (e.g., what elements can the difference
        (1) – (2) be the sum of?). Even taking the closure of elements under sum and
        difference would still not solve a granularity issue in handling text content: Take, for
        example, the strings <quote>
          <code>A </code>
        </quote>, <quote>
          <code> is </code>
        </quote>, and <quote>
          <code> rose.</code>
        </quote>; any given individual would contain either all three or none. There would be no way
        to <quote>separate</quote> those strings.</para><para>Another issue is that the definition of parthood implies nothing about the ordering of
        parts, resulting in the fact that <emphasis role="ital">individuals are
        unordered</emphasis>. Thus, there is no way in our approach to say, for example, that (2)
          <emphasis role="ital">occurs before</emphasis> (3).</para><para>The Calculus of Individuals offers in itself no way of defining ordered pairs <footnote><para><xref linkend="Goodman1972"/>, p. 164. But see also <xref linkend="Pitkanen"/> p. 268</para></footnote> — and thus, relations — as individuals. However, relations
        can be represented by <emphasis role="ital">predicates</emphasis> on individuals. Thus, we
        can order (either totally or partially) our individuals by defining an appropriate binary
        predicate corresponding to the desired relation.</para><para>If we think of individuals as corresponding to objects in an XML data model, and if that
        model allows serializations in which no two distinct elements or characters start at the
        same offset in a serialization<footnote><para>This is the case if we think of XML documents and elements as consisting of
            stretches of consecutive character occurrences (remember we exclude entity declarations
            and references from our discussion), and also with the xPath data model. It is <emphasis role="ital">not</emphasis> necessarily the case with the Infoset data model.</para></footnote> (we will need to deal with characters in later sections), then we can induce a
        total ordering of the individuals that correspond to elements and characters, based on the
        total order among the offsets of their XML counterparts in the serialization. We call that
        order relation <emphasis role="ital">document order</emphasis>.</para><!--Note from YM to co-authors: I believe we do not necessarily need such a
constrained total order as "document order". However, I am not sure exactly what
properties (that document order has) an adequate ordering would have to satisfy.
It would have to be something like: "if an elemental individual has n distinct molecules
as proper parts that exhaust it, then those molecules must be consecutive in
the ordering"; maybe others, I haven't had time to check it thorougly.
So, we're gonna use "document order", which does the job.--><para>Throughout this paper, we assume that document order exists and is well
        defined.<!--When we do not need or wish to totally order the individuals, we
		  may use a partial order that is a restriction of the document order.--></para><!--<para> BEGIN</para> 
		<para>two important issues: one
		  concerning the ordering of parts, and another concerning the granularity of the
		  analysis.</para> 
		<para>Concerning the question of ordering, let us first observe that
		  </para> 
		<para> However we can <emphasis>define</emphasis> an ordering relation
		  giving us the desired result in the following way: We may stipulate that
		  individuals which overlap are not ordered relative to each other,<footnote> 
			 <para> In other words, ordering is an <quote>external</quote>
				relation between individuals. According to this definition, (1) is not ordered
				relative to (2) and (3). We are aware that this goes against some common
				conventions, assumed for example by xPath, according to which (1) is ordered
				before (2). A <quote>non-external</quote> ordering (in which overlapping
				individuals may be ordered) could also be defined if desired. </para> 
		  </footnote> and that discrete individuals have an order derived from
		  their order in the serialized document.<footnote> 
			 <para>Thus, the individual consisting of (2) occurring before (3) is
				different from the individual consisting of (3) occurring before (2). This
				seems to go against one of Goodman's basic principles: Individuals are
				different if and only if they have different parts, i.e., it is not possible to
				compose two different individuals out of the same parts. Individuals consisting
				of the same parts in different ordering relations have been widely discussed in
				the literature as a possible counterexample to this principle. Goodman himself
				does not seem impressed [<xref linkend="Goodman1977"/>, pp.&nbsp;163-64], while
				others have concluded that extensional mereologies cannot account for ordering
				relations. As usual, we choose to be agnostic on this point. We do need
				ordering relations: if this makes our mereology non-nominalistic or even
				non-extensional, so be it.</para> 
		  </footnote> </para> 
		<para> We would, however, probably also want to be able to express an
		  ordering between, for example, the strings <quote>A&nbsp;</quote> and <quote>rose</quote>, or, to speak in
		  more native XML terms, between the PCDATA node labelled <quote>A&nbsp;</quote> and the child of
		  the element node labelled quote and identified as individual (2) above.
		  However, in order for two things to be ordered they must both be individuals.
		  (2) is an individual, but on the current account the string <quote>A&nbsp;</quote> (or the PCDATA
		  node labelled <quote>A&nbsp;</quote>) is not. Thus, in order to deal more
		  properly (or fully) with the question of ordering we are led directly to the
		  question of the granularity of our analysis: We seem to need individuals at a
		  finer level of granularity than the XML element.</para><para> END</para>--><para>So far we have assumed that XML elements containing no sub-elements have no parts, i.e.,
        that they are atoms in our system. A solution may perhaps be to recognize a more generous
        set of individuals. But before we proceed to investigate this, we pause to make a couple of
        observations on other characteristics of the element-as-individual approach.</para><itemizedlist><listitem><para>The lack of a fine enough granularity prevents a satisfactory treatment of strings,
            let alone <emphasis role="ital">parts</emphasis> of strings. </para><para>However we could regard a string as a property of an individual. Thus, although we
            cannot strictly speaking say that in (1) the string <quote>rose</quote> is a part of the
            string <quote>A rose is a rose.</quote>, we could say that an individual having the
            string <quote>rose</quote> as a property is part of an individual having the string
              <quote>A rose is a rose.</quote> as a property. Note that the strings <quote>rose
            is</quote> or <quote>ose i</quote> would not be properties of any individual,
            and thus not a <quote>part</quote> of the document even in this extended sense. </para></listitem><listitem><para>Building a tree structure in which each node is an individual (i.e., an element), in
            which each arc represents a whole-part relationship, and in which the children of each
            node are ordered in document order, produces a tree which is almost identical to the XML
            tree for the same document, except for PCDATA leaf nodes of mixed content elements,
            which would be lost.<footnote><para><!--The approach sketched here would work very much better for
				  XML documents without mixed content. -->This
                might be considered, by some, an interesting observation, since some markup
                theorists have argued against the use of mixed content, either generally or for
                specific applications or uses of markup.</para></footnote> (However empty element leaf nodes would appear in the tree.)</para></listitem></itemizedlist></section><section xml:id="coi2"><title>The tags and PCDATA approach</title><para>Moving one step down in level of granularity, we might take <emphasis>tags and PCDATA
          strings</emphasis> delimited by tags as atomic individuals. Thus (1) would contain the
        following 11 atomic individuals:
        <programlisting xml:space="preserve">&lt;para&gt;
A 
&lt;quote&gt;
rose 
&lt;/quote&gt;
 is 
&lt;emph&gt;
a
&lt;/emph&gt;
 rose.
&lt;/para&gt;</programlisting>
        From these, we might compose composite individuals such as, for example:
        <programlisting xml:space="preserve">&lt;para&gt;
&lt;para&gt;A 
&lt;para&gt;A &lt;quote&gt;
&lt;para&gt;A &lt;quote&gt;rose 
A rose
A  rose.
rose a
&lt;para&gt;A &lt;quote&gt;
A &lt;quote&gt;rose &lt;/quote&gt; is &lt;emph&gt;
rose &lt;/quote&gt;  rose.&lt;/para&gt;</programlisting>
        As a matter of fact, (1) would give rise to no less than 2<superscript>11</superscript>-1 =
        2047 individuals on this account (-1 because there is no <emphasis role="ital">nil</emphasis> individual) — in the interest of the reader we do not list all of
        them here. Only a handful of these individuals would be well-balanced XML fragments, of
        course.</para><para>A total order relation on the atomic individuals based on document order could be
        defined, as in the preceding section. Note that in this case, the sequence of ordered atomic
        individuals is isomorphic to the sequence of events identified by a SAX-like XML
        tokenizer.<!--Thus, we could also identify XML
		  elements, and with well-known methods we could easily define their ordering and
		  parent-child relationships in such a way as to be able to build a DOM-like
		  document model.--></para><para> Observe that although many of the <!--entities recognized as atoms here--> individuals
        could be identified or referenced using xPath or similar XML-aware mechanisms, many of them
        could not. In particular, tag atoms could not (or, at least, it is unclear how and in what
        sense they could). However, the interest of being able to refer to tags individually is not
        obvious. Also, since strings are atoms, it is still impossible to handle parts of strings:
          <quote>ose i</quote> is still not an individual. Therefore, we do not pursue this avenue
        any
        further.<!--Although only at the cost of increased complexity, all these issues
		  could probably be resolved more or less easily, partly with methods outlined
		  below.-->
      </para></section><section xml:id="coi3"><title>The character-atom approach</title><section><title>The approach</title><para>Finally, and moving one further step down in the level of granularity, we take
            <emphasis>character occurrences</emphasis> as the atomic individuals in our application
          of the calculus. For the sake of conciseness, we will use <emphasis role="ital">character</emphasis> as a synonym for <emphasis role="ital">character
          occurrence</emphasis>, except where confusion might
          arise.<!--This passage removed by YM: This is a discussion about types, not tokens; and
our individuals are tokens, not types. I may try to relocate it to the place where types
are discussed.

We might speculate whether best to think of
		  these character-atoms as abstract objects or rather as the concrete marks
		  occurring as parts of physical documents. The character types "a", "b", "c",
		  etc. might be construed as the classes of all "a"-inscriptions,
		  "b"-inscriptions, "c"-inscriptions, etc, or as the composite individuals having
		  all and only inscriptions of the character in question as their parts. We do
		  not believe that a decision one way or the other on this point is of any
		  consequence to what follows, however, and will choose to be agnostic as to the
		  ontological status of characters.-->
        </para><para>The <emphasis role="ital">type</emphasis> of a character occurrence is represented in
          our system by a property of that character occurrence. So any atom (i.e., character
          occurrence) has the property of being an <quote>a</quote>, or a <quote>b</quote>, or a
            <quote>c</quote>, etc., thus populating our vocabulary with one predicate for each of
          the characters of the writing system at hand.<footnote><para>We might allow a character occurrence to have more than one such property. For
              example, it could have the property of being an <quote>a</quote>, as well as that of
              being of some other type. Exploiting this option might be interesting in trying to
              account for multiple readings or interpretations in transcription, such as in [<xref linkend="dh2009"/>]. For the time being, however, we will assume that the ascription
              of one such character-type-property to a particular character excludes the ascription
              of any other character-type-property to that character. </para></footnote>
        </para><para>We define a total order relation on atoms, based on document order, represented by the
          predicate <code>PA(x, y)</code>, true iff <code>x</code> precedes <code>y</code> in
          document order (“P” stands for “precedes” and “A” indicates it is a predicate on atoms).
          The transitive reduction of <code>PA</code> is represented by the predicate <code>NA(x,
          y)</code>, true iff <code>x</code> immediately precedes <code>y</code> in document order
          (“N” stands for “next” and “A” indicates it is a predicate on
          atoms).<!--Sorry guys, it would be too long to make it more consistent...--><!--<footnote> 
			 <para>The fact that the last character of a document has no
				<quote>next</quote> character may be accounted for in standard ways, &mdash; either
				by making an exception for that character, or by giving it the unique property
				of being the <quote>last</quote> character and letting it point to an arbitrary
				character, for example the first character of the document.</para>
Note from YM: you wouldn't want the last to point to the first is you're going to
later take the transitive closure !
		  </footnote>--></para><!--<para> The relation is irreflexive, non-symmetric, and
		  intransitive. For convenience, we also define a <emphasis>preceding
		  character</emphasis> relation, which is the transitive closure of the next
		  character relation. The preceding character relation is irreflexive,
		  non-symmetric and transitive.</para>--><para>Since characters are atomic individuals, all individuals which can be composed on the
          basis of the characters of a document are also individuals, i.e., composite individuals.
          Composite individuals of special interest for our purposes are
          <emphasis>strings</emphasis>. We define strings as individuals which are either atoms, or
          the sum of atoms consecutive in <code>NA</code> order. A string that consists of only one
          character is (also) an atom. There is no such thing as an <quote>empty string</quote>
          (which would have to be the <emphasis role="ital">nil</emphasis> individual). Note that
          strings constitute a tiny fraction of all existing individuals.</para><para>Some strings are of particular interest to us. We define a <emphasis>molecular
          string</emphasis> (or <emphasis role="ital">molecule</emphasis>) as a string that is
          delimited on both sides (in the serialization underlying document order) by a tag, with no
          other tag intervening in between. A total ordering of molecular strings, represented by
          the predicate <code>P(x, y)</code>, is trivially derived from the ordering of atoms
          (itself based on document order). The transitive reduction of <code>P</code> is
          represented by the predicate <code>N(x, y)</code>. (“P” stands for “precedes” and “N” for
          “next”.)
          <!-- We define sums of molecular strings as
                <emphasis>extracts</emphasis>.--></para><para> We define an <emphasis>elemental string</emphasis> as a string delimited by the
          matching tags of an XML element (there may be intervening tags). We do not rely on any
          ordering of elemental strings.
          <!-- We define composites (sums or differences) of elemental strings as
                    <emphasis>constructs</emphasis>. It is also convenient to have a term for
                constructs which are not elemental strings: these we will term
                    <emphasis>derivates</emphasis>.--></para><para>For any given string <code>x</code>, we define (for convenience only) the
            <emphasis>label</emphasis> of <code>x</code> as the sequence of the types of the atoms
          composing <code>x</code>, in <code>NA</code> order. That is, for example, a string is
          labelled <quote>rose</quote> (or has the label <quote>rose</quote>) iff it is the sum of
          atoms of types <quote>r</quote>, <quote>o</quote>, <quote>s</quote>, and <quote>e</quote>,
          and those atoms are <code>NA</code>-ordered so that the one of type <quote>r</quote> comes
          first, the one of type <quote>o</quote> comes second, etc.</para><para>While it might have been plausible to treat tags as a special kind of strings, and
          build elements and nodes with their ordering and parent-child relationship in a way
          similar to that suggested in the tags and PCDATA approach above, instead, we shall regard
          tags simply as delimiting certain string individuals, and ascribing properties to (or
          relations between) those individuals.</para><!--<para>One general and one technical remark are in place here: This might
		  be seen as a way of pursuing the view that markup is not part of the document,
		  but carries information about the documents and its parts. Strictly speaking,
		  the move might also be said to mean that we are not really applying the
		  Calculus of Individuals directly to the marked up document, but rather using
		  the markup to build another object, which in turn consists of its own parts
		  with various properties, to which we then apply the Calculus. On the technical
		  side, it means that we must find some way of dealing with empty elements even
		  though with this move they are not <quote>parts</quote> of the document. We
		  will come back to these issues later.</para>--><!--<para>Since we have now gotten rid of the tags, we have fewer atomic
		  characters to take into account &mdash; in our example (1) there are 17 atomic
		  characters, and thus <quote>only</quote> 131,071 composite individuals, i.e.,
		  strings. Most of these are of little interest to our concerns,<footnote> 
			 <para>It is worth noting, however, that on the current account there
				<emphasis>are</emphasis> individuals labelled "rose is" and "ose i" in (1) &mdash; as
				opposed to the limitations of the previous approaches.</para> 
		  </footnote> so we need to single out certain subsets of them for
		  special consideration.</para>
		--><para> We can now read (1) as follows: <itemizedlist><listitem><para>There are 17 atomic individuals. Their ordered sequence of types is:
                <quote>A</quote>, <quote> </quote>, <quote>r</quote>, <quote>o</quote>,
                  <quote>s</quote>, <quote>e</quote>, <quote> </quote>, <quote>i</quote>,
                  <quote>s</quote>, <quote> </quote>, <quote>a</quote>,
                <quote> </quote>, <quote>r</quote>, <quote>o</quote>, <quote>s</quote>,
                  <quote>e</quote>, and <quote>.</quote>.
                <!--<itemizedlist> 
					 <listitem> 
						<para>For <quote>A</quote> read: "an atomic individual of type <quote>A</quote>, or:
						  <quote>a character having the property of being an A</quote>, etc.</para> 
					 </listitem> 
				  </itemizedlist>-->
              </para></listitem><listitem><para>There are five molecular string individuals. Their ordered sequence of labels
                is: <quote>A </quote>, <quote>rose</quote>,
                <quote> is </quote>, <quote>a</quote>, and
                <quote> rose.</quote>. <!--<itemizedlist> 
					 <listitem> 
						<para>For <quote>A&nbsp;</quote> read: "a molecular string individual
						  labelled <quote>A&nbsp;"</quote>, or, strictly speaking: "a molecular string consisting of a
						  character of type <quote>A</quote> followed by a character of type <quote>&nbsp;</quote> (blank)",
						  etc.</para> 
					 </listitem>
                </itemizedlist>-->
                <!-- <listitem>
                                    <para>Since there are five molecular strings, there must be 31
                                        extracts.</para>
                                </listitem> -->
              </para></listitem><listitem><para>There are three elemental string individuals, labelled <quote>A rose is a
                rose.</quote>, <quote>rose</quote> and
                <quote>a</quote>.<!-- <itemizedlist>
                                <listitem>
                                    <para>Since there are three elemental strings, there must be
                                        seven constructs, whereof four derivates.</para>
                                </listitem>
                            </itemizedlist> -->
              </para></listitem><listitem><para>The elemental string labelled <quote>A rose is a rose.</quote> has the property
                indicated by the generic identifier &lt;para&gt;. <itemizedlist><listitem><para>Note that this does not imply that any of its parts, such as the molecular
                      strings labelled <quote>A </quote>, <quote>rose</quote>, etc., has
                      this property.</para></listitem></itemizedlist>
              </para></listitem><listitem><para>The elemental string labelled <quote>rose</quote> has the property indicated by
                the generic identifier &lt;quote&gt;.</para><!--<itemizedlist> 
				  <listitem> 
					 <para>Note that this and the previous point does not imply that
						any of its parts, such as the characters <quote>r</quote>, <quote>o</quote>, etc. has any of the
						properties indicated by &lt;para&gt; or &lt;quote&gt;.</para> 
				  </listitem> 
				</itemizedlist>--></listitem><listitem><para>The elemental string labelled <quote>a</quote> has the property indicated by the
                generic identifier &lt;emph&gt;. <itemizedlist><listitem><para>Here we have an example of an atom which is also a molecule and an
                      elemental string.
                      <!--At first sight this
						  might seem puzzling &mdash; but it should be noted that the terms
						  <quote>molecular</quote> and <quote>elemental</quote> stand for
						  <emphasis>properties</emphasis> of individuals.--></para></listitem></itemizedlist>
              </para></listitem></itemizedlist>
        </para><para>We introduce the following predicates: 
<table><thead><tr><th>Predicate </th><th>Meaning</th><th>Range of x and y</th></tr></thead><tbody><tr><td/></tr><tr><td>NA(x,y) </td><td>next after x is y (or, x immediately precedes y)</td><td>atoms</td></tr><tr><td> PA(x,y) </td><td>x precedes y</td><td>atoms</td></tr><tr><td>N(x,y) </td><td>next after x is y (or, x immediately precedes y)</td><td>molecules</td></tr><tr><td> P(x,y) </td><td>x precedes y</td><td>molecules</td></tr><tr><td>A(x) </td><td>x is atomic</td><td>any</td></tr><tr><td>M(x) </td><td>x is molecular</td><td>any</td></tr><tr><td>E(x) </td><td>x is elemental</td><td>any</td></tr><!--
        <tr>
            <td>D(x) </td>
            <td>x is a derivate </td>
            <td>any</td>
        </tr> --><tr><td>ccc(x) </td><td>x has the property assigned by ccc (where ccc is an XML generic identifier) </td><td>any</td></tr><tr><td>T("c",x) </td><td>x is of type c (where c is a character type) </td><td>atoms</td></tr><tr><td>L("ccc",x)</td><td>x is labelled ccc (where ccc is a sequence of character types)</td><td>any</td></tr></tbody></table>
</para><para>The last two predicates (T and L) are to be regarded as notational convenience features.<footnote><para>In a <quote>real</quote>system, character type indications enclosed within quotes
              and occurring within two-place predicates, like T(<quote>A</quote>,i01) here, should
              be replaced with one-place predicates using for example Unicode names for character
              values, like T.x0041(i01). Character types are properties, not individuals, and so
              should not really appear as variables in the calculus. One unattractive consequence of
              the shorthand notation used here is that assignment of whitespace characters comes out
              as T(<quote> </quote>,i2), which is both imprecise and perhaps somewhat
              confusing.</para><para>As mentioned, saying that an individual is labelled with a string is merely a
              shorthand for saying that it consists of a sequence of atoms each with certain
              character types as their values. So expressions like
              L(<quote> is </quote>,i20) in the example below are really
              shorthands for more complex expressions referring to the atomic parts of the
              individual i20 and their next and type properties. Assuming that i20=i07+i08+i09+i10,
              what L(<quote> is </quote>,i20) says should be construed as
              something like NA(i07,i08) ∧ NA(i08,i09) ∧ NA(i09,i10) ∧
              T.x0020(i07) ∧ T.x0069(i08)∧ T.x0073(i09)∧
            T.x0020(i10).</para></footnote> We are ignoring potential problems of name conflicts in this presentation
          (which would arise e.g. in the case of a document containing XML generic identifiers
            <quote>A</quote>, <quote>M</quote> or <quote>E</quote>). </para></section><section><title>Examples</title><para>We assign the identifiers i01, i02, i03, etc. <footnote><para>In a working system one would probably use more <!--elaborate--> meaningful
              identifiers. The only requirement on identifiers is that they should identify
              individuals uniquely.</para></footnote> to individuals of (1) and state some facts about them as follows:
 <table><tr><td>T("A",i01)</td><td>A(i01)</td><td>NA(i01,i02)</td></tr><tr><td>T(" ",i02)</td><td>A(i02)</td><td>NA(i02,i03)</td></tr><tr><td/><td/><td/></tr><tr><td>T("r",i03)</td><td>A(i03)</td><td>NA(i03,i04)</td></tr><tr><td>T("o",i04)</td><td>A(i04)</td><td>NA(i04,i05)</td></tr><tr><td>T("s",i05)</td><td>A(i05)</td><td>NA(i05,i06)</td></tr><tr><td>T("e",i06)</td><td>A(i06)</td><td>NA(i06,i07)</td></tr><tr><td/><td/><td/></tr><tr><td>T(" ",i07)</td><td>A(i07)</td><td>NA(i07,i08)</td></tr><tr><td>T("i",i08)</td><td>A(i08)</td><td>NA(i08,i09)</td></tr><tr><td>T("s",i09)</td><td>A(i09)</td><td>NA(i09,i10)</td></tr><tr><td>T(" ",i10)</td><td>A(i10)</td><td>NA(i10,i11)</td></tr><tr><td/><td/><td/></tr><tr><td>T("a",i11)</td><td>A(i11)</td><td>NA(i11,i12)</td></tr><tr><td/><td/><td/></tr><tr><td>T(" ",i12)</td><td>A(i12)</td><td>NA(i12,i13)</td></tr><tr><td>T("r",i13)</td><td>A(i13)</td><td>NA(i13,i14)</td></tr><tr><td>T("o",i14)</td><td>A(i14)</td><td>NA(i14,i15)</td></tr><tr><td>T("s",i15)</td><td>A(i15)</td><td>NA(i15,i16)</td></tr><tr><td>T("e",i16)</td><td>A(i16)</td><td>NA(i16,i17)</td></tr><tr><td>T(".",i17)</td><td>A(i17)</td><td/></tr><tr><td/><td/><td/></tr><tr><td>i18=i01+i02</td><td>M(i18)</td><td>N(i18,i19)</td></tr><tr><td>i19=i03+i04+i05+i06</td><td>M(i19)</td><td>N(i19,i20)</td></tr><tr><td>i20=i07+i08+i09+i10</td><td>M(i20)</td><td>N(i20,i11)</td></tr><tr><td/><td>M(i11)</td><td>N(i11,i21)</td></tr><tr><td>i21=i12+i13+i14+i15+i16+i17</td><td>M(i21)</td><td/></tr><tr><td>i22=i18+i19+i20+i11+i21</td><td/><td/></tr><tr><td/><td/><td/></tr><tr><td>L("A ",i18)</td><td/><td/></tr><tr><td>L("rose",i19)</td><td>E(i19)</td><td>quote(i19)</td></tr><tr><td>L(" is ",i20)</td><td/><td/></tr><tr><td>T("a",i11)</td><td>E(i11)</td><td>emph(i11)</td></tr><tr><td>L("rose.",i21)</td><td/><td/></tr><tr><td>L("A rose is a rose.",i22)</td><td>E(i22)</td><td>para(i22)</td></tr></table>                  
  
        </para><para>The same information may be presented more conspicuously in the following table,
          listing for each individual its identifier, its type, its label, the kind of individual it
          is (A for atoms, M for molecular and E for elemental strings), its assigned properties
          (i.e., properties assigned by an XML generic identifier), its next atom or molecular
          string and its immediate proper parts. <footnote><para>At least as long as we are limiting ourselves to XML the notion <quote>immediate
                proper part</quote> can be given a straightforward and natural definition: <quote>x
                is an immediate proper part of y</quote> =<subscript>df</subscript> (x ≪ y) ∧
              ¬(∃z)((x ≪ z) ∧ (z ≪ y))</para></footnote> 
<table><tr><th>Id</th><th>Type</th><th>Label</th><th>Kind</th><th>Assigned property</th><th>Next atom</th><th>Next molecule</th><th>Immediate parts</th></tr><tr><td>i01</td><td>"A"</td><td/><td>A</td><td/><td>i02</td><td/><td/></tr><tr><td>i02</td><td>" "</td><td/><td>A</td><td/><td>i03</td><td/><td/></tr><tr><td>i03</td><td>"r"</td><td/><td>A</td><td/><td>i04</td><td/><td/></tr><tr><td>i04</td><td>"o"</td><td/><td>A</td><td/><td>i05</td><td/><td/></tr><tr><td>i05</td><td>"o"</td><td/><td>A</td><td/><td>i06</td><td/><td/></tr><tr><td>i06</td><td>"e"</td><td/><td>A</td><td/><td>i07</td><td/><td/></tr><tr><td>i07</td><td>" "</td><td/><td>A</td><td/><td>i08</td><td/><td/></tr><tr><td>i08</td><td>"i"</td><td/><td>A</td><td/><td>i09</td><td/><td/></tr><tr><td>i09</td><td>"s"</td><td/><td>A</td><td/><td>i10</td><td/><td/></tr><tr><td>i10</td><td>" "</td><td/><td>A</td><td/><td>i11</td><td/><td/></tr><tr><td>i11</td><td>"a"</td><td>"a"</td><td>A M E</td><td>emph</td><td>i12</td><td>i21</td><td/></tr><tr><td>i12</td><td>" "</td><td/><td>A</td><td/><td>i13</td><td/><td/></tr><tr><td>i13</td><td>"r"</td><td/><td>A</td><td/><td>i14</td><td/><td/></tr><tr><td>i14</td><td>"o"</td><td/><td>A</td><td/><td>i15</td><td/><td/></tr><tr><td>i15</td><td>"s"</td><td/><td>A</td><td/><td>i16</td><td/><td/></tr><tr><td>i16</td><td>"e"</td><td/><td>A</td><td/><td>i17</td><td/><td/></tr><tr><td>i17</td><td>"."</td><td/><td>A</td><td/><td/><td/><td/></tr><tr><td>i18</td><td/><td>"A "</td><td>    M</td><td/><td/><td>i19</td><td>i01, i02</td></tr><tr><td>i19</td><td/><td>"rose"</td><td>    M E</td><td>quote</td><td/><td>i20</td><td>i03, i04, i05, i06</td></tr><tr><td>i20</td><td/><td>" is "</td><td>    M</td><td/><td/><td>i11</td><td>i07, i08, i09, i10</td></tr><tr><td>i21</td><td/><td>"rose."</td><td>    M</td><td/><td/><td/><td>i12, i13, i14, i15, i16, i17</td></tr><tr><td>i22</td><td/><td>"A rose is a rose."</td><td>         E</td><td>para</td><td/><td/><td>i18, i19, i20, i11, i21</td></tr></table>

        </para><para>The elemental strings i22, i19 and i11 correspond to the XML elements (1)-(3) in a
          fairly straightforward way, and can now be identified for example as follows:
          <programlisting xml:space="preserve">i22 = (℩x)(para(x) ∧ E(x))
i19 = (℩x)(quote(x) ∧ E(x))
i11 = (℩x)(emph(x) ∧ E(x))</programlisting></para><para>The non-elemental molecules i18, i20 and i21 can be identified for example as follows:
          <programlisting xml:space="preserve">i18 = (℩x)(∃y)(quote(y) ∧ N(x,y))
i20 = (℩x)(∃y)(emph(y) ∧ N(x,y))
i21 = (℩x)(M(x) ∧ ¬(∃y)N(x,y))</programlisting></para><para>Although in this particular case the denoting expressions identifying individuals are
          fairly simple, identifying individuals by means of denoting expressions may in general
          become rather tedious. For example, in any document with more than one individual assigned
          the property quote, the denoting expression identifying individual i19 above would return
          the sum of all those individuals.</para><!--  Note to coauthors: You are surely happy that I finally got rid of all the rubbish about derivates:
        
            <para>The derivates of (1) are composites of its elemental strings:
                <programlisting>i23 = (i22 &dif; i19)
i24 = (i22 &dif; i11)
i25 = (i22 &dif; (i19 + i11))
i26 = (i19 + i11)</programlisting></para>

            <para>In tabular form we can display their properties as follows: &table03b; </para>

            <para>Since the derivates do not directly correspond to any of the elements of (1), it
                is significant that they have no type, label, assigned properties and no next atom
                or molecule. It is also significant that although it is clear enough that the
                    <emphasis>individuals</emphasis> i23&ndash;i26 are parts of the document
                according to the definitions introduced here, the <emphasis>parts</emphasis> of
                these individuals are not completely connected by their
                <quote>next</quote>-properties. If they were (and conventions could of course be
                introduced to this effect), i23&ndash;i26 would correspond to the following XML
                fragments, respectively:
                <programlisting>&lt;para&gt;A rose&lt;emph&gt;a&lt;/emph&gt; rose.&lt;/para&gt;
&lt;para&gt;A &lt;quote&gt;rose&lt;/quote&gt; is  rose.&lt;/para&gt;
&lt;para&gt;A  is  rose.&lt;/para&gt;
&lt;quote&gt;rose&lt;/quote&gt;&lt;emph&gt;a&lt;/emph&gt;</programlisting>
            </para> --><para>So although we have shown that all atoms, molecular and elemental strings
          <!--  and derivates -->of (1) can be identified by our relatively straightforward
          application of the Calculus, some of the above examples draw on the simplicity of the
          example and are rather ad hoc. Therefore, before we proceed to discuss how the Calculus
          can be used to make statements and make inferences about a document, we introduce a
          slightly more complicated (and also more realistic) example. </para><para>Consider the following XML document:
          <programlisting xml:space="preserve">&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;doc&gt; 
    A rule:
    &lt;list&gt;
        &lt;item&gt;First:&lt;/item&gt;
        &lt;item&gt;
            &lt;list&gt;
                &lt;item&gt;think,&lt;/item&gt;
                &lt;item&gt;decide.&lt;/item&gt;
            &lt;/list&gt;
        &lt;/item&gt;
        &lt;item&gt;Then:&lt;/item&gt;
        &lt;item&gt;
            &lt;list&gt;
                &lt;item&gt;act,&lt;/item&gt;
                &lt;item&gt;regret.&lt;/item&gt;
            &lt;/list&gt;
        &lt;/item&gt;
    &lt;/list&gt;
&lt;/doc&gt;</programlisting></para><para>Once again we provide identifiers for individuals of the document and present their
          properties and relations in tabular form, but this time we include only the molecular and
          elemental individuals: <footnote><para>We have made life even more comfortable for ourselves by leaving out the
              blankspace molecular atoms which occur between each of the molecules listed in the
              table.</para></footnote>
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Assigned property</th><th>Next molecule</th><th>Immediate parts</th></tr><tr><td>i01</td><td>A rule: </td><td>M</td><td/><td>i02</td><td/></tr><tr><td>i02</td><td>First:</td><td>M E</td><td>item</td><td>i03</td><td/></tr><tr><td>i03</td><td>think,</td><td>M E</td><td>item</td><td>i04</td><td/></tr><tr><td>i04</td><td>decide.</td><td>M E</td><td>item</td><td>i05</td><td/></tr><tr><td>i05</td><td>Then:</td><td>M E</td><td>item</td><td>i06</td><td/></tr><tr><td>i06</td><td>act,</td><td>M E</td><td>item</td><td>i07</td><td/></tr><tr><td>i07</td><td>regret.</td><td>M E</td><td>item</td><td/><td/></tr><tr><td>i08</td><td/><td>E</td><td>list, item</td><td/><td>i03, i04</td></tr><tr><td>i09</td><td/><td>E</td><td>list, item</td><td/><td>i06, i07</td></tr><tr><td>i10</td><td/><td>E</td><td>list</td><td/><td>i02, i08, i05, i09</td></tr><tr><td>i11</td><td/><td>E</td><td>doc</td><td/><td>i01, i10</td></tr></table>

        </para><para>Note that the individuals i08 and i09 are each represented as one individual with two
          assigned properties, rather than as two individuals each with one property. The difference
          between this representation and the conventional XML representation can be illustrated by
          juxtaposing a conventional XML tree of the document (to the left) and what we might call a
          mereological graph (to the right):<footnote><para>It should be noted that the mereological graph here has been construed so as to
              highlight the differences from XML discussed in this particular example, and that
              other important differences do not come out with this kind of visualization.</para><para> For example, the nodes of the XML graph are commonly understood to represent XML
              elements, which in this case have been decorated with their generic identifiers. The
              nodes of the mereological graph, however, represent individuals and are decorated with
              what we have here called there assigned properties. Moreover, the nodes visible in the
              mereological graph represent only a tiny fraction of the individuals of the document. </para><para>The arcs of the XML graph are commonly understood to represent containment and/or
              dominance relations between elements. In the mereological graph, they represent
              exclusively part-whole relationships. Again, the number of part-whole relationships
              depicted in the graph represent only a fraction of the part-whole relationships
              between the individuals of the document.</para></footnote>
          <mediaobject><imageobject><imagedata format="jpg" fileref="../../../vol3/graphics/Huitfeldt01/Huitfeldt01-001.jpg"/></imageobject></mediaobject>
        </para><para>Because of our decision not to count tags as part of the document, all coextensive XML
          elements will be represented as one elemental individual. The nesting order of these
          elements in the XML document will not be preserved in this representation. <footnote><para>It might of course seem that the nesting order is preserved by the order in which
              the assigned properties are mentioned in the table. However the table represents an
              unordered set of statements, so the order is insignificant. More on nesting order of
              coextensive elements further below.</para></footnote>
        </para><para>As before, we can use denoting expressions to refer to any part of the document, for
          example:
          <programlisting xml:space="preserve">i01 = (℩x)¬(∃y)N(y,x)
i02 = (℩x)(item(x) ∧ ¬(∃y)(item(y) ∧ P(y,x)))
i03 = (℩x)(∃y)(∃z)(w)(v)
      ((x ≪ y) ∧ list(y) ∧ 
      (y ≪ z) ∧ list(z) ∧
      (N(w,x) → ¬(w ≪ y)) ∧ 
      (N(v,w) → ¬(v ≪ z))) 
i09 = (℩x)(∃y)(∃z)
      (list(x) ∧ (x ≪ y) ∧ list(y) ∧
       list(z) ∧ (z ≪ y) ∧ ¬(x = z) ∧ P(x,z))</programlisting>
        </para></section><section><title>Statements and inferences</title><para>We can also use the Calculus to make statements about the document —
          unquantified, such as (1)–(4), or quantified, such as (5)–(8):
          <programlisting xml:space="preserve">(1) list(i09)
(2) item(i09)
(3) i07 ≪ i09
(4) i09 ≪ i10
(5) (x)(y)((list(x) ∧ item(x) ∧ (y ≪ x)) → item(y))
(6) (x)(y)((list(x) ∧ item(x) ∧ (x ≪ y)) → (list(y) ∨ doc(y)))
(7) (x)(item(x) → (∃y)((x ≪ y)  ∧ list(y)))
(8) (x)(item(x) → (∃y)(∃z)
   (item(y) ∧ list(z)  ∧ (x ≪ z) ∧ (y ≪ z)  ∧ ¬(x = y)))</programlisting>
          In order to avoid unnecessary misunderstanding, it should be pointed out that
          (1)–(8) are descriptive statements about this particular document. (In other
          context, such as for example situations where we wanted to express general constraints on
          document structure, we might of course also want to state facts about document
            <emphasis>types</emphasis>, but that is not our issue here.) </para><para>From the statements we can make inferences, such as for example:
          <programlisting xml:space="preserve">
(9) item(i07)
     [From (1), (2), (3) and (5).]
(10) list(i10) ∨ doc(i10)
     [From (1), (2), (4) and (6).]
(11) (∃y)((i09 ≪ y) ∧ list(y))
     [From (2) and (7).]
(12) (∃y)(∃z)(item(y) ∧ list(z) ∧ (i07 ≪ z) ∧ (y ≪ z) ∧ ¬(i07 = y))
     [From (8) and (9).]</programlisting>
        </para></section><section><title>Conclusion</title><para>We have shown that strings composed of characters defined as atomic individuals can be
          identified and referenced by denoting expressions, that the Calculus can be used to
          describe the part-whole relationships and ordering relations between parts of the document
          as well as the properties ascribed by generic identifiers. We have also shown that this
          application of the Calculus can be used for making statements about documents and for
          drawing inferences from these statements.</para><para>The approach chosen here has at least two obvious problems, or shortcomings; one
          concerns the representation of coextensive elements, one relates to the representation of
          empty elements. Before we discuss these problems, however, we would like to assess one of
          its possible merits. In the next section, we will therefore sketch how this application of
          the Calculus can be used for the formulation of rules for propagation of properties among
          the parts of a document.</para></section></section></section><section xml:id="propagation"><title>Property Propagation — a Sketch</title><para>We have assumed that the generic identifier of an element may be seen as assigning a
      property to the PCDATA content of that element, and not to any proper part of that PCDATA
      content. But sometimes, the meaning of the markup is such that that property is not assigned
      (or not only assigned) to the contents of the element itself, but also to all or some of its
      descendants, or to all or some of its ancestors, or to one or more of its siblings, or to only
      specific other elements. Furthermore, what is assigned to the element or elements in question
      may be not a monadic property, but a relation of them to other elements in the same document,
      or even to document elements or other entities outside that document. Thus, the propagation of
      properties ascribed by the generic identifier of an element may follow a large diversity of
      patterns.</para><para>Using examples from the TEI and HTML encoding schemes, we will show that some of these
      patterns can conveniently be described by means of our application of the Calculus. We will
      first address some of the general distribution patterns identified by Nelson Goodman, which
      seem to represent important aspects of the intended semantics of certain TEI or HTML element
      types. We will then proceed to more complicated examples.</para><section><title>Dissective and anti-dissective properties</title><para>As mentioned, in our application of the Calculus so far we have assumed that the
        property designated by the generic identifier of an XML element is assigned exclusively to
        the individual delimited by the start and end tags of the element, and not to its parts.
        This seems plausible enough for a number of element types, such as paragraphs, list items
        and titles. For example, a part of a paragraph, a list item or a title is not in general
        itself a paragraph, a list item or a title.</para><para>TEI element types such as &lt;hi&gt; (highlighting)<footnote><para>In the following we will often use the expression <quote>element</quote> or
              <quote>element type</quote> as short for <quote><emphasis>property</emphasis> ascribed
              to an element by its generic identifier</quote>.</para></footnote> or &lt;add&gt; (added), however, do not seem to follow this rule. Every
        part of a highlighted or added element is itself presumably highlighted or added. Other
        examples may be &lt;del&gt; (deleted) and &lt;foreign&gt;. The HTML element
        type &lt;i&gt; (italics) may provide an even clearer example here — every
        part of an italicized element is itself in italics. </para><para>According to Goodman, <quote>a ... predicate is ... <emphasis>dissective</emphasis> if
          it is satisfied by every part of every individual that satisfies it</quote> [<xref linkend="Goodman1972"/>, p. 38]. A dissective one-place predicate is defined as
        follows:
        <programlisting xml:space="preserve">F is dissective iff (x)(y)((F(x) ∧ (y &lt; x)) → F(y))</programlisting>
      </para><para>Consider the following document fragment:
        <programlisting xml:space="preserve">
&lt;s&gt;We
   &lt;add&gt;, as all 
      &lt;del&gt;purely &lt;hi&gt;human&lt;/hi&gt; and&lt;/del&gt; 
   finite beings,
   &lt;/add&gt; 
are all fallible.&lt;/s&gt;</programlisting>
        As earlier, we represent the properties of this fragment in tabular form. From now on,
        however, in stead of indicating <quote>assigned properties</quote> for each individual we
        will list relevant statements (some of which may be inferences from statements about the
        properties of other individuals): 
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Statements</th><th>Next</th><th>Parts</th></tr><tr><td>i01</td><td>We</td><td>M</td><td/><td>i02</td><td/></tr><tr><td>i02</td><td>, as all </td><td>M</td><td/><td>i03</td><td/></tr><tr><td>i03</td><td>purely </td><td>M</td><td/><td>i04</td><td/></tr><tr><td>i04</td><td>human</td><td>M E</td><td>hi(i04)</td><td>i05</td><td/></tr><tr><td>i05</td><td> and</td><td>M</td><td/><td>i06</td><td/></tr><tr><td>i06</td><td> finite beings,</td><td>M</td><td/><td>i07</td><td/></tr><tr><td>i07</td><td> are all fallible.</td><td>M</td><td/><td/><td/></tr><tr><td>i08</td><td/><td>E</td><td>del(i08)</td><td/><td>i03, i04, i05</td></tr><tr><td>i09</td><td/><td>E</td><td>add(i09)</td><td/><td>i02, i08, i06</td></tr><tr><td>i10</td><td/><td>E</td><td>s(i10)</td><td/><td>i01, i08, i09, i07</td></tr></table>

        However, if we add the following statements
        to the effect that the properties add, del and hi are dissective:
        <programlisting xml:space="preserve">(x)(y)((add(x) ∧ (y &lt; x)) → add(y))
(x)(y)((del(x) ∧ (y &lt; x)) → del(y))    
(x)(y)((hi(x) ∧ (y &lt; x)) → hi(y))</programlisting>
        — then, we can infer additional properties, with the following result:
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Statements</th><th>Next</th><th>Parts</th></tr><tr><td>i01</td><td>We</td><td>M</td><td/><td>i02</td><td/></tr><tr><td>i02</td><td>, as all </td><td>M</td><td>del(i02) </td><td>i03</td><td/></tr><tr><td>i03</td><td>purely </td><td>M</td><td>del(i03), add(i03)</td><td>i04</td><td/></tr><tr><td>i04</td><td>human</td><td>M E</td><td>hi(i04), del(i04), add(i04)</td><td>i05</td><td/></tr><tr><td>i05</td><td> and</td><td>M</td><td>del(i05), add(i05)</td><td>i06</td><td/></tr><tr><td>i06</td><td> finite beings,</td><td>M</td><td>del(i06) </td><td>i07</td><td/></tr><tr><td>i07</td><td> are all fallible.</td><td>M</td><td/><td/><td/></tr><tr><td>i08</td><td/><td>E</td><td>del(i08), add(i08)</td><td/><td>i03, i04, i05</td></tr><tr><td>i09</td><td/><td>E</td><td>add(i09)</td><td/><td>i02, i08, i06</td></tr><tr><td>i10</td><td/><td>E</td><td>s(i10)</td><td/><td>i01, i08, i09, i07</td></tr></table>

        (Note that this is the first example so far of non-elemental individuals
        carrying assigned properties.) </para><para>Goodman observes that <quote>In practice, we are usually concerned only with
          disectiveness under some special or systematic limitations...</quote> [<xref linkend="Goodman1972"/>, p. 38]. This seems to be the case here, too: While the
        TEI elements &lt;hi&gt;, &lt;add&gt; and &lt;del&gt; and the HTML
        element &lt;i&gt; seem to apply all the way down to every atomic part of an
        individual, an element type like &lt;foreign&gt; hardly applies below word-level. </para><para>Furthermore, there seem to be exceptions even in the case of &lt;hi&gt;,
        &lt;add&gt; and &lt;del&gt;: In a transcription, a &lt;note&gt;
        (note) element is normally not intended to inherit the property in question. A more
        generally usable formula for disectiveness may therefore be this:
        <programlisting xml:space="preserve">(x)(y)(z)((F(x) ∧ (y &lt; x) ∧ 
   ¬((z &lt; x) ∧ (y &lt; z) ∧ (G(z) ∨ H(z) ∨ ...))) 
   → F(y))</programlisting>
        where G, H,... indicate exceptions. </para><para>Let us define an <emphasis>anti-dissective</emphasis> one-place predicate as follows: <footnote><para>The term <quote>anti-dissective</quote> (and its definition) is ours, not Goodman's.
            The same goes for the terms <quote>anti-expansive</quote> and
            <quote>anti-collective</quote> in the following paragraphs.</para></footnote>
        <programlisting xml:space="preserve">F is anti-dissective iff (x)(y)((F(x) ∧ (y ≪ x)) → ¬F(y))</programlisting>
      </para><para>The TEI element &lt;docDate&gt; (document date) and the TEI and HTML
        &lt;body&gt; may serve as examples of anti-dissective properties, — no
        part of a &lt;docDate&gt; or a &lt;body&gt; element is itself a
        &lt;body&gt; or a &lt;docDate&gt;. The HTML &lt;p&gt; (paragraph)
        element is also clearly anti-dissective.</para><para> The TEI &lt;p&gt; element presents a complication. It would seem to be
        anti-dissective, but unlike HTML, TEI allows &lt;p&gt;s nested within
        &lt;p&gt;s. So
        <programlisting xml:space="preserve">(x)(y)((p(x) ∧ (y ≪ x)) → ¬p(y))</programlisting>
        is true in HTML, but not in TEI. The TEI &lt;p&gt; element can therefore not be said
        to be either dissective or anti-dissective.<footnote><para>A reflection upon this fact may also make us change our judgement of the HTML
            &lt;p&gt; element: Perhaps it is just a result of the content model of
            &lt;p&gt; in HTML that it seems anti-dissective. Anyhow, since nested
            &lt;p&gt;s simply do not occur in HTML, it does not matter much whether we
            classify the property as non-dissective or anti-dissective.
            <!-- Note to coauthors: 
       A remark about the relationship between syntax and semantics could be in place here.
       (In semantics, we are assuming syntacally valid documents, so for example in this case 
       there is no need to consider semanitc propagation rules for invalid HTML.
     -->
          </para></footnote>
      </para></section><section><title>Expansive and anti-expansive properties</title><para><quote>A one-place predicate is <emphasis>expansive</emphasis> if it is satisfied by
          everything that has a part satisfying it. </quote>[<xref linkend="Goodman1972"/>,
        p. 38]. An expansive one-place predicate can be defined as follows:
        <programlisting xml:space="preserve">F is expansive iff (x)(y)((F(x) ∧ (x &lt; y)) → F(y))</programlisting>
        In more conventional XML terms, while dissective predicates propagate <quote>down</quote>
        the document tree, expansive predicates propagate <quote>upwards</quote> in the tree, from
        children to their parents. This might be thought to be unusual, and actually it is difficult
        to find examples of such properties in the TEI and HTML encoding schemes. Element types such
        as &lt;docDate&gt; and &lt;docAuthor&gt; may, as we shall see later, be said
        to ascribe properties to individuals of which they are a part, but that does not make these
        individuals themselves &lt;docDate&gt;s or &lt;docAuthor&gt;s. (Even so, it
        easy to think of expansive properties: — for example, the property of
          <emphasis>containing the word <quote>Hamlet</quote></emphasis> would clearly be
        expansive.) </para><para>Let us define an anti-expansive property as follows:
        <programlisting xml:space="preserve">F is anti-expansive iff (x)(y)((F(x) ∧ (x ≪ y)) → ¬F(y))</programlisting>
        The TEI element &lt;foreign&gt; may be an example of a property which is
        anti-dissective, at least up to a certain level, and at least insofar as it seems reasonable
        to assume that if something is marked as foreign, then it is marked off from something which
        is <emphasis>not</emphasis> in a foreign language.
        <!-- Note to coauthors:
          foregin is probably not a good example here, the TEI rules are at best compatible with but do not 
          imply what is said above. Can we find a better example? -->
      </para></section><!-- Note to coauthors:
      The following section on collective and anti-collective properties does very little work here, 
      especially since I haven't found any good of such properties in TEI or HTML (apart from 
      possibly anti-collective divs). 
      Can you think of any? If not, we should probably collectively kill the section. 
 --><section><title>Collective and anti-collective properties</title><para><quote>That a one-place predicate is <emphasis>collective</emphasis> means that it is
          satisfied by the sum of every two individuals (distinct or not) that satisfy it
        severally</quote> [<xref linkend="Goodman1972"/>, p. 39]. A collective one-place
        predicate can be defined as follows:
        <programlisting xml:space="preserve">F is collective iff (x)(y)((F(x) ∧ F(y)) → F(x + y))</programlisting>
        Dissective elements like the TEI elements &lt;hi&gt;, &lt;add&gt;,
        &lt;del&gt; and &lt;foreign&gt; and the HTML element &lt;i&gt; seem
        also to be collective: any sum of strings in italics would seem itself to be in italics,
        etc. There probably are examples of expansive and non-dissective or anti-dissective
        properties in TEI or HTML, but so far we have not found any.</para><para>Let us define an anti-collective property as follows:
        <programlisting xml:space="preserve">F is anti-colletive iff (x)(y)((F(x) ∧ F(y) ∧ (x ʅ y)) → ¬F(x + y))</programlisting>
        Both the TEI and the HTML &lt;div&gt; (division) element types seem to be
        anti-collective: no sum of &lt;div&gt;s is itself a &lt;div&gt;.</para></section><section><title>The HTML title element</title><para>So far, we have been concerned only with one-place predicates.<footnote><para>We have simply tried to find examples of the patterns Goodman terms
              <quote>dissective</quote>, <quote>expansive</quote> and <quote>collective</quote>, and
            added the corresponding patterns <quote>anti-dissective</quote> etc. Goodman also
            identifies patterns he terms <quote>nucleative</quote>, <quote>pervasive</quote>,
              <quote>cumulative</quote> and <quote>agglomerative</quote> [<xref linkend="Goodman1972"/>, p. 39–40]. We do not discuss these here,
            as we have not found any interesting application of them for the present purposes. In
            particular, a nucleative property is a property such that
            <programlisting xml:space="preserve">F is nucleative iff (F(x) ∧ F(y)) → F(x · y)</programlisting>
            Since XML has no elements which overlap without the one being a part of the other, the
            product of two element strings is always a part of one of them. Therefore, although the
            pattern does not have any interesting applications to XML — it may have for
            markup systems such as xConcur, TexMecs, Goddag, LMNL and others which allow overlapping
            elements.</para></footnote> Many TEI and HTML elements ascribe properties according to more complicated
        patterns which can more conveniently be accounted for by representing them as relations, or
        predicates with two or more places. </para><para> We begin with a simple example of an element expressing a two-place predicate, the HTML
        title element. From: <programlisting xml:space="preserve">&lt;!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
    &lt;head&gt;
        &lt;title&gt;Simple HTML&lt;/title&gt;
    &lt;/head&gt;
    &lt;body&gt;
        &lt;p&gt;First para&lt;/p&gt;
        &lt;p&gt;Second para&lt;/p&gt;
    &lt;/body&gt;
    &lt;/html&gt;</programlisting> we get: 
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Statements</th><th>Next</th><th>Parts</th></tr><tr><td>i01</td><td>Simple HTML</td><td>M E</td><td>head(i01), title(i01)</td><td>i02</td><td/></tr><tr><td>i02</td><td>First para</td><td>M E</td><td>p(i02)</td><td>i03</td><td/></tr><tr><td>i03</td><td>Second para</td><td>ME </td><td>p(i03)</td><td/><td/></tr><tr><td>i04</td><td/><td>E</td><td>body(i04)</td><td/><td>i02, i03</td></tr><tr><td>i05</td><td/><td>E</td><td>html(i05)</td><td/><td>i01, i04</td></tr></table>

      </para><para> We state the propagation rule that:
        <programlisting xml:space="preserve">(x)(y)((title(x) ∧ (x &lt; y) ∧ html(y)) → hasTitle(y,x))</programlisting>
        and get for the last line of the previous table: 
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Statements</th><th>Next</th><th>Parts</th></tr><tr><td>i05</td><td/><td>E</td><td>html(i05), hasTitle(i05,i01)</td><td/><td>i01, i04</td></tr></table>
  
      </para><para>The fact that the propagation rule can be made so simple in this case is partly due to
        the fact that we are assuming that the document is valid, and that the relative structural
        positions of the elements are constant. For example, there is no need to state that the
        title element has to be the child of a head element which in turn is directly succeeded by a
        body element etc. </para></section><!-- Note to coauthors: 
       I had thought the propagation rules for the TEI head element were more complex than 
       they really are. So I have killed this example.
       <section>      <title>The TEI head element</title>
      <para>The propagation rule for the TEI head (heading) element is basically quite similar to
        the rule for the HTML title element. But the TEI head element may occur in a large number of
        different structural positions whithin a document, and therefore which element it assigns a <quote>header</quote> to depends
        on its context. Unvariably, however, the head element assigns a header to its direct ancestor. </para>
      
      <para>From:
        <programlisting>&lt;div&gt;
          &lt;head&gt;On Denoting&lt;/head&gt;
          &lt;p&gt;Denoting is fun.&lt;/p&gt;
          &lt;p&gt;It is useful, too.&lt;/p&gt;
          &lt;/div&gt;</programlisting>
        we get: 
      </para>
      <para> We state that <programlisting>...</programlisting> and get:
        <programlisting>da da da (table)</programlisting>
      </para>
      <para>bla bla about the above.</para>
    </section>
--><section><title>The TEI sp, speaker and stage elements</title><para>While it is quite legitimate to assume document validity when stating propagation rules,
        these rules tend to become more complex when more elements are involved, and/or the rules
        for the structural positions of the elements concerned are more complex. </para><para>The relation between the TEI elements &lt;sp&gt; (speech),
        &lt;speaker&gt; and &lt;stage&gt; (stage direction) is that a
        &lt;sp&gt; may contain a &lt;speaker&gt;, and if it does, the
        &lt;speaker&gt; element contains the name of the speaker of the rest of the
        &lt;sp&gt; element, except for any &lt;stage&gt;s (stage directions) it
        might contain. From: </para><para>
        <programlisting xml:space="preserve">&lt;sp&gt;
    &lt;speaker&gt;Peer&lt;/speaker&gt; 
    Why 
    &lt;stage&gt;(hesitating)&lt;/stage&gt; 
    swear?
&lt;/sp&gt;
</programlisting> we get: 
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Statements</th><th>Next</th><th>Parts</th></tr><tr><td>i01</td><td>Peer</td><td>M E</td><td>speaker(i01)</td><td>i02</td><td/></tr><tr><td>i02</td><td>Why</td><td>M</td><td/><td>i03</td><td/></tr><tr><td>i03</td><td>(hesitating)</td><td>M E</td><td>stage(i03)</td><td>i04</td><td/></tr><tr><td>i04</td><td>swear?</td><td>M</td><td/><td/><td/></tr><tr><td>i05</td><td/><td>E</td><td>sp(i05)</td><td/><td>i01, i02, i03, i04</td></tr></table>

        We state the following propagation rule:
        <programlisting xml:space="preserve">(x)(y)((speaker(x) ∧ (x &lt; y) ∧ sp(y)) → 
          (z)(((z &lt; y) ∧ ¬(speaker(z) ∨ stage(z))) → saidBy(z,x)))</programlisting>
        and get:
<table><tr><th>Id</th><th>Label</th><th>Kind</th><th>Statements</th><th>Next</th><th>Parts</th></tr><tr><td>i01</td><td>Peer</td><td>M E</td><td>speaker(i01)</td><td>i02</td><td/></tr><tr><td>i02</td><td>Why</td><td>M</td><td>saidBy(i02,i01)</td><td>i03</td><td/></tr><tr><td>i03</td><td>(hesitating)</td><td>M E</td><td>stage(i03)</td><td>i04</td><td/></tr><tr><td>i04</td><td>swear?</td><td>M</td><td>saidBy(i04,i01)</td><td/><td/></tr><tr><td>i05</td><td/><td>E</td><td>sp(i05)</td><td/><td>i01, i02, i03, i04</td></tr></table>

      </para></section><section><title>The TEI docTitle, docDate and docAuthor elements</title><para> The TEI &lt;docTitle&gt; (document title) element may occur directly within
        &lt;titlePage&gt; or &lt;front&gt; (front matter); &lt;titlePage&gt;
        may occur directly within &lt;front&gt; or &lt;back&gt; (back matter), and
        &lt;front&gt; and &lt;back&gt; may occur directly within
        &lt;text&gt;. &lt;docTitle&gt; behaves very much like the HTML
        &lt;title&gt; element:
        <programlisting xml:space="preserve">(x)(y)((docTitle(x) ∧ (x &lt; y) ∧ text(y)) → hasTitle(y,x))</programlisting>
        &lt;docTitle&gt; assigns the property of <emphasis>being</emphasis> a document title
        to its own content, and the property of <emphasis>having</emphasis> that title to the
        individual which carries the property of being a text, and of which it is itself a part.
        Thus, while no other parts of the elemental text individual have any of these properties,
        all its parts have the property of being the <emphasis>part</emphasis> of an individual
        which carries the title in question. </para><!-- docTitle has become so similar to html title that html title seems to have become superfluous. --><para>The &lt;docDate&gt; (document date) element, in turn, behaves very much like the
        &lt;docTitle&gt; element. Although it may occur in a larger variety of positions, it
        assigns the property of <emphasis>being</emphasis> (or identifying) the date of the document
        to its own content, and the property of <emphasis>having</emphasis> that date to the
        individual which carries the property of being a text, and of which it is itself a part. </para><para>We may assume, however, that the document date carries over to most or all the parts of
        the text, i.e., that all the parts of the element have the property of having that date,
        too.
        <!-- Discuss: is the property collective? To some extent, but it doesn't necessarily apply to itself (the docDate may be added later), and it does not extend upwards beyond text level.
  Is it a property which is propagated upt to a certain level, from which it is dissective? To some extent, but it does not apply to adds and notes. -->
        If we are dealing with a transcription of an authorial document which according to the
        &lt;docDate&gt; element dates from a particular year, it may be the case that we
        also know that all parts of the document marked by &lt;add&gt; contain corrections
        in that document made by another person several years later, and that all
        &lt;note&gt;s are editorial notes supplied even later than that, by the creator of
        the electronic version. A propagation rule to this effect may be expressed for example as
        follows:
        <programlisting xml:space="preserve">(x)(y)(z)(w)((docDate(x) ∧ (x &lt; y) ∧ text(y)) →
   (((z &lt; y) ∧ ¬((z &lt; w) ∧ (add(w) ∨ note(w)))) →
   (hasDate(y,x) ∧ hasDate(z,x))))</programlisting>
        Note, however, that in some situations the TEI &lt;docDate&gt; element gives the date of the 
        first edition of the text, while the text actually transcribed by the document comes from a later edition. In such situations 
        the semantics of the element is rather different, and the property of having the date given may possibly not propagate to elements below &lt;text&gt; level at all.
      </para><para>The &lt;docAuthor&gt; (document author) element, again, behaves much like the
        &lt;docDate&gt; element. It assigns the property of <emphasis>being</emphasis> the
        name of the author of the document to its own content, and the property of
        <emphasis>having</emphasis> the author of that name to the text of which it is a part. </para><para>In the example just discussed, we may again assume that the property, in this case the
        property of having the author in question, is not carried over to later additions and notes.
        Other element types, such as &lt;q&gt; (quote) &lt;cit&gt; (citation), would
        for more or less obvious reasons also have to be considered for exclusion. However, there is
        a further complication: If a person is considered the author of a document, he is normally
        also considered the author of parts of that document, such as its chapters, sections and
        paragraphs. Perhaps authorship may also be attributed to sentences or phrases, but certainly
        not to individual words or letters. Again we are faced with a property which propagates down
        to a certain level, but where it is unclear exactly where that level ends. And as is so
        often the case with markup, it does not help us much to become clear about the level at
        which the propagation ends, be it subparagraphs, sentences or phrases, if it turns out that
        the elements at that level have not been marked up. </para><!-- Note to coauthors: 
        For polishing: End with a remark here about this as a reminder that:
      1) One needs to care about what one marks - if you care about this and want authorship attribution to stop at subsection or sentence or phrase level, you had better mark these levels. 
      2) Working out propgation rules helps us understand the semantics of our own markup.
      For example, in this case a reminder of the distinction between "authored by" and "written by" (authorship may seem to apply at type level and have something to do with originality, the first original formulation of a token of a particular type. Writing has to do with the creation of tokens. (Shakespare is the author of the sentence "To be or not to be, that is the question", although that sentence has since been written by many. But also originality in another sense: Is there an author of "He looked at her."? Would the first utterer of such a sentence be counted as its author??)
      --></section><!-- We really need some examples of sideways propagation. 
    Find some in teiHeader, or perhaps in subelements of the TEI bibl element?--></section><section><title>Problems</title><para> We have mentioned that there are at least two serious problems with our application of
      the Calculus. One problem, which has already been identified, relates to the representation of
      coextensive elements. The other problem, which relates to the representation of empty
      elements, has only been mentioned in passing. We believe this is the least serious of the two,
      and we will therefore discuss that first.</para><section><title>Empty elements</title><para>For the purposes of this discussion, we may conveniently distinguish between milestone
        elements and other empty elements</para><section><title>Milestone elements</title><para>Milestones are empty elements which ascribe properties to parts of a document, but
          which for various reasons are represented by empty elements. The reason why some textual
          phenomena are represented by milestones rather than ordinary elements is often a need to
          overcome the XML constraint that element structure must be hierarchical.</para><para>Typically, a milestone may be seen as assigning a property to the following parts of
          the document, up to the next milestone element of the same type, up to the occurrence of
          an element of some specific other type, or to the end of the document. We think we have
          already demonstrated that our application of the Calculus to XML documents can handle such
          property assignment.</para><para> We believe that many of the other mechanisms proposed to handle so-called overlapping
          hierarchies in XML (for example, <quote>Trojan Horse</quote> milestones, [<xref linkend="DeRose"/>] and fragmented or virtual elements [<xref linkend="teip4"/>]) can be
          handled in similar ways, and therefore do not constitute a serious problem for our
          application of the Calculus. <!-- Quite on the contrary, using this application to propagate properties involved
          in the use of such mechanisms may help and support the processing of non-standard
          mechanisms, which is often cumbersome in XML applications. -->
          <!-- Phew, that was rough! --></para></section><section><title>Other empty elements</title><para>Empty elements which are not milestones typically stand for and/or ascribe properties
          to some part of the document which cannot straightforwardly be represented as a character
          or string of characters. These empty elements are more difficult to deal with, because
          according to our application of the Calculus something which cannot be said to consist of
          character atoms simply cannot be an individual. And if it is no individual there seems to
          be nothing to which properties can be ascribed; only individuals can have properties. </para><para>The TEI elements &lt;ptr&gt; (pointer), &lt;anchor&gt; (anchor point),
          &lt;index&gt; (index entry) and &lt;divGen&gt; (automatically generated
          text division) are some examples. Either they indicate a point in the document, i.e., they
          have no <quote>extension</quote> in the terms of our application of the Calculus and would
          seem to have to be located in a position between two atoms. Or they do not indicate any
          point or extension in the document, but rather an instruction to generate strings with
          certain properties at the position they are located. In some cases, the problems outlined
          here can be solved by replacing the empty element in question with a character string,
          taken for example from an attribute value of the element in question. In cases where the
          element occupies or points to a location between characters, we might find a practical
          workaround by letting it apply or point instead to the atom immediately before or after
          the relevant location in our model of the document. </para><para>A slightly different kind of problem is presented by the TEI &lt;graphic&gt;
          (inline graphic, illustration, or figure) and HTML &lt;graphic&gt; elements. The
          basic meaning of these elements is easy enough to catch: The occurrence of the element
          indicates that an illustration or a figure occurs at a specific location in the document.
          Therefore, a more appropriate solution to this as well as to the previously mentioned
          examples is probably to lift the requirement that all atoms should have a character type
          as a property. A graphics element, for example, might simply be represented in our model
          by a <quote>graphics</quote> atom. </para><para>More generally, this would be a model in which a document consists not of a sequence
          of character atoms, but of a sequence of some more generic kind of atoms. We might, for
          example, agree to call them atomic <quote>content objects</quote>, and concede that such
          atoms may or may not have a character property, an <quote>image</quote> property etc.
          Although we have not investigated the matter, we believe that such a modification would
          not drastically change the application of the Calculus described above.</para></section><!-- Note to coauthors:
      Wanted to have a section here on TEI gap element, and a concluding section. 
      Given up because of time constraints.
      <section>
        <title>The TEI gap element</title>
        <para>The TEI gap element <quote>indicates a point where material has been omitted in a
            transcription, whether for editorial reasons described in the TEI header, as part of
            sampling practice, or because the material is illegible, invisible, or
          inaudible.</quote> [ref] ...</para>
        <para>...Here is the really sticky problem ... suggest that the gap element is seen as 
          signalling a break in the sequence assigned by ordering predicates between some of 
          the individuals of the document. (Not the next atom or next molecule predicate, of 
          course, but perhaps if there is some <quote>next in  reading order</quote> predicate... this is admittedly 
          pretty faint....</para>
      </section>
      <section>
        <title>Conclusion</title>
        <para>The only problem which really cannot in any way be solved or worked around, is the gap
          problem. Suggest it would be a problem to any approach along our lines, and therefore
          either decisive or irrelevant, as one choses to see it.</para>
      </section>
      --></section><section><title>Coextensive elements</title><para>We have already exemplified and briefly discussed the problem with coextensive elements:
        If two or more nested elements have exactly the same content, i.e., share exactly the same
        leaf nodes in the XML tree, they will be represented in our application of the Calculus as
        one individual sharing all the properties ascribed by the nested XML elements. What kind of
        problem this is, and whether and how it can be solved, depends on the wider requirements and
        aims for our application of the Calculus to markup. Under certain requirements or
        perspectives, it may cease to be a problem.</para><para>If our aim is to establish a representation from which the serialized form of an XML
        document can be regenerated, we obviously have a problem: It is by no means obvious if or
        how this could be done. Likewise, if our aim is to establish a representation from which the
        XML DOM, the XDM or the XML Infoset representation can be generated, or which is isomorphic
        to and/or contains (all) the information given in any of those, then it is perhaps even more
        obvious that we have a problem.</para><para>We have two responses to this: On the one hand, the value of the approach presented here
        does not depend on such capabilities. The value of the approach to property propagation, for
        example, may be simply as an ancillary representation of some of the features of marked-up
        documents, a representation which is not intended to capture <quote>all</quote> the
        information present in XML documents but rather to assist in the processing of such
        documents. Therefore, the problem discussed here is a problem only to the extent that it
        impedes our work to realize this more modest aim. So far, we have not found any indication
        that it does.</para><para>On the other hand, we might want to use this representation in order to modify the XML
        documents so represented, and in that case we would clearly need to reserialize them to XML
        or generate an XML-conformant document model of them. For such purposes, we believe that
        information about the XML nesting order of coextensive elements could easily be stored in
        some ancillary data structure which would make reserialization etc possible. It should also
        be mentioned that, although again we have not investigated the matter, it is not
        unreasonable to assume that a representation of documents in the way proposed for our
        application of the Calculus might be a convenient step in the process of converting XML
        documents to certain other markup systems, such as TexMecs or LMNL.</para><para>Finally, if our aim is to offer an alternative representation based on a different
        understanding of the structure and semantics of marked-up documents, then we have a problem
        only if it can convincingly be argued that our representation is in some respect inferior to
        these standard ways of modelling documents. We think such a discussion is premature unless
        and until the application sketched here is developed further, but at least two lines of
        argument seem to present themselves as possible responses to the challenge. </para><para>First, one might argue that the problem is with XML, and not with the approach discussed
        here. For example, if a TEI &lt;p&gt; (paragraph) and &lt;s&gt; (s-unit,
        sentence) element are coextensive, XML forces us to decide whether we are dealing with a
        paragraph containing a sentence, or a sentence containing a paragraph, and leaves us no
        other option. But we might just as well (or rather) want to say that we are dealing with one
        object which has two properties: that of being a paragraph and that of being a sentence. The
        part-whole relationship which seems forced upon us by XML is an artifact of the
        serialization, a result of one of the limitations of embedded markup.[<xref linkend="Raymond"/>] </para><para>Second, we might concede that the representation of coextensive elements as conceived in
        the present approach is a problem, and try to solve it by amending our mereological system.
        Part of the solution may be found in allowing more generous set of atoms, as discussed above
        in connection with the problem of empty elements. Another part of the solution might be to
        replace the Calculus of Individuals with some other formal mereological system. For example,
        there seems to be mereological systems which allow for the idea that one individual may be
        part of another even in cases where we cannot identify any part which they do not share. For
        options along these lines, see the discussion of supplementation and closure principles in
          <xref linkend="CasatiVarzi1999"/> p. 38 f.f. </para></section></section><section><title>Conclusion and Future Work</title><para> We have considered some possible applications of the Calculus of Individuals to XML,
      whereof the so-called character-atom approach has seemed the most promising so far. Strings
      composed of characters defined as atomic individuals can be identified and referenced by
      denoting expressions. The part-whole relationships and ordering relations between parts of the
      document as well as the properties ascribed by generic identifiers can be described.
      Statements about the individuals of documents and their properties can be made, and inferences
      can be drawn from these statements. </para><para> We have shown, by means of examples from the TEI and HTML encoding schemes, how this
      application of the Calculus can be used for the formulation of rules describing the
      propagation of properties among the parts of a document. </para><para> We have identified problems or shortcomings concerning the representation of empty
      elements and coextensive elements, and suggested that these problems may be overcome partly by
      allowing a more generous set of atoms, and partly by replacing the Calculus of Individuals
      with some other formal mereological system. </para><para> In order to assess whether the application of formal mereology to markup semantics is
      worth while, we believe that continued work is required along several lines: The application
      to XML should be extended beyond the limitations of the approach presented here to include XML
      the full range of XML mechanisms, such as attributes, entities, declarations, comments,
      processing instructions, and marked sections. While the approach presented here is limited to
      the consideration of XML documents in serialized form, i.e. as character streams, attempts
      should be made at applying formal mereology to XML documents considered as graphs of xPath
      nodes, Infoset items, and the like. </para><para> Furthermore, and as already mentioned, mereological systems beyond the Calculus of
      Individuals should be considered in order to overcome some of the problems encountered in the
      approach presented her. Last, but not least: The application of formal mereological systems
      should be extended to other markup systems such as SGML, TexMecs, LMNL, Goddag and others.
    </para></section><bibliography><title>References</title><bibliomixed xml:id="CasatiVarzi1999" xreflabel="Casati and Varzi 1999">Casati, Roberto and
      Varzi, Achille C. <emphasis>Parts and Places. The Structures of Spatial
      Representation</emphasis>. MIT Press, 1999. </bibliomixed><bibliomixed xml:id="DeRose" xreflabel="DeRose 2004">DeRose, Steven J. 2004. <quote>Markup
        overlap: A review and a horse.</quote> In <emphasis>Proceedings of Extreme Markup Languages
        2004</emphasis>.</bibliomixed><bibliomixed xml:id="Fitzgerald" xreflabel="Fitzgerald 2003">Fitzgerald, Henry.
        <quote>Nominalist things</quote>. <emphasis>Analysis</emphasis> 63.2, OUP, April 2003, pp
      170-71. doi: <biblioid class="doi">10.1093/analys/63.2.170, 10.1111/1467-8284.00030</biblioid>.</bibliomixed><bibliomixed xml:id="Goodman1972" xreflabel="Goodman 1972">Goodman, Nelson. <emphasis>Problems
        and Projects</emphasis>. Hackett, Indianapolis 1972. </bibliomixed><bibliomixed xml:id="Goodman1977" xreflabel="Goodman 1977">Goodman, Nelson. <emphasis>The
        structure of appearance</emphasis>. Third edition. Boston: Reidel, 1977</bibliomixed><bibliomixed xml:id="LeonardandGoodman1940" xreflabel="Leonard and Goodman 1940">Leonard, Henry
      S. and Goodman, Nelson. <quote>The Calculus of Individuals and Its Uses</quote>, <emphasis>The
        Journal of Symbolic Logic</emphasis> Vol 5, No. 2, pp 45-55, June 1940. doi: <biblioid class="doi">10.2307/2266169</biblioid>.</bibliomixed><bibliomixed xml:id="Libardi1994" xreflabel="Libardi 1994">Libardi, Massimo. <quote>Applications
        and limits of mereology. From the theory of parts to the theory of wholes</quote>,
        <emphasis>Axiomathes</emphasis>, n.1, aprile 1994, pp. 13-54. </bibliomixed><bibliomixed xml:id="balisage2009" xreflabel="Marcoux et al. 2009"> Marcoux, Yves, Michael
      Sperberg-McQueen, and Claus Huitfeldt. <quote>Formal and informal meaning from documents
        through skeleton sentences: Complementing formal tag-set descriptions with intertextual
        semantics and vice-versa.</quote> Presented at Balisage: The Markup Conference 2009,
      Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference
      2009. <emphasis>Balisage Series on Markup Technologies</emphasis>, vol. 3 (2009).
      doi: <biblioid class="doi">10.4242/BalisageVol3.Sperberg-McQueen01</biblioid>.</bibliomixed><bibliomixed xml:id="Pitkanen" xreflabel="Pitkänen ">Risto Pitkänen.
        <quote>Content Identity</quote>. <emphasis>Mind</emphasis>.1976;
      LXXXV: 262–268. doi: <biblioid class="doi">10.1093/mind/LXXXV.338.262</biblioid>.</bibliomixed><bibliomixed xml:id="bielefeld" xreflabel="Sperberg-McQueen and Huitfeldt 2008">Sperberg-McQueen, C. M., and Claus Huitfeldt. <quote>Containment and dominance in Goddag
        structures</quote>. Talk given at Conference on Text Technology, Bielefeld, March 2008.
      Forthcoming. <!-- ~/2008/talks/bielefeld-200803 -->
    </bibliomixed><bibliomixed xml:id="Raymond" xreflabel="Raymond et al. 1996">Raymond, Darrell, Frank Wm. Tompa
      and Derick Wood. <quote>From Data Representation to Data Model: Meta-Semantic Issues in the
        Evolution of SGML</quote>, <emphasis>Computer Standards and Interfaces</emphasis> 18 p.
      25-36 (1996). doi: <biblioid class="doi">10.1016/0920-5489(96)00033-5</biblioid>.</bibliomixed><bibliomixed xml:id="dh2009" xreflabel="Sperberg-McQueen et al. 2009a">Sperberg-McQueen, C. M.,
      Claus Huitfeldt and Yves Marcoux. <quote>What is transcription? (Part 2)</quote>. Talk given
      at <emphasis>Digital Humanities 2009</emphasis>, Maryland, June 2009. Forthcoming. </bibliomixed><bibliomixed xml:id="teip4" xreflabel="TEI P4">The TEI Consortium / The Association for
      Computers and the Humanities (ACH); The Association for Computational Linguistics (ACL); The
      Association for Literary and Linguistic Computing (ALLC). <emphasis role="ital">TEI P4:
        Guidelines for Electronic Text Encoding and Interchange XML-compatible edition</emphasis>.
      Ed. C. M. Sperberg-McQueen and Lou Burnard; XML conversion by Syd Bauman, Lou Burnard, Steven
      DeRose, and Sebastian Rahtz. Oxford, Providence, Charlottesville, Bergen: TEI Consortium,
      December 2001. <link xlink:href="http://www.tei-c.org/release/doc/tei-p4-doc/html/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.tei-c.org/release/doc/tei-p4-doc/html/</link>
    </bibliomixed><bibliomixed xml:id="Varzi" xreflabel="Varzi 2003">Varzi, Achille. <quote>Mereology</quote>.
        <emphasis>Stanford Encyclopedia of Philosophy</emphasis>.
      http://plato.stanford.edu/entries/mereology/ First published Tue May 13, 2003; substantive
      revision Thu May 14, 2009. </bibliomixed></bibliography><!--  
    Show that the ancestor-sibling axis of xPath can easily be replicated.
     Discuss non-XML: overlap, discontinuity, virtuality. --><!-- Haphazard list of  things we never got down to:
    
    We should discuss options for a similar strategy for applying the Calculus of
    Individuals to TexMecs, and possibly other markup languages, as well as the relationship
    between the current approach and Goddag. The extensions to XPath for handling
    overlapping elements proposed by Dekhtyar and Witt should also be discussed. 
    
    We should talk about modelling the Calculus of Individuals of markup in Alloy. Talk about the
    Calculus of Individuals modelling not the markup, but the data model (the DOM?) or the
    denotata of the inferences licensed (cf skeleton sentences). 
    
    We should review LMNL and compare it to the approach presented here.
    
    We should investigate the possibility of letting any
    individual have any number of successors, and of letting a separate
    successor relation hold between character strings as well. Any character or
    character string may then have any number of characters or strings as its
    successor.         
    
  --></article>
