Expressiveness
power of the XML-based techniques to manage overlap with respect of the complex document
features described in the previous section [* true only if the vocabularies of the
structures in overlap are disjoint].
XML techniques / complex document features
Classic overlap
Self overlap
out-of-order elements
Containment/dominance decoupling
Milestones
Yes
Yes
No
No
Fragmentation
Yes
Yes
Yes
No
Stand-off markup
Yes*
Yes
Yes
Yes
Twin documents
Yes*
Yes
Yes
Yes
In order to overcome the limitations of XML, many different solutions have been
proposed:CONCUR [] is an SGML option that allows
multiple DTDs for the same content: all these structures live in the same document, and it is up to the
parser to either consider the structure of only one DTD, or parse them simultaneously but keeping separate
track of what elements are open in each. The main advantage of this technique is that documents are quite
legible and maintainable, but there are many drawbacks: for example, it is not possible to constrain
relationships across DTDs, it is not possible to express self-overlap situations, and there is little
software support for this technique;JITT (Just In Time Trees): another syntax very
close to XML have been proposed [] [].
The basic idea is similar to CONCUR in that it requires the parser to filter and take in
consideration only some tags: multiple overlapping hierarchies may coexist into
documents, but only those which the filter selects are returned to the application as
real start or end tags. JITTs’ main contribution is that a document need not be
well-formed until the moment it is being processed, at the cost of a very small change
to an XML parser. Unfortunately, JITTs does not provide a way to correlate and validate
across structures, and it is not possible to express cases of self-overlap.MuLaX: another document syntax similar to SGML
CONCUR for XML called MuLaX has been developed [] together
with a constraint based validation language [] []. Each overlapping hierarchy represents a layer identified
by an ID prefixing each tag name, and multiple layers may coexist into one MuLaX
document. An external software can parse a MuLaX document and project
each layer into well formed XML documents. Standard XML tools can only be used
on these separate XML projections. A drawback of this technique is that these documents
can get very complex when dealing with a large number of annotation layers : for
example, updates are difficult since working on MuLaX documents requires frequent
projections into XML projections. Moreover, the project is still at the state of
experimental markup languages, lacking the support of tools and technologies as that
available for XML-based solutions.Multi-colored trees: another extension of the XML
model that is able to represent overlapping structures are the Multi-colored trees
[]. The basic idea is to associate a color to each
concurrent tree, and to allow each node to have multiple colors. Navigation inside the multicolored
nodes is possible by using an Xpath [] extension that implements a color
selector, and an extension of XQuery [] has also been proposed for
the creation of nodes.Non-XML syntaxes for overlapsAn alternative approach to overcome the limitations of tree-based meta-languages in
representing complex documents is to use alternative and more expressive data models, such
as graphs. The more general is the model (acyclic vs. cyclic graphs, ordered vs. unordered
graphs, etc.), the more expressive is the meta-language in terms of overlapping features
that can be convenientely managed, at the cost of an increased computational complexity.
Moreover, since this abstract model may be represented with different concrete
syntaxes (embedded markup languages, stand-off annotations, etc.), the chosen linearisation
format may place limits in terms of expressiveness, support provided by standard
technologies and related tools, etc. A summary of the most eminent solutions is presented
below:GODDAG and TexMECS: Sperberg-McQueen and Huitfeld
proposed to manage overlapping hierarchies using a directed acyclic graph structure with
no transitive arcs named GODDAG (General Ordered Descendant Directed Acyclic
Graph)[]. Arcs denotes containment relationships,
and multi-parentage is allowed, thus making it possible to represent overlapping
situations. Several kinds of GODDAG have been defined in order to explore their
expressive power and their mutual relation: generalized, restricted and clean in [], normalized and colored in [], node-ordered (noDAG) in [], child-arc-ordered in []. The authors of GODDAG also developed a markup meta-language
named TexMECS [] as the natural linearisation format for the
GODDAG structure. As XML, TexMECS is an embedded meta-markup language where elements are
delimited by start and end tags, but it also allows to represent graph structures by
allowing tags to not nest properly. TexMECS supports complex document features, such as
self overlap (using a co-indexing scheme) and discontinuous, virtual and unordered
elements (using special attributes and elements' delimiters). Since TexMECS documents
are not isomorphic to XML documents, the standard XML tools cannot be used and, as far
as we know, no query mechanisms have been developed.LMNL: the Layered Markup and Annotaion Language [] defines a specific syntax based on layered ranges which can overlap each other. A LMNL document is a
set of layers containing either a sequence of Unicode characters (text
layer) or a sequence of ranges. A layer can be based on a single other layer, but can also be
the base of several other layers. LMNL is able to capture classic and self overlap cases and virtual
elements (via a pointers' mechanism), but since a range spans over continuous sequences of characters, there
is no way to represent discontinuous text fragments and element with mixed content (i.e. characters and
other ranges). Despite the main contribution of LMNL is a data model, at least three syntaxes have been
proposed: two are XML-based (ECLIX [] and CLIX [], both
based on the milestone technique), and a non-XML syntax known as the LMNL syntax. XSLT stylesheets have been
developed to deal with the XML representation of a LMNL document.EARMARKThe Extremely Annotational RDF Markup, or EARMARK [], is an OWL 2 DL
ontologyEARMARK Ontology: http://www.essepuntato.it/2008/12/earmark. The prefix earmark
refers to entities defined in it, while the prefix co refers to entities –
used in the EARMARK Ontology – defined in the old version of the Collections Ontology [].that defines document meta-markup. It is an ontologically precise definition of markup that
instantiates the markup of a text document as an independent OWL document outside of the text strings it
annotates, and through appropriate OWL and SWRL characterisations it can define structures such as trees or graphs
(in particular, extended GODDAGs []) and can be used to generate validity
constraints (including co-constraints) [], to make explicit the semantics of markup [], to annotate text or other markup documents [], to keep track of changes in markup [], and as interchange format to enable conversions between different kinds of
XML vocabularies embedding overlap []. The whole ontological description of
EARMARK is summarised in the Graffoo diagramGraffoo is a graphical notation for OWL ontologies and it is available at http://www.essepuntato.it/graffoo. [] shown in .The core classes of our model describe three disjoint base concepts: docuverses, ranges and markup
items.The textual content of an EARMARK document is conceptually separated from its annotations, and is referred to
through the earmark:Docuverse class. The individuals of this class represent the
objects of discourse, i.e. all the containers of text from an EARMARK document. Any individual of the earmark:Docuverse class – commonly called a docuverse
(lowercase to distinguish it from the class) – specifies its actual content through the property earmark:hasContent. There exist two different kinds of docuverses, those that specify all
its content in form of a string (defined through the class earmark:StringDocuverse) and those that refer to a document containing the string to be marked up
(defined through the class earmark:URIDocuverse).We define the class earmark:Range for any text lying between two locations of
a docuverse. A range, i.e, an individual of the class earmark:Range, is defined by a starting and an ending location (any literal) of a specific docuverse
through the functional properties earmark:begins, eamark:ends and earmark:refersTo respectively. There exist two main
types of ranges: those (i.e., earmark:PointerRange) that refer to text lying
between two non-negative integer locations that identify precise positions within a docuverse, and those (defined
through the class earmark:XPathPointerRange) that refer to any text, obtained
from a particular XPath context (specified through the property earmark:hasXPathContext) starting from a docuverse content, lying between two non-negative integer
locations that identify precise positions.The class earmark:MarkupItem is the superclass defining artefacts to be
interpreted as markup such as elements (i.e., the class earmark:Element),
attributes (i.e., the class earmark:Attribute) and comments (i.e., the class
earmark:Comment). A markupitem individual is a
collectionIn the following descriptions the prefix co is used to indicate entities taken
from version 1.2 of the Collections Ontology [], an imported ontology used for
handling collections, available at http://swan.mindinformatics.org/ontologies/1.2/collections.owl. (co:Set, co:Bag and co:List, where the latter is a subclass of the second one and all of them are subclasses
of co:Collection) of individuals belonging to the classes earmark:MarkupItem and earmark:Range. Through these collections it
is possible:to define a markup item as a set of other markup items and ranges by using the property co:element;to define a markup item as a bag of items (defined by individuals belonging to the class co:Item), each of them containing a markup item or a range, by using the properties
c:item and co:itemContent
respectively;to define a markup item as a list of items (defined by individuals belonging to the class co:ListItem), each of them containing a markup item or a range, in which we can also
specify a particular order among the items themselves by using the property co:nextItem.A markupitem might also have a name, specified in the functional property
earmark:hasGeneralIdentifierGeneral identifier actually refers to the SGML generic identifier, i.e., the SGML term for the local name of the markup
item, e.g., “p” for markup element “<p>...</p>”., and a namespace specified using the functional property earmark:hasNamespace.In order to understand how EARMARK is used to describe markup hierarchies, let us consider the markup
structures shown in .First of all, we define the whole textual content of the document – i.e., the first three lines of the
Paradise Lost by John Milton – by creating an instance of the class earmark:StringDocuverseThis and all the following excerpts are defined in Turtle [].:@prefix : <http://www.essepuntato.it/2014/balisage/example/>
:doc a earmark:StringDocuverse ;
earmark:hasContent
"Of Mans First Disobedience, and the Fruit
Of that Forbidden Tree, whose mortal tast
Brought Death into the World" .Then, we can define all the six different ranges (as individuals of earmark:PointerRange) that are introduced in the figure, i.e.:# The string 'Of Mans First Disobedience, and the Fruit'
:r1 a earmark:PointerRange ;
earmark:refersTo :doc ;
earmark:begins "0"^^xsd:nonNegativeInteger ;
earmark:ends "41"^^xsd:nonNegativeInteger .
# The string 'the Fruit Of that Forbidden Tree,'
:r2 a earmark:PointerRange ;
earmark:refersTo :doc ;
earmark:begins "32"^^xsd:nonNegativeInteger ;
earmark:ends "65"^^xsd:nonNegativeInteger .
# The string 'Of that Forbidden Tree,'
:r3 a earmark:PointerRange ;
earmark:refersTo :doc ;
earmark:begins "42"^^xsd:nonNegativeInteger ;
earmark:ends "65"^^xsd:nonNegativeInteger .
… Finally, we can built the three markup hierarchies shown in upon these ranges, as shown in the follwing
excerpt::lg a earmark:MarkupItem , co:List ;
earmark:hasGeneralIdentifier "lg" ;
co:firstItem [
a co:ListItem ;
co:itemContent :l1 ;
co:nextItem [
a co:ListItem ;
co:itemContent :l2 ;
co:nextItem [
a co:ListItem ;
co:itemContent :l3 ] ] ] .
:q a earmark:MarkupItem , co:List ;
earmark:hasGeneralIdentifier "q" ;
co:firstItem [
a co:ListItem ;
co:itemContent :l1 ] .
:l1 a earmark:MarkupItem , co:List ;
earmark:hasGeneralIdentifier "l" ;
co:firstItem [
a co:ListItem ;
co:itemContent :r1 ] .
… Characterizing overlaps by way of an ontologyDifferent types of overlap exist – according to the subset of EARMARK nodes involved (i.e., ranges or markup
items) – and different strategies are needed to detect them. In particular, there is a clear distinction between
overlapping ranges and overlapping markup items, and in the ways these overlapping scenarios affect the dominance
and containment relations between nodes – as shown in figure , that will be
used to illustrate the different kinds of overlapping scenarios.In this section, we introduce the EARMARK Overlapping Ontology (EOO)EARMARK Overlapping Ontology: http://www.essepuntato.it/2011/05/overlapping. The prefix eoo
refers to entities defined in it., which is an OWL 2 DL ontology [MotikOWL2] that extends the EARMARK Ontology by adding support for
overlapping scenarios and for inferences relative to them. In particular, in the following subsections we describe
how the ontology models all possible overlapping scenarios between nodes by means of description logic
formulasOWL 2 DL [MotikOWL2] is based on a particular description logic (DL), i.e., SROIQ []. In this paper, we decided to use DL notation for the sake of clarity, instead
of adopting one of the possible linearisation of OWL made available by the W3C. We recommend the reading of
[] for more information about DL notation. As an extension of common DL
notation, we are using ⊤ and ⊤op to indicate the top class and the top object property respectively. and SWRL rules [] (if needed)Any OWL 2 DL ontology can be accompanied by SWRL rules so as to guarantee additional inferences that are
not directly handled by current ontological definitions. All these rules will be defined using an informal
human readable syntax as introduced in [], where each rule is represented in
the form of “antecedent ⇒ consequent” statements, meaning that if the antecedent is true, then the consequent can be inferred. Both
antecedent and consequent are a list of
ontological assertions separated by “^”. Each assertion can be composed by an atomic entity (e.g., a class or
a property) containing zero, one or two variables (each beginning with a “?”) depending on the kinds of unary
(i.e., class) or binary (i.e., property) entity used, or by a (boolean, cardinality, etc.) restriction of
multiple entities.. A summary of the taxonomy of possible overlapping scenarios is provided in figure .Properties of overlappingThe most important property in EOO is the generic property, eoo:overlapsWith, that describes when an EARMARK node overlaps with another EARMARK node of the
same type. This means that markup items can overlap only with other markup items, and ranges can overlap only
with ranges. In addition, this property is symmetric (i.e., if A overlaps with B, then B overlaps with A) and
irreflexive (i.e., if A overlaps with B, then A is different from BNote that OWL 2 DL does not support the unique name assumption typical of database systems. Among the
various consequences of this choice, in this case it means that two different IRIs cannot be guaranteed to
refer to two different resources. ). This property is defined formally as follows:# Declaration as an object property
eoo:overlapsWith ⊑ ⊤op
# Domain
∃eoo:overlapsWith.⊤ ⊑
(earmark:Range ⊓ ∀eoo:overlapsWith.Range) ⊔
(earmark:MarkupItem ⊓ ∀eoo:overlapsWith.MarkupItem)
# Range
⊤ ⊑ ∀eoo:overlapsWith.(
(earmark:Range ⊓ ∀eoo:overlapsWith.Range) ⊔
(earmark:MarkupItem ⊓ ∀eoo:overlapsWith.MarkupItem))
# Symmetry
eoo:overlapsWith ≡ eoo:overlapsWith-
# Irreflexivity
⊤ ⊑ ¬∃eoo:overlapsWith.SelfAll the properties presented in the following sections are sub-properties of the
generic relation eoo:overlapsWith.Overlapping of rangesBy definition, overlapping ranges (i.e., linked through the symmetric property eoo:overlapsWithRange) are two ranges of the same type that refer to the same docuverse and so
that at least one of the end points of the first range is contained in the interval described by the locations
of the second range (end-points excluded). The property eoo:overlapsWithRange
is defined as follows:# Sub-property declaration
eoo:overlapsWithRange ⊑ eoo:overlapsWith
# Domain
∃eoo:overlapsWithRange.⊤ ⊑ earmark:Range
# Range
⊤ ⊑ ∀eoo:overlapsWithRange.earmark:Range
# Symmetry
eoo:overlapsWithRange ≡ eoo:overlapsWithRange-Specifically, totally overlapping ranges (defined through the property
eoo:overlapsTotallyWithRange) have the locations of the first range
completely contained in the interval of the second range or vice versa, i.e., the range is fully contained
inside the second range. For instance, in the example in , the range “the
Fruit Of that Forbidden Tree” overlaps totally with the range “Of that Forbidden Tree”.On the other hand, partially overlapping ranges (defined through the property eoo:overlapsPartiallyWithRange) have exactly one location inside the interval and the other
outside. For instance, considering the example in , the range “Of Mans
First Disobedience, and the Fruit” overlaps partially with “the Fruit Of that Forbidden Tree”. These two
properties are disjoint, meaning that two ranges cannot overlap totally and partially between them.
Additionally, this property also handles the situation in which the two locations are complety identical, but
the end points have reversed roles (i.e., the starting point of the first range is the ending point of the
second one, and vice versa). They are formally defined as follows:# Sub-property declarations
eoo:overlapsTotallyWithRange ⊑ eoo:overlapsWithRange
eoo:overlapsPartiallyWithRange ⊑ eoo:overlapsWithRange
# Disjointness
eoo:overlapsTotallyWithRange ⊓ eoo:overlapsPartialelyWithRange ⊑ ⊥
# Symmetry
eoo: overlapsTotallyWithRange ≡ eoo:overlapsTotallyWithRange-
eoo: overlapsPartiallyWithRange ≡ eoo:overlapsPartiallyWithRange-The following SWRL rules allows us to catch the constraints of this kind of overlap by
inferring the overlapping relation between the two different kinds of (concrete) ranges,
i.e., earmark:PointerRange and earmark:XPathPonterRangeIn the following examples, we introduce some generic SWRL rules for ranges that
actually work fully only with instances of the class earmark:PointerRange, which is one kind of range defined in EARMARK. In
particular, note that if we consider individuals of the class earmark:XPathRange, the XPath context (defined through the property
earmark:hasXPathContext) must be taken into account
to identify when such ranges overlap between them. Even if the SWRL rules for XPath
ranges are not introduced in this paper for the sake of clarity, in EOO the issue of
using also the property earmark:hasXPathContext in
such rules has been approached in the most lazy way, saying that two XPath ranges have
the same context when the XPath expressions specified are exactly the same. However,
currently EOO does not handle the cases of having different XPath expressions that are
either semantically-equivalent (i.e., “//p” and “//element()[name() = 'p']”) or
functionally-equivalent (i.e., they return the same sequence of items).:# Overlaps partially with range
RANGE_IDENTIFICATION ^
earmark:refersTo(?x,?d) ^ earmark:refersTo(?y,?d) ^
earmark:begins(?x,?b1) ^ earmark:begins(?y,?b2) ^
earmark:ends(?x,?e1) ^ earmark:ends(?y,?e2) ^
(?b1 < ?b2 < ?e1 < ?e2) or (?b1 < ?e2 < ?e1 < ?b2) or
(?e1 < ?b2 < ?b1 < ?e2) or (?e1 < ?e2 < ?b1 < ?b2) or
(?b1 = ?b2 and ?e1 = ?e2) or (?b1 = ?e2 and ?e1 = ?b2) ^
?x != ?y
⇒ eoo:overlapsPartiallyWithRange(?x,?y)
# Overlaps totally with range
RANGE_IDENTIFICATION ^
earmark:refersTo(?x,?d) ^ earmark:refersTo(?y,?d) ^
earmark:begins(?x,?b1) ^ earmark:begins(?y,?b2) ^
earmark:ends(?x,?e1) ^ earmark:ends(?y,?e2) ^
(?b1 <= ?b2 < ?e2 < ?e1) or (?e1 <= ?b2 < ?e2 < ?b1) ^
(?b1 < ?b2 < ?e2 <= ?e1) or (?e1 < ?b2 < ?e2 <= ?b1) ^
(?b1 <= ?e2 < ?b2 < ?e1) or (?e1 <= ?e2 < ?b2 < ?b1) ^
(?b1 < ?e2 < ?b2 <= ?e1) or (?e1 < ?e2 < ?b2 <= ?b1) ^
?x != ?y
⇒ eoo:overlapsTotallyWithRange(?x,?y)Here, “RANGE_IDENTIFICATION” is a placeholder for the different antecedents to use in case we want to deal
with pointer ranges or with XPath pointer ranges. In particular, for the pointer range we have:earmark:PointerRange(?x) ^ earmark:PointerRange(?y), and for XPath pointer ranges we have:earmark:XPathPointerRange(?x) ^ earmark:XPathPointerRange(?y) ^
earmark:hasXPathContext(?x,?c) ^ earmark:hasXPathContext(?y,?c) Dominance vs. Containment in EARMARKIn this section we introduce how dominance and containment relations are implemented
in EOO, since their intrinsic relation with any kind of overlapping scenario we discuss in
the following subsections.The dominance relation is actually defined by two different and related concepts that
have always markup items as subject of dominance assertions. In particular, we say that a
markup item A dominates directly (i.e., eoo:dominatesDirectly) an EARMARK node B if A has B as child.
This relation is formally defined as follows:# Declaration as an object property
eoo:dominatesDirectly ⊑ ⊤op
# Domain
∃eoo:dominatesDirectly.⊤ ⊑ earmark:MarkupItem
# Range
⊤ ⊑ ∀eoo:dominatesDirectly.(earmark:Range ⊔ earmark:MarkupItem)The relation between eoo:dominatesDirectly and the
parent-child relation in EARMARKAs anticipated in , note that in EARMARK any
parent-child relationship between a markup item and a node is defined through the
property co:element in case the markup item is
defined as a set (i.e., co:Set) or a bag (i.e.,
co:Bag), while it is defined by the chain co:item o co:itemContent if the markup item is defined as a
list (i.e., a co:List). However, the new version of
the Collections Ontology [], available at http://purl.org/co, defines the property
co:element as sub-property of the aforementioned
property chain, meaning that if we have “A co:item I” and “I co:itemContent B”, then
“A co:elements B” holds as well. Even if EARMARK is still using the old version of the
Collection ontology, that does not includes the above sub-property axiom, we have
added such axiom in EOO in order to map co:element
assertions between markup items and nodes as parent-child relationships. is defined by means of the following SWRL rule:earmark:MarkupItem(?x) ^ co:element(?x,?y)
⇒ eoo:dominatesDirectly(?x,?y)Generalising eoo:dominatesDirectly, we say that a
markup item A dominates (i.e., eoo:dominates) an EARMARK node B if B is a descendant of A. This property is
transitive and is also a super-property of eoo:dominatesDirectly (i.e., eoo:dominatesDirectly entails eoo:dominates), as defined as follows:# Declaration as an object property
eoo:dominates ⊑ ⊤op
# Sub-property declaration
eoo:dominatesDirectly ⊑ eoo:dominates
# Transitivity
eoo:dominates o eoo:dominates ⊑ eoo:dominatesThe containment is a transitive relation (i.e., eoo:contains) that is defined on the basis of the dominance relation and
applies among any EARMARK node (either markup item or range). In particular, we say that
an EARMARK node A contains another EARMARK node B when one of the following conditions holds:A dominates B;if A and B are markup items, the leaf nodes dominated by A are a super-set of the
leaf nodes dominated by B;if A and B are ranges, A overlaps totally with B (cf. ) and the interval defined by A contains
completely the locations of B.This relation is thus formally defined as follows:# Declaration as an object property
eoo:contains ⊑ ⊤op
# Domain
∃eoo:contains.⊤ ⊑ earmark:Range ⊔ earmark:MarkupItem
# Range
⊤ ⊑ ∀eoo:contains.(earmark:Range ⊔ earmark:MarkupItem)
# Transitivity
eoo:contains o eoo:contains ⊑ eoo:containsIn addition to that, by means of rule 1, we can also state that the dominance relation
is actually a sub-relation of the containment relation (meaning that if A eoo:dominates B, then A eoo:contains
B holds as well), as shown as follows:# Sub-property declaration
eoo:dominates ⊑ eoo:containsWhile we cannot specify in any way (neither in OWL nor SWRL) the constraint introduced
in rule 2, we can define a particular SWRL rule to handle the constraint introduced in
rule 3:eoo:overlapsTotallyWithRange(?x,?y) ^
earmark:begins(?x,?b1) ^ earmark:begins(?y,?b2) ^
earmark:ends(?x,?e1) ^ earmark:ends(?y,?e2) ^
(?b1 < ?b2 < ?e1) or (?b1 < ?e2 < ?e1) or
(?e1 < ?b2 < ?b1) or (?e1 < ?e2 < ?b1)
⇒ eoo:contains(?x,?y)Overlapping of markup itemsThe case of overlapping markup items (i.e., linked through the symmetric property eoo:overlapsWithMarkupItem) is slightly more complicated than range overlaps. We define that two
markup items A and B overlap when at least one of the following scenarios holds:a markup item A contains a range that overlaps
with another range contained by a markup item B;two markup items A and B contain at least a range in common;two markup items A and B contain at least a markup item in common.The property eoo:overlapsWithMarkupItem is defined as
follows:# Sub-property declaration
eoo:overlapsWithMarkupItem ⊑ eoo:overlapsWith
# Domain
∃eoo:overlapsWithMarkupItem.⊤ ⊑ earmark:MarkupItem
# Range
⊤ ⊑ ∀eoo:overlapsWithMarkupItem.earmark:MarkupItem
# Symmetry
eoo:overlapsWithMarkupItem ≡ eoo:overlapsWithMarkupItem-The three aforementioned scenarios correspond to three different symmetric sub-properties of eoo:overlapsWIthMarkupItem. The first scenario – i.e., A contains a range that overlaps with another
range contained by B – refers to markup items overlapping by range.In the example in , the element
l1 overlaps by range with the element unit1. This is captured by a subproperty of eoo:overlapsWIthMarkupItem, property eoo:overlapsByRange, that
is formally described as follows:# Sub-property declaration
eoo:overlapsByRange ⊑ eoo:overlapsWithWithMarkupItem
# Domain
∃eoo:overlapsByRange.⊤ ⊑
earmark:MarkupItem ⊓
∃eoo:dominatesDirectly.(∃eoo:overlapsWithRange.earmark:Range)
# Range
⊤ ⊑ ∀eoo:overlapsByRange.(
earmark:MarkupItem ⊓
∃eoo:dominatesDirectly.(∃eoo:overlapsWithRange.earmark:Range))
# Symmetry
eoo:overlapsByRange ≡ eoo:overlapsByRange-The second scenario – i.e., A and B
contain at least one shared range – refers to markup items overlapping by content
hierarchy. In the example in , the element l2 overlaps by content hierarchy with the element unit2. The corresponding subproperty eoo:overlapsByContentHierarchy is formally described as follows:# Sub-property declaration
eoo:overlapsByContentHierarchy ⊑ eoo:overlapsWithWithMarkupItem
# Domain
∃eoo:overlapsByContentHierarchy.⊤ ⊑
earmark:MarkupItem ⊓ ∃eoo:dominatesDirectly.earmark:Range
# Range
⊤ ⊑ ∀eoo:overlapsByContentHierarchy.(
earmark:MarkupItem ⊓ ∃eoo:dominatesDirectly.earmark:Range)
# Symmetry
eoo:overlapsByContentHierarchy ≡ eoo:overlapsByContentHierarchy-The third scenario – i.e., A and B
contain at least another markup item in common – refers to markup items overlapping by
markup hierarchy. In the example in , the element lg overlaps by markup hierarchy with the element q.
The related subproperty eoo:overlapsByMarkupHierarchy is formally described
as follows:# Sub-property declaration
eoo:overlapsByMarkupHierarchy ⊑ eoo:overlapsWithWithMarkupItem
# Domain
∃eoo:overlapsByMarkupHierarchy.⊤ ⊑
earmark:MarkupItem ⊓ ∃eoo:dominatesDirectly.earmark:MarkupItem
# Range
⊤ ⊑ ∀eoo:overlapsByMarkupHierarchy.(
earmark:MarkupItem ⊓ ∃eoo:dominatesDirectly.earmark:MarkupItem)
# Symmetry
eoo:overlapsByMarkupHierarchy ≡ eoo:overlapsByMarkupHierarchy-The following SWRL rules allows us to catch the constraints of this kind of overlap by
inferring the right overlapping relation according to the aforementioned three
scenarios:# overlaps by range
earmark:MarkupItem(?a) ^ earmark:MarkupItem(?b) ^
earmark:Range(?r1) ^ earmark:Range(?r2) ^
eoo:dominatesDirectly(?a,?r1) ^ eoo:dominatesDirectly(?b,?r2) ^
eoo:overlapsWithRange(?r1,?r2) ^
?a != ?b ^ ?r1 != ?r2
⇒ eoo:overlapsByRange(?a,?b)
# overlaps by content hierarchy
earmark:MarkupItem(?a) ^ earmark:MarkupItem(?b) ^ earmark:Range(?r) ^
eoo:dominatesDirectly(?a,?r) ^ eoo:dominatesDirectly(?b,?r) ^
?a != ?b
⇒ eoo:overlapsByContentHierarchy(?a,?b)
# overlaps by markup hierarchy
earmark:MarkupItem(?a) ^ earmark:MarkupItem(?b) ^ earmark:MarkupItem(?x) ^
eoo:dominatesDirectly(?a,?x) ^ eoo:dominatesDirectly(?b,?x) ^
?a != ?b != ?x
⇒ eoo:overlapsByMarkupHierarchy(?a,?b)Approaching inferences through reasonersThe EARMARK Overlapping Ontology can be used by OWL reasoners such as PelletPellet, OWL 2 reasoner for Java: http://clarkparsia.com/pellet/. [] in order to identify all the possible kinds of
overlapping scenarios that happen within any EARMARK document. As an example, running such
reasoner according to EOO on the EARMARK file describing the document in Available online at http://www.essepuntato.it/2014/balisage/earmark-document.ttl. , we obtain a full and complete description of all kinds of overlaps existing in
such documentAn OWL file containing all the assertions about overlaps inferred by the reasoner is
available online at http://www.essepuntato.it/2014/balisage/earmark-overlapping.ttl. .In particular, the reasoner identified:all the dominance relations among elements that exist in the document, as well as all the related
containment relations entailed by dominance;that the range “Of Mans First Disobedience, and the Fruit” (r1 from now on) overlaps with the range “the Fruit Of that Forbidden Tree”
(r2 from now on), and r2 overlaps with the range “Of that Forbidden Tree” (r3 from now on). Specifically, r1
overlaps partially with r2, and r2 overlaps totally with r3;about the last total range overlap, that r2 actually contains r3 and, consequently, the markup items syntax and
unit1 contain r3;that the markup items in the pairs l1 - unit1, l2 - unit1, l2 - unit2, l3 -
unit2, and lg - q overlap between them. Specifically, the markup items in the first two pairs overlap by
range, while those in the following two pairs overlap by content hierarchy, and the last two overlap by
markup hierarchy.Of course, this inference process can be run on any EARMARK document. However, the bigger the document (in
terms of the number of OWL assertions that specify the markup structure), the longer it takes for the reasoner
to infer those data. For this reason, in some cases, it could be prefereable to express as SPARQL 1.1 inserts
[] some of the inference rules that we have shown here as OWL logical
axioms and SWRL rules. For instance, the rule specified for identifying the overlaps by markup hierarchy could
improve the efficiency of the system if expressed in SPARQL as follows:# Rule 'overlaps by markup hierarchy' in SPARQL
CONSTRUCT { ?a eoo:overlapsByMarkupHierarchy ?b }
WHERE {
?a a earmark:MarkupItem ;
eoo:dominatesDirectly ?x .
?b a earmark:MarkupItem ;
eoo:dominatesDirectly ?x .
?x a earmark:MarkupItem .
}According to our experience, this approach considerably reduces the time to infer the existing overlapping
scenarios in an EARMARK document, even if it needs to implement manually all the inferences that are needed,
including those derived from any ontological axiom, e.g., subsumption, property characteristic (transitivity,
irreflexivity, symmetry, etc.), and so on.ConclusionsFor EARMARK to be able to claim to be a one-stop answer to overlapping needs of markup authors, we still
needed a way to identify when, indeed, ranges and markup items actually overlap. EARMARK per se, in fact, does not have a way to identify overlapping situations, simply allowing them to
exist and each overlapping item to ignore the others. With the EARMARK Overlapping Ontology, on the other hand, it
is now possible to identify and qualify explicitly every overlapping situation we encounter. For instance in
[] we provide a brief overview of situations and contexts where EARMARK
can and has been used, especially in the domain of Digital Humanities. Also, technically, EARMARK is a stand-off notation, and as such it suffers from the same limitations that all
stand-off notations suffer: namely, whenever the source document (the docuverse)
is modified outside of the control of the author of the EARMARK annotations, they may (and often will) have the
pointers become outdated and wrong. Also in [] we provide some
mechanisms through which EARMARK pointers can be resynchronized with a modified source, that should be able to
handle some of the possible situations. EARMARK still has not finished evolving. The FRETTA parser [], that provides a
way for converting EARMARK documents into XML, and expressing overlapping situations choosing parametrically one
of the many existing XML tricks such as fragmentation, milestones or twin documents, is working and complete, but
the opposite converter, the one that generates an EARMARK document from an XML file that uses XML tricks to
express overlaps, is still to be completed. Once this is finished, we will have a complete solution to the problem
of expressing any markup document with Semantic Web technologies, and we will be able to cover all possible
situations of conversion of overlapping documents. BibliographyBarabucci, G., Di Iorio, A.,
Peroni, S., Poggi, F., & Vitali, F. (2013). Annotations with EARMARK in practice: a fairy
tale. In F. Tomasi & F. Vitali (Eds.), Proceedings of the 2013 Workshop on Collaborative
Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital
Humanities (DH-CASE 2013). New York, New York, US: ACM Press. doi:10.1145/2517978.2517990Barabucci, G., Peroni, S., Poggi, F., &
Vitali, F. (2012). Embedding semantic annotations within texts: the FRETTA approach. In
Proceedings of the 2012 ACM Symposium on Applied Computing (SAC 2012): 658–663. New York, New
York, US: ACM Press. doi:10.1145/2245276.2245403Barnard, D., Hayter, R., Karababa, M.,
Logan, G., & McFadden, J. (1988). SGML-based markup for literary texts: Two problems and
some solutions. Computers and the Humanities, 22(4), 265-276. doi:10.1007/BF00118602.Berglund, A., Boag, S., Chamberlin, D., Fernández, M.
F., Kay, M., Robie, J., Siméon, J. (2010). XML Path Language (XPath) 2.0 (Second Edition). W3C
Recommendation 14 December 2010. World Wide Web Consortium.
“http://www.w3.org/TR/xpath20/”.Boag, S., Chamberlin, D., Fernández, M. F., Florescu,
D., Robie, J., Siméon, J. (2010). XQuery 1.0: An XML Query Language (Second Edition). W3C
Recommendation 14 December 2010. World Wide Web Consortium.
“http://www.w3.org/TR/xquery/”.Ciccarese, P., & Peroni, S. (2013). The
Collections Ontology: creating and handling collections in OWL 2 DL frameworks. Semantic Web –
Interoperability, Usability, Applicability. doi:10.3233/SW-130121Cowan, J., Tennison, J. ECLIX: reading XML as
LMNL. LMNL wiki “http://lmnl-markup.org/specs/”.DeRose, S. J. (2004). Markup Overlap: A Review and a
Horse. In Extreme Markup Languages.DeRose, S. J., Durand, D. G., Mylonas, E.,
Renear, A. H. (1990). What is text, really? In Journal of Computing in Higher Education, 1(2), 3-26. doi:10.1007/BF02941632.DeRose, S., Daniel, R., Grosso, P., Maler, E.,
Marsh, J., Walsh, N. (2002). XML Pointer Language (XPointer). W3C Working Draft 16 August
2002. World Wide Web Consortium. “http://www.w3.org/TR/xptr/”.Di Iorio, A., Peroni, S., &
Vitali, F. (2009). Towards markup support for full GODDAGs and beyond: the EARMARK approach.
In Proceedings of Balisage: The Markup Conference 2009, Balisage Series on Markup Technologies
3. Rockville, Maryland, US: Mulberry Technologies, Inc. doi:10.4242/BalisageVol3.Peroni01Di Iorio, A., Peroni, S., & Vitali, F.
(2011). A Semantic Web approach to everyday overlapping markup. Journal of the American
Society for Information Science and Technology, 62(9): 1696–1716. doi:10.1002/asi.21591Di Iorio, A., Peroni, S., &
Vitali, F. (2011). Using semantic web technologies for analysis and validation of structural
markup. International Journal of Web Engineering and Technology, 6(4): 375–398. doi:10.1504/IJWET.2011.043439Durusau, P., O'Donnell, M. B. (2002). Coming down
from the trees: Next step in the evolution of markup? In Extreme Markup
Languages®.Durusau, P., O'Donnell, M. B. (2002).
Just-In-Time-Trees (JITTs): Next Step in the Evolution of Markup. In Proceedings of 2002
Extreme Markup Languages Conference, Montréal, Canada.Falco, R., Gangemi, A., Peroni, S., &
Vitali, F. (2014). Modelling OWL ontologies with Graffoo. In ESWC 2014 Satellite Events -
Revised Selected Papers, Lecture Notes in Computer Science. Berlin, Germany: Springer.
Postprint available at
http://speroni.web.cs.unibo.it/publications/falco-in-press-modelling-ontologies-graffoo.pdfGearon, P., Passant, A., & Polleres,
A. (2013). SPARQL 1.1 Update. W3C Recommendation, 21 March 2013. World Wide Web Consortium.
Retrieved from http://www.w3.org/TR/sparql11-update/Goldfarb, C. F., Rubinsky, Y. (1990). The SGML
handbook. Oxford University Press.Hilbert, M., Schonefeld, O., Witt, A. (2005,
August). Making CONCUR work. In Extreme Markup Languages.Horrocks, I., Kutz, O., & Sattler, U.
(2006). The Even More Irresistible SROIQ. In P. Doherty, J. Mylopoulos, & C. A. Welty
(Eds.), Proceedings of the 10th International Conference on Principles of Knowledge
Representation and Reasoning (KR 2006): 57–67. Palo Alto, California, USA: AAAI
Press.Horrocks, I., Patel-Schneider, P. F., Boley,
H., Tabet, S., Grosof, B., & Dean, M. (2004). SWRL: A Semantic Web Rule Language Combining
OWL and RuleML. W3C Member Submission, 21 May 2004. World Wide Web Consortium. Retrieved from
http://www.w3.org/Submission/SWRL/Huitfeldt, C., Sperberg-McQueen, C. M. (2001).
TexMECS: An experimental markup meta-language for complex documents.
“http://mlcd.blackmesatech.com/mlcd/2003/Papers/texmecs.html”.(DAG, noDAG, child-arch-ordered
direct graph (CODG), overlap-only (oo) TexMECS, etc.)Huitfeldt, C.,Sperberg-McQueen, C. M. (2006).
Representation and processing of goddag structures: implementation strategies and progress
report. In Extreme Markup Languages.Jagadish, H. V., Lakshmanan, L. V.,
Scannapieco, M., Srivastava, D., Wiwatwattana, N. (2004). Colorful XML: one hierarchy isn't
enough. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data.
(pp. 251-262). ACM. doi:10.1145/1007568.1007598.Krötzsch, M., Simancik, F., & Horrocks, I.
(2013). A Description Logic Primer. No. arXiv:1201.4089, 2013, The Computing Research
Repository. Retrieved from http://arxiv.org/abs/1201.4089Marcoux, Y. (2008). Graph characterization of
overlap-only TexMECS and other overlapping markup formalisms. In Proceedings of Balisage: The
Markup Conference (Vol. 1). doi:10.4242/BalisageVol1.Marcoux01Marcoux, Y., Sperberg-McQueen, M., Huitfeldt, C.
(2013). Modeling overlapping structures. Graphs and serializability. In Balisage: The Markup
Conference, 2013. doi:10.4242/BalisageVol10.Marcoux01Marinelli, P., Vitali, F., Zacchiroli, S. (2008).
Towards the unification of formats for overlapping markup. In New Review of Hypermedia and
Multimedia 14, 1 (January 2008), pages 57-94. doi:10.1080/13614560802316145O. Schonefeld. (2007). XCONCUR and
XCONCUR-CL: A constraint-based approach for the validation of concurrent markup. In Data
Structures for Linguistic Resources and Applications. Proceedings of the Biennial GLDV
Conference 2007, Tübingen, Germany, 2007. Gunter Narr Verlag.Peroni, S., Gangemi, A., &
Vitali, F. (2011). Dealing with markup semantics. In Proceedings the 7th International
Conference on Semantic Systems (I-SEMANTICS 2011): 111–118. New York, New York, US: ACM Press.
doi:10.1145/2063518.2063533Peroni, S., Poggi, F., & Vitali,
F. (2013). Tracking changes through EARMARK: a theoretical perspective and an implementation.
In G. Barabucci, U. Burghoff, A. Di Iorio, & S. Maier (Eds.), Proceedings of 1st
International Workshop on (Document) Changes: modeling, detection, storage and visualization
(DChanges 2013), CEUR Workshop Proceedings 1008. Aachen, Germany: CEUR-WS.org. Retrieved from
http://ceur-ws.org/Vol-1008/paper6.pdfPrud’hommeaux, E., & Carothers, G.
(2013). Turtle - Terse RDF Triple Language. W3C Candidate Recommendation, 19 February 2013.
World Wide Web Consortium. Retrieved from http://www.w3.org/TR/turtle/Sirin, E., Parsia, B., Grau, B. C., Kalyanpur,
A., & Katz, Y. (2007). Pellet: A practical OWL-DL reasoner. Web Semantics: Science,
Services and Agents on the World Wide Web, 5(2): 51–53. doi:10.1016/j.websem.2007.03.004Sperberg-McQueen, C. M., & Huitfeldt,
C. (2004). Goddag: A data structure for overlapping hierarchies. In Digital Documents: Systems
and Principles (pp. 139-160). Springer Berlin Heidelberg. doi:10.1007/978-3-540-39916-2_12.TEI Consortium (2008). TEI P5: Guidelines for
electronic text encoding and interchange. Eds. Lou Burnard, and Syd Bauman. TEI Consortium,
2008.Tennison, J., Piez, W. (2002). The Layered Markup and
Annotation Language (LMNL). In Extreme Markup Languages, 2002.Witt, A., Schonefeld, O., Rehm, G., Khoo, J.
Evang, K. (2007). On the lossless transformation of single-file, multi-layer annotations into
multi-rooted trees. In Proceedings of Extreme Markup Languages, Montréal, Québec,
2007.