Adida, B., Birbeck, M., McCarron, S.,
Pemberton, S. (2008). RDFa in XHTML: Syntax and processing. W3C Recommendation. World
Wide Web Consortium. http://www.w3.org/TR/rdfa-syntax/.
Berglund, A., Boag, S., Chamberlin, D.,
Fernández, M. F., Kay, M., Robie, J., Siméon, J. (2007). XML Path Language (XPath) 2.0.
W3C Recommendation. http://www.w3.org/TR/xpath20/.
Dattolo, A., Di Iorio, A., Duca, S., Feliziani,
A.A., Vitali, F. (2007). Structural patterns for descriptive documents. In the
Proceedings of the Seventh International Conference on Web Engineering 2007, Como,
Italy, 2007.
DeRose, S., Maler, E., Daniel, R. (2001). XML
Pointer Language (XPointer) Version 1.0. W3C Candidate Recommendation.
DeRose. S. (2004). Markup overlap: A review and
a horse. In Extreme Markup Languages.
Goldfarb, C. F. (1990). The SGML Handbook.
Oxford University Press, USA.
Horrocks, I., Patel-Schneider, P. F., Boley, H.
Tabet, S., Grosof, B., Dean, M. (2004). SWRL: A Semantic Web Rule Language Combining OWL
and RuleML. W3C Member Submission.
http://www.w3.org/Submission/SWRL/.
Huitfeldt, C., Sperberg-McQueen, C. M. (2001).
TexMECS: An experimental markup meta-language for complex documents.
Marcoux, Y. (2008). Graph characterization of
overlap-only TexMECS and other overlapping markup formalisms. Paper presented at the
Balisage: The Markup Conference. doi: 10.4242/BalisageVol1.Marcoux01.
Marinelli, P., Vitali, F., Zacchiroli, S.
(2008). Towards the unification of formats for overlapping markup. The New Review of
Hypermedia and Multimedia.
Nelson, T. (1980). Literary Machines: The
report on, and of, Project Xanadu concerning word processing, electronic publishing,
hypertext, thinkertoys, tomorrow's intellectual... including knowledge, education and
freedom - Mindful Press, Sausalito, CA, USA.
Oliver Schonefeld und Andreas Witt (2006).
Towards validation of concurrent markup. In: Proceedings of the Extreme Markup 2006,
Montréal, Canada.
Schmidt, D., Colomb, R. (2009). A data
structure for representing multi-version texts online. International Journal of
Human-Computer Studies.
Sperberg-McQueen, C. M., Burnard, L. (2005).
TEI P5 Guidelines for Electronic Text Encoding and Interchange (revised). The
Association for Computers and the Humanities.
Sperberg-McQueen, C.M., Huitfeldt, C. (2004).
GODDAG: A Data Structure for Overlapping Hierarchies. Lecture Notes In Computer Science.
Springer.
Tennison, J., Piez, W. (2002). The Layered
Markup and Annotation Language (LMNL). Paper presented at the Late breaking at Extreme
Markup. Montreal, Canada.
Tummarello, G., Morbidni, C., Pierazzo, E.
(2005). Toward textual encoding based on RDF. 9th ICCC Conference on Electronic
Publishing (ELPUB 2005). Leuven, Belgium.
Towards markup support for full GODDAGs and beyond: the EARMARK approach
Angelo Di Iorio
Department of Computer Science, University of Bologna
Silvio Peroni
Department of Computer Science, University of Bologna
Fabio Vitali
Department of Computer Science, University of Bologna
Abstract
One of the most evident tenets of the literature on overlapping markup is that the
philosophy of documents as trees (as dictated by meta-markup languages such as SGML
and XML) is a simplification that sometimes fails and requires corrections. These
corrections have been proposed at the markup level (e.g., milestones, segmentation),
at the meta-markup level (e.g., LMNL, TexMecs, XCONCUR, etc.) or at level of the
abstract model (e.g., GODDAG). Unfortunately full GODDAGs do not allow
linearizations in general, and as such a restricted version of GODDAG, r-GODDAG, has
been proposed that is guaranteed to be linearizable (in TexMecs) and still allows
many nice features beyond trees.
In this paper we discuss that the problem of linearizing more-than-hierarchical
structures lies basically in the embedding of markup within content and that no such
problem arises with an appropriate standoff approach, that is able to represent full
GODDAGs without restrictions. This gives ample opportunities to deal with
interesting markup features that are describable with GODDAGs but not with
r-GODDAGs, such as non-contiguous elements and virtual elements.
Besides, we discuss whether a specific constraint of full GODDAGs is really
necessary once all residual hopes of embeddability are given up, and we further
propose a minimal extension to GODDAG, genially called "extended GODDAG" (e-GODDAG)
that, by removing the requirement for names in non-terminal nodes, adds support for
additional interesting markup features such as content repetitions. In truth,
e-GODDAGs are even less embeddable than full GODDAGs, but they are just as easily
dealt with by using stand-off markup.
We further propose a meta-syntax for non-embedded markup, called EARMARK, that can
be used for stand-off annotations of textual content, and that naturally represents
e-GODDAGs with fully W3C-compliant technologies. EARMARK is based on an
ontologically precise definition of markup that instantiates the markup of a text
document as an OWL document, and through appropriate OWL and SWRL characterizations
it can define structures such as trees, r-GODDAGs, full GODDAGs and e-GODDAGs, and
can be used to generate validity constraints (including co-constraints), and to
verify adherence to content model patterns.
As mentioned, in general the embedding of a full EARMARK document is not straightforward,
but approaches can be taken in that direction: just like segmentation and
fragmentation are strategies to embed in a strictly-hierarchical language a
r-GODDAG-specific feature such as overlapping elements, similarly a number of
strategies exist to provide embedding of GODDAG and e-GODDAG features in less
expressive syntaxes. In the final part of the paper we discuss our wish to provide
at the metalanguage level a series of embedding strategies of the non-hierarchical
features of EARMARK, i.e. a number of language-independent mechanisms to express
e-GODDAGs structures into XML (as well as in TexMecs and in LMNL) and that can be
recognized as such (i.e., as strategies, as tricks) by tools and readers alike,
especially for further uses of such documents.
Towards markup support for full GODDAGs and beyond: the EARMARK approach
Balisage: The Markup Conference 2009
August 11 - 14, 2009
The materials listed below were provided by the speaker as supplements to a
presentation at Balisage. These materials may include the slides or visuals used in the
presentation; supplementary material, such as code samples or a demonstration application;
and/or the paper accompanying the presentation (if it has not been provided in XML). These
materials have been zipped for easy download and are identified by a brief description of
the contents. The materials themselves are untouched
, that is, they
have not been tested or edited by Balisage: The Markup Conference or by Mulberry
Technologies, Inc. As such, they are included on this website AS IS
,
i.e., as provided by the speaker, with no warranties, express or otherwise, made by Balisage
or Mulberry.
Slides and Materials