Modeling overlapping structures
Graphs and serializability
Université de Montréal, Canada
Black Mesa Technologies
University of Bergen, Norway
The problem of overlapping structures has long been familiar to the structured document community. In a poem, for example, the verse and line structures overlap, and having them both available simultaneously is convenient, and sometimes necessary (for example for automatic analyses). However, only structures that embed nicely can be represented directly in XML. Proposals to address this problem include XML solutions (based essentially on a layer of semantics) and non-XML ones. Among the latter is TexMecs HS2003, a markup language that allows overlap (and many other features).
XML documents, when viewed as graphs, correspond to trees. Marcoux M2008 characterized overlap-only TexMecs documents by showing that they correspond exactly to completion-acyclic node-ordered directed acyclic graphs. In this paper, we elaborate on that result in two ways.
First, we cast it in the setting of a strictly larger class of graphs, child-arc-ordered directed graphs, that includes multi-graphs and non-acyclic graphs, and show that — somewhat surprisingly — it does not hold in general for graphs with multiple roots. Second, we formulate a stronger condition, full-completion-acyclicity, that guarantees correspondence with an overlap-only document, even for graphs that have multiple roots.
The definition of fully-completion-acyclic graph does not in itself suggest an efficient algorithm for checking the condition, nor for computing a corresponding overlap-only document when the condition is satisfied. We present basic polynomial-time upper bounds on the complexity of accomplishing those tasks.