1. Motivation and related work
1.1. Graphs and documents
Many operations are more conveniently performed on a graph representation than on a linear representation of a marked up document, and vice versa. One of the strengths of XML is that XML documents in serial form are readily deserialized into ordered trees, which form a convenient data structure for many useful operations.^{[1]}
Socalled “XML trees” are directed acyclic graphs with single parenthood and a total ordering on leaf nodes. While this constitutes an intuitively natural and generally suitable model for the representation of the structure of most documents, and for most purposes, it also poses a challenge for the representation of complex structures such as overlapping, fragmented or disordered document elements, and multiple coexisting alternative structures, which allow for a more natural representation of complex documents in a wide range of situations.
For such purposes, a different kind of graph representation has been proposed, the socalled Goddag SH2004. Roughly, Goddags (General OrderedDescendant Directed Acyclic Graphs) are like XML trees except that they allow multiple parenthood and do not require a total ordering on leaf nodes. (Thus, XML trees constitute a subset of Goddags.)
Documents using different techniques for representing such structures in XML form (e.g., milestones, fragmentation, virtual elements, etc. B1995 SH1999 W2005) can be mapped onto Goddags, though not without applicationspecific mechanisms typically involving levels of indirection which may appear cumbersome. The experimental markup system TexMecs HS2003 offers mechanisms for the representation of complex structures which can be mapped on to Goddags independently of such knowledge.
Since its introduction, the Goddag data structure has frequently been cited, and it is used as a reference in various works on overlap. (For example, Moore M2012 studied Goddags in the context of access control, and introduced the notion of globally ordered Goddag.) However, the original description of Goddags is rather informal, and exhibits the kinds of gaps, vaguenesses, and ambiguities that have, over time, given informality a bad name among mathematicians and others interested in firm results.
For example, it was conjectured that a linearized document which made use, in addition to the mechanisms of XML, only of markup for overlapping elements, could be represented by a Goddag with a total order on leaf nodes (socalled restricted Goddags), but no proofs were given of this fact. The paper was silent and its authors agnostic about the serializability of graphs with multiple roots, and the relationship between Goddags and markup in terms of serializability was not systematically investigated. In HS2003, it was assumed, but no attempt was made to prove, that all TexMecs documents could be represented as Goddags, or that all Goddags could be serialized as TexMecs documents. The present paper is a modest contribution towards straightening up the situation, by way of a systematic study of the fine point of the complex relationship between markup formalisms like TexMecs and graph structures like Goddags.
1.2. The problem of serialization
The general problem is this: whenever a markup system (be it
XML, TexMecs, or another system) provides more than one way to
represent a given abstract structure,^{[2]}
that same abstract structure can be written out again
(serialized) in more than one way. Can we control the
serialization process to provide the markedup forms we find
easiest to work with at a given moment? Can we tell, by
inspection of a given graph, what serialization formats are
possible for the graph? In many cases, a marked up form using
overlapping elements, seems to at least some observers to be the
most natural
representation of a given document;
when can a graph be serialized using overlap alone, and when
does it require use of the more powerful mechanisms of virtual
or discontinuous elements?
A concrete example may help illustrate the point.
In the following fragment (adapted from D2004), the verse
elements are
empty milestones marking the beginning and end of verses, in
Trojan Horse
style markup. A Goddag structure
representing this fragment would have nodes for the verses,
but those nodes do not correspond one to one with XML elements
in this serialization:^{[3]}
<div xmlns="http://www.teic.org/ns/1.0"> <p> <verse xml:id="Jer.2.1"/> Moreover the word of the LORD came to me, saying, <verse eID="#Jer.2.1"/> <q n="QJer.2.2A"> <verse xml:id="Jer.2.2"/> Go and cry in the hearing of Jerusalem, saying, <q n="QJer.2.2B"> Thus says the LORD: <q n="QJer.2.2C"> I remember you, The kindness of your youth, The love of your betrothal, When you went after Me in the wilderness, In a land not sown. <verse eID="#Jer.2.2"/> <verse xml:id="Jer.2.3"/> Israel [was] holiness to the LORD, The firstfruits of His increase. All that devour him will offend; Disaster will come upon them, </q> <!True Close QJer.2.2C> says the LORD.</q> <verse eID="#Jer.2.3"/> <!* ... *> </q> </p> </div>
The same Goddag structure can also be serialized in an extended form of TexMecs notation.^{[4]}
<div <p <verse@Jer.2.1Moreover the word of the LORD came to me, saying,verse> <q n="QJer.2.2A" <verse@Jer.2.2 Go and cry in the hearing of Jerusalem, saying, <~@Jer.2.2b Thus says the LORD:~> <~@Jer.2.2c I remember you, The kindness of your youth, The love of your betrothal, When you went after Me in the wilderness, In a land not sown.~> verse> <q n="QJer.2.2B" <=@Jer.2.2b=> <q n="QJer.2.2C" <=Jer.2.2c=> <~@Jer.2.3a Israel [was] holiness to the LORD, The firstfruits of His increase. All that devour him will offend; Disaster will come upon them, ~> q> q> <verse@Jer.2.3 <=Jer.2.3a=> says the LORD. verse> <* ... *> q> p> div>
This particular Goddag structure can also be
serialized without virtual elements, just by allowing
the q
and verse
elements
to overlap:
<div <p <verse@Jer.2.1 Moreover the word of the LORD came to me, saying, verse> <q n="QJer.2.2A" <verse@Jer.2.2 Go and cry in the hearing of Jerusalem, saying, <q n="QJer.2.2B" Thus says the LORD: <q n="QJer.2.2C" I remember you, The kindness of your youth, The love of your betrothal, When you went after Me in the wilderness, In a land not sown. verse> <verse@Jer.2.3 Israel [was] holiness to the LORD, The firstfruits of His increase. All that devour him will offend; Disaster will come upon them, q> says the LORD. verse> q> <* ... etc. ... *> q> p> div>Intuitively, many readers find the overlaponly version of the document simpler and more natural than the version using bilocation tags. But (as demonstrated by M2008), not all Goddag structures can be written out using only overlap, without virtual elements, discontinuous elements, or bilocation tags.
This leads directly and obviously to the questions
When can graphs be serialized
using overlap only? And conversely, when are other
markup mechanisms necessary?
Marcoux M2008 introduced the notions of noDAG and overlaponly (oo) TexMecs as a first step towards answering these questions. A noDAG is a nodeordered directed acyclic graph, i.e., a slight variation on the Goddag, where there is a strict partial ordering on nodes. As a markup language, ooTexMecs is the subset of TexMecs that allows multiple roots and overlapping elements, but not virtual or interrupted elements. Marcoux established that a noDAG is serializable if and only if it is completionacyclic, and that “roundtripping” is possible, in that there is essentially a bijective correspondence between noDAGs and ooTexMecs documents.
1.3 The approach of this paper
In order to investigate whether and how the results of M2008 apply to other classes of graphs, we introduce here the more general notion of a childarcordered directed graph (CODG), and demonstrate that the results from M2008 hold also for CODGs, with the somewhat surprising exception of some CODGs with multiple roots. By defining the stronger notion of “fully completionacyclic” graphs, we succeed in identifying this subset: the ooserializable CODGs are exactly the fullycompletionacyclic ones. We also give basic polynomialtime upper bounds on the complexity of checking fullcompletionacyclicity and of actually computing an ooserialization of fullycompletionacyclic CODGs.
2. Childarcordered directed graphs
2.1 Definition A childarcordered directed graph (CODG for short) G = (V, ch) is a directed graph over a finite nonempty set of vertices (or nodes) V, where ch (for children) is a total mapping from V to finite (and possibly empty) sequences of nodes from V. The set of arcs (or edges) of G, noted E(G), comprises exactly those ordered pairs (v, w) for which
(∃n ∈ N)[ ch(v, n) = w ],
where N represents the set of nonnegative integers. The notation ch(v, n) is used as a shorthand for (ch(v))(n), that is, for the element with index n in the sequence ch(v). We use 0origin indexing; thus, for all v ∈ V, ch(v, 0) denotes the first child of v (or is undefined, if v has no child).
Note: Throughout this paper, the “parent” relation must be understood to be the exact inverse of the “child” relation (we bother to make this explicit because it is not the case in some other models, such as the XPath 1.0 data model).
It is possible for the same child to show up at more than one place in a sequence of children; that is, ch(v, n) = ch(v, m) with m ≠ n is possible. Loops are allowed; that is, ch(v, n) = v for a given n is possible.
Note that (v, w) ∈ E(G) for given v and w tells only part of the story: There could be many distinct values of n for which ch(v, n) = w. Also note that the length of ch(v), i.e., the smallest value of n (≥ 0) for which ch(v, n) is undefined, is greater than or equal to the number of distinct children of v (if v has no child, ch(v) = ∅, which, as a sequence, is of length 0).
CODGs are very loose structures: they can be “multigraphs,” in that more than one arc can link any given pair of nodes. They can have cycles and loops (i.e., cycles of length one). There can be both a direct (length one) and indirect path between any two given nodes.
The rationale for the adjective childarcordered
is that for all
v ∈ V, ch(v) can be seen as inducing an ordering on the arcs going out of
v (the
childarcs
of v).
Examples We present examples of CODGs illustrating some of their features.
Example 2.1 illustrates that CODGs can be disconnected, and that, in a CODG:

A node can have more than one parent (here most simply node d, with parents a and c, but also node b [with parents a and b]). This is a significant departure from the rule of single parenthood in XML.

There can be cycles and loops (cycles of length 1); here the only example is the loop on node b.

The same node can occur more than once as a child of some parent (here node c, which is both second and fourth among the children of node a).

There can be both direct and indirect paths between two nodes (here node a dominates node d both directly and via node c).
Note that the order of the outgoing arcs is usually not shown explicitly in the visual representation of a CODG. We adopt the convention of drawing the arcs going out of any node in order from left to right (even if the arcs must cross each other further down, in order to reach the child node they point to). So the leftmost arc leaving any parent is pointing to that parent's first child, and the rightmost arc points to that parent's last child.^{[5]} Thus, in Example 1, ch(a) = (b, c, d, c).
On the rare occasions that this convention is not practical, we use explicit green arrows between the outgoing arcs to indicate their order, as in the next example.
In Example 2.2, ch(a) = (b, c) and ch(d) = (e, c). Rightpointing arrows, though superfluous, are sometimes shown as a reminder of the implicit convention.
Sibling precedence For all v, ch(v)
induces a siblingprecedence
relation sp(v) among
the children of v, defined by:
sp(v) =_{def} { (w, x) ∈ V × V  (∃m, n ∈ N)[ m < n & ch(v, m) = w & ch(v, n) = x ] }.
Example 2.3 illustrates that parents may order the same nodes differently as children. Thus, note that ch(a) = (b, c), which induces the strict order relation b < c, and ch(d) = (c, b), which induces the strict order relation c < b.
In examples in which no pair of nodes is ordered differently by different parents, we will usually place the green arrows between nodes, rather than between arcs:
Auxiliary concepts We now define a number of auxiliary concepts useful in discussions of CODGs. All of them are secondary concepts in the sense that they are entirely and uniquely determined by the set of vertices and the sequences of children of the graph.
2.2 Definition Let G = (V, ch) be a CODG. Then:

⇒_{G} denotes the (positive) transitive closure of E(G).
Note that ⇒_{G} is not necessarily antireflexive, as E(G) may contain cycles.

⇒^{*}_{G} denotes the reflexive transitive closure of E(G), that is:
⇒_{G} ∪ { (v, v)  v ∈ V }.

sp(G) =_{def} { (v, w, x) ∈ V × V × V  (∃m, n ∈ N)[ m < n & ch(v, m) = w & ch(v, n) = x ] }.
The name “sp” stands for “sibling precedence.” Note that, iff w occurs more than once in the sequence of children of v, then (v, w, w) ∈ sp(G). Note also that it is entirely possible for both (v, w, x) and and (v, x, w) to be members of sp(G). Finally note that sp(G) is the union over all v ∈ V of ({v} × sp(v)), where sp(v) is the siblingprecedence relation induced by ch(v), as defined above at Sibling precedence.

gsp(G) =_{def} { (w, x) ∈ V × V  (∃v ∈ V)[ (v, w, x) ∈ sp(G) ] }.
The name “gsp” stands for “global sibling precedence.” It is the projection of sp(G) onto the last two components. It follows from the observations in the preceding point that there can be loops and cycles in gsp(G).
2.3 Notation Let G = (V, ch) be a CODG. Unless otherwise stated:

V can also be denoted by V(G),

E denotes E(G),

⇒ denotes ⇒_{G},

⇒^{*} denotes ⇒^{*}_{G},

sp denotes sp(G),

gsp denotes gsp(G).
3. Overlaponly documents
The phenomenon we wish to study in this paper is how the structural properties of a CODG relate to the fact that it mimics the containment and precedence relationships among elements in some markedup document. More specifically, we want to consider documents expressed in markup languages that allow overlapping elements, such as TexMecs HS2003. Thus, we need to define a model for such documents.
TexMecs allows many more constructs than element embedding and overlap. However, in this paper, we concentrate on those two, ignoring the others, such as virtual elements, interrupted elements, empty elements, attribute specifications, entity references, generic identifier coindexing (for handling selfoverlap), unordered contents, and comments. This is why we speak of “overlaponly” (or “oo”) documents. When the structure of a CODG corresponds to the containment and precedence relationships of some oodocument (to be defined precisely in a moment), we say the CODG is “ooserializable,” because the oodocument can be viewed as a serialization (a representation in serial form) of the CODG.
Instead of defining documents as character strings with syntactic constraints, we use a more abstract approach that avoids some complications and leads to results that are simpler to formulate. More constraints on the definition of document could later be added to suit specific markup languages if and when desired.
Intuitively, we adopt a tokenized view of the document, where tokens are tags and leaves. Tokens are represented by their (integer) position in the sequence of tokens that make up the document.
The tags in our model of oodocuments correspond, in the XML world, to start and endtags for nonempty elements.
The leaves in our model of oodocuments correspond, in the XML world, to text nodes (#PCDATA) and empty elements. Note that our model abstracts away from the actual textual content of elements and documents, and also ignores the differences among different element types. We claim, however, that our abstraction captures the essential structural aspects of markedup documents with possible element overlap.
3.1 Definition An oodocument is a finite set of pairs of the form (x, y), where x, y ∈ N (the set of nonnegative natural numbers) and x ≤ y, additionally satisfying a number of “wellformedness” constraints (stated below).
The pairs in a document are called ranges. If r = (x, y) is a range, then r_{1} and r_{2} denote respectively x and y.
Intuitively, a range gives the position of a starttag and of the corresponding endtag, or the position of a leaf, in which case, x = y. Formally, if D is an oodocument:
leaves(D) =_{def} { x ∈ N  (x, x) ∈ D },
stags(D) =_{def} domain(D)  leaves(D),
etags(D) =_{def} image(D)  leaves(D).
Note that, as usual:
domain(D) =_{def} { x ∈ N  ( ∃ y ∈ N  (x, y) ∈ D ) }, and
image(D) =_{def} { y ∈ N  ( ∃ x ∈ N  (x, y) ∈ D ) }, and
Oodocuments are subject to the following wellformedness constraints.
For all oodocument D:
D is a partial function over N, i.e., for all x ∈ N, there is at most one y such that (x, y) ∈ D.
D^{1} (that is, the inverse of D) is also a partial function over N, i.e., for all y ∈ N, there is at most one x such that (x, y) ∈ D.
stags(D) ∩ etags(D) = ∅.
Put less formally: No token is both a starttag and an endtag.
It must also be remembered that (as stated at the beginning of the definition) for all range r, r_{1} ≤ r_{2}, which corresponds to the normal rule of syntax that starttag must precede its matching endtag.
Note that we do not require the numbering of token positions to be gapfree, nor do we forbid consecutive leaves without intervening tags. There is also no requirement of an element spanning the whole document: this is of course crucial for oodocuments to be able to correspond to graphs with multiple roots.
3.2 Definition Let D be an oodocument, and r, s ∈ D:

r is said to contain s iff (r_{1} < s_{1} and s_{2} < r_{2}).

r is said to precede s iff (r_{1} < s_{1} and r_{2} < s_{2}).
Note that in the latter case, r and s may or may not overlap. Also note that r cannot both contain and precede s.
4. Correspondence between a graph and a document
Intuitively, a CODG and an oodocument correspond to each other when the nodes of the graph and the ranges of the document can be put in correspondence in such a way that node reachability corresponds to range containment, and gsp corresponds to range precedence.
4.1 Definition A CODG G and an oodocument D correspond to each other iff there exists a bijective mapping g from V(G) to D, such that all of the following conditions hold:

(∀ b, c ∈ V(G)) [(b ⇒ c) iff g(b) contains g(c)]

(∀ b, c ∈ V(G)) [if (b, c) ∈ gsp(G), then g(b) precedes g(c)]
We then say that G and D correspond to each other through g.
4.2 Definition A CODG G is said to be ooserializable iff there exists an oodocument that corresponds to G.
It is clear that every oodocument has a corresponding CODG: use ranges as nodes, and the transitive reduction of range containment as parentchild relation. Then, order all sets of siblings in range precedence order.
It is also clear that some CODGs are not ooserializable: for example, a CODG with cycles would imply (by transitivity) a range containing itself, which is impossible. But are all acyclic CODGs ooserializable? The following examples, inspired from M2008, show that the question is at best not trivial:
Both graphs are acyclic and they differ by just the presence/absence of one arc. Yet, only the first one is ooserializable M2008.^{[6]}
In M2008, Marcoux defined noDAGs, or nodeordered DAGs, as (essentially) directed acyclic graphs (DAGs) in which the nodes are partially ordered in such a way that siblings (children of a common parent), as well as distinct roots, are totally ordered. He then defined the property of completionacyclicity for noDAGs, and showed that ooserializable noDAGs are exactly the completionacyclic ones.
In order to investigate whether the same is true of CODGs, we must define an analogous property for CODGs. The following definition is the natural adaptation of completionacyclicity to CODGs.
4.3 Definition Let G = (V, ch) be a CODG. Then:

ssba(G) =_{def} { (w, x) ∈ V × V  (∃v ∈ V)[ v ⇒ w & (v, x) ∈ gsp & x ⇏ w ] }.
The name “ssba” stands for “shouldstartbefore additions.”

ssb(G) denotes the transitive closure of (E ∪ gsp ∪ ssba).
The relation “ssb” is called the “shouldstartbefore completion” of G.

seaa(G) =_{def} { (w, x) ∈ V × V  (∃v ∈ V)[ v ⇒ w & (v, x) ∈ gsp^{1} & x ⇏ w ] }.
The name “seaa” stands for “shouldendafter additions.” The relation gsp^{1} is the inverse of relation gsp.

sea(G) denotes the transitive closure of (E ∪ gsp^{1} ∪ seaa).
The relation “sea” is called the “shouldendafter completion” of G.
4.4 Notation Let G = (V, ch) be a CODG. Unless otherwise stated:

ssba denotes ssba(G),

ssb denotes ssb(G),

seaa denotes seaa(G),

sea denotes sea(G).
The relations ssb and sea can be understood as meaning: “should <something> in any ooserialization of the CODG,” for example “should end after in any ooserialization of the CODG.” Thus, “(v, w) ∈ ssb” can be read out as: “v should start before w in any ooserialization of the CODG.” In other words, ssb (respectively, sea) represents the start (respectively, end) tagprecedence relations that can be deduced from the topology of the CODG, supposing parentchild relations are interpreted as element containment, and siblingprecedence relations as start and endtagprecedence.
The relations ssba and seaa represent the “additional” arcs (over and above those in E and gsp or gsp^{1}) that must be considered to compute all the possible ssb and sea pairs that can be deduced from the CODG topology.
4.5 Definition A CODG G = (V, ch) is said to be completionacyclic (CA) iff each of ssb(G) and sea(G) is acyclic.
5. Main results
Things are not as simple with CODGs as with noDAGs. There are, it turns out, CODGs that are completionacyclic, yet not ooserializable. Our first result is to show that Example 2.2 above, as well as the following CODG, are in that situation:
Since the number of structurally distinct documents that can possibly correspond to a 5node graph is finite, we could exhaustively enumerate them and verify that none of them correspond to either Example 2.2 or to Example 6. However, that would not be very insightful.^{[7]} We will thus rather proceed by way of a lemma (5.9) that provides a general characterization of ooserializable CODGs, and will be useful for our second main result.
5.1 Definition Let G = (V, ch) be a CODG. The ancestral precedence relation of G, denoted ap(G), is defined as:
ap(G) =_{def} { (v, w) ∈ V × V  (v ⇒ w) & (w ⇏ v) }.
It is easy to show that ap(G) is always a strict partialorder on V. Informally, we could say that ap(G) gets rid of the cycles in G by contracting its (maximal) stronglyconnected components, then reexpanding them to an equal number of disconnected vertices.
5.2 Definition Let G = (V, ch) be a CODG. A root in G is a vertex r ∈ V for which:
(∄w ∈ V)[ (w, r) ∈ ap(G) ].
5.3 Notation Let G = (V, ch) be a CODG. Unless otherwise stated, ap denotes ap(G).
The next result establishes that for any distinct roots v and w, either v and w are in the same stronglyconnected component, or else v and w are unordered in each of ssb and sea.
5.4 Lemma Let G = (V, ch) be a CODG, and v and w two distinct roots in G. Then, either:
(v ⇒ w) & (w ⇒ v)
or
{ (v, w), (w, v) } ∩ ((ssb ∪ sea) − ⇒) = ∅.
Note: For space consideration, most proofs are omitted.
Thus, an important difference between noDAGs and CODGs is that the latter can have unordered root pairs, whereas noDAGs have (by definition) their roots totallyordered.
5.5 Definition A CODG G = (V, ch) is said to be spconsistent iff gsp is an acyclic relation, i.e., iff:
(∀v ∈ V)[(v, v) ∉ transitiveclosure(gsp)].
Note that if G is truly a multigraph, i.e., if some node occurs more than once as a child of the same parent, then G is certainly not spconsistent. However, G could fail to be spconsistent without being a true multigraph, for example if two siblings are ordered differently by two distinct parents.
5.6 Definition A CODG G = (V, ch) is said to be reduced iff no node is both directly and indirectly reachable from some other node, i.e., iff:
(∄v, w, x ∈ V)[ {(v, w), (v, x)} ⊆ E(G) & w ⇒ x].
Note that if there is a cycle in G, then it is not reduced.
5.7 Lemma If a CODG is spconsistent, is reduced, and has a single root, then it is isomorphic to a noDAG; thus, by M2008, it is ooserializable iff it is completionacyclic.
5.8 Definition Let G be a CODG. A singlerooted extension (sre) of a G, is identical to G with an added root that has as children the roots of the original CODG, in some ordering, without repetition.
Note that, in general, a CODG has more than one sre (in effect, n!, where n is the number of roots in the original CODG, i.e., one for each possible ordering of the original roots).
5.9 Lemma A CODG is ooserializable iff it has an sre that is ooserializable.
Proof sketch. (←): Let G be a CODG, and H an ooserializable sre of G. Let D be any serialization of H. Because H has only one root, and D corresponds to H, there must be a range in D that contains all the others. Thus, the first and last tag of H must be matching tags. By “removing” those tags from D, we obtain an ooserialization of G.
(→): Let G be an ooserializable CODG, and D any serialization of G. By “adding” a starttag and a matching end tag at (respectively) the beginning and end of D, we obtain a document that can be shown to correspond to some sre of G.
We are now ready to state our first main result:
5.10 Theorem There exist CODGs that are completionacyclic but not ooserializable.
Proof sketch. The theorem follows from the observations that:

Each of Examples 2.2 and 6 is completionacyclic.

Each of Examples 2.2 and 6 has exactly two sres, each of which is spconsistent, reduced, and (by definition of sre) has a single root.

Each sre of each of Examples 2.2 and 6 is completioncyclic.
By Lemma 5.7, none of Examples 2.2 and 6 has an sre that is ooserializable. Thus, by Lemma 5.9, none of Examples 2.2 and 6 is ooserializable.
Our second main result is easiest seen as a corollary to the proof of the preceding theorem. First, we define:
5.11 Definition A CODG is said to be fullycompletionacyclic (FCA) iff it has an sre that is completionacyclic.
5.12 Theorem A CODG is ooserializable iff it is fullycompletionacyclic.
Proof sketch. The theorem follows from the proof of the preceding theorem and the following lemma:
5.13 Lemma If a CODG is not spconsistent or is not reduced, then it is not completionacyclic.
6. Checking fullcompletionacyclicity
An obvious way to check whether a CODG is fullycompletionacyclic is to try out all possible sres and see if at least one is completionacyclic. From a completionacyclic sre, it would be easy to derive an ooserialization of the CODG. However, since there are n! different sres to check (where n is the number of roots in the CODG), this can be very inefficient. It would be nice to be able to check whether a CODG is fullycompletionacyclic without having to generate all possible sres.
It turns out it suffices to check each pair of roots for a particular condition which is verifiable in polynomial time. Since there are n × (n  1) / 2 rootpairs and checking the condition can be done in polynomial time, it follows that fullcompletionacyclicity can in fact be checked in polynomial time.
The condition to be checked is as follows.
6.1 Definition Let r and s be two roots of some CODG G = (V, ch) that are unordered with respect to ssb(G). We say that r must precede s, noted r ↝ s, iff there exist vertices x and y standing in either (or both) of the following configurations with respect to r and s:
Here, the doublearrows represent the reachability (⇒) relation, not just parentchild relationships. The red doublearrow (with a stroke through it) means the complement of ⇒ (thus, in Configuration 1, s ⇏ x). It does not matter whether or not r ⇒ y (resp. s ⇒ x) in Configuration 1 (resp. Configuration 2). In other words, at least one of r ⇏ y and s ⇏ x must be the case. The dotted green arrow means that (x, y) ∈ (ssb − ⇒), in other words, that x precedes y without being an ancestor of it.
An instance of at least one of those configurations is found in each of the following CODGs:
6.2 Lemma Let G be a CODG, and r and s be two roots in G such that r ↝ s. Then, for each H that is a CA sre of G, (r, s) ∈ gsp(H).
Proof sketch. If there exists no CA sre of G, or if there are no two roots r and s in G such that r ↝ s, then the lemma is vacuously verified. Let thus H be any CA sre of G and, for the sake of contradiction, suppose r and s are two Groots such that r ↝ s and such that (r, s) ∉ gsp(H). Suppose x and y are two vertices as in Configuration 1 above (we prove only the case of Configuration 1; that of Configuration 2 is proved similarly).
By the definition of sre, if (r, s) ∉ gsp(H), then it must be the case that (s, r) ∈ gsp(H), and thus, by construction of ssb(H) (Definition 4.32), that (s, r) ∈ ssb(H). Hence, it follows by Lemma 4 of M2008 and the fact that r and s do not stand in ancestordescendant relationship (being both Groots, they are ⇒incomparable), that (r, s) ∈ sea(H). Similarly, from (x, y) ∈ ssb(H) and x ⇏ y, and y ⇏ x (because (x, y) ∈ ssb(H)), we conclude that (y, x) ∈ sea(H).
Now, by construction of sea(H) (Definition 4.34), and from the facts that r ⇒ x, that (r, s) ∈ gsp^{1}(H), and that s ⇏ x, we conclude that (x, s) ∈ sea(H). So, we have (y, x) ∈ sea(H) (established earlier), and (x, s) ∈ sea(H), and (s, y) ∈ sea(H) (because s ⇒ y). Thus, sea(H) is cyclic, contrary to our hypothesis that H is CA, and so we must reject the hypothesis that (r, s) ∉ gsp(H), and conclude that (r, s) ∈ gsp(H).
QED
6.3 Theorem A CODG is FCA iff it is CA and it does not have any two roots r and s such that r ↝ s and s ↝ r.
Proof sketch.
(⇒) Any FCA CODG is CA. If a CODG had roots r and s such that r ↝ s and s ↝ r, then by Lemma 6.2, it would have a cycle in ssb, and thus could not be CA.
(⇐) Let G be a CA CODG in which no two roots r and s are such that r ↝ s and s ↝ r. Note that by Lemma 5.13 and the fact that G is CA, we know that G is spconsistent and reduced, and will thus take this for granted.
We give an algorithm for constructing an ordering of the roots of G that can be used as the rootorder in a sre H of G which will be shown to be FCA. In the algorithm, ssb(G, ROR) denotes the result of building ssb as per Definitions 4.31 and 4.32, but using gsp(G) ∪ ROR instead of gsp(G)
Algorithm:

Let ROR = { (r, s)  r and s are roots in G and r ↝ s }.

Let SSB = ssb(G, ROR).

WHILE (∃ x, y ∈ V)[ch(x) = ch(y) & {(x, y), (y, x)} ∩ SSB = ∅]

Pick any x and y satisfying the WHILE clause, and let X = x, and Y = y.

Let ROR = ROR ∪ {(r, s)  r and s are roots in G, and r ⇒^{*} X, and s ⇒^{*} Y}.

Let SSB = ssb(G, ROR).

The intuition behind the algorithm is best conveyed with examples.
Essentially, the algorithm goes like this: start with the rootorderings that
are imposed by the topology of the CODG, i.e., those pairs of roots (r, s) for
which r ↝ s. Then, for the other rootpairs, they can basically be ordered
randomly, as long as no silly decision
is taken.
To see what silly decisions
would be, consider
Example 10. A silly decision would be to stick root r between s and t. To
avoid such decisions, the rootordering must be built gradually, considering
one by one (the order does not matter) the unordered pairs of parents of the
same children. For each such pair (x, y), decide of an arbitrary order, then,
make sure all roots reachable upwards from x come before all those reachable
upwards from y (there can be no root reachable both ways, otherwise, x and y
would have been ordered to start with).
Each time such rootorderings are added, the consequences on the global ordering of the CODG are recomputed and propagated down from the roots.
Example 11 provides a more intricate example, in which eight possible pairs of unordered parents of the same children exist: (u, w), (u, x), (v, w), and (v, x), and their inverse. Whichever pair is chosen, it will result in roots r and s being ordered at step 3b, and then, in all eight pairs being ordered at step 3c.
Examples 12 and 13 show that the addition of a leaf can cause some rootpairs to be ordered from the start: (r ↝ s) and (r ↝ t) in Example 12; (r ↝ s) in Example 13.
Proof sketch of termination: At each turn, at least one pair of the kind sought for in (3) is ordered. Indeed, it can be shown that the chosen pair (x, y) causes arcs to be added to ROR that will necessarily order the pair (x, y) itself. Thus, eventually, no pair satisfying the WHILE clause will remain.
Proof sketch that ROR orders all pairs of roots: The existence of an unordered rootpair implies that there is a pair satisfying the WHILE clause. Thus, when no more such pair exists, all the roots have been ordered.
Proof sketch that ssb(G, ROR) is acyclic: Any condition that might cause a cycle would also cause a cycle in { (r, s)  r and s are roots in G and r ↝ s }, a contradiction.
A rough analysis of the algorithm shows that its running time is polynomial (probably with degree at most 3 or 4). Obviously, it could be used to actually build a CA sre of any FCA CODG, and thus an ooserialization of the CODG. This establishes a polynomial upperbound on the task of verifying fullcompletionacyclicity and of generating an ooserialization of a FCA CODG. While interesting, we do not believe these upperbounds to be tight, and hence consider the exact complexity of these tasks to be open questions.
7. Conclusion and future work
In this paper, we defined a class of graphs, childarcordered directed graphs (CODGs), that includes multigraphs and nonacyclic graphs, and investigated the conditions under which a CODG is “ooserializable”, i.e., has a structure which corresponds to that of an overlaponly markedup document. We found that the property of completionacyclicity does not guarantee ooserializability in general for CODGs, by showing that there exist completionacyclic CODGs that are not ooserializable. By contrast, Marcoux has shown that for the less general class of nodeorderedDAGs (noDAGs), completionacyclicity does guarantee ooserializability M2008.
We then defined a condition strictly stronger than completionacyclicity, fullcompletionacyclicity, and showed that it does guarantee ooserializability for all CODGs.
Finally, we presented polynomialtime algorithms for checking fullcompletionacyclicity and for computing an ooserialization of fullycompletionacyclic CODGs. However, we do not believe these algorithms to be optimal. Thus, open questions include determining the exact complexity of — and finding optimal algorithms for — checking completionacyclicity, fullcompletionacyclicity, and of actually computing an ooserialization of a CODG once it is found to be serializable.
Another area of research we hope to pursue in the near future is investigating whether and how some forms of interrupted and virtual elements, as found in TexMecs, can be characterized in terms of graphs.
Appendix A. Notation and symbols
For the convenience of readers who find the notation used here unfamiliar, we list here the symbols and conventional variable names used in this paper.
↝ 
the mustprecede relation on roots of a graph G: r ↝ s if and only if there exist vertices x and y standing in some specific configurations with respect to r and s (see Definition 6.1). 
⇒_{G} 
the positive transitive closure of E(G), for any graph G; sometimes known as the reachability relation of G; often abbreviated to ⇒ when the identity of G is understood. 
⇒^{*}_{G} 
the reflexive transitive closure of E(G), for any graph G; often abbreviated to ⇒^{*} when the identity of G is understood. 
∈ 
is an element of. So 
∪ 
set union. So 
∩ 
set intersection. So 
ap(G) 
the ancestralprecedence relation of a graph G; abbreviated ap when G is understood. This is a binary relation consisting of all node pairs (v, w) for which (informally) v is an ancestor of w and not viceversa. 
ch 
(core meaning) a unary function that maps each node of a graph to a sequence of nodes of that graph. (secondary meaning) a twoargument function that maps from a pair (v, n) (where v is a node and n is an integer) to at most one node among the children of v. 
ch(v) 
for any node v in a graph G,
Note that sequences are typically modeled as
sets of pairs (n, e) where n is a number and e an
element of the sequence. The set of pairs
denoted by ch(v) can thus be treated as a function
from nonnegative integers to nodes in the graph:
for any node v and any suitable integer
n, 
ch(v, n) 
denotes the nth child (counting from 0) of node v. This is a shorthand form for the expression ch(v)(n). 
E(G) 
the set of arcs in a graph G 
G 
the conventional variable for a graph (here invariably a CODG) 
gsp 
a shorthand form of gsp(G) when the identity of the graph G is understood. 
gsp(G) 
the global siblingprecedence relation of a graph G;
written as 
N 
the natural numbers (0, 1, 2, ...) 
n, m 
conventional variables used to represent individual natural numbers 
r 
conventional variable used to denote an arbitrary range. 
r_{1} 
for a given range r, r_{1} denotes the first element of r. 
r_{2} 
for a given range r, r_{2} denotes the second element of r. 
r, s 
conventional variables used to denote two roots of a CODG. 
sea(G) 
the 
seaa(G) 
the 
sp 
a shorthand form of sp(G) when the identity of the graph G is understood. 
sp(G) 
the siblingprecedence relation of a graph G; this is a ternary relation consisting of all node triples (v, w, x) for which (informally) w precedes x among the children of v; note that if w occurs more than once among the children of v, then the triple (v, w, w) appears in sp. If some node x occurs between the two occurrences of w, then (v, w, x) and (v, x, w) are both in sp. 
sre 
singlerooted extension of some graph G. Note that any multirooted graph G has many singlerooted extensions (one for each possible ordering of the roots of G). 
ssb(G) 
the 
ssba(G) 
the 
V 
the set of nodes (or vertices) in a graph; a shorthand for V(G) 
V(G) 
the set of vertices in a graph G 
V × V 
the Cartesian product of V, the set of nodes; that is, the set of pairs (v, w) where v ∈ V and w ∈ V 
V × V × V 
the set of triples (v, w, x) where v, w and x are all ∈ V. 
v, w, x, u 
variables conventionally used for individual nodes in a graph (by convention, all of v, w, x, u ∈ V) 
References
[B1995] David Barnard, Lou Burnard, JeanPierre Gaspart, Lynne A. Price, C. M. SperbergMcQueen, and Giovanni Battista Varile. “Hierarchical Encoding of Text: Technical Problems and SGML Solutions”, in Computers and the Humanities, 29/3 1995, pp. 211231. http://www.springerlink.com/content/p7775247276v88h3/, http://xml.coverpages.org/barnardHierps.gz. doi:10.1007/BF01830617
[D2004] Steven DeRose. “Markup Overlap: A Review and a Horse”. Paper delivered at Extreme Markup Languages, 2004, Montréal. http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html
[HS2003] Claus Huitfeldt and C. M. SperbergMcQueen. TexMECS: An experimental markup metalanguage for complex documents. University of Bergen, January 2001, rev. October 2003. http://mlcd.blackmesatech.com/mlcd/2003/Papers/texmecs.html
[M2008] Yves Marcoux. Graph characterization of overlaponly TexMECS and other overlapping markup formalisms. Proceedings of the Balisage 2008 conference, 1215 august 2008, Montréal (Canada). doi:10.4242/BalisageVol1.Marcoux01
[M2012] Moore, Neil. Multihierarchical documents and finegrained access control (2012). Theses and DissertationsComputer Science. Paper 6. http://uknowledge.uky.edu/cs_etds/6
[SH2004] C. M. SperbergMcQueen and Claus Huitfeldt. GODDAG: A Data Structure for Overlapping Hierarchies. SpringerVerlag (2004). Preprint at http://cmsmcq.com/2000/poddp2000.html
[SH1999] C. M. SperbergMcQueen and Claus Huitfeldt: “Concurrent Document Hierarchies in MECS and SGML”, in Literary and Linguistic Computing, 14 1999, pp. 2942. http://llc.oxfordjournals.org/cgi/content/abstract/14/1/29. doi:10.1093/llc/14.1.29
[W2005] Andreas Witt. “Multiple Hierarchies: New Aspects of an Old Solution”, in: Stefanie Dipper, Michael Götze, and Manfred Stede (eds.), Heterogeneity in Focus: Creating and Using Linguistic Databases, vol. 2 of Interdisciplinary Studies on Information Structure (ISIS), Working Papers of the SFB 632. University of Potsdam, Germany, 2005. (Corrected reprint of an Extreme Markup 2004 paper.) http://www.sfb632.unipotsdam.de/publications/isis02_4witt.pdf
^{[1]} The authors thank Deborah A. Lapeyre and several anonymous Balisage peer reviewers for their extensive help in improving the presentation of this paper.
^{[2]} We focus here exclusively on graphs as the abstract structures conveyed by marked up documents.
^{[3]} A number of other equivalent encodings are of
course possible. The q
elements can be fragmented
in TEI style to signal that multiple XML elements together
make up a single logical unit. In a Goddag structure built
from this example, one would expect to find one node in the
graph for each logical q
element, rather one for
each q
element in the XML.
<div xmlns="http://www.teic.org/ns/1.0"> <p> <verse n="Jer.2.1">Moreover the word of the LORD came to me, saying,</verse> <verse n="Jer.2.2"> <q xml:id="QJer.2.2A" part="I"> Go and cry in the hearing of Jerusalem, saying, <q xml:id="QJer.2.2B" part="I"> Thus says the LORD: <q xml:id="QJer.2.2C" part="I"> I remember you, The kindness of your youth, The love of your betrothal, When you went after Me in the wilderness, In a land not sown. </q> </q> </q> </verse> <verse n="Jer.2.3"> <q part="M" prev="#QJer.2.2A"> <q part="F" prev="#QJer.2.2B"> <q part="F" prev="#QJer.2.2C"> Israel [was] holiness to the LORD, The firstfruits of His increase. All that devour him will offend; Disaster will come upon them, </q> says the LORD. </q> </q> </verse> </p> </div>
Or the verse
elements can be fragmented.
Or TEI virtual elements can be used to represent parts of
the document structure that do not fit neatly into a
hierarchy.
^{[4]} Readers who have not recently reviewed the definition of TexMecs may need to be reminded of some basics of TexMecs notation. The conventions used here are these; most are adopted from HS2003 but bilocation tags are new; they are introduced in order to allow the serialization of a larger class of graphs.

Start, end, and soletags for an element of type
e
take the forms<e
,e>
<e>
. 
Elements may overlap.

A unique identifier may be assigned to an element by following its generic identifier immediately with
@
and an ID value. These IDs are recognized and handled at the TexMecs level; they do not require declaration or applicationlevel semantics. 
The generic identifier may be omitted from start and endtags, in which case they mark a pseduoelement: an arbitrary portion of the document, typically marked this way in order to assign an identifier to that segment of the text. In the examples in this paper, we assume a convention that wrapping a text node in such an
anonymous
element does not create a new node but merely gives an identifier for the text node. (This is not stated normatively in the definition of TexMecs.) 
The notation
<^e^xyz>
marks avirtual element
, whose type ise
and whose children are those of the element whose ID isxyz
. Virtual elements thus serve as additional parents to nodes already present with other parents.The analogous notation
<^^xyz>
is used to refer to the pseudoelement whose ID isxyz
. 
The notation
<=xyz=>
is abilocation tag
, used to signal that the element whose ID isxyz
appears as a child of the immediately open elements, at the location indicated.Bilocation tags are not defined in HS2003; they are defined here in order to have a convenient notation for the graphs in this paper. When a bilocation tag appears in a TexMecs document, all the same parentchild arcs are created in the document graph as would be created were there a soletag at that location. But the target of the parentchild arcs is not a new element represented by a sole tag, but the element whose ID appears in the bilocation tag.
The example given here could be represented without bilocation tags, by adopting the convention that neither pseudoelements nor virtualelement references to pseudoelements create new nodes during parsing. Verse 3 would look like this using this convention.
<verse@Jer.2.3 <^^Jer.2.3a> says the LORD. verse>
^{[5]} This may sound like a fragile or errorprone convention, but it turns out to work well most of the time, and it makes the diagrams easier to read than attaching numbers to the arcs and drawing them with fewer crossings but out of order.
^{[6]} Example 4 is ooserializable in the TexMecs form
<poem <verse <quote~1 leaves <quote~2 fall verse>
quote~1> quote~2> poem>
.
Example 5 is not ooserializable, in brief, because
its EA (endsafter) relation has a cycle among the first
quote
element, the verse
element,
and the text node containing the word fall
.
Let us refer to them as q_{1}, v, and f for short.

q_{1} follows v among the children of the
poem
element, and thus q_{1} must end after v. 
v dominates f, and thus v must end after f.

f is not dominated by q_{1}, but it is dominated by a following sibling of q_{1} (namely, the second
quote
element), and thus f must end after q_{1}.
^{[7]} The curious reader may, however, be interested in the enumeration. We cannot give it in full, but we can sketch it here.
In Example 6, nodes a and e are nonterminals (which means they must correspond to elements in TexMecs) and nodes b, c and d are childless (which means they may correspond either to empty elements or to spans of character data in TexMecs, that is, to leaf nodes). One way to begin the enumeration is to observe that in overlaponly TexMecs the elements a and e will each have one start and one endtag. There are six possible orders for these four tags:

<a <e a> e>

<a <e e> a>

<a a> <e e>

<e <a a> e>

<e <a e> a>

<e e> <a a>
To each of these six patterns for the representation of nodes a and e there correspond 210 possible TexMecs documents with nodes b, c, and d interleaved among the tags for a and e. (Node b can be situated in any of five locations: before the first tag, after the first tag, after the second, after the third, after the fourth. The tag for node c can be placed in any of six locations (before or after any of the four start and endtags and node b); the tag for node d can be placed in any of seven locations. 210 = 5 × 6 × 7.) The beginning of the enumeration might look like this:

d c b <a <e a> e>

c d b <a <e a> e>

c b d <a <e a> e>

c b <a d <e a> e>

c b <a <e d a> e>

c b <a <e a> d e>

c b <a <e a> e> d

d b c <a <e a> e>

b d c <a <e a> e>

b c d <a <e a> e>

b c <a d <e a> e>

…

d b <a c <e a> e>

b d <a c <e a> e>

b <a d c <e a> e>

b <a c d <e a> e>

b <a c <e d a> e>

…
To each of these 210 interleavings of b, c, and d into the four start and endtags for a and e, there correspond eight TexMecs documents. Each of nodes b, c, and d may be either a text node or an empty element, so there are eight (two to the third power) combinations. The first pattern in the preceding list corresponds to the following eight TexMecs documents, and each of the other 210 expands similarly.

d c b <a <e a> e>

<d> c b <a <e a> e>

d <c> b <a <e a> e>

<d> <c> b <a <e a> e>

d c <b> <a <e a> e>

<d> c <b> <a <e a> e>

d <c> <b> <a <e a> e>

<d> <c> <b> <a <e a> e>
In total, then, there are 6 patterns for the start and endtags of a and e, 210 ways to interleave b, c, and d into those patterns, and 8 ways to realize each interleaving, for 10,080 (6 × 210 × 8) structurally possible ooTexMecs documents for this configuration of five nodes. Since each of the five nodes can be either a leaf node or a nonterminal, there are 32 possible configurations. These vary in their number of possible realizations, but we hope it is now clear why enumerating the distinct documents for fivenode graphs seems unlikely to be a helpful approach.