Discontinuity in TexMecs, Goddag structures, and rabbit/duck grammars
C. M. Sperberg-McQueen
Member of the technical staff
World Wide Web Consoritum / MIT
Associate Professor (førsteamanuensis)
Department of Philosophy, University of Bergen
That the textual phenomena of interest for markup are not always hierarchically arranged is well known and widely discussed. Less frequently discussed is the fact that they are also not always contiguous, so that the units of our analysis cannot always correspond to single elements in the document. Various notations for discontinuous elements exist, but the mapping from those notations to data structures has not been well analysed or understood. And as far as we know, there are no standard mechanisms for validating discontinuous elements. We propose a data structure (a modification of the Goddag structure) to better handle discontinuous elements: we relax the rule that every pair of elements where one contains the other be related by a path of parent/child links. Parent/child links are then not an automatic result of containment. We conclude with a brief sketch of the issues involved in extending current validation mechanisms to handle discontinuity.