An XML user steps into, and escapes from, XPath quicksand
David J. Birnbaum
Professor and Chair
Department of Slavic Languages and Literatures University of Pittsburgh
<djbpitt@pitt.edu>
Until recently, the admirable and impressive eXist XML database sometimes failed to optimize queries with
numerical predicates. For example, a search for $i/following::word[1] would retrieve all
<word> elements on the following axis and
only then apply the predicate as a filter to return only the first of them. This was
enormously inefficient when $i pointed to a node near the beginning of
a very large document, with many thousands of following
<word> elements. As an end-user without the Java
programming skills to write optimization code for eXist, the author describes two types of optimization in the more
familiar XML, XPath, and XQuery, which reduced the number of nodes that needed to be
accessed and thus improved response time substantially.
A subsequent optimization introduced by the eXist developers into the eXist code base is described in an addendum to this paper. Although this revision partially obviates the need for the work-arounds developed earlier, the analysis of the efficiency of various XPath approaches to a single problem continues to provide valuable general lessons about XPath.