A Virtualization-Based Retrieval and Update API for XML-Encoded Corpora
Postdoctoral Fellow in Digital Humanities and High Performance Computing
Providing support for flexible automated tagging of text-oriented XML documents (i.e. text with intersparsed markup) is a challenging issue. This requires support for tag-aware full text search (i.e. the capability to skip some tags or make invisible whole sections of the document), match points, and transparent updates. An API addressing this issue is described. Based on the virtualization of selected sections of the XML document, the API produces a tag-aware representation, backed by the document, that is transparently searchable (using keyword search or regular expressions) and updatable, offering support for natural linguistic reasoning.