Cyril Briquet and Pascale Renders. «Une approche reposante (RESTful) des aspects opérationnels de la rétroconversion du Französisches Etymologisches Wörterbuch (FEW)». Proc. Liège Day in Processing of Gallo-Roman Sources (TraSoGal), May 2009.
Éva Büchi. «Les Structures du /Französisches Etymologisches Wörterbuch/. Recherches métalexicographiques et métalexicologiques», Niemeyer, Tübingen, 1996.
David Carmel, Yoelle S. Maarek, Matan Mandelbrod, Yosi Mass and Aya Soffer. «Searching XML documents via XML fragments». Proc. SIGIR, Toronto, ON, 2003.
Jacques Dendien and Jean-Marie Pierrel. «Le Trésor de la Langue Française informatisé. Un exemple d’informatisation d’un dictionnaire de langue de référence». In Traitement Automatique des Langues 43 (2), 2003.
Jaap Kamps, Maarten Marx, Maarten de Rijke and Börkur Sigurbjörnsson. «Articulating Information Needs in XML Query Languages». In ACM Transactions on Information Systems 24 (4), October 2006. doi: 10.1145/1185877.1185879
Luca Lini, Daniella Lombardini, Michele Paoli, Dario Colazzo and Carlo Sartiani. «XTReSy: A Text Retrieval System for XML documents». In D. Buzzetti, H. Short, and G. Pancalddella, editors, Augmenting Comprehension: Digital Tools for the History of Ideas. Office for Humanities Communication Publications, King's College, London, 2001.
Pascale Renders. «L’informatisation du Französisches Etymologisches Wörterbuch : quels objectifs, quelles possibilités ?». Proc. Congrès International de Linguistique et de Philologie Romanes, Innsbruck, Austria, September 2007.
Pascale Renders and Cyril Briquet. «Conception d’algorithmes de rétroconversion». Proc. Liège Day in Processing of Gallo-Roman Sources (TraSoGal), May 2009.
«Trésor de la Langue Française informatisé» (TLFi) CD-ROM, CNRS Editions, Paris, 2004.
Xavier Tannier, Jean-Jacques Girardot and Mihaela Mathieu. «Classifying XML Tags through Reading Contexts
». Proc. ACM Symposium on Document Engineering, Bristol, UK, 2005. doi: 10.1145/1096601.1096638
John van der Voort van der Kleij. «Reverse Lemmatizing of the Dictionary of Middle Dutch (1885-1929) Using Pattern Matching». Proc. Conf. Computational Lexicography and Text Research, Budapest, Hungary, 2005.
Walther von Wartburg et al. «Französisches Etymologisches Wörterbuch. Eine darstellung des galloromanischen sprachschatzes», 25 volumes, Bonn/Heidelberg/Leipzig-Berlin/Bâle, Klopp/Winter/Teubner/Zbinden, 1922-2002.
A Virtualization-Based Retrieval and Update API for XML-Encoded Corpora
Abstract
Providing support for flexible automated tagging of text-oriented XML documents (i.e. text with intersparsed markup) is a challenging issue.
This requires support for tag-aware full text search (i.e. the capability to skip some tags or make invisible whole sections of the document), match points, and transparent updates.
An API addressing this issue is described.
Based on the virtualization of selected sections of the XML document, the API produces a tag-aware representation, backed by the document, that is transparently searchable (using keyword search or regular expressions) and updatable, offering support for natural linguistic reasoning.
A Virtualization-Based Retrieval and Update API for XML-Encoded Corpora
Balisage: The Markup Conference 2010
August 3 - 6, 2010
The materials listed below were provided by the speaker as supplements to a
presentation at Balisage. These materials may include the slides or visuals used in the
presentation; supplementary material, such as code samples or a demonstration application;
and/or the paper accompanying the presentation (if it has not been provided in XML). These
materials have been zipped for easy download and are identified by a brief description of
the contents. The materials themselves are untouched
, that is, they
have not been tested or edited by Balisage: The Markup Conference or by Mulberry
Technologies, Inc. As such, they are included on this website AS IS
,
i.e., as provided by the speaker, with no warranties, express or otherwise, made by Balisage
or Mulberry.
Slides and Materials
Author's keywords for this paper: XML; corpus; API; text; retrieval; update; algorithm; virtual; virtualization; string; context