Alink, W., Bhoedjang, R., de
Vries, A. P., and Boncz, P. A. Efficient XQuery Support for Stand-Off
Annotation. In: Proceedings of the 3rd International Workshop on XQuery
Implementation, Experience and Perspectives, in cooperation with ACM SIGMOD, Chicago, USA,
2006.
Alink, W., Jijkoun, V., Ahn,
D., and de Rijke, M. Representing and Querying Multi-dimensional Markup
for Question Answering. In: Proceedings of the 5th EACL Workshop on NLP and XML
(NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing}, Trento, 2006.
Bayerl, P. S., Lüngen, H.,
Goecke, D., Witt, A. and Naber, D. Methods for the semantic analysis of
document markup. In: Roisin, C.; Muson, E. and Vanoirbeek, C. (ed.), Proceedings
of the 3rd ACM Symposium on Document Engineering (DocEng), Grenoble, pages 161-170, 2003.
doi:10.1145/958220.958250.
Bird, S. and Liberman,
M.Annotation graphs as a framework for multidimensional linguistic
data analysis. In: Proceedings of the Workshop "Towards Standards and Tools for
Discourse Tagging", pages 1–10. Association for Computational Linguistics, 1999.
Bird, S., Day, D., Garofolo, J.,
Henderson,J., Laprun, C. and Liberman,M. ATLAS: A flexible and
extensible architecture for linguistic annotation. In: Proceedings of the Second
International Conference on Language Resources and Evaluation, pages 1699–1706, Paris, 2000.
European Language Resources Association.
Bird, S. and Liberman, M.
A formal framework for linguistic annotation. Speech
Communication, 33(1–2): pages 23–60, 2001.
doi:10.1016/S0167-6393(00)00068-6.
Bird, S., Chen, Y., Davidson, S.,
Lee, H. and Zheng,Y. Designing and Evaluating an XPath Dialect for
Linguistic Queries. In: Proceedings of the 22nd International Conference on Data
Engineering (ICDE), Atlanta, USA., 2006.
doi:10.1109/ICDE.2006.48.
Carletta, J., Kilgour, J.,
O’Donnel, T. J., Evert, S. and Voormann, H. The NITE Object Model
Library for Handling Structured Linguistic Annotation on Multimodal Data Sets.
In: Proceedings of the EACL Workshop on Language Technology and the Semantic Web (3rd Workshop
on NLP and XML (NLPXML-2003)), Budapest, Ungarn, 2003.
Clark, H. (1977). Bridging. In: Johnson-Laird, P.N. and Wason, P.C. (eds.): Thinking: Readings in
Cognitive Science. Cambridge : Cambridge University Press, 1977, S. 411 - 420.
J. Cowan, J. Tennison, and Piez,
W. LMNL update. In: Proceedings of Extreme Markup Languages,
Montréal, Québec, 2006.
DeRose, S. J. Markup Overlap: A Review and a Horse. In: Proceedings of Extreme Markup
Languages, 2004.
Diewald, N.,
Stührenberg, M., Garbar, A. and Goecke, D. Serengeti -- Webbasierte
Annotation semantischer Relationen. To appear in LDV Forum - Zeitschrift für
Computerlinguistik und Sprachtechnologie.
Dipper, S. XML-based stand-off representation and exploitation of multi-level linguistic
annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), pages 39–50,
Berlin, Deutschland, 2005.
Dipper, S., Götze, M.,
Küssner, U. and Stede, M. Representing and Querying Standoff
XML. In: Rehm, G., Witt, A. and Lemnitzer, L. editors, Datenstrukturen für
linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources and
Applications. Proceedings of the Biennial GLDV Conference 2007, pages 337–346, Tübingen, 2007.
Gunter Narr Verlag.
Durusau, P. and
O'Donnell, M.B.. Concurrent Markup for XML Documents. In:
Proceedings of the XML Europe conference 2002.
Fellbaum, C. WordNet: An electronic lexical database. Cambridge, Mass.: MIT Press, 1998.
Gleim, R., Mehler, A. and
Eikmeyer, H.-J. Representing and Maintaining Large Corpora.
In: Proceedings of the Corpus Linguistics 2007 Conference, Birmingham (UK), 2007.
Goecke, D. and Witt, A.
Exploiting Logical Document Structure for Anaphora
Resolution. In: Proceedings of the 5th International Conference on Language
Resources and Evaluation (LREC 2006). Genoa, Italy, 2006.
Goecke, D., Stührenberg,
M. and Wandmacher, T. Extraction and representation of semantic
relations for resolving definite descriptions. To appear in LDV Forum -
Zeitschrift für Computerlinguistik und Sprachtechnologie.
Goecke, D., Lüngen, H.,
Metzing, D., Stührenberg, M. and Witt, A. Different Views on Markup.
Distinguishing levels and layers. In: Linguistic modeling of information and
Markup Languages. Contributions to language technology. Springer, 2008.
Hamp, B. and Feldweg, H.
GermaNet - a Lexical-Semantic Net for German. In:
Proceedings of ACL workshop "Automatic Information Extraction and Building of Lexical
Semantic Resources for NLP Applications", pages 9–15, New Brunswick, New Jersey,
1997. Association for Computational Linguistics.
Hilbert, M. MuLaX – ein Modell zur Verarbeitung mehrfach XML-strukturierter Daten. Diploma
thesis, Bielefeld University, 2005.
M. Hilbert, O. Schonefeld,
and A. Witt. Making CONCUR work. In: Proceedings of Extreme
Markup Languages, 2005.
Holt, R., Schürr, A., Elliott Sim,
S and Winter, A. GXL: A graph-based standard exchange format for
reengineering. In: Science of Computer Programming, 60(2): 149-170, 2006.
doi:10.1016/j.scico.2005.10.003.
Huitfeldt,
C. and Sperberg-McQueen, C.M. Texmecs: An experimental markup
meta-language for complex documents. Markup Languages and Complex Documents
(MLCD) Project, Februar 2001.
Ide, N. and Romary, L. International Standard for a Linguistic Annotation Framework. Journal
of Natural Language Engineering, 10(3-4): pages 211-225, 2004.
doi:10.1017/S135132490400350X.
Ide, N. and Romary, L.
Towards International Standards for Language Resources. In:
Dybkjaer, L., Hemsen, H., and Minker, W., editors, Evaluation of Text and Speech Systems,
pages 263--284. Springer.
Ide, N. and Suderman, K.
GrAF: A Graph-based Format for Linguistic Annotations. In:
Proceedings of the Linguistic Annotation Workshop, pages 1-8, Prague, Czech Republic.
Association for Computational Linguistics, 2007.
Laprun, C., Fiscus, J. G.,
Garofolo, J. and Pajot, S. Recent improvements to the ATLAS
architecture. In: Proceedings of HLT 2002, Second International Conference on Human
Language Technology Research, 2002.
ISO/IEC 19757-2:2003.
Information technology – Document Schema Definition Language (DSDL) –
Part 2: Regular-grammar-based validation – RELAX NG (ISO/IEC 19757-2).
International Standard, International Organization for Standardization, Geneva, 2003.
ISO/IEC 19757-3:2006.
Information technology – Document Schema Definition Language (DSDL) –
Part 3: Rule-based validation – Schematron. International standard, International
Organization for Standardization, Geneva, 2006.
Jagadish, H. V.,
Lakshmanany, L. V. S., Scannapieco, M., Srivastava, D. and Wiwatwattana, N. Colorful XML: One hierarchy isn’t enough. In: Proceedings of ACM
SIGMOD International Conference on Management of Data (SIGMOD 2004), pages 251–262, Paris,
June 13-18 2004. ACM Press New York, NY, USA.
doi:10.1145/1007568.1007598.
M. Kay. XSLT 2.0 and
XPath 2.0 Programmer’s Reference. Wiley Publishing, Indianapolis, 4th edition,
2008.
Le Maitre, J. Describing multistructured XML documents by means of delay nodes. In:
DocEng ’06: Proceedings of the 2006 ACM symposium on Document engineering, pages 155–164, New
York, NY, USA, 2006. ACM Press.
doi:10.1145/1166160.1166200.
Mitkov, R. Anaphora resolution. London: Longman, 2002
Poesio, M. and
Kruschwitz, U. Anawiki: Creating anaphorically annotated resources
through web cooperation. In: Proceedings of LREC 2008.
Polanyi, L. A
formal model of the structure of discourse. In: Journal of Pragmatics 12 (1988),
pages 601-638. doi:10.1016/0378-2166(88)90050-1.
O. Schonefeld. XCONCUR and XCONCUR-CL: A constraint-based approach for the validation of
concurrent markup. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.), Datenstrukturen
für linguistische Ressourcen und ihre Anwendungen. Data Structures for Linguistic Resources
and Applications. Proceedings of the Biennial GLDV Conference 2007, Tübingen, Germany, 2007.
Gunter Narr Verlag.
Soon, W.M., Lim, D.C.Y. and Ng,
H.T. (2001). A Machine Learning Approach to Coreference Resolution of
Noun Phrases. In: Computational Linguistics 27 (2001), No. 4, pages 521-544.
doi:10.1162/089120101753342653.
Sperberg-McQueen, C. M., Huitfeldt, C. and Renear, A.. Meaning and
Interpretation of markup. Markup Languages - Theory & Practice, 2, pages
215-234, 2000. doi:10.1162/109966200750363599.
Sperberg-McQueen, C. M., Dubin, D., Huitfeldt, C. and Renear, A. Drawing inferences on the basis of markup. In: Proceedings of Extreme Markup
Languages, 2002.
C. Sperberg-McQueen, C. M. and Burnard, L. (eds.). TEI P4: Guidelines
for Electronic Text Encoding and Interchange. published for the TEI Consortium by
Humanities Computing Unit, University of Oxford, Oxford, Providence, Charlottesville, Bergen,
2002.
Sperberg-McQueen, C. M. and Huitfeldt, C. GODDAG: A Data Structure for
Overlapping Hierarchies. In: King, P. and Munson, E. V. (eds.), Proceedings of
the 5th International Workshop on the Principles of Digital Document Processing (PODDP 2000),
volume 2023 of Lecture Notes in Computer Science, pages 139–160. Springer, 2004.
Strube, M. and Müller, C.
(2003). A machine learning approach to pronoun resolution in spoken
dialogue. In: ACL '03: Proceedings of the 41st Annual Meeting on Association for
Computational Linguistics. Morristown, NJ, USA : Association for Computational Linguistics,
2003, pages 168-175.
doi:10.3115/1075096.1075118.
Stührenberg, M.,
Goecke, D, Diewald, N., Cramer, I. and Mehler, A. Web-based annotation
of anaphoric relations and lexical chains. In: Proceedings of the Linguistic
Annotation Workshop (LAW), pages 140–147, Prague. Association for Computational Linguistics,
2007
Tennison, J. Layered Markup and Annotation Language (LMNL). In: Proceedings of
Extreme Markup Languages, Montréal, Québec, 2002.
Thompson, H. S. and
D. McKelvie. Hyperlink semantics for standoff markup of read-only
documents. In: Proceedings of SGML Europe ’97: The next decade – Pushing the
Envelope, pages 227–229, Barcelona, 1997.
Waltinger, U., Mehler,
A. Mehler, and Stührenberg, M. An Integrated Model of Lexical Chaining:
Application, Resources and its Format. Accepted for Proceedings of Konvens 2008.
Witt, A. Meaning
and interpretation of concurrent markup. In: Proceedings of ALLC-ACH2002, Joint
Conference of the ALLC and ACH, 2002.
Witt, A. Multiple
hierarchies: New Aspects of an Old Solution. In: Proceedings of Extreme Markup
Languages, 2004.
Witt, A., Goecke, D., Sasaki, F.,
and Lüngen, H. Unification of XML Documents with Concurrent
Markup. Literary and Lingustic Computing, 20(1): pages 103-116, 2005.
doi:10.1093/llc/fqh046.
Witt, A., Schonefeld, O., Rehm,
G., Khoo, J. and Evang, K. On the lossless transformation of
single-file, multi-layer annotations into multi-rooted trees. In: Proceedings of
Extreme Markup Languages, Montréal, Québec, 2007.
Yang, X., Su, J., Zhou, G. and Tan,
C. L. (2004). Improving pronoun resolution by incorporating
coreferential information of candidates. In: Proceedings of the 42nd Annual
Meeting of the Association for Computational Linguistics (ACL04). Barcelona, Spain,
2004.
doi:10.3115/1218955.1218972.
SGF - An integrated model for multiple annotations and its application in a linguistic
domain
Maik Stührenberg
Daniela Goecke
Abstract
Seamless integration of various, often heterogeneous linguistic resources (in terms of
their output formats) and merging of the respective annotation layers are crucial tasks for
linguistic research. After a decade of concentration on the development of formats in order
to structure single annotations for specific linguistic issues, a variety of specifications
to store multiple annotations over the same primary data has been developed in the last
years. Among these approaches three main architectures can be identified: Prolog-based
architectures, XML-related approaches and graph-based models that follow the XML syntax.
However, these architectures are not free of disadvantages when used in real world
applications. In the Sekimo project the XML-based Sekimo Generic Format (SGF) was developed for the purpose of
storing multiple annotations on the same primary data and examine relationships between
elements of different annotation layers without prepended conversion. SGF is based on the
design principles of graph-based approaches but makes use of the XML-inherent tree
structures whenever possible to reduce processing costs. Analysing data stored in SGF can be
done via standard XML-related specifications such as XPath, XSLT or XQuery and is done in
our project in the linguistic application domain of anaphora resolution.
SGF - An integrated model for multiple annotations and its application in a linguistic
domain
Balisage: The Markup Conference 2008
August 12 - 15, 2008
The materials listed below were provided by the speaker as supplements to a
presentation at Balisage. These materials may include the slides or visuals used in the
presentation; supplementary material, such as code samples or a demonstration application;
and/or the paper accompanying the presentation (if it has not been provided in XML). These
materials have been zipped for easy download and are identified by a brief description of
the contents. The materials themselves are untouched
, that is, they
have not been tested or edited by Balisage: The Markup Conference or by Mulberry
Technologies, Inc. As such, they are included on this website AS IS
,
i.e., as provided by the speaker, with no warranties, express or otherwise, made by Balisage
or Mulberry.
Slides and Materials
Author's keywords for this paper: Concurrent Markup