SGF - An integrated model for multiple annotations and its application in a linguistic domain
Seamless integration of various, often heterogeneous linguistic resources (in terms of their output formats) and merging of the respective annotation layers are crucial tasks for linguistic research. After a decade of concentration on the development of formats in order to structure single annotations for specific linguistic issues, a variety of specifications to store multiple annotations over the same primary data has been developed in the last years. Among these approaches three main architectures can be identified: Prolog-based architectures, XML-related approaches and graph-based models that follow the XML syntax. However, these architectures are not free of disadvantages when used in real world applications. In the Sekimo project the XML-based Sekimo Generic Format (SGF) was developed for the purpose of storing multiple annotations on the same primary data and examine relationships between elements of different annotation layers without prepended conversion. SGF is based on the design principles of graph-based approaches but makes use of the XML-inherent tree structures whenever possible to reduce processing costs. Analysing data stored in SGF can be done via standard XML-related specifications such as XPath, XSLT or XQuery and is done in our project in the linguistic application domain of anaphora resolution.