An XML engine to model and query multimodal concurrent linguistic annotations

Application to the OTIM Project

Julien Seinturier

Post-Doctorate

Laboratoire LSIS UMR CNRS 6168, Université du Sud Toulon et Var

Elisabeth Murisasco

Professor

Laboratoire LSIS UMR CNRS 6168, Université du Sud Toulon et Var

Emmanuel Bruno

Lecturer

Laboratoire LSIS UMR CNRS 6168, Université du Sud Toulon et Var

Copyright © 2011 by the authors. Used with permission.

expand Abstract

expand Julien Seinturier

expand Elisabeth Murisasco

expand Emmanuel Bruno

Balisage logo

Proceedings

expand How to cite this paper

An XML engine to model and query multimodal concurrent linguistic annotations

Application to the OTIM Project

Balisage: The Markup Conference 2011
August 2 - 5, 2011

Introduction

In this paper, our intention is to present an XML engine defined to model and query multimodal concurrent annotated data. This work stands in the context of the OTIM (Tools for Multimodal Annotation) project which aims at developing conventions and tools for multimodal annotation of a large conversational French speech corpus (http://aune.lpl.univ-aix.fr/~otim/) [[Blache 2010b]]. This interdisciplinary project is funded by the French ANR agency; it groups together Social Science and Computer Science researchers. Within OTIM, our objective is to provide linguists with a unique framework to encode and manipulate numerous linguistic domains: morpho-syntax, prosody, phonetics, disfluencies, discourse, gesture and posture [[Blache 2010]]. For that, it has to be possible to bring together and align all the different pieces of information (called annotations) associated to a corpus.

Since some years, several works have studied concurrent markups/annotations associated to the same data, in particular in the context of XML documents. These documents are usually called multistructured. Indeed, XML documents are mainly hierarchical. The hierarchy, captured in a tree-like structure corresponds to one level of analysis of the data contained in the document. Concurrent markup opens a way for dynamically linking and aggregating documents with different structures associated to the same data. The CONCUR feature of SGML [[Goldfarb 1990]] first pointed out this need in the nineties in context of document-centric encoding where some applications needed to consider more than one hierarchy over the same text.

The main problem with concurrent structures is that merging every hierarchy in an unique XML document implies overlapping: structures cannot be merged in order to get a well-formed XML document without using a flat representation or hyperlinks that make the structure difficult to query with standards XML languages like XPath (http://www.w3.org/TR/xpath/), XQuery (http://www.w3.org/TR/xquery/) or XSLT (http://www.w3.org/TR/xslt/). See [[Witt 2004]][[DeRose 2004]] for a review about the problem of overlapping and about multiple hierarchies. Another approach is to keep one document by hierarchy and to solve the problem of the concurrent querying. We stand our work in this last approach: we want to keep each structure safe to use available XML tools and languages for its manipulation.

Main proposals are usually classified in three categories [[Bruno 2006]][[Portier 2010]].

Historical solutions which propose syntactic solutions for the representation of multiple hierarchies in the same text, like the CONCUR feature [[Hilbert 2005]] of SGML or TEI specifications [[TEI P4]] (e.g. a flat representation or a main (hierarchical) structure and the use references (ID/IDREF) for the description of the other structures). These solutions make impossible querying by means of standards XML languages.

Proprietary graph based-model (and possible alternative syntax to XML) like LNML [[Tennison 2002]], TexMecs[http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html], XCONCUR [http://www.xconcur.org/], Annotation graphs [[Bird 2001]] (coming from the linguistic domain coupled to specific extension of XPath) or MVDM [[Djemal 2008]]. They allow the concurrent markup relying on a granule of data common to all the structures. Nevertheless, hierarchical structures are easier to exploit compared to the graph structure which remain complex when a large number of structures exist. XML syntax for serialization or import/export XML syntax do not make easy their querying by XML standard languages.

Finally, the third category is XML compatible contributions; they generally propose extensions of XML data model (XDM) (http://www.w3.org/XML/Datamodel.html) to consider a set of XML trees [[Le Maitre 2006]][[Jagadish et al. 2004]] or several trees sharing their leaves [[Dekhtyar 2005]][[Bruno 2007]][[Bruno 2007b]] [[Chatti 2007]]. For querying, these proposals define extensions of XPath or XQuery in order to navigate between different structures either by extending the notion of step in XPath [[Jagadish et al. 2004]] or by adding new axis [[Dekhtyar 2005]]. Another solution is to extend the semantics of the XQuery filter [[Bruno 2006]] to use as much as possible an unchanged XQuery structure. Adding Allen’s relations [[Allen 1991]] by means of function definitions enable to deal with overlapping [[Bruno 2006]][[Chatti 2007]].

Besides, some linguistic projects have a similar objective than OTIM, for instance NITE (http://groups.inf.ed.ac.uk/nxt/), AGTK (http://weblex.ens-lsh.fr/projects/xitools/logiciels/AGTK/agtk.htm), ATLAS [[Bird et al. 2000]], PAULA (http://www.sfb632.uni-potsdam.de/~d1/paula/doc/), XStandoff [[Sperberg-McQueen 2000]]. These projects rely on graph-based model. They generally propose toolkits for multi-level annotation by means of libraries of data and annotation management. The multiplication of annotation schemes and coding formats is a severe limitation for interoperability. One solution consists in developing higher level approaches (e.g. GrAF [[Ide 2007]]) or annotation graphs based formats on top of which conversion routines between tools can be developed (see the Atlas Interchange Format [[Sthrenberg 2009]]). However, these experiments still remain very programmatic.

Interoperability of linguistic annotated resources requires over all to be independent from the coding format. This means to specify and organize the information to be encoded independently from the constraints or restriction of the format (or the annotation tool), then to encode the information into an standard XML format, readable whatever the edition or annotation system. Moreover we have made the choice not to provide new anootation tools for linguists as they have a long experience and they are efficient with their own tools.

This paper is organised as follows: the context and motivations are presented, then we describe the project objectives and its functional architecture. A visual UML representation of the linguistic knowledge is proposed before the OTIM XML representation and Java framework are explicited. The end of the work shows the methods for querying and managing linguistic annotations within the XML engine and the implementation of the whole corpus construction and management before a conclusion.

Context and motivations

The OTIM project can be summarized in two main steps.

The first step concerns the multimodal annotation of a conversational speech between two persons. It is under the responsibility of linguists; annotation is done according to different levels of linguistic analysis. Each expert has to annotate the same data flow according to its knowledge domain and the nature of the signal on which he annotates (signal transcription or signal). Experts generally use dedicated tools like PRAAT http://www.fon.hum.uva.nl/praat/, ANVIL http://www.anvil-software.de/ or ELAN http://www.lat-mpi.eu/tools/elan/. The qualifier multimodal is due to the nature of the studied corpus which is composed of text, sound, video.

Within the project OTIM, linguists propose an encoding for annotating spoken language data, with the acoustic signal as well as its orthographic transcription. They have chosen to use Typed Feature Structures [[Carpenter 1992]][[Copestake 2003]] (TFS) to represent in an unified view the knowledge and the information they need for annotation. TFS representation is usual for linguists: it aims at normalizing, sharing and exchanging annotation schemas between experts. Linguistic annotation tools rely on native and not often open formats which are not directly interoperable. TFS provides an abstract description using a high level formalism independent from coding languages and tools.

The second step concerns the representation and manipulation of multimodal annotation. To analyze and find correlations between annotated linguistic domains, it is necessary to consider them grouped together: it implies the definition of a formal model for describing and manipulating them in a concurrent way. The main difficulty in defining a data model comes from the heterogeneity and the distribution of the resources. Concurrent manipulation consists in querying annotations belonging to two or more modalities or in querying the relationships between modalities. For instance, we need to be able to express queries over gestures and intonation contours (what kind of intonation contour does the speaker use when he looks at the listener?) and to query temporal relationships (in terms of anticipation, synchronization or delay) between both gesture strokes and lexical affiliates. The results of queries could be useful to help in constructing new annotations or to extend existing ones.

The paper focuses on the second step. It describes an XML engine based on an architecture dedicated to the construction and the exploitation of multimodal annotated linguistic corpus. Our theoretical standpoint being to share data and resources, we will use open standards from the XML universe (see http://www.w3.org).

Linguistic knowledge is captured by means of three types of information:

  • Properties: the set of characteristics of an object. An object is a type of information to be annotated in the corpus ,

  • Relations: the set of relations that an object has with other objects,

  • Constituents: complex objects are composed of other objects called constituents.

Typed Feature Structures proposes a formal presentation of each object in terms of feature structures and type hierarchies: properties are encoded by features, constituency is implemented with complex features, and relations make use feature structure indexing; each linguistic domain is represented as a hierarchical model.

Figure 1: TFS of the prosodic domain

png image ../../../vol7/graphics/Seinturier01/Seinturier01-001.png

TFS description of the prosodic domain within the OTIM project.

For example, Figure 1 graphically describes TFS representation of the prosodic domain. Notice that every feature of the domain related to signal is a sub-feature of the OtimObject that is constituted of an INDEX feature in order to be referred and a LOCALISATION feature that represents an interval, which boundaries are defined by the features START and END, with temporal value (usually milliseconds). Prosodic phrases are of two different types: ap (accentual phrases) and ip (intonation phrases). Accentual phrases are constituted of two appropriate features: the LABEL, which value is simply the name of the corresponding type, and the list of CONSTITUENTS, in this case a list of syllables. The features of type ip contain the list of its CONSTITUENTS (a set of ap) as well as the description of its CONTOUR which is a prosodic event, situated at the end of the ip and is usually associated to an ap. The prosodic phrases are formally defined as set of syllables. A syllable (syl) is constituted of features: STRUCT that describes the syllable structure (for example CVC, CCVC, etc.), the position of the syllable in the word (POSITION), its possibility to be accented or prominent (resp. ACCENTUABLE, PROMINENCE). Features of type const_syl, contains two different features: a set of phonemes, denoted PHON, and the type of the constituent (onset, nucleus and coda), denoted CONST_TYPE. Note that each syllable constituent can contain a set of phonemes.

Objectives and functional architecture

TFS is well suited to take into account the heterogeneous characteristics of annotated data. TFS provides an abstract description using a high level formalism independent from coding languages and tools. Nevertheless, due to its theoretical nature, such a representation cannot be used within an applicative framework and has to be implemented into other formalisms.

In this context, our contributions are the following:

  • The representation of the knowledge expressed by means of TFS in an XML formalism. We base our proposal on XDM (XML data model, http://www.w3.org/XML/Datamodel.html),

  • The automatic construction of XML annotated multimodal linguistic corpus. We define an operational multimodal data processing from TFS and (semi) automatic procedures to convert legacy data and annotations to the XML formalism,

  • The concurrent querying of multilevel annotated linguistic data represented in the XML formalism. We use the XML Query Language XQuery that we have extended to multistructured XML documents [[Bruno 2006]], in particular in order to take into account Allen’s temporal relations.

In summary, from the organization of annotations in terms of TFS, we automatically generate a XML schema [http://www.w3.org/XML/Schema] for each linguistic domain. All the annotations are then encoded following this schema and data are represented in standard XML document. This representation provides a high level of modularity for applicative requirements and enables to modify only the structure / schema needed and it can be queried by our XQuery extension which supports Allen’s relations. Notice that the TEI consortium has proposed guidelines for implementing feature structures in XML (http://www.tei-c.org/Guidelines/P5/) however, we have decided not to encode TFS this way because type hierarchies and inheritance are not easily and directly represented and no querying support has be defined. This engine, from TFS to XML data, based on formalisms independent from linguistic coding languages and tools is an element of answer to the question of interoperability.

Figure 2: The OTIM Framework

png image ../../../vol7/graphics/Seinturier01/Seinturier01-002.png

The OTIM project applicative framework.

Figure 2 shows the functional architecture of our XML engine. It is composed of three parts:
  • Linguistics tools and formalisms chosen by linguists in the OTIM project,

  • An XML information system dedicated to the representation and querying of XML annotated multimodal corpus,

  • A Java API that interfaces with both XML information system and linguistic specific annotation tools.

Our approach guarantees that no information is lost when translating one format into the target formalism. However, the Java / XML framework does not provide linguists with a visual representation equivalent to the graphical representation of TFS. Therefore, we have chosen to make a preliminary work within the representation process that leads to XML / Java. This work is the representations of TFS by means of UML diagrams. This representation has two advantages: first it is standard, secondly there is many friendly and ergonomic tools that enable to build UML diagrams like Omondo (http://www.omondo.com/) or Poseidon (http://www.gentleware.com/).

UML Representation

As we have already said, TEI consortium proposes guidelines for implementing TFS in XML (http://www.tei-c.org/Guidelines/P5/). However, OTIM consortium has decided not to encode TFS this way for several reasons [[Blache 2010]]. First, there is a need of representation for type hierarchies and inheritance. Secondly, it is not realistic, or even possible, to encode all information by means of a TFS, that rapidly becomes huge and intractable. This last argument is important in the perspective of interoperability: annotation tools mainly focus on properties annotation (the encoding of object characteristics). Only some of them propose solution for constituency representation (e.g. in terms of primary/secondary tracks). None implement typing machinery. We think then preferable a decentralized representation in which objects are represented separately, their organization in hierarchy being encoded independently from their properties. Such an encoding offers the advantage to be close to the traditional way of encoding annotations without losing the richness of TFS representation.

OTIM UML representation relies on two different views. Static one describes the static structure in terms of objects, attributes, operations and relationships. This view includes class diagrams. Dynamic view emphasizes the dynamic behavior of a system. One of its advantages is that the language includes a set of graphical notation techniques to create and share intuitive visual models. We can represent a TFS description by a set of UML class diagrams by means of the following mapping:

  • to each complex TFS corresponds a class,

  • to each TFS atomic attribute corresponds a class attribute,

  • the inheritance relationship defined on the TFS is represented by an inheritance relationship between classes,

  • the constituency relationships between TFS are represented by aggregation relationships between classes,

Figure 3: The phonetics domain

png image ../../../vol7/graphics/Seinturier01/Seinturier01-003.png

UML representation of phonetics domain.

Figure 3 shows the UML representation of phonetics and prosody domains. This graphical representation provides a global standard view of the two domains and a suitable way for experts to share their knowledge.

XML / Java representation

From TFS to XSchema

XML representation of the knowledge and data within the OTIM framework relies on XSchema. We generate an XSchema for each linguistic domain (for example phonetics and prosody). Data are then represented in different documents validated by the associated XSchemas. This method is motivated by two important points already said:

  • this representation provides a high level of modularity for applicative requirements and enables to modify only the structure / schema needed

  • recents works on multistructured documents open a way for dynamically linking and aggregating various documents with different structure [[Bruno 2006]].

Within such a framework, it is possible to deal with a TFS-like XML representation and to process data as standard XML documents.

For the sake of simplicity, we call OTIM XML our set of XML schemas constructed from TFS. From a linguistic point of view, this set can be seen as the third component mentioned as a perspective in [[Schmidt et al. 2009]]. This component, besides a basic encoding of data exportable into linguistic standard AIF (Atlas Interchange Format) [1], encodes every information concerning the organization as well as the constraints on the structures. In the same way, as TFS are used as a tree description language in theories such as HPSG (Head-Driven Phrase Structure Grammar, http://hpsg.stanford.edu/), the XML schema generated from our TFS representation also plays the same role with respect to the XML annotation data file. On the one hand, basic data can be encoded with AIF, on the other hand, the XML schema encodes all higher level information. Both components (basic data + structural constraints) guarantee against information lost that otherwise occurs when translating from one coding format to another (for example from Anvil to Praat).

The creation of the XSchema from the linguistic knowledge is done from the TFS or the UML representation if it is available. The first step consists in representing all the information available within TFS. Each feature of a TFS appears in the associated XSchema:

  • feature types encoding atomic type are represented with a simple type (integer, string, ...),

  • composite feature types are represented with complex types.

From type definitions, a feature is represented by an XML element. The two relations of the TFS formalism (hierarchy and constituency) are represented respectively by type hierarchy and element aggregation. This representation guarantees that every information described within OTIM XML formalism can be represented by XML documents. Appendix A shows the complete XSchema of the prosodic domain. It shows all required components for annotating the prosody of a dialog within the OTIM project.

The choice of constructing one XSchema by linguistic domain is motivated by two reasons. The OTIM project crosses several domains but a significant part of the data processing is made within each domain. Independence between schemas implies that expert only has to work with XML documents containing the needed information. Moreover, as the knowledge on the domains evolves, a modular approach of the formal representation enables to be more reactive and efficient for updating or revising the XSchemas according to the TFS updating.

From raw annotations to XML documents

XSchemas enable to represent the knowledge of the linguistic domains and to describe the complex structure of data but linguists work with specific tools (like PRAAT or ELAN) that provide data as raw text files or specific XML documents that are not valid with our schemas. Generating valid OTIM XML documents relies on a transformation from the outputs of linguistic tools to the OTIM XML representation.

Documents generation relies on the expression of the link between the data present within the annotation tool outputs and the value of the elements of the OTIM XML documents. This link is called Annotation Scheme and describes what kind of information is contained within raw annotation files and what element is attached to within the OTIM XML document. The annotation scheme also describes good practices to experts for a consistent and repeatable annotation. The construction of a set of valid XML documents is made by processing the raw annotation files with the annotation scheme.

A Java framework for managing linguistic data

The OTIM project aims to fully annotate an 8 hours long corpus of recorded dialog. The raw annotation outputs from linguists' specific tools have to be processed in an efficient way with respect to the OTIM XML description proposed. As linguists are very familiar with their annotation tools (PRAAT, ANVIL, ...), it was not an option to provide new tools from scratch to directly make annotations within OTIM XML representation. Moreover, the diversity of the tools would make difficult to provide and maintain extensions for them in order to conform with the OTIM XML. A solution for providing data processing capability that can deal with both raw annotation files and OTIM XML document is an applicative framework composed of specific modules.

We made the choice of developing a Java framework for OTIM that is composed of two main parts (see Figure 2):

  • an implementation of the OTIM XSchemas within Object paradigm,

  • a set of modules that can interface with both raw annotation files and XML documents.

The first part of the framework is distributed as OTIM API and enables to deal with linguistic annotations with respect to OTIM XSchemas (and also TFS). The link between XML schemas and Java implementation is guaranteed by using the Java Architecture For XML Binding (JAXB) [2]. This technology provides method for automatically generating Java classes from a set of XSchemas. Using JAXB makes implicit and always consistent the link between XSchemas and their Java representation. When a schema is modified, the associated Java classes are automatically regenerated. In most case, a minor change within XSchemas only involves an automatic update of the Java classes (changing an element name, changing a value, ...). When the changes are important: the structure of an XSchema or the annotation scheme are modified, the update of the Java classes need to be manually performed as implementation can be heavily modified. This can be seen as a limit but this kind of changes are rare in a standardidation context.

The second part of the framework is a set of I/O modules that provide data exchange capabilities between OTIM XML documents and raw annotations files. The most important work of these modules is to generate OTIM XML documents from raw annotations. This work is done by parsing annotations and interpreting them against the annotation scheme. Another important need is to reinsert annotations that are expressed within OTIM XML format into a specific annotation tool. The I/O modules enable to export from XML documents to PRAAT or ANVIL tools and so, linguists can see annotations performed by other teams on other domains within their own tools. This capability can be seen as a partial response to interoperability between different annotation tools and enable to make a full pipeline from the annotation step to information system.

Querying and managing annotations in XML

The OTIM project aims to provide a fully annotated corpus represented in XML which can be queried by the standard XQuery language. Annotations can be used in various ways: simple viewing, domain crossing or reinsertion within standard tools. Every need of access to annotations relies on a query on the corpus. We first present general queries on the corpus and then the OTIM specific queries related to time and temporal relations between annotations. These last queries need an extention of XQuery with temporal capabilities.

General queries

Queries are classified into two groups: those involving only a single domain and those taking into account several domains. In addition, for each group we distinguish between filtering queries and queries requiring the construction of new XML documents.

Filtering queries enables to extract a subset of data in the corpus. Filtering on a single domain produces a subset of the involved OTIM XML documents. For example, the following query filters the phonemes from the phonetic annotations (described within the document phonetics.xml) by getting only those that are tagged with the label "A":

for $phoneme in fn:doc("phonetics.xml")/SyntacticPhrase/words/word/phonemes/phoneme[@label="A"]
return $phoneme
Such a quite simple query is very important for linguists as it enables to target only annotations with specific characteristics. One another example of query is the filtering of phonemes according to their temporal anchoring. The following query extracts each phoneme that verifies a condition on its temporal bounds:
for $phoneme in fn:doc("phonetics.xml")/SyntacticPhrase/words/word/phonemes/phoneme
where ($phoneme/TimeInterval/start/@time > 250) and ($phoneme/TimeInterval/end/@time < 500) 
return $phoneme
Filtered fragments can be reorganised in the return clause specifying the structure of the result to match linguistic specific needs. For example, the following query builds an empty element A (searched label) for each filtered phoneme computing its duration:
for $phoneme in fn:doc("phonetics.xml")/SyntacticPhrase/words/word/phonemes/phoneme
where ($phoneme/TimeInterval/start/@time > 250) and ($phoneme/TimeInterval/end/@time < 500) 
return <A duration='{$phoneme/TimeInterval/end/@time - $phoneme/TimeInterval/start/@time}'/>
   
Finally, the following query shows an example of concurent querying based on phonetics and prosody domains. For each accentual phrase (ap) with at least 6 syllables the query builds an element phoneme which is the first phoneme of the phrase as their share same start time (see inner where clause).
for $ap in fn:doc("prosody-IP_AP.xml")/TurnConversationalUnit/phrases/ip/constituents/ap
where count($ap/syllables/syllable) > 6
return 
  <phoneme id='{$ap/@id}'>{$ap/label}
  {for $phoneme in fn:doc("phonemes.xml")/phonemes/phoneme
   where $phoneme/TimeInterval/start/@time = $ap/TimeInterval/start/@time return $phoneme}
</phoneme>
   

Time related queries

Among the queries shown above, some take into account the notion of time. For a corpus of annotations aligned with a signal such as video, time is indeed the dimension with which the studied objects are related. However, even if the time can be taken in account in an absolute way without extending XQuery, the use of temporal relation (such as relations described within Allen algebra) cannot be expressed. A relative expression of time position (before, after, during, ...) is a critical need when linguists process the corpus for determining some complex object (before, meets, overlaps, starts, during, finishes and equal and symetrics).

We have extended XQuery by adding functions that implement Allen's relations within the OTIM XML representation. By inheritance, every object description contains the following structure that enables temporal anchoring:

<xs:complexType name="timeInterval">
  <xs:sequence>
    <xs:element name="start" type="timePoint" minOccurs="0"/>
    <xs:element name="end" type="timePoint" minOccurs="0"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="timePoint">
  <xs:sequence/>
    <xs:attribute name="time" type="xs:double" use="required"/>
</xs:complexType>
	
<xs:complexType name="Object">
  <xs:sequence>
    <xs:element name="index" type="xs:integer"/>
    <xs:choice minOccurs="0">
      <xs:element ref="TimeInterval"/>
      <xs:element ref="TimePoint"/>
    </xs:choice>
  </xs:sequence>
</xs:complexType>

Allen's function are binaries boolean functions that take in parameter a TimeInterval or a TimePoint XML fragment and that return true if the given relation is verified or false otherwise.

Extending XQuery with specific time functions for OTIM XML representation is a solution that satisfies linguists need for expressiveness. However, due to TFS initial representation, OTIM XML representation is not optimized for processing queries that are significantly based on time. Their use cannot be made ​​on an entire corpus. Among the possible solutions to solve this problem, an explicit representation of time relationships in separate XML documents or an ontological approach (OWL) can be considered.

Implementation

The OTIM framework is implemented within a Java / XML framework. The JAXB technology provides a strong and consistent link between XML Schema and Java representations. The XML management and querying is hosted within one eXist (http://exist.sourceforge.net/) server for web application or within an embedded eXist module for a standalone utilization. At this time, 15 minutes of the corpus have been annotated for 4 domains (phonetics, prosody, syntax, disfluency). The size of an XML document representing the 15 minutes can vary from 150 up to 500 Mo.

Conclusion

In this paper, our intention was to describe an XML engine dedicated to the representation and the concurrent querying of multimodal linguistic annotations implying to analyze and to find correlations between annotated linguistic domains.

From the signal and the signal transcription of a conversational speech between two persons, experts annotate data according to different linguistics domains like prosody, phonetics, morpho-syntax, discourse or gesture/posture. From the TFS provided by linguists that describe the knowledge of the studied domains, from legacy data and annotations, we automatically build an XML corpus which can be queried by means of an extension of XQuery, in a concurrent way. A java framework has been provided and enables to interface with both linguistic tools and OTIM XML information system. With this framework, the problem of the interoperability is solved for all the involved tools (PRAAT, ANVIL, ELAN) and domains within the OTIM project. Moreover, our proposal is based on XML standards.

Nevertheless, we can notice that our XML solution cannot capture the semantics of linguistic knowledge. Moreover, TFS expressivity is limited, for example for temporal relations. Object anchoring is absolute: interval boundaries are represented by the features start and end, with temporal value (an object can also be situated by means of a point). It would be useful to have a relative anchoring. Another limit is due to the underlying model of TFS which is a Directed Acyclic Graph (DAG). When linguists need to annotate co-references or disfluencies which are organized around objects, it would be useful to have an object anchoring which is conflicting with the acyclic graph.

We currently work to propose a knowledge representation formalism which be an alternative to TFS: an ontological approach based on Description Logics [[Baader et al. 2003]] (DL) and on semantic web technologies for the development of a linguistic Knowledge-based Information System. We have already defined a linguistic ontology from the TFS provided by linguists for prosody, phonetics, lexical and disfluency domains. An applicative framework is under development, it is based on semantic web proposals such as OWL DL (Ontology Web Language5) for the representation of this ontology and SPARQL [[Prud’hommeaux 2007]] the querying language of semantic web for its manipulation.

Appendix A. XSchema of the Prosodic domain

<xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="Syllable" type="syllable"/>

  <xs:element name="TimeCompound" type="timeCompound"/>

  <xs:element name="TimeInterval" type="timeInterval"/>

  <xs:element name="TimePoint" type="timePoint"/>

  <xs:element name="TurnConversationalUnit" type="turnConversationalUnit"/>

  <xs:complexType name="turnConversationalUnit">
    <xs:complexContent>
      <xs:extension base="Object">
        <xs:sequence>
          <xs:element name="phrases" minOccurs="0">
            <xs:complexType>
              <xs:sequence>
                <xs:choice minOccurs="0" maxOccurs="unbounded">
                  <xs:element name="ip" type="IntonationalPhrase"/>
                  <xs:element name="ap" type="AccentualPhrase"/>
                  <xs:element name="pp" type="ProsodicPhrase"/>
                </xs:choice>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="Object">
    <xs:sequence>
      <xs:element name="index" type="xs:int"/>
      <xs:choice minOccurs="0">
        <xs:element ref="TimeInterval"/>
        <xs:element ref="TimePoint"/>
        <xs:element ref="TimeCompound"/>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="IntonationalPhrase">
    <xs:complexContent>
      <xs:extension base="ProsodicPhrase">
        <xs:sequence>
          <xs:element name="constituents" minOccurs="0">
            <xs:complexType>
              <xs:sequence>
                <xs:element name="accentual_phrase" type="AccentualPhrase" minOccurs="0" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:complexType>
          </xs:element>
          <xs:element name="contour" type="Contour" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="ProsodicPhrase" abstract="true">
    <xs:complexContent>
      <xs:extension base="Object">
        <xs:sequence/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="AccentualPhrase">
    <xs:complexContent>
      <xs:extension base="ProsodicPhrase">
        <xs:sequence>
          <xs:element name="label" type="xs:string" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="timeInterval">
    <xs:sequence>
      <xs:element name="end" type="timePoint" minOccurs="0"/>
      <xs:element name="start" type="timePoint" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="timePoint">
    <xs:sequence/>
    <xs:attribute name="time" type="xs:double" use="required"/>
  </xs:complexType>

  <xs:complexType name="timeCompound">
    <xs:sequence>
      <xs:element name="TimeLocations" minOccurs="0">
        <xs:complexType>
          <xs:sequence>
            <xs:choice minOccurs="0" maxOccurs="unbounded">
              <xs:element name="Interval" type="timeInterval"/>
              <xs:element name="Point" type="timePoint"/>
            </xs:choice>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="Contour">
    <xs:sequence>
      <xs:element name="direction" type="ContourDirection" minOccurs="0"/>
      <xs:element name="function" type="ContourFunction" minOccurs="0"/>
      <xs:element name="position" type="ContourPosition" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="syllable">
    <xs:complexContent>
      <xs:extension base="Object">
        <xs:sequence>
          <xs:element name="accentuable" type="xs:boolean"/>
          <xs:element name="constituents" type="SyllableConstituent" nillable="true" minOccurs="0" maxOccurs="unbounded"/>
          <xs:element name="position" type="SyllablePosition" minOccurs="0"/>
          <xs:element name="prominence" type="xs:boolean"/>
          <xs:element name="struct" type="SyllableStruct" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="SyllableConstituent">
    <xs:sequence>
      <xs:element name="label" type="xs:string" minOccurs="0"/>
      <xs:element name="phon" type="Phoneme" nillable="true" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element name="type" type="SyllableConstituentType" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="Phoneme">
    <xs:complexContent>
      <xs:extension base="Object">
        <xs:sequence>
          <xs:element name="articulationGestures" type="ArticulationGestures" minOccurs="0"/>
          <xs:element name="category" type="PhonemeCategory" minOccurs="0"/>
          <xs:element name="label" type="xs:string" minOccurs="0"/>
          <xs:element name="phonemeType" type="phoneme_type" minOccurs="0"/>
          <xs:element name="role" type="Role" minOccurs="0"/>
          <xs:element name="sonority" type="PhonemeSonority" minOccurs="0"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="ArticulationGestures">
    <xs:sequence>
      <xs:element name="glottis" type="Aperture" minOccurs="0"/>
      <xs:element name="lip" type="Lip" minOccurs="0"/>
      <xs:element name="tongue" type="Tongue" minOccurs="0"/>
      <xs:element name="velum" type="Aperture" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="Lip">
    <xs:sequence>
      <xs:element name="aperture" type="Aperture" minOccurs="0"/>
      <xs:element name="protusion" type="xs:string" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="Tongue">
    <xs:sequence>
      <xs:element name="bodyConstriction" type="BodyConstriction" minOccurs="0"/>
      <xs:element name="tipConstriction" type="TipConstriction" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="BodyConstriction">
    <xs:sequence>
      <xs:element name="degree" type="xs:string" minOccurs="0"/>
      <xs:element name="location" type="BodyConstrictionLocation" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="TipConstriction">
    <xs:sequence>
      <xs:element name="degree" type="xs:string" minOccurs="0"/>
      <xs:element name="location" type="TipConstrictionLocation" minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="Role">
    <xs:sequence>
      <xs:element name="epenthetic" type="xs:boolean"/>
      <xs:element name="liaison" type="xs:boolean"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="SyllablePosition">
    <xs:sequence>
      <xs:element name="rank" type="xs:int"/>
      <xs:element name="syl_number" type="xs:int"/>
    </xs:sequence>
  </xs:complexType>

  <xs:simpleType name="ContourDirection">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Raising"/>
      <xs:enumeration value="Falling"/>
      <xs:enumeration value="Falling_raising"/>
      <xs:enumeration value="Raising_falling"/>
      <xs:enumeration value="Unspecified"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="ContourFunction">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Conclusive"/>
      <xs:enumeration value="NonConclusive"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="ContourPosition">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Final"/>
      <xs:enumeration value="Penultimate"/>
      <xs:enumeration value="Unspecified"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="Aperture">
    <xs:restriction base="xs:string">
      <xs:enumeration value="wide"/>
      <xs:enumeration value="narrow"/>
      <xs:enumeration value="closed"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="BodyConstrictionLocation">
    <xs:restriction base="xs:string">
      <xs:enumeration value="wide"/>
      <xs:enumeration value="narrow"/>
      <xs:enumeration value="closed"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="TipConstrictionLocation">
    <xs:restriction base="xs:string">
      <xs:enumeration value="alveolar"/>
      <xs:enumeration value="dental"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="PhonemeCategory">
    <xs:restriction base="xs:string">
      <xs:enumeration value="vowel"/>
      <xs:enumeration value="consonant"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="phoneme_type">
    <xs:restriction base="xs:string">
      <xs:enumeration value="occlusive"/>
      <xs:enumeration value="fricative"/>
      <xs:enumeration value="nasal"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="PhonemeSonority">
    <xs:restriction base="xs:string">
      <xs:enumeration value="voiced"/>
      <xs:enumeration value="voiceless"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="SyllableConstituentType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="Onset"/>
      <xs:enumeration value="Nucleus"/>
      <xs:enumeration value="Coda"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="SyllableStruct">
    <xs:restriction base="xs:string">
      <xs:enumeration value="V"/>
      <xs:enumeration value="CV"/>
      <xs:enumeration value="CCV"/>
      <xs:enumeration value="CVC"/>
      <xs:enumeration value="CCVC"/>
      <xs:enumeration value="CVCC"/>
      <xs:enumeration value="VC"/>
      <xs:enumeration value="CCCV"/>
      <xs:enumeration value="CCVCC"/>
      <xs:enumeration value="CCCVC"/>
      <xs:enumeration value="VCC"/>
      <xs:enumeration value="OTHER"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

	

Bibliography

[Bruno 2007b] Bruno E., Calabretto S., Murisasco E. Documents textuels multistructurés : un état de l’art, Revue en Sciences du Traitement de l'Information in I3 (Information - Interaction – Intelligence) , Vol. 7 (1). 2007

[Bruno 2007] Bruno, E. and Murisasco, E. An xml environment for multistructured textual documents. in Proceedings of the Second International Conference on Digital Information Management (ICDIM’07), pages 230–235. Lyon, France, October 2007.

[Bruno 2006] Bruno, E. and Murisasco, E. Describing and querying hierarchical structures defined over the same textual data in Proceedings of the ACM Symposium on Document Engineering (DocEng 2006), pages 147–154. Amsterdam, The Netherlands, October 2006

[Le Maitre 2006] Le Maitre, J. Describing multistructured xml documents by means of delay nodes. in Proceedings of the 2006 ACM symposium on Document engineering (DocEng ’06), pp 155–164. Amsterdam, The Netherlands, oct 2006.

[Allen 1991] Allen, J.Time and time again : The many ways represent time in International Journal of Intelligent Systems 6(4):341–355, july 1991

[Baader et al. 2003] Baader F., Calvanese D., McGuinness D.L., Nardi D., P. F., Patel-Schneider P.F. The Description Logic Handbook: Theory, Implementation, Applications. Cambridge University Press, Cambridge, UK, 2003. ISBN 0-521-78176-0

[Portier 2010] Portier, P.-E. and Calabretto S. Multi-structured documents and the emergence of annotations vocabularies. in Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5. Montréal, Canada, August 3 - 6, 2010. doi:10.4242/BalisageVol5.Portier01.

[Bird 2001] Bird S. and Liberman M. A formal framework for linguistic annotation. in Speech Communication 33(1,2), pages 23–60. september 2001.

[Bird et al. 2000] Bird, S., Day, D., Garofolo, J., Henderson, J., Laprun, C., and Liberman, M. ATLAS: A flexible and extensible architecture for linguistic annotation 2000

[Ide 2007] Ide, N., Suderman, K. GrAF: A Graph-based Format for Linguistic Annotations in Proceedings of the Linguistic Annotation Workshop LAW'07. 2007

[Dekhtyar 2005] Dekhtyar A. and Iacob I.E. A framework for management of concurrent xml markup. in Data and Knowledge Engineering, 52(2):185–208, 2005

[Chatti 2007] Chatti, N., Kaouk, S., Calabretto, S. and Pinon J.M.. Multix: an xml based formalism to encode multi-structured documents. in Proceedings of Extreme Markup Languages Conference. August 6-10 2007.

[Djemal 2008] Djemal K., Soule-Dupuy and Valles-Parlangeau, C. Modeling and exploitation of multistructured documents. in Proceedings of the IEEE 3rd International Conference on Information and Communication Technologies: From Theory to Applications (ICTTA’ 08). Damascus, Syria, April 2008.

[Goldfarb 1990] Goldfarb, C.-F. and Rubinsky Y. The SGML handbook. Clarendon Press, Oxford, 1990.

[Hilbert 2005] Hilbert M., Schonefeld O. and Witt A. Making concur work. in Proceedings of The Extreme Markup Languages Conference. August 2005.

[DeRose 2004] DeRose, S. Markup overlap : a review and a horse. in In Proceedings of The Extreme markup language Conference. 2004

[Jagadish et al. 2004] Jagadish, H.-V., Lakshmanan, L.-V.-S., Scannapieco, M., Srivastava, D. and Wiwatwattana N. Colorful XML: One Hierarchy Isn’t Enough. in Proceedings of The International Conference on Management of Data (SIGMOD’04), pages 251–262. 2004

[TEI P4] Sperberg-McQueen, C.-M. and Burnard, L. Tei p4 guidelines for electronic text encoding and interchange. 2001

[Sperberg-McQueen 2000] Sperberg-McQueen, C-M and Huitfeldt, C. Goddag: A data structure for overlapping hierarchies. in Proceedings of The Principles of Digital Document and electronic publishing (DDEP/PODDP’00), pages 139–160. 2000

[Tennison 2002] Tennison, J. and Wendell, P. Layered markup and annotation language (lmnl) in Proceedongs of The Extreme Markup Languages Conference. 2002

[Witt 2004] Witt, A. Multiple hierarchies : news aspects of an old solution. in Proceedings of The Extreme markup language Conference. 2004

[Blache 2010b] Blache, P., Bertrand, R., Bigi, B., Bruno, E., Cela, E., Esperrer, R., Ferre, G., Guardiola, M., Hirst, D., Magro, E., JC, M., C, M., Ma, M., Murisasco, E., I, N., P, N., B, P., Laurent, P., J, P.-V., Seinturier, J., N, T., Marion, T., and Stephane, R. Multimodal annotation of conversational data. in In Proceedings of the fourth linguistic annotation workwhop (LAW), pages 186–191. Assocation for computational Lunguistics (ACL). 2010

[Copestake 2003] Copestake, A. Definitions of Typed Feature Structures. in Collaborative Language Engineering: A Case Study in Efficient Grammar-based Processing CSLI Publications, Ventura Hall, Stanford University, Stanford, CA 94305-4115, 2003.

[Prud’hommeaux 2007] Prud’hommeaux, E. and Seaborne, A. Sparql query language for rdf (working draft) Technical report W3C. http://www.w3.org/2009/sparql/wiki/Main_Page

[Schmidt et al. 2009] Schmidt, T., Duncan, S., Ehmer, O., Hoyt, J., Kipp, M., Loehr, D., Magnusson, M., Rose, T., and Sloetjes, H. chapter An exchange format for multimodal annotations in Multimodal corpora. pages 207–221. Springer-Verlag, Berlin, Heidelberg, 2009

[Sthrenberg 2009] Sthrenberg, M. and Jettka, D.A toolkit for multidimensional markup - the development of sgf to xstandoff in Proceedings of Balisage: The Markup Conference 2009 Assocation for computational Lunguistics(ACL), Balisage Series on Markup Technologies, vol. 3. http://www.xstandoff.net/. doi:10.4242/BalisageVol3.Stuhrenberg01.

[Blache 2010] Blache, Philippe and Bigi, Brigitte and Prévot, Laurent and Rauzy, Stéphane and Seinturier, JulienAnnotation schemes, annotation tools and the question of interoperability: from Typed Feature Structures to XML Schemas in Proceedings of the 2nd International Conférence on Global Interoperability for Language Resources (compiled under Act of Congress of 18-20 January, 2010). City University of Hong Kong http://www.yale.edu/lawweb/avalon/raleigh.htm.

[Carpenter 1992] Carpenter, R. L. The Logic of Typed Feature Structures. in volume 32 of Cambridge Tracts in Theoretical Computer Science Cambridge University Press, The Edinburgh Building, Shaftesbury Road, Cambridge CB2 8RU, United Kingdom. 1992



[2] http://www.oracle.com/technetwork/articles/javase/index-140168.html

Author's keywords for this paper: Knowledge Engineering; XML Information System; Application