The Text Encoding Initiative (TEI) began as an international research project in 1987, with the goal of creating guidelines for the representation of texts in digital form. While these guidelines are still its main focus today, the TEI has since evolved into a non-profit consortium with numerous members and an elected body of individuals (the Technical Council) who maintain and expand the Guidelines in response to the needs of the community. TEI’s broad mission of representing “text” has resonated in particular with the academic community, libraries, and cultural heritage institutions, who have widely applied the TEI—and consequently shaped it—as an instrument for online research, teaching, and preservation.
In 2007, the TEI released version P5 of the Guidelines and with it introduced a complete revision of ODD, or One Document Does-it-all, the system for its documentation, schema generation, and customization. This system makes use of literate programming principles in order to keep the documentation, grammar, and constraints rules of the TEI all together in the same TEI XML document. To achieve this, the large documentation text of the Guidelines is encoded in TEI and is peppered with references to formal declarations of elements, attributes, modules, and classes. These formal declarations are themselves expressed using TEI elements, which allows the TEI’s processing tools to generate both human-readable documentation and schemas in a variety of formats. These elements, described in Chapter 22 of the Guidelines, can be used both to define and to customize the various components of the TEI; this allows users to define and document customizations, and to generate human-readable and machine-readable output.
Customizing the TEI is an essential step in the creation of a TEI project: the specification is very large and using it all at once is discouraged. Indeed, the TEI offers customization “exemplars” to users, including TEI Lite, TEI for Manuscript Description, and jTEI (a customization for articles for the Journal of the TEI). Researchers using the TEI are recommended (most often via workshops and other training sessions) to let their research questions drive their customization design and create a subset that most closely addresses their needs. Besides selecting a subset, customization in ODD allows encoders to add constraints (such as limiting open attribute values) and to introduce extensive prose documentation tightly coupled with formal declarations.
What is in an ODD
Because ODD is used to both define and customize a markup vocabulary, it takes two
files to tango: one to define the vocabulary (e.g. the whole of the TEI) and one to
customize it (e.g. TEI Lite). Both kinds of operation are performed with the same
element set, but a
@mode attribute determines whether something is being
added, changed only in part, replaced, or explicitly removed. The absence of
@mode means that something new is being declared. The following
subsections introduce what is in an ODD by way of a brief introduction to some of
As a TEI document, an ODD can contain extensive prose describing either a new
markup language or a customization. This human-readable documentation is typically
contained within the TEI
<text> element, and can make use of the
standard TEI elements including those for divisions, headings, paragraphs, and
snippets of computer code.
A schema specification (or more)
Specification and customization elements are contained by
<schemaSpec>, on which a number of top-level options can be
set, such as the schema name, language, namespace, and the possible root or
<schemaSpec ident="myTEI" start="TEI" ns="http://tei-c.org/ns/1.0"> <!-- specification and customization elements go here --> </schemaSpec>
Specifications: a brief overview
The specifications introduced below are defined and referenced using ODD elements
that share a similar structure. First, the names of these elements are formed by a
term plus “Spec”, such as
<classSpec>. Elements that express references to these
specifications end in “Ref”, such as
<classRef>. Other shared features include the
ident attribute to indicate the name of the object being specified
or referred to, and documentation elements for providing descriptions and usage
<*Spec ident="name"> <gloss>An expansion of the name, if necessary</gloss> <desc>A description of this specification</desc> <!-- definitions depending on the type of specification --> <exemplum> <!-- Examples of usage --> </exemplum> <remarks> <!--Any further notes or comments about this specification--> </remarks> </*Spec>
A module provides a name for a set of other formal declarations, which other specifications will use to indicate their membership to the module and that module alone (specifications can only belong to one module).
<moduleSpec ident="namesdates"> <desc>Additional elements for names and dates</desc> </moduleSpec>
Modules are rarely changed, but a customization ODD will use
<moduleRef> to indicate which of them are to be included in the
customization. This element is also equipped with attributes to exclude or include
element members; for example the following example includes the whole “namesdates”
module, but without
<moduleRef ident="namesdates" except="event listEvent" />
This element can also be used to bring in external schemata if necessary.
<moduleRef url="svg.rng" />
A TEI customization needs four modules to be functional:
ODD processor does not enforce the presence of these modules, however, and it is
left to the user to make sure they are included.
Model classes work similarly to modules, but only accept memberships from element declarations (elements may be members of multiple model classes). A model class can be referenced from within a content model, allowing all members of the model class to appear at that point in the content model. When the class is referenced, one can indicate cardinality and the order (alternation or sequence) of any or all members of the class (see Elements below).
<classSpec module="tei" type="model" ident="model.segLike"> <desc>groups elements used for arbitrary segmentation.</desc> <classes> <memberOf key="model.phrase"/> </classes> </classSpec>
Customizations may change model classes to fine-tune class dependencies. Here is
an example that allows members of the
model.segLike class to appear
wherever members of the
model.addrPart class are allowed.
<classSpec module="tei" type="model" ident="model.segLike" mode="change"> <classes> <memberOf key="model.phrase"/> <memberOf key="model.addrPart"/> </classes> </classSpec>
Attribute classes declare and provide documentation for a set of attributes. Elements and other attribute classes can inherit from them.
<classSpec module="verse" type="atts" ident="att.enjamb"> <attList> <attDef ident="enjamb" usage="opt"> <desc>indicates whether the end of a verse line is marked by enjambement.</desc> <datatype> <dataRef key="teidata.enumerated"/> </datatype> <valList type="open"> <valItem ident="no"> <desc>the line is end-stopped </desc> </valItem> <valItem ident="yes"> <desc>the line in question runs on into the next </desc> </valItem> <valItem ident="weak"> <desc>the line is weakly enjambed </desc> </valItem> <valItem ident="strong"> <desc>the line is strongly enjambed</desc> </valItem> </valList> </attDef> </attList> </classSpec>
Customizations may adjust dependencies to other attribute classes and will often
update and constrain attribute values. The example below makes the
@enjamb attribute required (by default it is optional), changes its
values to a particular preferred terminology, and closes the list of values, thus
disallowing a value that is not from the specified preferred terminology. All of
these changes are well within the original specification of the TEI, which is quite
permissive, but this case supposes a situation where a mandatory and stricter
@enjamb is required by a text encoding project. Note how
<classSpec> that do not need change are not included
<desc>). Note the use of @mode="replace" to override the
<classSpec module="verse" type="atts" ident="att.enjamb" mode="change"> <attList> <attDef ident="enjamb" usage="req" mode="replace"> <valList type="close"> <valItem ident="endstop"> <desc>the line is end-stopped </desc> </valItem> <valItem ident="light"> <desc>the line is lightly enjambed</desc> </valItem> <valItem ident="heavy"> <desc>the line is heavily enjambed</desc> </valItem> </valList> </attDef> </attList> </classSpec>
The definition of elements includes their memberships to modules and classes, attributes, and a content model declaration.
<elementSpec module="tagdocs" ident="code"> <desc>contains literal code</desc> <classes> <memberOf key="model.emphLike"/> </classes> <content> <textNode/> </content> <attList> <attDef ident="type" usage="opt"> <desc>the language of the code</desc> <datatype> <dataRef key="teidata.enumerated"/> </datatype> </attDef> </attList> </elementSpec>
Content models can be defined using RELAX NG, or (preferably) using dedicated ODD
elements. There are a number of features available to organize the content model,
<sequence> to determine how
the referenced elements can be combined; and
@maxOccurs attributes to set cardinality. Note that each
specification element (
<elementSpec>) has corresponding reference elements
<content> <alternate> <classRef key="model.pLike" maxOccurs="unbounded"/> <sequence> <elementRef key="summary" minOccurs="0" maxOccurs="1"/> <elementRef key="msItem" maxOccurs="unbounded"/> </sequence> </alternate> </content>
Model classes group elements by membership and typically do not impose a specific
order. When referenced, however, the
@expand attribute can be used to
override this behavior. For example, the following content model boils down to
( p*, ab* ) rather than the usual
( p | ab
<content> <classRef key="model.pLike" expand="sequenceOptionalRepeatable" /> </content>
In a customization, including or removing an element is typically done when
selecting a module via the
<moduleRef>. However, these operations can also be performed
<elementRef> inside a
For example, to add the element
<msItem> (manuscript item) without
including the manuscript description
<elementRef key="msItem" />
<p> (paragraph) element without removing the core module
(without which TEI would make little sense):
<elementRef key="p" mode="delete" />
More minute changes to elements are quite common in a TEI customization and they
will range from adjusting the description, to adjusting attribute values, to class
memberships. A typical operation would be constraining attributes, for example the
@type attribute on the
<div> (textual division)
@type attribute is derived from
membership in the attribute class
att.typed. Note the use of
@mode="replace" to override the declaration of the @type attribute inherited from
<elementSpec ident="div" mode="change"> <attList> <attDef ident="type" mode="replace"> <valList type="closed"> <valItem ident="chapter"/> <valItem ident="section"/> </valList> </attDef> </attList> </elementSpec>
Entirely new elements can be added as well, though when customizing TEI, the
Guidelines require that new elements and
attributes are added under a new namespace. Membership to classes will determine
where the element can go; for example
model.phrase groups “inline”
elements, so a new inline element can simply declare its membership to that
<elementSpec ident="opus" ns="myTEI.example.org" mode="add"> <desc>The opus number or "work number" that is assigned to a musical composition</desc> <classes> <memberOf key="model.phrase"/> <memberOf key="att.global"/> </classes> <content> <textNode/> </content> </elementSpec>
Likewise, a “block” element could be part of the same model class as
model.divLike) or as
model.pLike). When the new element is meant to be a child of
another specific element, the parent element’s content model will need to be
changed. For example this is how the new element
it is not a member of
model.phrase, could be added to TEI’s
<title> element only.
<elementSpec ident="title" mode="change"> <content> <alternate minOccurs="0" maxOccurs="unbounded"> <macroRef key="macro.paraContent"/> <elementRef key="opus" /> </alternate> </content> </elementSpec>
Datatypes for attributes and other string content can be specified and used by multiple declarations. W3C XML datatypes can be referred directly by their name and need not be redefined.
<dataSpec ident="teidata.pointer"> <desc>defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.</desc> <content> <dataRef name="anyURI"/> </content> </dataSpec>
When referenced, datatypes can be restricted to match a given regular expression.
<!-- a fraction: --> <dataRef name="token" restriction="(\-?[\d]+/\-?[\d]+)"/>
Datatypes can be changed by customizations, though it is more common to add or change restrictions or introduce entirely new datatypes.
Macros are used to declare predefined strings or patterns. Content models can be defined here just like they are in elements.
<macroSpec module="tei" ident="macro.paraContent"> <content> <alternate minOccurs="0" maxOccurs="unbounded"> <textNode/> <classRef key="model.gLike"/> <classRef key="model.phrase"/> <classRef key="model.inter"/> <classRef key="model.global"/> <elementRef key="lg"/> <classRef key="model.lLike"/> </alternate> </content> </macroSpec>
Customizations may consider introducing new macros, or adding new classes and elements to existing macros.
Other formal constraints can be documented and specified within the <constraintSpec> element. These can be placed within other specification elements, or elsewhere in the documentation text. The TEI source uses Schematron to express constraints, for example:
<constraintSpec ident="activemutual" scheme="schematron"> <constraint> <s:report test="@active and @mutual">Only one of the attributes @active and @mutual may be supplied</s:report> </constraint> </constraintSpec>
Expressing constraints is a powerful tool for building customizations, particularly when multiple encoders will be working with the schema. Schematron in particular offers many level of reporting for catching encoding errors and offering suggestions to encoders.
More literate programming
A truly literate programming ODD will couple prose with specifications, yet the elements introduced so far are children of <schemaSpec>, which is somewhat divorced from the <text> element containing the bulk of the human-readable documentation. It is possible, nonetheless, to refer from the prose to specifications in the <schemaSpec>, which a processor will expand when generating documentation. The TEI Guidelines use this mechanism, which simplifies the maintenance of specifications organized into multiple XML files.
<listRef> <ptr target="#ID_OF_SPEC_ELEMENT"/> </listRef>
This is hardly a tight coupling of prose and specification, but it works well for a complex ecosystem such as the TEI Guidelines. It is also possible, however, to do just the opposite: break up specifications into groups to be included within the documentation prose. References within <schemaSpec> can then take care of telling the processor how to put everything back together. The more recent TEI customization for “Simple Print” documents employs this strategy.
The following (abridged) bit of prose describes the selection of elements from the TEI header for the Simple Print customization:
<div> <p>A subset of 45 elements is selected from the TEI header module. In addition, <!-- etc. --></p> <specGrp xml:id="header"> <moduleRef key="header" include="abstract availability biblFull catDesc etc" /> <moduleRef key="corpus" include="particDesc settingDesc"/> </specGrp> </div>
Elsewhere in the document, the <schemaSpec> points back to this and other <specGrp> elements.
<schemaSpec ident="teisimpleprint" start="TEI teiCorpus"> <specGrpRef target="#base"/> <specGrpRef target="#header"/> <!-- etc. --> </schemaSpec>
Processing ODDs: zapping, sourcing, chaining
To generate documentation and schemata, a processor will merge together the source
and the customization ODD, resulting in a compiled document containing everything
the customization selected from the source, plus the instructions to perform the
additions and changes required. This first step is an opportunity to drop anything
is not needed, which makes it possible to write fairly lean customizations. For example,
analysis module has among its members the global attribute class
att.global.analytic. In turn, this class is a member of the
att.global class, which is referenced by every single element in the
TEI. When a customization excludes the analysis module,
will also be dropped without needing to change
Similarly, when selected classes or elements end up not being referenced anywhere
in the compiled ODD, they get “zapped” to avoid unreferenced declarations in the
resulting schemata and to exclude unnecessary documentation from the human-readable
In a typical TEI customization, only the customization ODD is supplied by the user
the processor obtains the source ODD for the latest release of TEI P5 before
compilation. While an altogether different source can be passed to the processor,
user can also indicate in the customization file that certain specifications should
obtained from specific sources. <schemaSpec> and other reference elements (e.g.
<moduleRef>, <elementRef>) can use the @source attribute to point the processor to
a different ODD to look for that specification. The TEI Guidelines specify a private URI (
tei:x.y.z) to be able to
refer to specifications from older versions of the TEI. Because
point to any ODD via a URI, it is possible to “chain” ODDs by customizing an existing
customization. This example from the TEI Guidelines shows how to extend the customization “TEI Bare”, which doesn’t include
<q> (quote), with <q> from version 3.0.0 of the TEI.
<schemaSpec ident="Bare-plus" source="tei_bare.compiled.odd" start="TEI"> <moduleRef key="tei"/> <moduleRef key="header"/> <moduleRef key="core" include="p list item label head author title"/> <elementRef key="q" source="tei:3.0.0"/> <moduleRef key="textstructure"/> </schemaSpec>
The only existing ODD processor is a set of XSLT scripts maintained by the TEI. Besides obtaining and running these scripts directly, there are a number of ways to process ODD to generate documentation and schemata.
Command line: the TEI Stylesheets repository on GitHub includes a number of scripts to perform transformations. The script
bin/teitorelaxngcompiles ODDs and transforms them into RELAX NG. It allows the user to set a non-TEI source ODD as well as a number of other options. Other scripts such as
bin/teitohtml5can generate documentation from a compiled ODD.
Oxygen XML editor: the TEI Oxygen plugin includes the TEI Stylesheets and routines for generating documentation and schemata from TEI ODD customizations.
OxGarage: this online service at https://oxgarage.tei-c.org/ provides both a graphical interface and an API to the TEI Stylesheets. It can compile ODDs and generate documentation and schemata in a number of formats.
Additionally, the TEI has created Roma (https://roma.tei-c.org/), an online tool to create customizations via a user interface, which also interfaces with the TEI Stylesheets to generate documentation and schemata. The interface does not cover the full expressiveness of ODD, but it supports users with less schema design expertise. An entirely new version of Roma is currently in beta (https://romabeta.tei-c.org/). Besides a complete rewrite of the interface, the new version takes advantage of the OxGarage API for processing ODD and covers a wider range of customization operations.
ODD for TEI interchange and beyond
ODD plays an important role in data interchange within the TEI ecosystem: as a large and greatly adaptable format, TEI-encoded documents can look quite different from one another. When carefully crafted, ODD customizations become the key to facilitate TEI interchange because they contain human-readable documentation as well as a formal description as to how a schema differs from the whole TEI specification. Finally, because ODD can be used to both express and customize a markup vocabulary, it has been adopted outside of the TEI. The most notable case is the Music Encoding Initiative, a markup language for representing music notation targeted at library and musicological research that shares many of the documentation and customization principles and needs of the TEI. The Music Encoding Initiative, which also uses ODD for its source and customizations, provides a transformation service online at http://customization.music-encoding.org/, which also applies the TEI Stylesheets to process ODD and generate schemata. ODD has also been used for the definition of the Internationalization Tag Set (ITS); and “various standards proposal designed within ISO committee TC 37 have been totally or partially written in TEI/ODD: MLIF, MAF, ISO 16642 rev., ISOTimeML”. New applications of ODD are still underway, including Martin Holmes’ proposed use for HTML (Holmes 2018).
My thanks to Syd Bauman for his extensive feedback on this piece and to the organizers of the pre-conference Symposium on Markup Vocabulary Customization for inviting me to talk about TEI ODD.
Bauman, Syd. 2011.
Interchange vs. Interoperability, in Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). doi:https://doi.org/10.4242/BalisageVol7.Bauman01
Baumann, Syd. 2017.
tei_customization: A TEI customization for writing TEI customizations (paper), in Proceedings of the Text Encoding Initiative Conference and Members Meeting, Victoria, British Columbia, Canada, November 11 - 15 2017. https://hcmc.uvic.ca/tei2017/abstracts/t_110_bauman_teicustomization.html
Cummings, James. 2007.
The text encoding initiative and the study of literature, A Companion to Digital Literary Studies, ed. Susan Schreibman and Ray Siemens. Oxford: Blackwell, 2008. http://www.digitalhumanities.org/companion/view?docId=blackwell/9781405148641/9781405148641.xml&chunk.id=ss1-6-6
Holmes, Martin. 2018.
Using ODD for HTML, in Proceedings of of the Text Encoding Initiative Conference and Members Meeting The
Markup Conference, Tokyo, Japan, September 9 - 13 2018. Pages 240 - 241. https://tei2018.dhii.asia/AbstractsBook_TEI_0907.pdf
Vanhoutte, Edward. 2004.
An Introduction to the TEI and the TEI Consortium, Literary and Linguistic Computing, Volume 19, Issue 1, April 2004, Pages 9–16. doi:https://doi.org/10.1093/llc/19.1.9
[Wittern et al. 2009]
Wittern, Christian, Arianna Ciula, Conal Tuohy. 2009.
The making of TEI P5, Literary and Linguistic Computing, Volume 24, Issue 3, September 2009, Pages 281–296. doi:https://doi.org/10.1093/llc/fqp017
 The story of the TEI has been told several times in writing (Burnard 2000, Vanhoutte 2004, Cummings 2008, Wittern et al. 2009, to name a few). Burnard and Rahtz 2000 explain how the idea behind ODD first originated with Lou Burnard and Michael Sperberg-McQueen in 1998, yet the transition to P5 (concluded in 2007) determined most of the modern shape of ODD that this paper will introduce.
 RELAX NG, DTD, and Schematron are generated directly, but for XML Schemas, the current processing takes a shortcut by converting RELAX NG to XSD using Trang.
 This overview is meant to showcase the capabilities of ODD for creating customizations and it is not intended to be a comprehensive documentation of the language or a tutorial. Please refer to Chapter 22 of the TEI Guidelines for a more comprehensive description of TEI ODD.
 Available on GitHub: https://github.com/TEIC/Stylesheets/releases and SourceForge: https://sourceforge.net/projects/tei/files/Stylesheets/.