CETEIcean: TEI in the Browser
Copyright ©2018 Hugh Cayless and Raffaele Viglianti
Table of Contents
The standard method for displaying a TEI document on the web is to either pre-transform it to HTML using XSLT or to dynamically transform it to HTML, again usually with XSLT or possibly XQuery. This method, while obviously totally viable, does tend to enshrine a particular workflow and division of responsibilities that may not be optimally flexible. We will present an alternative approach that permits a lighter-weight development workflow and uses more standard web technologies.
The TEI Stylesheets and (though to a lesser extent, usually) more specialized XSLT conversions are large, complex software packages in their own right. Their development and maintenance requires experienced XSLT developers. Much though we might regret it, this is no longer a widespread skill. Even very sophisticated XSLT transforms usually involve discarding or substantially reformatting some data in the source. That XSLT is capable of this is obviously a strength, but we would argue there is a subtle pressure towards rewriting your data model for presentation rather than making use of it. Moreover, since XSLT has become a niche specialty, projects are more likely to either just use an existing package for transformation or rely on inexperienced developers customizing such a package. More subtly, the way XSLT transforms divorce the presentation of texts from the work done to model them means there is a disjunction between the (often intensive in TEI) intellectual labor of encoding texts and the presentation of those texts online. Browsers today are dynamic and powerful rendering engines, so why not apply them more directly to the TEI’s data model?
Because it does not rely on a transformation step, the use of CETEIcean means that a distributed development workflow can be used. The Digital Latin Library (DLL) is a new initiative to publish digital critical editions of Latin texts. We have been using GitHub Pages and CETEIcean to work on pre-publication texts. A “stub” HTML file is placed on our GitHub Pages site that points to the raw XML in a separate GitHub repository. When the stub file is loaded in a browser, it fetches the source file and displays it. What this means in practice is that editors working on the XML source have only to push to GitHub to be able to see how their TEI document will look in a web browser. The editor does not need to be able to run their own XSLT transform, nor is there any setup required beyond a very simple HTML file being placed in the GitHub Pages repo. This “push to publish” workflow means it is very simple to check your own or others’ work. If the encoder and the editor are not the same person, reviewing the encoder’s work is trivial. Moreover, because the reviewer is looking at a 1::1 representation of the source, errors are more likely be easy to troubleshoot, as they haven’t gone through a transformation process that might accidentally suppress or obfuscate them.
The “push to publish” workflow is also useful in the classroom; we have adopted it
teach TEI as part of the Introduction to Digital Studies in the Arts and Humanities,
graduate course at the Maryland Institute for Technology in the Humanities. Students
to focus on learning how to encode texts with the TEI and are not required to learn
preview or publish their work. By simply adopting an HTML template equipped with CETEIcean,
they are able to engage with issues related to digital publication with a lower learning
curve. With the 1::1 representation of the TEI source, students are able to use CSS
their encoding directly. This kind of workflow is of course possible using XSLT 1.0
xml-stylesheet processing instruction in the source XML or using
than XSLT 1.0.
CETEIcean converts all TEI elements to an “HTML” equivalent with a
prefix. TEI attributes, which mainly have no namespace, are simply copied over. The
@xml:id attributes are converted to their HTML
@id. Data attributes are used to preserve
namespace information, original element name, and whether the element is empty. For
elements, this is sufficient, but for TEI elements which have HTML equivalents, more
necessary. TEI has a couple of constructs roughly equivalent to HTML
for example, namely
<ref target> and
<ptr target/> (the former
represents linked text with one or more targets and the latter a bare pointer with
While linking behaviors can be applied to these post-conversion, those links do not
full status of HTML links—browsers do not preview the link URL on hover, for example.
CETEIcean handles this differently depending on whether the browser in question handles
registering Custom Elements. Recent builds of Chrome and Safari support the feature. In
Firefox, the support is experimental.
In browsers with Custom Elements support, the additional behaviors are applied in
element’s constructor. So when the element is added to the DOM, it gets additional
<ptr> elements, for example, get an
<a href> element
inserted inside them, with the @href and the content set to the @target of the
<ptr>. Tables are another case where HTML and CSS can have a hard time with
non-HTML table elements (the new CSS Grid Layout module may resolve this issue), so
have their content hidden and replaced with HTML table elements.
In browsers which have not yet implemented Custom Elements, the same effects are achieved
using a fallback method which calls the same behavior function. A baseline set of
behaviors is defined for CETEIcean, but these may be redefined or extended by the
containing one or two strings. The latter are automatically converted to functions
insert the content of the array into the element, prefixing or wrapping its content.
define a behavior for the TEI
<add> element, for example, which would wrap its
contents in the Leiden Convention markers for text added to a document.
Figure 1: Behavior Example
The behavior definition: "add": ["`","´"] The source: <add>an addition</add> (in the TEI namespace) The output: <tei-add><span>`</span>an addition<span>´</span></tei-add>
CETEIcean is not the first solution to permit TEI in the browser. TEI Boilerplate
uses an XSLT script to wrap the TEI document in HTML and convert those TEI elements
specific HTML equivalents, such as
above. The transformation is triggered by an “xsl-transform” instruction which has
to be added
to the source file, and because browser-based XSLT capabilities stalled at XSLT 1.0,
limited to that version of the language. The now-deprecated Saxon-CE implementation supported XSLT 2.0, and its replacement, Saxon-JS, supports XSLT 3.0 and provides significant performance improvements. Saxon-JS
which means it is not really viable in an open source environment. As a matter of
TEI Consortium won’t distribute software that requires users to purchase a software
order to use it.
The DLL’s edition viewer does some DOM manipulation to generate a traditional apparatus
criticus from the TEI
<app> elements in a text. A source text with inline
elements (see Figure 3) is displayed as text (Figure 4) plus apparatus (Figure 5).
of transformation is fairly standard, and it would not be unusual to see it done with
instead. What is unique about the DLL viewer is its use of the TEI’s model of textual
variation to produce a dynamic apparatus. Besides the traditional appearance of the
crit., the viewer also generates widgets which manipulate the edition’s DOM (which
isomorphically presents the TEI’s model), and thereby allow a reader to make decisions
what should appear in the reading text. The TEI models textual variation by placing
variants in parallel inside an
<app> element. The variant to appear in the
main text is placed in a
<lem> and any additional variants go in
<rdg> elements. The DLL viewer’s widgets allow readers to change any
<rdg> into a
<lem> (and the
<lem> into a
<rdg>), promoting that reading to the main text. The page’s CSS takes care
of the rendering, as
<lem>s are displayed and
<rdg>s are not.
In this way, the edition’s readers can try out different versions of the text and
see how the
affect its flow and meaning. This makes for a much more powerful and intuitive critical
edition than is possible in print (or static HTML). Obviously, something similar could
with an HTML version converted from TEI with XSLT, but having the affordances of the
directly available in the browser makes it easier for a developer to see how that
model can be
Figure 3: The first lines of Calpurnius Siculus’ first eclogue
Figure 4: Lines 1-5 (from https://digitallatin.github.io/viewer/calpurnius.html#poem1
Figure 5: Apparatus criticus for line 1-3 (from https://digitallatin.github.io/viewer/calpurnius.html#poem1
There are, of course, cases where one might want some radical transformation of one’s
source encoding. The natural inclination (and CETEIcean’s default) in rendering a
in a browser is to make the header invisible, for example. After all, it’s the text
meant to be read. But a number of projects put important information in the
<teiHeader>, and they might reasonably want to display that in the rendered
document. Things like critical apparatuses are an example where a natural display
form must be
created by extracting information from the text and reformatting it. CETEIcean by
not help particularly with these cases, although it is quite possible to accomplish
A more important concern is what to do about search engines. Google, for example,
make some attempt to index pages rendered using AJAX calls, where the page content
actually present in the page source, but is fetched at load time, but deprecated this
in 2015. The DLL’s pre-publication editions would never have been indexed anyway,
source is fetched from the
raw.githubusercontent.com domain, which excludes all
web crawlers. For purposes of the DLL workflow, Google-invisibility is actually an
as it means pre-publication materials can be open, but at the same time not very discoverable,
and will not be in competition with the eventual publication. But when those editions
published, we definitely want them to be searchable. The DLL’s solution at publication
to deliver a partially-converted file—i.e. A TEI file already converted to HTML Custom
source is indexable by search engines. An XSLT or XQuery conversion to a Custom Elements
format is trivial to write—the core of the DLL implementation is 30 lines of XQuery,
example. An alternative method would be to embed the TEI XML source in the HTML page
CETEIcean load that, though it is hard to know what a search engine might make of
Chimera. It is worth noting that (anecdotally) TEI Boilerplate does not fare well
CETEIcean does not provide a complete replacement for XSLT and XQuery as a means for publishing TEI on the web, but we do think it is a viable alternative for some projects, and is particularly useful in situations where a quick view of work in progress is needed. It especially shines in situations where the TEI’s model of the text can be usefully leveraged to allow interesting dynamic functionality in the browser. The isomorphism of CETEIcean documents to their TEI sources also means it will support things like robust annotation (since annotation targets should be able to be trivially mapped to the source documents) and in-browser editing (because we can easily turn the HTML back into TEI). In sum, it seems like this approach has a great deal of potential and starts to get us out of the cul-de-sac our XSLT dependency had put us in.
 Examples are available at https://digitallatin.github.io/viewer/calpurnius.html and https://digitallatin.github.io/viewer/balex.html. Both examples pull their source text from a separate repository on GitHub.
 See below for a discussion of proprietary XSLT 2.0 and 3.0 browser-based implementations.