Two systems are said to be interoperable if each is able to work with the parts or products of the other, with minimal if any external intervention. When applied to formats of digital texts, interoperability is differentiated between syntactic and semantic.
My distinction between and use of syntactic and semantic is congruent with that of the European Interoperability Framework European Commission 2010, 23:
Semantic interoperability is about the meaning of data elements and the relationship between them. It includes developing vocabulary to describe data exchanges, and ensures that data elements are understood in the same way by communicating parties.
Syntactic interoperability is about describing the exact format of the information to be exchanged in terms of grammar, format and schemas.
Syntactic interoperability refers to consistency or completeness in encoding, markup, and related conventions attached to that markup. It generally implies the complete, lossless exchange of data, no matter its meaning. We witness syntactic interoperability every day that we use the Web. Major updated browsers accessing the data in any page written validly in a version of Hypertext Markup Language (HTML) will present different readers with the same content and roughly the same display. Likewise, in the realm of textual scholarship, files validly marked up with one of the Text Encoding Initiative (TEI) formats are, in general, syntactically interoperable. A valid TEI file created by one party can be shared with any other to be studied, processed, or otherwise used.
Semantic interoperability stands a level higher, and characterizes systems that can
losslessly exchange not just the data but any associated or underlying meaning. For
example, the UTF-8 string "
France" may be syntactically interoperable with
other systems that handle UTF-8, but for it to be semantically interoperable, the
underlying significance or meaning, i.e., that the string represents the name of the
country France, should also be preserved after exchange. Such semantics admit degrees
interest and importance. For example, in both HTML and TEI ,
<p> have some semantic meaning, but to most users, of little import or
precision. HTML 5 has allowed a few other semantically interesting elements, e.g.,
<article>, but there are not many of these, thus keeping vocabulary to
less than 120 elements. In its more concerted effort to support scholarly concepts
markup, the TEI Consortium has produced many more and with even greater precision,
<residence>, so that in its full schema
TEI supports nearly 550 elements. TEI encourages projects and users to build on this
by customizing the TEI to add their own semantically precise elements, or to remove
that have no relevance to a given project.
But assigning an XML element to every possible concept of interest is impractical, even in a customized TEI scheme. Thousands of concepts could be encoded, but with what result? If an elemental vocabulary gets too large, it winds up being misunderstood or misused. Or it may legitimate interpretations that members of the community may regard as wrongly deviating from standard usage.
An alternative has emerged to making elements the main carrier of semantics. Known loosely and variously as linked data, open linked data, or the semantic web, this set of practices builds upon a recommendation of the World Wide Web Consortium (W3C) called the Resource Description Framework (RDF), a relatively simple data model that envisions data as a network of nodes connected by lines, termed rather misleadingly edges (http://www.w3.org/RDF/).
In everyday usage, edge implies the juncture of two surfaces of one or more solid objects, with no implications for where that edge might begin or end, if it does at all. None of these sine qua nons for real-life edges have a place in the RDF appropriation of the metaphor. A newcomer may be forgiven for objecting that what depicted looks like a line, not an edge.
http://universal resource locators (URLs), so that further information about a thing or concept can be automatically retrieved. The method of transferring semantics thus shifts, from elements and attributes to the data they contain, namely URIs.
RDF conventions have been implemented in markup languages to varyious degrees. Across the Internet, RDFa and other forms of structured markup (Microdata, Microformat) have been applied widely, helping HTML become a major vehicle for semantic interoperability. The Web is populated with billions of assertions that are semantically comparable.
The University of Mannheim's Web Data Commons project, http://webdatacommons.org, conducts regular crawls of the entire Web. The project showed that in winter 2014 31% of HTML pages retrieved from 2.01 billion URLs (up from 26% of 2.24 billion in 2013) had some kind of structured markup, resulting in 20.5 billion RDF quads (RDF triples attached to a named graph; this figure is up from 17.2 billion in 2013). See http://webdatacommons.org/structureddata/2014-12/stats/stats.html and http://webdatacommons.org/structureddata/2013-11/stats/stats.html.
For a theoretical reflection on canonical or standardized reference numbers and their place in digital projects, see Kalvesmaki 2014.
In this article I offer three practical ways to make standardized references in TEI more semantically interoperable. The first of these, deployment of Canonical Text Services URNs, is somewhat well known but has not yet been broadly used in TEI cross-references. The second has, to my knowledge, not yet been tried at all, namely, informal communities agreeing to adopt Schematron files, to be added to the prolog of TEI files to standardize cross-references to a work that is frequently cited. My third and final approach shifts to stand-off markup, and I offer a model based upon the Text Alignment Network, a planned TEI-friendly XML format for the interchange of aligned texts.
Standard Cross-References in TEI
[B]ecause the choice of tags is guided by human interpretation, TEI-XML encoded files are in general not interoperable (Schmidt 2014)
Doubts about the interoperability of the XML format supported by the Text Encoding Initiative (TEI) have been voiced on numerous occasions, even within the flagship journal of the TEI, as in the quote above.
See also Schmidt and October 2014 discussions on the public TEI-L listserv, initiated by Roberto Rosselli Del Turco under the subject line "Interchange of TEI documents: examples?": https://listserv.brown.edu/archives/cgi-bin/wa).
@cRef, in tandem with
<ref>(and sometimes supplemented by
<cRefPattern>). But there are other ways as well. One could also use those elements with
@type. Or one could use
@source. Other methods include the use of
<linkGrp>, or even loose, unstructured mechanisms such as
<bibl>. (The variety of options, as I shall argue, hamper interoperability.)
A few of these many options are discussed further in this paper. But for ease of
discussion, I will concentrate on
@cRef, presented in the TEI Guidelines as an
ideal solution for an encoder who wishes to create a cross-reference to another work
means of a standardized or canonical reference. The relevant parts of the Guidelines,
§3.10.4 and §16.2.5, although accurate, are disjoint, technical, and not clearly connected to
everyday usage. So I present the material somewhat differently, from the perspective
ordinary encoder who is putting a project together and doing the best to follow the
All references to the TEI guidelines are based on version 2.8.0 of the P5 Guidelines, http://www.tei-c.org/Guidelines/P5/, last accessed 3 July 2015.
The Guidelines illustrate
@cRef with the example of a text that quotes from
the gospel of Matthew, chapter 5 verse 7 (Guidelines §16.2.5). Let us enhance this example by considering the needs of an encoder who
is editing works by Anne Brontë and who has decided to encode explicit quotations,
including the quotation from Matthew 5:7 that appears at chapter 5, paragraph 18 of
Agnes Grey. Because our focus is on both syntax and semantics, let
us assume that the encoder wishes to provide a cross-reference that will refer to
versions of that text as possible, created independently by other encoders or projects,
will be as useful as possible to the maximum number users, with a minimum of human
intervention for processing the data. Let us also assume that all the TEI transcriptions
that exist in the world are both discoverable and available. Of course, this is a
assumption to make in real life, but the problems associated with discoverability
availability are ubiquitous for this method and every other one, whether discussed
article or not. Assessing those problems here would be repetitive and tangential to
main point, interoperability.
We turn to the Brontë encoder, who has prepared a plain TEI transcription of
Agnes Grey, and now turns to marking up cross-references. Following
the TEI guidelines, the encoder tags the quotation with
seeing that only
@cRef, the encoder ignores the first two.
Upon further reading, particularly of the examples, the encoder feeling that both
<ref> are equally valid, decides that the
markup is more of a reference than a pointer, so adds
<ref> nearby in a
valid location. The relevant part of the TEI file might look like
..... <div type="chapter" n="5"> ..... <p @xml:base="•••••••••">‘But, for the child’s own sake, it ought not to be encouraged to have such amusements,’ answered I, as meekly as I could, to make up for such unusual pertinacity. <said>‘<quote>“Blessed are the merciful, for they shall obtain mercy</quote><ref cRef="•••"/>.”’</said></p> ..... </div> .....The encoder has given
@xml:basedummy values because it is as yet unknown what kind of values are expected. A target Bible text must be chosen, and then it must be interrogated to find out what elements and attributes have been used, and with what values. So the encoder finds one in TEI format. After noting the URL, the encoder studies the file and finds that it has the following structure at the place quoted:
..... <div n="Matt"> ..... <div type="chap" n="5"> ..... <ab type="v" n="7">Blessed are the merciful, for they will be shown mercy.</ab> ..... </div> ..... </div> .....
The encoder therefore replaces
••••••••• with the target URL (let's call it
http://example.com/nt.xml) and replaces
5:7. But the latter, being so far parsable only to humans, must be converted to
something a computer can act upon. So the Brontë encoder, again following the Guidelines,
adds a statement to the
<teiHeader>, something like this:
<teiHeader> ..... <encodingDesc> <refsDecl xml:id="biblical"> <cRefPattern matchPattern="(.+) (.+):(.+)" replacementPattern="#xpath(//div[@n='$1']/div[@n='$2']/ab[@n='$3'])"> <p>This pointer pattern extracts and references the <q>book,</q> <q>chapter,</q> and <q>verse</q> parts of a biblical reference.</p> </cRefPattern> </refsDecl> </encodingDesc> ..... </teiHeader>
The program listing above departs slightly from the official example in the TEI
Guidelines (§16.2.5), which use
XPath expression that assumes that verse labels and positions are isomorphic. That
a false assumption for most modern editions, which suppress or demote verses
considered spurious without altering the canonical numbering. The
@replacementPattern in my example also takes into account advice at
§16.3 that Bible verses should be tagged
<cRefPattern>stipulates for any TEI processor that
Matt 5:7should be converted to the URL
The encoder's job finishes, and the work now moves to those who wish to process,
publish, or study the data. This requires the use of some TEI-compliant and -aware
processing mechanism, which will take the TEI elements and attributes that have been
for cross-referencing, resolve them to retrieve a string or document fragment, and
transform that data according to whatever purpose is intended. Although the end result
differs widely from one processor to another, the initial, preparatory step is common
across the board. All processors must be programmed to find instances of
@cRef, take the string value, find a matching pattern in
<cRefPattern>), create an XPath
expression to be applied to the target XML file of Matthew (specified by
@xml:base), and then retrieve the document fragment, for later
But even in this preparatory stage, the processor requires some human intervention.
Someone must first step in and configure it to address irregularities not found in
TEI files. The person configuring the processor must study the Brontë text and discern
which elements have been used for cross-references, and with what kind of editorial
consistency. Perhaps the configurer is surprised to find that the encoder chose
<ref> instead of
<ptr>, and that the former was left
empty. Perhaps the configurer is surprised to find that the Brontë encoder was enamoured
the attraction of
@cRef and ignored a simpler solution, that of
@source. Perhaps the encoder and configurer
will engage in a spirited discussion as to the best use of TEI.
Perhaps the configurer and encoder are not on speaking terms, and
stands. The configurer must interrogate the use of the element even further to determine
what relationship any given
<ref> pair share.
After all, the former could be the previous sibling, next sibling, parent, or child
latter. (Of these four valid configurations, three are offered as examples in the
Guidelines.) The configurer might find that in a series of adjacent quotes it is difficult
to tell which
<quote> is paired with which
<ref>, and the
encoder may not have been consistent. The variety of options in TEI is the source
work for the person configuring the pre-processor. As Schmidt points out, in the quote
above, the choice of an element, as well as its placement, is subject to human
interpretation, and is therefore detrimental to interoperability.
Such a workflow also requires quite a lot of human intervention and interpretation
both stages (transcription, pre-processing configuration). And not only does it fail
preserve any data required for semantic interoperability, such as URNs, but it can
be said to be even syntactically interoperable. The syntax of the values of
@replacementPattern are guaranteed to be applicable
only to one quoting version and one quoted version. Any attempts to apply the data
versions of the New Testament (reflected by, say, changing the value of
@xml:base) must be preceded by checking the structure and contents of the
new file. In addition, once
@cRef is used this way, it becomes difficult to
use the attribute to refer to works other than the New Testament.
This is most acute when an encoder wishes to use
@cRef to point to
multiple works, a practice that would tax the limits of
@cRefas an interoperable cross-reference mechanism proves to be rather limited. It may be suitable for a single project depending upon specific files, but it is not prepared to handle a distributed network of independently created TEI files.
@cRef + Canonical Text Services URNs
The limitations of
@cRef prompt many TEI users to migrate to more complex
TEI linking mechanisms (discussed below). But
@cRef need not be abandoned so
quickly. Its syntactic and semantic value can be enhanced rather easily through Canonical
Text Services (CTS) URNs, a convention that defines a way to coin unique,
computer-actionable references to literary works independent of individual versions.
description of the syntax of CTS URNs would take us too far afield, and are easily
Discussed informally at
defined formally at
also Kalvesmaki 2014, paras. 15-24. See esp. notes 12-17, where I
register some concerns about the design of CTS URNs.
urn:cts:greekLit:tlg0031.tlg001:5.7(the Greek New Testament is catalogued by the Thesaurus Linguae Graecae as author number 0031, and Matthew as work number 001). This URN is said, by definition, to be valid for any version of Matthew.
Let us revisit the workflow of our example. Above we started with the Brontë encoder, and we placed no special requirements upon the TEI-compliant version of Matthew she or he used. But under the CTS URN method, the process has to start earlier, with the target text. Or rather, more precisely, a new participant is introduced as a mediary between the New Testament encoder and the Brontë one, namely, a CTS server.
The person who administers a CTS server finds one or more TEI-compliant New Testament texts, and processes those texts, importing them into an RDF-compliant data store. During that process each segment of text is converted into RDF data that connects the text string with a CTS URN (in RDF terms, the latter would be the subject and the former the predicate). The data could be stored and served in any number of ways, for example as a relational database or as a SPARQL Protocol and RDF Query Language (SPARQL) endpoint.
Whereas the architects of CTS have developed CTS as a SPARQL endpoint, Jochen Tiepmar, at the University of Leipzig, has deployed a CTS server as a MySQL database. See https://github.com/cite-architecture/sparqlcts and http://www.culingtec.uni-leipzig.de/ESU_C_T/node/471
In our example, we start with an administrator of a CTS server, who finds a TEI New
Testament. After interrogating the data structure, the administrator imports the verses
the New Testament, along with their proper CTS URNs into the service. The administrator
publishes specifications for the API that state that any queries should target the
http://ctsservice.example.com/text, add a question mark, then the CTS
Work shifts to the Brontë transcriber, who now does not need to study the structure of any particular New Testament text. All he or she needs to do is get the base URL for the CTS service, follow the specifications for the API, and encode the novel accordingly, e.g.:
..... <div xml:base="http://ctsservice.example.com/text?"> <p>‘But, for the child’s own sake, it ought not to be encouraged to have such amusements,’ answered I, as meekly as I could, to make up for such unusual pertinacity. ‘<quote>“Blessed are the merciful, for they shall obtain mercy.”</quote><ref cRef="urn:cts:greekLit:tlg0031.tlg001:5.7"/>’</p> .....
This particular CTS URN points to every version of the New Testament held in a
particular CTS service. But if the Brontë encoder knows that the quotation is from
specific version of Matthew, say a handwritten diary, and finds that version available
CTS service, the value of
@cRef can simply be narrowed further, e.g.,
The two attributes
@cRef are all that is
required of the transcriber. The syntax of the CTS URN renders
The work now shifts to the person configuring the processor, who still must interrogate
the Brontë text, to see how elements and attributes have been used for cross-referencing.
But once that is accomplished, the processor can be preconfigured by simply concatenating
@cRef. Before sending this request to the CTS
service, the configurer may wish to restrict the number of versions returned, which
simple enough: the value of
@cRef or the SPARQL query is changed to specify
the version or versions intended. The text or texts that are returned from the CTS
are then ready for transformation.
Under this method, the amount of work required of the transcriber and the pre-processor
is reduced considerably. The transcriber does not need to know anything about regular
expressions, XPath, and replacement patterns. The person configuring the processor
need to rewrite any preprocessing stylesheets. The syntactic and semantic interoperability
of the Brontë TEI file is increased significantly. The syntactic irregularities inherent
the customary use of
@cRef are eliminated by the CTS specifications, which
dictate exactly how a valid URN must be constructed. And a new level of semantic
interoperability not traditionally part of TEI files has been introduced. In that
CTS URN, one has a machine-actionable name not only for a particular passage but for
collection, a work, or, possibly, a specific version. The Brontë encoder has not only
pointed to a specific set of texts in a CTS service, but has uniquely named both a
(gospel of Matthew) and a specific part of that work (5:7). That URN can be used by
other system that is CTS URN-aware to collate the assertion governed by
into heterogenous datasets. And that means that the cross-reference declared in the
file of the Brontë transcription has now been released to the semantic web.
This approach to cross-references assumes, of course, that a quoted text is available in a CTS service, an assumption we made at the outset (see above). But the need to have an available CTS server is a reminder that this method introduces a major step into the workflow, and an added point of possible failure in data processing. The relationship between source text, cross-reference, and target text is now mediated. In addition, the extra labor on the part of the CTS administrator is not to be underestimated. CTS services require software packages (e.g., SPARQL endpoints) that must be configured and maintained, requiring server administrator skills well beyond simply uploading a plain XML file to a public server. The average TEI encoder who has a basic website is not likely to be ready to administer a CTS server. There are also, at this time, few examples of CTS services, and only as that number grows will the specifics of other opportunities and shortcomings be made clear.
@cRef + Shared Schematron
At the heart of a CTS URN is a familiar, standardized canonical reference system that has been transformed into a syntactically regularized string, to bridge independently created texts. Another way a community of encoders and projects can exploit so-called canonical references in the name of interoperability is to transform standardized references into an agreed controlled vocabulary, then specifying the rules for that vocabulary with a Schematron file. Anyone choosing to use the convention need merely add a reference to the Schematron file in the head of their TEI documents. This inclusion not only tells other users that the shared cross-reference system has been adopted, but, in the validation process, can weed out bad values and provide contextual help to the TEI encoder who may not know all the rules for the cross-reference system.
The method advocated below resembles somewhat the constraints applied by the schemas developed for the Mary Baker Eddy Library, which regulates the syntax of cross-references within a single corpus to a variety of works. For documentation see http://www.wwp.neu.edu/outreach/seminars/mbel/TEI_development/schemas/mbel.odd; http://www.wwp.neu.edu/outreach/seminars/mbel/TEI_development/schemas/mbel.doc.html#att.pointing; and http://www.wwp.neu.edu/outreach/seminars/mbel/TEI_development/schemas/mbel.isosch. But whereas the Mary Baker Eddy schema focuses on the needs of a single project dealing with multiple works, in this section I deal with the inverse: multiple projects trying to interoperably quote a single work, no matter the specific version.
This method starts further upstream than either the Brontë encoder or a putative CTS server. It begins with the community that wishes to make Matthew and the rest of the New Testament (maybe the Bible in general) open to standardized cross-references. Out of that community a person or project (or perhaps a TEI special interest group) agrees to host and maintain master versions of the schema files. The community agrees to create a pair of Schematron files, one to regulate transcriptions of the New Testament, the other, transcriptions of texts that quote from the New Testament.
The first file defines the structure of the New Testament text and permissible
values. Let us suppose the community has agreed that any New Testament transcription
should have three levels of
<div>, one for books, one for chapters, and
one for verses. They also agree on a set of abbreviations that should be used for
names of the books. They envision transcriptions of the New Testament having a TEI
<text> that looks something like this:
<text> <body> <div n="Mt"> ..... <div n="5"> ..... <div n="7"> <p>μακάριοι οἱ ἐλεήμονες, ὅτι αὐτοὶ ἐλεηθήσονται.</p> </div> ..... </div> ..... </div> ..... </body> </text>
To enforce this structure, the community encodes assorted rules in the first of the two Schematron files. For example, this rule defines permissible book abbreviations:
<rule context="tei:div"> <let name="hierarchy" value="count(ancestor::tei:div) + 1"/> <report test="$hierarchy = 1 and not(matches(@n,'^(Mt|Mk|Lu|Jn|Ac| Ro|1Co|2Co|Gal|Eph|Php|Col|1Th|2Th|1Tim|2Tim|Tit|Phm| Heb|Jam|1Pe|2Pe|1Jn|2Jn|3Jn|Jud|Re)$','x'))" >Book value must be one of the following: Mt, Mk, Lu, Jn, Ac, Ro, 1Co, 2Co, Gal, Eph, Php, Col, 1Th, 2Th, 1Tim, 2Tim, Tit, Phm, Heb, Jam, 1Pe, 2Pe, 1Jn, 2Jn, 3Jn, Jud, Re.</report> ..... </rule>
The example above concisely specifies that the first-level
(those at the book level in the hierarchy) must have values of
@n that draw
from one of the abbreviations adopted by the community for the twenty-seven books
New Testament. In the case of Matthew, the agreed abbreviation is
<report> is but one of many that could be declared within the
<rule>. Another could include a specification as to the number of
chapters allowed in a particular book. This next
<report> specifies that
the second level
<div>s pertaining to the book of Matthew must be
numbered 1 through
<report test="$hierarchy = 2 and ../@n ='Mt' and @n and not(matches(@n,'^([1-9]|1[0-9]|2[0-8])$'))">Mt has a maximum of 28 chapters.</report>
The verse numbers too can be defined, as here, which specifies that verse numbers for Matthew 5 fall from 1 through 48:
<report test="$hierarchy = 3 and ../../@n = 'Mt' and ../@n = '5' and @n and not(matches(@n,'^([1-9]|[1-3][0-9]|4[0-8])$'))">Mt 5 takes verses 1 through 48.</report>
Furthermore, let us suppose that this community agrees with many modern text editors that certain verses should be deprecated, but they do not wish to render a text that includes them as being invalid. For example, Matthew 18:11, widely regarded as spurious, could be flagged in a report, but merely as a warning:
<report test="$hierarchy = 3 and ../../@n = 'Mt' and ../@n = '18' and @n='11'" role="warning">Most critical editions suppress Mt 18.11 as spurious.</report>
Perhaps most important of all, the schema file can declare that every
<div> should have values of
@n such that every
<div> furthest from the root is uniquely citable, what I call the
Leaf Div Uniqueness Rule:
<pattern> <let name="leafdiv-flatrefs" value="for $i in (//tei:div[not(descendant::tei:div)]) return string-join($i/ancestor-or-self::tei:div/@n,' ')"/> <rule context="tei:div"> ..... <let name="this-ref" value="string-join(./ancestor-or-self::tei:div/@n,' ')"/> ..... <report test="not(descendant::tei:div) and count(index-of($leafdiv-flatrefs,$this-ref)) > 1" >Canonical references must be unique. </report> </rule> </pattern>
<pattern> above binds to the variable
$leafdiv-flatrefs a sequence of canonical reference for all leaf
<div>s. Each item in the sequence is a string made up of all the
@n values of a leaf
<div> and its ancestors joined by a
Mt 5 7. Each item must be unique to the sequence, a rule
that is checked by the
<report>. If it is not, the duplicate leaf
<div>s are marked as invalid. Enforcement of the Leaf Div Uniqueness
Rule allows chains of
@n joined vertically along an XML hierarchy to act as
an ID, one that economically follows the standardized (canonical) reference systems
are familiar to human encoders.
The uniqueness rule must apply only to leafmost
there are cases where a
<div> midlevel in the hierarchy is
intentionally split. For example, in the Greek Septuagint (LXX) version of
Proverbs, the 30th chapter is split, and interleaved with the two halves of
chapter 24 (24.1 - 24.22e [22a - 22e are LXX verses not extant in the Hebrew];
30.1 - 30.14; 24.23 - 24.34; and 30.15 - 30.33). In this case the
of the two split book
<div>s must be identical. This also explains
why the report is tested not against a leafmost
(which may be but only a partial selection of siblings according to the reference
system) but against the entire sequence of leafmost
@nhas little if any repetition.
Such repetition is found in alternate approaches such as those that use
@xml:id in the leafmost
5 could have
been inferred from the ancestors'
@xml:id values. Abbreviations of
book names and chapter numbers would need to be repeated for all ca. eight
thousand verses of the New Testament.
We turn now to the second part of the pair of shared Schematron files, that
pertaining to the quoting text and the syntax of the cross-reference. Here rules are
community anticipates that the attribute might be used for multiple space-delimited
cross-references, and to works other than the New Testament. They anticipate complex
quoting files that might look something like this (illustrating the work of an encoder
who wishes to add cross-references outside the New Testament, here to Proverbs
..... <div type="chapter" n="5"> <p n="18">‘But, for the child’s own sake, it ought not to be encouraged to have such amusements,’ answered I, as meekly as I could, to make up for such unusual pertinacity. ‘<quote>“Blessed are the merciful, for they shall obtain mercy.”</quote><ref cRef="NT.Mt.5.7 HebB.Prov.11.17"/>’</p> </div> .....
The community therefore defines both a prefix for the work (
NT) and some
character to be used as a delimiter (here a period, but many other nonspacing, nonword
characters would also serve). And the community specifies that every value of
@cRef that begins with the reserved prefix should construct the
cross-reference according to the established rules. For example, this next rule
specifies that the second element of any New Testament cross-reference (e.g., the
NT.Mt.5.7) should be one of the acceptable book
<pattern> <rule context="@cRef"> <let name="delimiter" value="'\.'"/> <let name="these-refs" value="tokenize(.,'\s+')"/> <let name="invalid-books" value="for $i in $these-refs return if(matches($i,concat('^NT',$delimiter)) and not(matches(tokenize($i,$delimiter),'^(Mt|Mk|Lu|Jn|Ac| Ro|1Co|2Co|Gal|Eph|Php|Col|1Th|2Th|1Tim|2Tim|Tit|Phm| Heb|Jam|1Pe|2Pe|1Jn|2Jn|3Jn|Jud|Re)$','x'))) then true() else false()"/> <report test="some $i in $invalid-books satisfies $i = true()">Error in cross-reference no. <value-of select="index-of($invalid-books,true())"/>. Book value must be one of the following: Mt, Mk, Lu, Jn, Ac, Ro, 1Co, 2Co, Gal, Eph, Php, Col, 1Th, 2Th, 1Tim, 2Tim, Tit, Phm, Heb, Jam, 1Pe, 2Pe, 1Jn, 2Jn, 3Jn, Jud, Re, separated by subsequent values by this delimiter: <value-of select="replace($delimiter,'\\','')"/></report> ..... </rule> </pattern>
@cRef is tokenized into a
sequence of space-delimited cross-references, assigned to the variable
$these-refs. Another variable checks the ones that begin with NT, and
makes sure that the next part (defined by the delimiter, the period) is one of the
acceptable abbreviations for a New Testament book. If any value does not conform,
@cRef is marked as invalid, and a message is returned, indicating which
cross-reference is faulty, as well as a list of acceptable values and the delimiter
should be used to separate parts of a cross-reference.
Other reports that are found in the first Schematron file can be replicated here as
well. For example, allowable chapter and verse numbers can be specified (examples
suppressed here for the sake of brevity). That second shared Schematron file could
specify exactly where the
<ref> should be placed relative to the
..... <report test="$this-val = 'NT' and not(name(../preceding-sibling::*) = 'quote')">An element containing @cRef must come immediately after the closing tag of the matching quote element.</report> .....
This report specifies that the element containing
@cRef must be the very
next sibling of its corresponding
<quote>. This test removes the
guesswork as to where a quotation's cross-reference is to be found, and so saves some
labor on the part of the person configuring a processor.
The blocks of code in the examples above are not necessarily computationally efficient, nor do they necessarily represent the best use of TEI elements. They merely illustrate the types of patterns and rules a community of practice might embrace. Once the community has established their rules, the two master Schematron files are posted in a central location. The community has the freedom to update those rules as the community learns what works and what doesn't, and the updates benefit every user.
Now work shifts to the two different communities of transcribers. The first consists of those who wish to provide a citable transcription of the New Testament. They begin by adding to a pre-existing TEI file an extra prolog statement, for example:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_lite.rng" schematypens="http://relaxng.org/ns/structure/1.0"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_lite.rng" schematypens="http://purl.oclc.org/dsdl/schematron"?> <?xml-model href="http://example.org/schemas/nt/1.0/nt-quotable.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TEI xmlns="http://www.tei-c.org/ns/1.0"> ..... </TEI>
The transcriber runs the validator, and might find that the once-valid TEI file is now rendered invalid, because it does not follow the new rules precisely. But the explanations provided by the error messages will advise the transcriber on how and where to alter the file to make it valid, so it can be made interoperable with all others.
In fact, the schematron file could be provided Schematron Quick Fixes, which in SQF-aware XML processors would allow the invalid data to be corrected with just two clicks or keystrokes, or even automatically. See http://www.schematron-quickfix.com/.
We now turn to the Brontë encoder, who, like the New Testament transcribers, adds a prolog:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?> <?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <?xml-model href="http://example.org/schemas/nt/1.0/quoting-nt.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TEI xmlns="http://www.tei-c.org/ns/1.0"> ..... </TEI>
And once again, the encoder runs the validator, and the extra Schematron pattern is used to see if the citations to the New Testament conform to the rules agreed upon by the community. If there are any errors, the message specifies exactly where and for what reason. The Brontë encoder edits the file until there are no more error messages.
This process can be repeated as often as one wants, upon any version of any text,
whether quoting or quoted. In fact, they can be combined in the same file, to allow
New Testament to be marked with internal cross-references. No matter the context,
Schematron reports steer the transcriber into the (usually small) fixes that need
@cRef alone is sufficient to declare the cross-reference. Neither
<cRefPattern> is necessary. An
@xml:base could be supplied, if so desired, but the
is now applicable to any version of the New Testament that adopts the shared Schematron
Work now turns to the processor to do something with the cross-reference. Here,
because the structure of every New Testament TEI file has been precisely defined (as
series of tesselated
<div>s) very little human intervention is needed.
Or, rather, the type of human intervention shifts, primarily to deciding which and
many of the available versions of the New Testament should be processed (compare the
same wealth of riches in the CTS method). Once a processor is configured to handle
user-defined cross-references to the New Testament, it can be used on any valid file
that also uses it, with no extra work. Naturally, this applies only to the preprocessing
phase. How exactly that data will be used (display, statistics, etc.) is determined
what users want.
This method greatly improves both the syntactic and semantic interoperability of TEI files. It requires no new infrastructure, and it supports both customized and standard TEI schemas. The shared Schematron files provide structure and predictability—a controlled vocabulary for cross-references—in areas where encoders most want it. Like CTS, a middleman has been introduced, but it is rather simple and benign: two relatively small Schematron files made available by http request that will normally be cached by users on their local drive for day-to-day work. So maintenance and overhead are rather light.
Note too that the shared Schematron files can be used on TEI Lite, TEI All, or even
customized TEI. No one has to use the same version of TEI in order to make New Testament
references interoperable. The validation files do not preclude any other markup within
<div>. They can be used on any version of the New Testament,
partial or complete, in any language, and the books or chapters need not be in a
specified order (thereby accommodating unusual editions that adopt alternative orders
the books of the New Testament).
Furthermore, this effort could be extended outside the TEI realm. That same community might create variations of the Schematron file pairs for XHTML 1, thereby allowing web pages to serve as host to syntactically and semantically interoperable transcriptions of New Testaments, or of texts quoting the New Testament.
But this general method also has a few major problems. It might work fine for heavily
quoted works, but what about less frequently quoted ones? Organizing a community of
practice to agree on rules might be difficult if not impossible for some texts
(including, ironically, the Bible). Further, how would reserved keywords (here,
NT) within the value of
@cRef be minted without conflict?
What happens in the case of duplicate or ambiguous prefixes adopted by independent
communities? Such questions should be regarded not as reasons for abandonment but
problems that can and should be solved. But those solutions go beyond the scope of
The problem of conflicting prefixes could be solved if they were handled like namespace prefixes. But such "work prefixes" would require new specifications in the TEI Guidelines, to ensure the integrity of the method.
TEI + Stand-off Markup
The three methods discussed so far assume cross-references that are embedded within a transcription. Such inline annotation is the most common way an encoder points from one text to another, not just in TEI but also in HTML. But the TEI guidelines (§§16.9-16.10) provide for an alternative approach, stand-off markup, where linking and cross-referencing are placed in a file separate from the transcriptions. Such stand-off markup or annotation has a few immediate drawbacks, the most immediate being that it is difficult to easily see the text to which an annotation applies, either because the files must be navigated and edited independently or because the semantics in the pointing scheme may be difficult for a human to parse (character counting, complex or opaque XPath expressions, etc.). But stand-off markup also has great benefits. It allows multiple complimentary or competing annotations to be made of the same base transcription; stand-off markup files can be created, edited, and served independently of any source texts; it facilitates a division of labor that allows transcribers and annotators to focus independently and concurrently on their discrete tasks.
The current specifications of the TEI guidelines provide for a specific method of
stand-off markup. It presumes that one or more transcription files are to be found
somewhere, and an external aligning file stands apart from them. That external file
point to the source files either by means of XInclude elements (explained at TEI Guidelines
§16.9) or by using
<link> (TEI Guidelines §§16.2, 16.7). Common to all these methods is a reliance upon the TEI XPointer scheme,
which provides a precise, stable, and expressive reference system that follows a
straight-forward, consistent syntax. The following examples show two different ways
create a stand-off cross-reference from the Brontë novel's quotation to the New
..... <linkGrp> <link target="http://example2.com/agnesgray.xml#xpath(//div[@n='5']/p) http://example.com/nt.xml#xpath(//div[@n='Matt']/div[@n='5']/div[@n='7'])"/> </linkGrp> .....
..... <body> <div> <include href="http://example2.com/agnesgray.xml" xmlns="http://www.w3.org/2001/XInclude" xpointer="range(xpath(//div[@n='5']/p))"/> <include href="http://example.com/nt.xml" xmlns="http://www.w3.org/2001/XInclude" xpointer="range(xpath(//div[@n='Matt']/div[@n='5']/div[@n='7']))"/> </div> </body> .....
Other examples using
<link> would look similar
to the second one above. The XPointer framework stands at the heart of them all,
pinpointing the precise node or document fragment that is meant. But as currently
constructed, this XPointer scheme shares with
@cRef a lack of semantics behind
the syntax. That is, no information about the meaning of a particular node is built
the XPointer scheme. For the examples above, there is no way to imply in the XPath
div[@n='Matt'] that the
div means a book and that the
@n means the name of that book. In addition, this fragment has coinage only
within a specific TEI file. Its interoperability is as limited as
shown to be above, since the XPointers are not guaranteed to have any validity for
versions of the same work. For every new version of Matthew or Agnes
Grey that the encoder wishes to include, the file structure must be
interrogated and a new XPointer expression created.
I propose a different approach to stand-off cross-references, one that relies upon semantically defined alignment. My proposal shares points with the previous two methods (CTS URNs and community-written Schematron files) but is more extensive in scope, anticipating an ecosystem of scholarly texts in which stand-off markup is the norm for all types of annotations, not simply cross-references. This ecosystem is the goal of a project that is still in development, the Text Alignment Network (TAN; http://textalign.net), a suite of XML encoding formats and set of recommended best practices to serve anyone who wishes to encode, exchange, and study varieties of text reuse: translations, quotations, paraphrases, adaptations, summaries, and so forth. In this section I use fragments of examples created in the TAN format to illustrate how stand-off annotation might be used to maximize the syntactic and semantic interoperability of the cross-reference.
Because the TAN format is still under development, examples provided in this article may be rendered invalid in any public release.
Methods discussed above moved the beginning of the encoding workflow earlier, either to a new network of CTS servers or to communities of practice coming up with their own Schematron files. Under the TAN method work begins with what I hope will become an informal community that actively develops and maintains TAN validation schemas, documentation, and examples, and to house those files in a central repository.
To make the format maximally useful to TEI users, TAN defines a minor customization of the TEI All schema, introducing a few constraints. Every transcription file must:
be dedicated exclusively to a normalized text of one version of one work found on one text bearing object;
be uniquely named;
uniquely name the work that has been transcribed;
segment the transcription of the work into a series of nested
<div>s or no
@n, specifying the type of division and its name;
observe the Leaf Div Uniqueness Rule (explained above).
define every metadatum with both human-readable names and machine-readable ones (URI/IRIs).
<div>s, but such markup is likely to be ignored by TAN users, since they are interested in TEI files primarily as a source of normalized, well-segmented transcriptions. Extra markup, such as nuanced, complex cross-references, are expected to be found in a separate file.
So, coming back to our example, we start with the transcriber of Agnes Grey, who makes a few adjustments to the TEI file (explained below):
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="http://textalign.net/release/1/schemas/TAN-TEI.rnc" type="application/relax-ng-compact-syntax"?> <?xml-model href="http://textalign.net/release/1/schemas/TAN-TEI.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" TAN-version="1" id="tag:textalign.net,2015-04-07:test1"> <teiHeader> ..... </teiHeader> <head xmlns="tag:textalign.net,2015:ns"> ..... <declarations> <work> <IRI>http://dbpedia.org/resource/Agnes_Grey</IRI> <name>Agnes Grey</name> </work> <div-type xml:id="chapter"> <IRI>http://dbpedia.org/resource/Chapter_(books)</IRI> <name>chapter</name> </div-type> <div-type xml:id="p"> <IRI>http://dbpedia.org/resource/Paragraph</IRI> <name>paragraph</name> </div-type> ..... </declarations> ..... </head> <body xml:lang="eng"> ..... <div type="chapter" n="5"> ..... <div n="18" type="p"> <p>‘But, for the child’s own sake, it ought not to be encouraged to have such amusements,’ answered I, as meekly as I could, to make up for such unusual pertinacity. ‘“Blessed are the merciful, for they shall obtain mercy.”’</p> </div> ..... </div> ..... </body> </TEI>
That is all the Brontë encoder need do. The New Testament transcriber has a similar responsibility:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="http://textalign.net/release/1/schemas/TAN-TEI.rnc" type="application/relax-ng-compact-syntax"?> <?xml-model href="http://textalign.net/release/1/schemas/TAN-TEI.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" TAN-version="1" id="tag:textalign.net,2015-04-07:test2"> <teiHeader> ..... </teiHeader> <head xmlns="tag:textalign.net,2015:ns"> ..... <declarations> <work> <IRI>http://dbpedia.org/resource/New_testament</IRI> <name>New Testament</name> </work> <div-type xml:id="bk"> <IRI>http://dbpedia.org/resource/Book</IRI> <name>book</name> </div-type> <div-type xml:id="ch"> <IRI>http://dbpedia.org/resource/Chapter_(books)</IRI> <name>chapter</name> </div-type> <div-type xml:id="v"> <IRI>tag:textalign.net,2015-04-07:div-type:verse:biblical</IRI> <name>verse (Bible)</name> </div-type> ..... </declarations> ..... </head> <body xml:lang="eng"> <div n="Matt" type="bk"> ..... <div n="5" type="ch"> ..... <div n="7" type="v"><ab>Blessed are the merciful: for they shall obtain mercy.</ab></div> ..... </div> ..... </body> </TEI>
Starting from the top of both examples, observe the following:
The prolog contains two declarations, one pointing to a customized TEI schema in RELAX-NG (compact syntax) and another pointing to a Schematron file. (These URLs do not resolve; they are merely illustrative.)
The rootmost element,
@id. The latter is a user-defined URN naming the file. (Actually, the name applies to all versions of that file, but I avoid a full explanation here.)
There is a new
<head>element. The TAN suite has formats for different kinds of data (some of which one would never use TEI to encode). Metadata from one type of TAN file to the next must be predictably and consistently structured. In a word,
<teiHeader>is inadequate for TAN files, and would be confusing when juxtaposed with other TAN files. The
<tan:head>structures metadata in a manner consistent with other TAN files. The need for predictability is also why it is a sibling, not a child, of
The literary work and the division types are defined by
<div-type>, which take what I call an IRI + name pattern, a recurrent feature of all TAN files. One or more
<IRI>s supply a computer-readable name in the form of an Internationalized Resource Identifier (IRI, an extension of URI, Uniform Resource Identifier) and one or more
<name>s, a human-readable one. The
@xml:idprovides a local identifier so that the entity, properly defined by its IRI values, can be easily referenced. Thus, the two examples assign the division "chapter" different abbreviations (
chapter), but this difference does not matter because the definition, made by
<IRI>, is shared.
<body>takes a set of nested
<div>s. Any markup inside a leaf
<div>is optional, and will be ignored by many users of the file. (For this reason, a bare TAN format for transcriptions is provided, to support users who prefer plain text to TEI.)
The transcribers' work is finished. Before we move to the next phase, however, it
worth noting some important gains in interoperability that have already been made.
a TAN transcriber is compelled to segment a single work according to a semi-intuitive
reference system, and to declare the work and the types of division according to IRI/URIs,
we have in place the foundation for computer-actionable alignment. That is, if one
have one hundred people each independently transcribe a different version of
Agnes Grey or the New Testament along TAN rules, it is likely that
many of them would structure, define, and label
<div>s in a similar
fashion. Thus, a good number of these versions will already be prepared for automatic
alignment, with no human intervention whatsoever. There will always be some versions
encoded differently, of course, and the TAN format provides the tools for an aligner
easily reconcile differences where they exist. But even before the aligner has arrived,
stage has been set for computers to create multilingual editions of versions of the
work with minimal human intervention.
At this point, work shifts to the annotator who wants to encode the cross-reference.
TAN format specifies two formats for cross-referencing. One is designed exclusively
pairs of texts (bitexts) and is used to create clusters of words (or merely letters)
correspond across the bitexts. This format, intended for highly detailed, nuanced,
complex work, provides a kind of microscopic alignment. But we focus here on the other
of format, mascroscopic, which is intended to be used to align any number of versions
any number of works, and to specify further alignments on the basis of leaf
<div>s (but more larger or mor precise alignments, down to the level of
words, can also be made).
Let us suppose an aligner has found not only our two example TAN transcription files but another version of each work, and wishes to declare a cross-reference from the Brontë novel to the New Testament that applies to all four. That alignment file will look something like this:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="http://textalign.net/schemas/1/TAN-TEI.rnc" type="application/relax-ng-compact-syntax"?> <?xml-model href="http://textalign.net/schemas/1/TAN-TEI.sch" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?> <TAN-A-div xmlns="tag:textalign.net,2015:ns" TAN-version="1" id="tag:textalign.net,2015-04-07:alignment-test1"> <head> ..... <source xml:id="bronte"> <IRI>tag:textalign.net,2015-04-07:test1</IRI> <name>Agnes Grey in English</name> <location when-accessed="2015-07-13">test1.xml</location> </source> <source xml:id="bronte-fra"> <IRI>tag:textalign.net,2015-04-07:test3</IRI> <name>Agnes Grey in French</name> <location when-accessed="2015-07-13">test3.xml</location> </source> <source xml:id="nt"> <IRI>tag:textalign.net,2015-04-07:test2</IRI> <name>King James version of the New Testament</name> <location when-accessed="2015-07-13">test2.xml</location> </source> <source xml:id="nt-grc"> <IRI>tag:textalign.net,2015-04-07:test4</IRI> <name>Nestle Aland version of the Greek New Testament</name> <location when-accessed="2015-07-13">test4.xml</location> </source> ..... </head> <body> <align> <div-ref src="bronte" ref="chapter 5 p 18"/> <div-ref src="nt" ref="bk Matt ch 5 v 7"/> </align> </body>
<head> is somewhat long, because four different versions are in
play, and they each need the IRI + name pattern (see above) as well as one or more
<location>s, to specify where the source has been found. But the
<body> is relatively straightforward. A single
encloses a set of
<div-ref>s, each of which names a particular passage by
identifying the source and reference. The pair of
<div-refs> provide a
two-way cross-reference that follows a human-friendly syntax that does not require
knowledge of XPath, XPointer, regular expressions, and so forth.
Even though this cross-reference invokes only the sources given the id
nt, the reference applies to all four sources. That
<div>-based alignment rules stipulate that every processor must
infer alignment wherever possible and that, unless otherwise specified, alignment
transitive. If two texts are versions of the same work (discerned through the
<IRI> values of each source's
<work>), then their
<div>s—should be aligned wherever they can (using
the IRI values of
@type and the data values of
if special alignment is made across works (such as the cross-reference above), then
alignment is to be treated as transitive unless otherwise specified. That is, if an
<align> says that X ~ (aligns with) Y, then for every A ~ X and every B
~ Y, A ~ B.
There are a number of benefits to the simplified
illustrated above, but one should be singled out. The value of
(a required attribute of
<location>) indicates when the aligner last saw a
source transcription. If that file is corrected and updated, and the date of the change
logged in the source file, then when the aligner validates the alignment file, the
Schematron pattern will issue a warning that the source has been updated. The aligner
then decide if the changes have any important consequences. So transcribers can keep
files in a central location and have the liberty of correcting
typographical errors. They need not worry about altering any stand-off markup files
hunting down every person using their files. The Schematron schemas do the notifying.
who depend upon the source file can be automatically informed of any changes, one
signal strengths of stand-off markup.
The aligner's task is finished, and work shifts to the processor. Configuration of the pre-processor is a one-time affair that will apply not only to any version of a particular text (as was the case with the method of the shared Schematron file, discussed above) but to any TAN div-based alignment file for any work. That is, those who configure processors do not need to learn the structure of a given work or transcription file. They need only to know the TAN specifications for alignment (i.e., how to interpret a TAN-A-div file). Any TAN-compliant processor can be used on any TAN-A-div file, no matter how many works or versions it has. How the processor uses or transforms the data is another issue altogether, because that depends upon the purpose and questions the transformation serves. But the preliminary pre-processing stage need be configured only once, since all valid TAN files (both transcription and alignment) are interoperable, both syntactically and semantically.
There is obviously much more I should say about TAN alignment, in response to important
questions or concerns. What if independent transcriptions of the same work are discordant,
using different values for
@n? What if division types and works are defined by
different IRI vocabularies? What about versions of the same work that use altogether
different reference systems? What about works that are similar but not really the
What about coordinating specific ranges of text smaller than the leaf
<div>? What if a commonly used reference system is misleading or
These questions and more have been anticipated, and will be addressed in the full specifications for the Text Alignment Network. Explaining any single point adequately would involve moving into territory outside the remit of this article, and would raise yet other questions that would require a full discussion of the TAN design principles and rules.
But let us assume for the sake of argument that these concerns are not handled
adequately under TAN specifications. Inevitable shortcomings aside, consider how much
interoperability has been secured in the simple examples above. Like CTS URNs,
TAN-compliant TEI provides a means for uniquely naming literary works. Like the shared
Schematron method, TAN-TEI offers transcribers rules to make their texts consistent
predictably structured (and therefore citable). And by compelling
be given a semantically precise definition, TAN specifications allow an otherwise
element to become highly productive and semantically precise. That is, a transcriber
free to define
<div> to mean a textual division that might be unusual or
specific to a field. Thus the world of textual divisions is now opened to the semantic
Even if TAN proves to have fatal flaws, I hope these examples inspire someone to create a better stand-off annotation system. If the goal is to allow a cross-reference to apply to any number of versions of any two works, then in-line annotation is not viable, because it indelibly impresses the cross-reference into a single version. To be applicable to other versions the cross-reference must be freed.
Three methods for enhancing the syntactic and semantic interoperability of cross-references in TEI files have been offered: Canonical Text Services URNs, shared Schematron files, and the stand-off markup of the Text Alignment Network. The first two could be implemented now. The principal barrier is practical—getting independent scholars, projects, and groups to adopt a method, try it out, and through trial and experience develop the protocols behind it. The third method needs both experimentation and development before it can be widely used. But all three show that greater interoperability is possible through a few modest adjustments to our approach to TEI. First, make source transcriptions predictably structured. Second, make sure that references to those predictably structured sources are themselves predictably structured. Third, define the syntax of the metadata such that each constituent part retains its semantics, defined by IRIs/URIs. Even if a reader finds one of the three methods disfavorable, that method is successful if, in the end, it catalyzes a better way.
[European Commission 2010] European Commission,
‘Towards Interoperability for European Public Services’, ver. 744final (2010-12-16),
[Kalvesmaki 2014] Joel Kalvesmaki,
References in Electronic Texts: Rationale and Best Practices, Digital
Humanities Quarterly 8.2 (2014),
[Schmidt 2014] Desmond Schmidt,
Interoperable Digital Scholarly Edition, Journal of the Text Encoding Initiative
[Online], Issue 7 | November 2014, Online since 12 November 2014, connection on 24
2015. URL:http://jtei.revues.org/979; doi:https://doi.org/10.4000/jtei.979.