Encoding the Ethiopic Manuscript Tradition
Encoding and representation challenges of the project Beta maṣāḥǝft: Manuscripts of
Ethiopia and Eritrea
Balisage: The Markup Conference 2017
August 1 - 4, 2017
The Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea project aims to construct a virtual research environment to encode and manage the rich and complex manuscript tradition of the Ethiopian and Eritrean Highlands. The Ethiopic manuscript culture, consisting of varying and difficult-to-identify literary works, is a living tradition. This project explores the difficulties in encoding and managing the relationships not only between the manuscripts themselves, but also between the manuscripts and the broader Ethiopian literary traditions that include Greek and Arabic texts as well.
Pietro
Maria
Liuzzo
Ancient Historian turned TEI and Digital Humanities advocate, he has worked for the
EAGLE project on Greek and Latin epigraphy and now leads all aspects of the technical
development of the project Beta maṣāḥǝft: Manuscripts of Ethiopia and
Eritrea.
Tech. Lead
Universität Hamburg
pietro.liuzzo@uni-hamburg.de
Copyright © 2017 by the author. Used with permission.
Manuscripts
TEI
Ethiopic Literature
Data Architecture
Introduction
The project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea (Schriftkultur des
christlichen Äthiopiens: eine multimediale Forschungsumgebung)
Beta maṣāḥǝft is a TEI (Text Encoding Initiative, www.tei-c.org) project.
We use GitHub to store our data and we deploy our data with eXist-db
(http://eXist-db.org/). The project is
extremely indebted for their continued support to the TEI and eXist-db communities. We would like to thank the team at the
Hiob Ludolf Centre for Ethiopian Studies, the Akademie der Wissenschaften in Hamburg and
the University of Hamburg for their support to the project and to the preparation of this
contribution.
is a long-term project funded within the framework of the Academies Programme
(coordinated by the Union of the German Academies of Sciences and Humanities) under the
supervision of the Akademie der Wissenschaften in Hamburg funded for 25 years, from 2016–2040.
The project is hosted by the Hiob Ludolf Centre for Ethiopian Studies at the University of
Hamburg and aims to create a virtual research environment for the management of complex data
related to the manuscript tradition of the Ethiopian and Eritrean highlands.
A video presentation of the project is available at
https://youtu.be/bI950izCu2E
This is a huge challenge and opportunity for the community of scholars in Ethiopian
history, philology, codicology and language, as the project will digitally catalogue all
manuscripts already catalogued in the past in print but will also digitize texts in fidal
(the script used by classical Ethiopic or Gǝʿǝz) meaning both manuscript transcriptions and
editions of texts.
The project scope includes also a prosopography, inherited in part from the work of the
Encyclopaedia Aethiopica and a gazetteer of Ethiopian places, both ancient and modern.A small grant has additionally been awarded for this task from the [] Resource Development Grant.
This means not just encoding but also creating semantically meaningful relationships between descriptions of manuscripts,
works, places and persons, in a way which gives justice to the Ethiopian tradition and its
specific characteristics. With a simple network of referenced entities in TEI as the background
architecture, the project produces data at several different levels and serves it in multiple
ways, displaying the contents and the semantic relationships encoded in it.
This article will highlight encoding choices for manuscripts' and works' descriptions and
hint at the further development of persons' and places' records. Also the serialization of
relationships and the data API [] will be presented.
The Distributed Text Services project is currently under development. We are just
early and eager users of it as we believe it will serve very well the community.
Structuring the data
This section describes a few challenges faced in the attempt to structure the data
architecture of the entire Ethiopic literary tradition and living manuscript
production.
The project's first phase deals currently with the digitization of manuscript's catalogues
and their structured description. A second phase will see also the digitization of texts,
including both transcriptions of manuscripts and editions of texts. At the moment the largest
portion of encoded Ethiopic text consists in the additions to manuscripts and already amounts to
the largest existing collection of ancient Ethiopic documentary evidence.
Status
Beta maṣāḥǝft inherited a lot of good work. To begin with, it could avail itself of the
data used for the Encyclopaedia Aethiopica [], a long-term project
which led to the publication of an incredibly useful tool for the Ethiopian studies.
Secondly around 1000 manuscripts recorded in a XML
database by the project where converted to the Beta
maṣāḥǝft TEI schema. These shaped to a large extent the customization of the TEI schema
prepared for the project initial phase, i.e. the digitization of catalogue records starting from out-of-copyright catalogues.
The type of information recorded by had to be included in a TEI formal
manuscript description responding to a schema as similar as possible, for interoperability
purposes, to those of projects like syriaca.org [] and
.
The major improvement introduced in this process was the possibility to nest and
organize msPart
elements as well as msItem
elements, which leads
to a much more interesting description than a flat list of consecutive parts, as these are
often not consecutive especially in a tradition that has mainly complex manuscript objects.
On the other side many pieces of information like incipit, abbreviated words, etc.
would have belonged to the manuscript's
transcription and had to be accommodated in other parts of the teiHeader
as
notes.
<list type="abbreviations">
<item>
<abbr>ፍ፡</abbr> for <expan>ፍቱሐ፡ </expan> (e.g., <locus target="#36ra #36vb #39ra"/>)
</item>
<item>
<abbr>ይ፡ ሕ፡</abbr> for <expan>ይበል፡ ሕዝብ፡ </expan> (fol. <locus target="#40va"/>)
</item>
</list>
Interaction with other projects
During this phase of theorization of the encoding structure and initial schema definition, Beta
maṣāḥǝft has benefited from the work of many predecessor projects, like the ones mentioned
above as well as and the EpiDoc guidelines [] and schema.
The use of relation
elements was introduced at this stage in conformance
with existing ontologies, although in an entirely non canonical way, which we hoped
would allow us to visualize the stratigraphy, showing the history of the manuscript and allow the encoding of what cannot be hierarchically structured.
To describe the connections between the different entities in Beta
maṣāḥǝft we take relations names from the classes of the SAWS [] and SNAP []
projects for example, but we reuse also relation's names used by syriaca.org or simply
dcterms
or geonames
classes and properties names. We have had
sometimes anyway to add some value for relation/@name
but only for relations specific to the Ethiopic milieu. For example bm:hasTabot
indicates a relation between a person and a place, i.e. that the @active
subject of the relation, a person, has a Tabot in the place at @passive
.
The following is an example where we use a relation element to state which works
might be parts of another literary work and we are able to point to the
specific subject and object of the relationship. Note also that we use tags for our bibliography,
exploiting the Zotero API to retrive the TEI needed for the full export of TEI XML files and for the visualization in the app, which uses a Zotero Style common to our centre resources.
<relation name="ecrm:CLP46i_may_form_part_of" active="LIT1978Mashaf" passive="LIT1586Hayman">
<desc>The Maṣḥafa ṭomār is mostly contained in manuscripts with the Hāymānota ʾabaw, where it can precede or follow
the table of contents. It also circulates independently.</desc>
</relation>
<relation name="saws:isVersionInAnotherLanguageOf" active="LIT1978Mashaf#t1" passive="LIT1978Mashaf#t4arabic">
<desc>The Ethiopic version of the Maṣḥafa ṭomār was translated from the Arabic version based on a Syriac version.
Guidi suggests that the Maṣḥafa ṭomār may have been translated in the mid-16th century.
<bibl><ptr target="bm:Guidi1932storia"/><citedRange unit="page">72-73</citedRange></bibl>
<bibl><ptr target="bm:Graf1944GCALI"/><citedRange unit="page">295-297</citedRange></bibl></desc>
</relation>
<relation name="saws:isVersionInAnotherLanguageOf" active="LIT1978Mashaf#t4arabic" passive="LIT1978Mashaf#t7syriac">
<desc></desc>
</relation>
Although many relations might be inferred from the files with their name (for instance the
relation between a manuscript and the repository or collection in which it is preserved), many
and more complex relations, like authorship of a work, fail to be easily represented as a
simple fact stated in an author
element. Often in the Ethiopic tradition as in
many other mediaeval traditions, attested literary works are attributed to authors or cannot
be assigned at all to an editorial will. A work can have as many authors assigned as
needed and we use a relation element for this
purpose in order to connect author or translator and the exact part of work that is
attributed to him in each case.
<relation name="saws:isAttributedToAuthor" active="LIT1036Agains" passive="PRS9481Theodoto"/>
In the following sections more details will be provided on the structuring of single
entities, focusing especially on manuscripts and literary works, which are the ones most
developed at the current stage of the project.
Manuscripts
Anyone who has worked with codicologists and philologists knows it is very difficult to
satisfy everybody. Different specialists will have their own opinion on
the importance of different parts of data and how to represent it.
Corpus analysis should also take into consideration the material structure of supports.
The manuscript collation (structure of leaves and quires) should be considered when
collating philologically texts coming from different manuscripts or manuscripts' parts.
For the collation of the manuscript we use the very nice VisColl tool by Dot Porter
[] to visualize the structure of the quires, extracting the
information from a simple list encoding.
<list>
<item xml:id="q1">
<dim unit="leaf">7</dim>
<locus from="1r" to="7v"/>
I(7; s.l.: 1, stub after 7/fols. 1r-7v)
</item>
<item xml:id="q2">
<dim unit="leaf">8</dim>
<locus from="8r" to="15v"/>
?II(8/fols. 8r-15v)
</item>
<item xml:id="q3">
<dim unit="leaf">8</dim>
<locus from="16r" to="23v"/>
?III(8; s.l.: 2, stub after 6; 7, stub after 1/fols. 16r-23v)
</item>
....
</list>
The
above XML structure is transformed in the intermediate format required by the tool by an
XSLT and is then passed as a parameter to the VisColl XSLT library to produce the following
result
Lists of contents are vital to understand what a manuscript contains and can alone give
all the information a user actually needs. In Beta maṣāḥǝft these summary links to records
with the data about each of the works or parts of works listed in the manuscript.
<msItem xml:id="ms_i1.6">
<locus from="105r" facs="f223"/>
<title type="complete" ref="LIT1812GospelLuke"/>
<msItem xml:id="ms_i1.6.1">
<locus from="105r" facs="f223"/>
<title type="complete" ref="LIT1812GospelLuke#TituliLuke"/>
</msItem>
<msItem xml:id="ms_i1.6.2">
<title type="complete" ref="LIT2713Luke" xml:lang="gez">ወንጌል፡ ዘሉቃስ።</title>
<note>
The text is divided in 85 (or 83) sections, which correspond approximately to the old
<foreign xml:lang="grc">τίτλοι</foreign>
.
</note>
<note>
The pericopes, the chapters of Ammonius and the 343 canons of
<persName ref="PRS3912Eusebius"/>
are indicated in the margins.
</note>
</msItem>
</msItem>
<msItem xml:id="ms_i1.7">
<locus from="163r" facs="f339"/>
<title type="complete" ref="LIT1693John"/>
<msItem xml:id="ms_i1.7.1">
<locus from="163r" facs="f339"/>
<title type="complete" ref="LIT1693John#TituliJohn"/>
</msItem>
<msItem xml:id="ms_i1.7.2">
<title type="complete" ref="LIT2715John" xml:lang="gez">ወንጌል፡ ዘዮሐንስ።</title>
<note>
The text is divided in 20 (or 19) sections, which correspond approximately to the old
<foreign xml:lang="grc">τίτλοι</foreign>
.
</note>
<note>
The pericopes, the chapters of Ammonius and the 232 canons of
<persName ref="PRS3912Eusebius"/>
are indicated in the margins.
</note>
</msItem>
</msItem>
When a title is given in the catalogue, then it is used otherwise the canonical title of the work
is chosen.
While we don't yet have the transcriptions of the main texts of the manuscript, we do
have a lot of transcriptions of incipits, explicits, colophons and additions to the main
manuscripts text of several kinds.
<additions>
<list>
<item xml:id="a1">
<locus target="#205v" facs="f424"/>
<desc type="ScribalNoteCompleting"/>
<q xml:lang="gez">
ለዘአጽሐፋሂ፡ ወለዘጸሐፋሂ፡ ሕቡረ፡ ይምሐሮሙ፡ እግዚኣብሔር፡ በመንግሥተ፡ ሰማያት፡ አሜን። ወለእመቦ፡ ዘወስኩ፡ ወዘአንተጉ፡ ወዘገንጰልኩ፡ ስረዩ፡ ወባርኩኒ፡ ለዓለመ፡ ዓለም፡ አሜን።
</q>
</item>
<item xml:id="a2">
<locus target="#1r" facs="f15"/>
<desc type="DonationNote">
Note from the king
<persName ref="PRS8595SayfaAr"/>
, son of
<persName ref="PRS1854Amdase"/>
, by which he donates this Gospel to the monastery of
<placeName ref="INS0308DQ"/>
at
<placeName ref="LOC2600Dayral"/>
</desc>
<q xml:lang="gez">
<supplied reason="undefined" resp="PRS10747Zotenbe">በ</supplied>
አኰቴተ፡ አብ፡ ወወልድ፡ ወመንፈ
<supplied reason="undefined" resp="PRS10747Zotenbe">ስ፡ ቅ</supplied>
ዱ። ወሀብኩ፡ አነ፡
<persName ref="PRS8595SayfaAr">ሰይፈ፡ ዐርኣድ፡</persName>
<supplied reason="undefined" resp="PRS10747Zotenbe">ን</supplied>
ጉሥ። ወልደ፡
<persName ref="PRS1854Amdase">ኣምደ፡ ጽዮን፡</persName>
ንጉሥ።
<supplied reason="undefined" resp="PRS10747Zotenbe">በ</supplied>
ስመ፡ መንግሥትየ፡ ቈስጣንጢኖስ። ዘንተ፡ ወንጌለ፡ ቅዱስ። ለ
<placeName ref="INS0308DQ">ቤተ፡ ሐዋርያት፡ ዘደብረ፡ ቍስቋም።</placeName>
እንዘ፡ እሰገድ፡ በብረከ፡ መንፈስ። ኀበ፡ ተኀብኣ፡ ውስቴቱ፡ መላኬ፡ ሥጋ፡ ወነፍስ። ምስለ፡ እሙ፡ ንጽሕት፡ ድንግል፡ ዘእንበለ፡ ርኵስ።
<gap reason="ellipsis" extent="unknown" resp="PRS10747Zotenbe"/>
ወለእመቦ፡ ዘሄዶ፡ ወዘተዓገሎ፡ ለዝንቱ፡ ወንጌል። ይኩን፡ ቅዉመ፡ ወርጉመ፡
<supplied reason="undefined" resp="PRS10747Zotenbe">በ</supplied>
ቅድመ፡ አብ፡ ወወልድ፡ ወመንፈስ፡ ቅዱስ። በዝ፡ ዓለም፡ ወበዓለም፡ ዘይመጽእ፡ ለዓለመ፡ ዓለም፡ አሜን፡ ወአሜን።
</q>
</item>
....
</list>
</additions>
These, as well as decorations and keywords, are also separately visualized as a navigation aid and as
small corpora of organized documentary sources.
Dating is a central issue for the semantic organization of text corpora, but the
timespan in a manuscript catalogue is very wide also within each manuscript. Manuscripts
from this live tradition may present multiple layers of modification (material, textual,
etc.) continuing until today. While encoding this is fairly easy because dating
elements can be added anywhere in a TEI structure, this information offers challenges in
the analysis. We encode dates with several attributes like in the following example.
<date calendar="ethiopian" when="1888" when-custom="1881" evidence="internal">፲ወ፰፻፹ወ፩ ዓመት፡ ምሕረት፡ </date>
This is displayed in the record as a simple table, but is also used to extract information
to build a timeline of the manuscript transforming the XML to feed a script using .
A way to encode the manuscript stratigraphy following is
currently under development and will hopefully be presented at the next TEI
conference.
A single work can be contained in several manuscripts in different parts and together
with different texts. The selection and organization of these texts is also very important and
might provide useful information. We then offer a way to pick a work from our canonical
list and see the structure of all manuscripts which contain it side by side. Hoovering on
any title will also highlight if and where the same work appears in another manuscript. The
manuscripts are here organized chronologically as much as possible to give also a good
starting point for an evaluation of the history of the manuscript tradition of the selected
work.
Textual and Narrative Units
The Ethiopic manuscript culture is a living tradition. Manuscripts are still being
produced today. The literary works they contain vary and are difficult to identify. The
author of this paper is not expert on this and will not dive into this issue if not in as
much as it concerns the architecture of the Web application serving the data.
In the Oriental tradition it is difficult to uniquely identify texts under titles (a
problematic category) and thus define distinct corpora. In our project
we try to identify and relate all independently circulating units and to keep a core
distinction between textual and narrative units as suggested by .
Although this is reflected in the encoding of texts, until now no satisfactory approach has
been found to exploit this distinction. A solely narrative unit like a set miracle account
used for saints or kings can be used to identify parts of a text. On the other side
a textual unit might still be considered a literary work in its own right or not.
In the Ethiopic literary tradition the concepts of work (as a
quantifiable unit) and that of collection (as a grouping unit) are not
available in the same way as they are, almost intuitively, for Western European corpora. Ethiopian literature
is characterized especially by its compilatory character. The treatment of work collections
(homilies, hagiographies, etc.) as grouping units is thus crucial to the project. How can
the relationships between texts, manuscripts, narrative units which are so complex and
variable be articulated with certainty in a canonic structure? Working with the outlines
defined by and , it is one of the main
concerns at the current stage to find working definitions of these concepts which can
accommodate all Ethiopian literary works. How should this distinction impact
textual-critical analysis, computation and hermeneutic?
The entire traditional hierarchy of literature falls apart in the face of this problem, but
encoding might still help in organizing the unorganized and TEI provides here the necessary
rigour and flexibility. A work can have parts which do not have independent circulation in
the Ethiopian tradition but can still be used as a reference in a list of contents of a
manuscript.
<msItem xml:id="ms_i1.1">
<locus from="3r" to="10v"/>
<title type="complete" ref="LIT1560Gospel#IntroductionGospels">Introduction to the Gospels</title>
<msItem xml:id="ms_i1.1.1">
<locus from="3r"/>
<title type="complete">Maqdǝma wangel, “Introduction to the Gospels”</title>
</msItem>
<msItem xml:id="ms_i1.1.2">
<title type="complete" ref="LIT1349Epistl">Letter of Eusebius to Carpianus</title>
</msItem>
<msItem xml:id="ms_i1.1.3">
<locus from="4r" to="8v"/>
<title type="complete" ref="LIT1560Gospel#IntroductionCanons">Canon tables</title>
</msItem>
<msItem xml:id="ms_i1.1.4">
<locus from="9r" to="10v"/>
<title type="complete" ref="LIT1560Gospel#IntroductionGessawe" evidence="conjecture"/>
</msItem>
</msItem>
Being able to add as many relation
elements as wanted, means that the articulation of the information does not need to be forced into one value or the assertion of certainty about authorship does not need to be an assumption.
These visualizations should help the user to obtain all the information needed without forcing
too much of our structure to the organization of knowledge traditional of the Ethiopian
context.
The Ethiopian literary tradition cannot be studied outside a wider context of Oriental
traditions. Thus, whenever possible, we will take the relationship between translations of
Greek and Arabic texts in Ethiopic with their Vorlage and versions in other
languages into consideration. The alignment of such units is complex and
requires referring to external reference works for specific traditions when they are
available.
We encode this information in two distinct places. We give all possible titles of a work
and assign types and @xml:id
s and @corresp
s in an orderly fashion.
<title xml:id="t1" xml:lang="en">(Ethiopic) Book of Kings 1</title>
<title xml:id="t2" xml:lang="gez">ነገሥት፡ ቀዳማዊ፡</title>
<title corresp="#t2" xml:lang="gez" type="normalized">Nagaśt qadāmāwi</title>
<title corresp="#t2" xml:lang="en">First Kings</title>
<title xml:id="t3" xml:lang="en">1 Sam (Septuagint)</title>
<title xml:id="t4" xml:lang="en">Book of Samuel 1</title>
In addition to this, we also link a bibliographic reference to relevant clavis in a bibl
element
<listBibl type="clavis">
<bibl corresp="#t3greek" type="CPG">
<ptr target="bm:CPG"/>
<citedRange unit="item">2257</citedRange>
</bibl>
</listBibl>
This information is then visualized in the list view and in the record view.
APIs
All project data is exposed through documented and testable data APIs, which allow
search, basic information and full record retrieval to all accredited users, both in XML and
JSON.
This is the core technology behind the app, developed with and
. It was primarily intended to serve data to our services and to
third parties in the formats required. For example the canonical titles of each record are
always computed from the available information. A sister application (not in the scope of
this presentation), which contains definitions of words, uses the API to get related records
and the text of quoted passages.
This API is already available for third-party users, although the status of the data,
still at its initial phases, does not allow us to advertise it yet. We hope people will find
it useful for example to resolve citations of literary works to the exact passage, or to
include representations of the data we curate in other resources.
Of special interest is the development in the context of the 's
endeavour of collection, text and citations APIs based on the TEI data.
Bibliography
Orlandi, Tito. A
Terminology for the Identification of Coptic Literary Documents. Journal of
Coptic Studies Proceedings of the Ninth International Congress of Coptic Studies, Cairo 2008,
no. 15 (2013): 87–94. doi:10.2143/JCS.15.0.3005414.
Bausi, Alessandro. Composite and Multiple Text Manuscripts: The Ethiopian Evidence. In “One-Volume
Libraries”: Composite Manuscripts and Multiple Text Manuscripts. Proceedings of the
International Conference, Asien-Afrika-Institut, Universität Hamburg, October 7–10, 2010,
edited by Michael Friedrich and Jörg Quenzer. Studies in Manuscript Cultures. Berlin - New
York: De Gruyter, forthcoming.
Uhlig, Siegbert (I-IV), and Alessandro Bausi (IV-V),
eds. Encyclopaedia Aethiopica. Wiesbaden: Harrassowitz,
2003-2014.
Andrist, Patrick, Paul
Canart, and Marilena Maniaci. La syntaxe du codex.
Bibliologia ; 34 ; Bibliologia 34. Turnhout: Brepols, 2013.
Consortium, T. E. I., Lou Burnard, Syd
Bauman, and others. TEI P5: Guidelines for Electronic Text Encoding and
Interchange. TEI Consortium, 2008.
Nosnitsin, Denis. DomLib/Ethio-SPARE Manuscript Cataloguing Database. In Paper Presented at the
COMSt Workshop The Electronic Revolution? The Impact of the Digital on Cataloguing.
Copenhagen, 2012. http://www1.uni-hamburg.de/ethiostudies/ETHIOSPARE/.
E-Codices May 2014.
http://www.e-codices.unifr.ch.
Michelson D.A., ed. Digital
Catalogue of Syriac Manuscripts in the British Library. Syriaca.org:
The Syriac Reference Portal; forthcoming. http://syriaca.org/mss/.
Saint-Laurent J.-N.M.,
Michelson D.A., eds. Qadishē: Guide to the Syriac Saints. The Syriac
Biographical Dictionary. Syriaca.org: The Syriac Reference Portal; forthcoming
2016. http://syriaca.org/q/.
Schwartz D.L., ed. SPEAR: Syriac Persons, Events, and Relations. Syriaca.org: The Syriac Reference
Portal.
http://syriaca.org.
Carlson T.A.,
Michelson D.A., eds. The Syriac Gazetteer. Syriaca.org: The Syriac
Reference Portal; 2014. http://syriaca.org/geo/.
Elliot, Tom, Gabriel Bodard,
Elli Mylonas, Simona Stoyanova, Charlotte Tupman, and Scott Vanderbilt. EpiDoc Guidelines: Ancient Documents in TEI XML, 2013-2007.
http://www.stoa.org/epidoc/gl/latest/.
“EAGLE Project” n.d.
http://www.eagle-network.eu/.
“SAWS Sharing Ancient Wisdoms project”
http://www.ancientwisdoms.ac.uk/method/ontology/.
SNAP:DRGN Standards for
Networking Ancient Prosopographies: Data and Relations in Greco-Roman Names
http://snapdrgn.net/ontology.
https://github.com/leoba/VisColl
http://visjs.org/
http://exist-db.org/
http://wwww.mycore.org/
http://commons.pelagios.org/
https://www.zotero.org/groups/358366/ethiostudies
http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-specification.html
https://github.com/distributed-text-services/