The project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea (Schriftkultur des christlichen Äthiopiens: eine multimediale Forschungsumgebung)  is a long-term project funded within the framework of the Academies Programme (coordinated by the Union of the German Academies of Sciences and Humanities) under the supervision of the Akademie der Wissenschaften in Hamburg funded for 25 years, from 2016–2040. The project is hosted by the Hiob Ludolf Centre for Ethiopian Studies at the University of Hamburg and aims to create a virtual research environment for the management of complex data related to the manuscript tradition of the Ethiopian and Eritrean highlands.
This is a huge challenge and opportunity for the community of scholars in Ethiopian history, philology, codicology and language, as the project will digitally catalogue all manuscripts already catalogued in the past in print but will also digitize texts in fidal (the script used by classical Ethiopic or Gǝʿǝz) meaning both manuscript transcriptions and editions of texts.
The project scope includes also a prosopography, inherited in part from the work of the Encyclopaedia Aethiopica and a gazetteer of Ethiopian places, both ancient and modern.
This means not just encoding but also creating semantically meaningful relationships between descriptions of manuscripts, works, places and persons, in a way which gives justice to the Ethiopian tradition and its specific characteristics. With a simple network of referenced entities in TEI as the background architecture, the project produces data at several different levels and serves it in multiple ways, displaying the contents and the semantic relationships encoded in it.
This article will highlight encoding choices for manuscripts' and works' descriptions and hint at the further development of persons' and places' records. Also the serialization of relationships and the data API [Distributed Text Services] will be presented.
Structuring the data
This section describes a few challenges faced in the attempt to structure the data architecture of the entire Ethiopic literary tradition and living manuscript production.
The project's first phase deals currently with the digitization of manuscript's catalogues and their structured description. A second phase will see also the digitization of texts, including both transcriptions of manuscripts and editions of texts. At the moment the largest portion of encoded Ethiopic text consists in the additions to manuscripts and already amounts to the largest existing collection of ancient Ethiopic documentary evidence.
Beta maṣāḥǝft inherited a lot of good work. To begin with, it could avail itself of the data used for the Encyclopaedia Aethiopica [EAe], a long-term project which led to the publication of an incredibly useful tool for the Ethiopian studies.
Secondly around 1000 manuscripts recorded in a MyCoRe XML database by the EthioSPaRe project where converted to the Beta maṣāḥǝft TEI schema. These shaped to a large extent the customization of the TEI schema prepared for the project initial phase, i.e. the digitization of catalogue records starting from out-of-copyright catalogues.
The type of information recorded by EthioSPaRe had to be included in a TEI formal manuscript description responding to a schema as similar as possible, for interoperability purposes, to those of projects like syriaca.org [Digital Catalogue of Syriac Manuscripts - Syriaca.orgSPEAR- Syriaca.orgQadishē - Syriaca.orgThe Syriac Gazetteer - Syriaca.org] and eCodices.
The major improvement introduced in this process was the possibility to nest and
msPart elements as well as
msItem elements, which leads
to a much more interesting description than a flat list of consecutive parts, as these
often not consecutive especially in a tradition that has mainly complex manuscript
On the other side many pieces of information like incipit, abbreviated words, etc.
would have belonged to the manuscript's
transcription and had to be accommodated in other parts of the
<list type="abbreviations"> <item> <abbr>ፍ፡</abbr> for <expan>ፍቱሐ፡ </expan> (e.g., <locus target="#36ra #36vb #39ra"/>) </item> <item> <abbr>ይ፡ ሕ፡</abbr> for <expan>ይበል፡ ሕዝብ፡ </expan> (fol. <locus target="#40va"/>) </item> </list>
Interaction with other projects
During this phase of theorization of the encoding structure and initial schema definition, Beta maṣāḥǝft has benefited from the work of many predecessor projects, like the ones mentioned above as well as EAGLE and the EpiDoc guidelines [Epidoc Guidelines 8.23] and schema.
The use of
relation elements was introduced at this stage in conformance
with existing ontologies, although in an entirely non canonical way, which we hoped
would allow us to visualize the stratigraphy, showing the history of the manuscript
and allow the encoding of what cannot be hierarchically structured.
To describe the connections between the different entities in Beta
maṣāḥǝft we take relations names from the classes of the SAWS [SAWS] and SNAP [SNAP:DRGN]
projects for example, but we reuse also relation's names used by syriaca.org or simply
geonames classes and properties names. We have had
sometimes anyway to add some value for
relation/@name but only for relations specific to the Ethiopic milieu. For example
bm:hasTabot indicates a relation between a person and a place, i.e. that the
@active subject of the relation, a person, has a Tabot in the place at
The following is an example where we use a relation element to state which works
might be parts of another literary work and we are able to point to the
specific subject and object of the relationship. Note also that we use Zotero EthioStudies tags for our bibliography,
exploiting the Zotero API to retrive the TEI needed for the full export of TEI XML
files and for the visualization in the app, which uses a Zotero Style common to our
<relation name="ecrm:CLP46i_may_form_part_of" active="LIT1978Mashaf" passive="LIT1586Hayman"> <desc>The Maṣḥafa ṭomār is mostly contained in manuscripts with the Hāymānota ʾabaw, where it can precede or follow the table of contents. It also circulates independently.</desc> </relation> <relation name="saws:isVersionInAnotherLanguageOf" active="LIT1978Mashaf#t1" passive="LIT1978Mashaf#t4arabic"> <desc>The Ethiopic version of the Maṣḥafa ṭomār was translated from the Arabic version based on a Syriac version. Guidi suggests that the Maṣḥafa ṭomār may have been translated in the mid-16th century. <bibl><ptr target="bm:Guidi1932storia"/><citedRange unit="page">72-73</citedRange></bibl> <bibl><ptr target="bm:Graf1944GCALI"/><citedRange unit="page">295-297</citedRange></bibl></desc> </relation> <relation name="saws:isVersionInAnotherLanguageOf" active="LIT1978Mashaf#t4arabic" passive="LIT1978Mashaf#t7syriac"> <desc></desc> </relation>
Although many relations might be inferred from the files with their name (for instance
relation between a manuscript and the repository or collection in which it is preserved),
and more complex relations, like authorship of a work, fail to be easily represented
simple fact stated in an
author element. Often in the Ethiopic tradition as in
many other mediaeval traditions, attested literary works are attributed to authors
be assigned at all to an editorial will. A work can have as many authors assigned
needed and we use a relation element for this
purpose in order to connect author or translator and the exact part of work that is
attributed to him in each case.
<relation name="saws:isAttributedToAuthor" active="LIT1036Agains" passive="PRS9481Theodoto"/>
In the following sections more details will be provided on the structuring of single entities, focusing especially on manuscripts and literary works, which are the ones most developed at the current stage of the project.
Anyone who has worked with codicologists and philologists knows it is very difficult to satisfy everybody. Different specialists will have their own opinion on the importance of different parts of data and how to represent it.
Corpus analysis should also take into consideration the material structure of supports. The manuscript collation (structure of leaves and quires) should be considered when collating philologically texts coming from different manuscripts or manuscripts' parts.
For the collation of the manuscript we use the very nice VisColl tool by Dot Porter [VisColl] to visualize the structure of the quires, extracting the information from a simple list encoding.
<list> <item xml:id="q1"> <dim unit="leaf">7</dim> <locus from="1r" to="7v"/> I(7; s.l.: 1, stub after 7/fols. 1r-7v) </item> <item xml:id="q2"> <dim unit="leaf">8</dim> <locus from="8r" to="15v"/> ?II(8/fols. 8r-15v) </item> <item xml:id="q3"> <dim unit="leaf">8</dim> <locus from="16r" to="23v"/> ?III(8; s.l.: 2, stub after 6; 7, stub after 1/fols. 16r-23v) </item> .... </list>The above XML structure is transformed in the intermediate format required by the tool by an XSLT and is then passed as a parameter to the VisColl XSLT library to produce the following result
Figure 1: Manuscript Contents Comparison
Lists of contents are vital to understand what a manuscript contains and can alone give all the information a user actually needs. In Beta maṣāḥǝft these summary links to records with the data about each of the works or parts of works listed in the manuscript.
<msItem xml:id="ms_i1.6"> <locus from="105r" facs="f223"/> <title type="complete" ref="LIT1812GospelLuke"/> <msItem xml:id="ms_i1.6.1"> <locus from="105r" facs="f223"/> <title type="complete" ref="LIT1812GospelLuke#TituliLuke"/> </msItem> <msItem xml:id="ms_i1.6.2"> <title type="complete" ref="LIT2713Luke" xml:lang="gez">ወንጌል፡ ዘሉቃስ።</title> <note> The text is divided in 85 (or 83) sections, which correspond approximately to the old <foreign xml:lang="grc">τίτλοι</foreign> . </note> <note> The pericopes, the chapters of Ammonius and the 343 canons of <persName ref="PRS3912Eusebius"/> are indicated in the margins. </note> </msItem> </msItem> <msItem xml:id="ms_i1.7"> <locus from="163r" facs="f339"/> <title type="complete" ref="LIT1693John"/> <msItem xml:id="ms_i1.7.1"> <locus from="163r" facs="f339"/> <title type="complete" ref="LIT1693John#TituliJohn"/> </msItem> <msItem xml:id="ms_i1.7.2"> <title type="complete" ref="LIT2715John" xml:lang="gez">ወንጌል፡ ዘዮሐንስ።</title> <note> The text is divided in 20 (or 19) sections, which correspond approximately to the old <foreign xml:lang="grc">τίτλοι</foreign> . </note> <note> The pericopes, the chapters of Ammonius and the 232 canons of <persName ref="PRS3912Eusebius"/> are indicated in the margins. </note> </msItem> </msItem>When a title is given in the catalogue, then it is used otherwise the canonical title of the work is chosen.
Figure 2: Manuscript Contents Summary
While we don't yet have the transcriptions of the main texts of the manuscript, we do have a lot of transcriptions of incipits, explicits, colophons and additions to the main manuscripts text of several kinds.
<additions> <list> <item xml:id="a1"> <locus target="#205v" facs="f424"/> <desc type="ScribalNoteCompleting"/> <q xml:lang="gez"> ለዘአጽሐፋሂ፡ ወለዘጸሐፋሂ፡ ሕቡረ፡ ይምሐሮሙ፡ እግዚኣብሔር፡ በመንግሥተ፡ ሰማያት፡ አሜን። ወለእመቦ፡ ዘወስኩ፡ ወዘአንተጉ፡ ወዘገንጰልኩ፡ ስረዩ፡ ወባርኩኒ፡ ለዓለመ፡ ዓለም፡ አሜን። </q> </item> <item xml:id="a2"> <locus target="#1r" facs="f15"/> <desc type="DonationNote"> Note from the king <persName ref="PRS8595SayfaAr"/> , son of <persName ref="PRS1854Amdase"/> , by which he donates this Gospel to the monastery of <placeName ref="INS0308DQ"/> at <placeName ref="LOC2600Dayral"/> </desc> <q xml:lang="gez"> <supplied reason="undefined" resp="PRS10747Zotenbe">በ</supplied> አኰቴተ፡ አብ፡ ወወልድ፡ ወመንፈ <supplied reason="undefined" resp="PRS10747Zotenbe">ስ፡ ቅ</supplied> ዱ። ወሀብኩ፡ አነ፡ <persName ref="PRS8595SayfaAr">ሰይፈ፡ ዐርኣድ፡</persName> <supplied reason="undefined" resp="PRS10747Zotenbe">ን</supplied> ጉሥ። ወልደ፡ <persName ref="PRS1854Amdase">ኣምደ፡ ጽዮን፡</persName> ንጉሥ። <supplied reason="undefined" resp="PRS10747Zotenbe">በ</supplied> ስመ፡ መንግሥትየ፡ ቈስጣንጢኖስ። ዘንተ፡ ወንጌለ፡ ቅዱስ። ለ <placeName ref="INS0308DQ">ቤተ፡ ሐዋርያት፡ ዘደብረ፡ ቍስቋም።</placeName> እንዘ፡ እሰገድ፡ በብረከ፡ መንፈስ። ኀበ፡ ተኀብኣ፡ ውስቴቱ፡ መላኬ፡ ሥጋ፡ ወነፍስ። ምስለ፡ እሙ፡ ንጽሕት፡ ድንግል፡ ዘእንበለ፡ ርኵስ። <gap reason="ellipsis" extent="unknown" resp="PRS10747Zotenbe"/> ወለእመቦ፡ ዘሄዶ፡ ወዘተዓገሎ፡ ለዝንቱ፡ ወንጌል። ይኩን፡ ቅዉመ፡ ወርጉመ፡ <supplied reason="undefined" resp="PRS10747Zotenbe">በ</supplied> ቅድመ፡ አብ፡ ወወልድ፡ ወመንፈስ፡ ቅዱስ። በዝ፡ ዓለም፡ ወበዓለም፡ ዘይመጽእ፡ ለዓለመ፡ ዓለም፡ አሜን፡ ወአሜን። </q> </item> .... </list> </additions>These, as well as decorations and keywords, are also separately visualized as a navigation aid and as small corpora of organized documentary sources.
Figure 3: Manuscript Additions
Dating is a central issue for the semantic organization of text corpora, but the timespan in a manuscript catalogue is very wide also within each manuscript. Manuscripts from this live tradition may present multiple layers of modification (material, textual, etc.) continuing until today. While encoding this is fairly easy because dating elements can be added anywhere in a TEI structure, this information offers challenges in the analysis. We encode dates with several attributes like in the following example.
<date calendar="ethiopian" when="1888" when-custom="1881" evidence="internal">፲ወ፰፻፹ወ፩ ዓመት፡ ምሕረት፡ </date>This is displayed in the record as a simple table, but is also used to extract information to build a timeline of the manuscript transforming the XML to feed a script using Vis.js.
Figure 4: Dates Display
Figure 5: Timeline
A way to encode the manuscript stratigraphy following Andrist Maniaci et al. 2013 is currently under development and will hopefully be presented at the next TEI conference.
A single work can be contained in several manuscripts in different parts and together with different texts. The selection and organization of these texts is also very important and might provide useful information. We then offer a way to pick a work from our canonical list and see the structure of all manuscripts which contain it side by side. Hoovering on any title will also highlight if and where the same work appears in another manuscript. The manuscripts are here organized chronologically as much as possible to give also a good starting point for an evaluation of the history of the manuscript tradition of the selected work.
Figure 6: Manuscript Contents Comparison
Textual and Narrative Units
The Ethiopic manuscript culture is a living tradition. Manuscripts are still being produced today. The literary works they contain vary and are difficult to identify. The author of this paper is not expert on this and will not dive into this issue if not in as much as it concerns the architecture of the Web application serving the data.
In the Oriental tradition it is difficult to uniquely identify texts under titles (a problematic category) and thus define distinct corpora. In our project we try to identify and relate all independently circulating units and to keep a core distinction between textual and narrative units as suggested by Orlandi 2013. Although this is reflected in the encoding of texts, until now no satisfactory approach has been found to exploit this distinction. A solely narrative unit like a set miracle account used for saints or kings can be used to identify parts of a text. On the other side a textual unit might still be considered a literary work in its own right or not.
Figure 7: Works in the Ethiopic Literature Clavis.
In the Ethiopic literary tradition the concepts of work (as a quantifiable unit) and that of collection (as a grouping unit) are not available in the same way as they are, almost intuitively, for Western European corpora. Ethiopian literature is characterized especially by its compilatory character. The treatment of work collections (homilies, hagiographies, etc.) as grouping units is thus crucial to the project. How can the relationships between texts, manuscripts, narrative units which are so complex and variable be articulated with certainty in a canonic structure? Working with the outlines defined by Orlandi 2013 and Bausi 2010, it is one of the main concerns at the current stage to find working definitions of these concepts which can accommodate all Ethiopian literary works. How should this distinction impact textual-critical analysis, computation and hermeneutic?
The entire traditional hierarchy of literature falls apart in the face of this problem, but encoding might still help in organizing the unorganized and TEI provides here the necessary rigour and flexibility. A work can have parts which do not have independent circulation in the Ethiopian tradition but can still be used as a reference in a list of contents of a manuscript.
<msItem xml:id="ms_i1.1"> <locus from="3r" to="10v"/> <title type="complete" ref="LIT1560Gospel#IntroductionGospels">Introduction to the Gospels</title> <msItem xml:id="ms_i1.1.1"> <locus from="3r"/> <title type="complete">Maqdǝma wangel, “Introduction to the Gospels”</title> </msItem> <msItem xml:id="ms_i1.1.2"> <title type="complete" ref="LIT1349Epistl">Letter of Eusebius to Carpianus</title> </msItem> <msItem xml:id="ms_i1.1.3"> <locus from="4r" to="8v"/> <title type="complete" ref="LIT1560Gospel#IntroductionCanons">Canon tables</title> </msItem> <msItem xml:id="ms_i1.1.4"> <locus from="9r" to="10v"/> <title type="complete" ref="LIT1560Gospel#IntroductionGessawe" evidence="conjecture"/> </msItem> </msItem>
Being able to add as many
relation elements as wanted, means that the articulation of the information does not need
to be forced into one value or the assertion of certainty about authorship does not
need to be an assumption.
Figure 8: Manuscript Contents Comparison
These visualizations should help the user to obtain all the information needed without forcing too much of our structure to the organization of knowledge traditional of the Ethiopian context.
The Ethiopian literary tradition cannot be studied outside a wider context of Oriental traditions. Thus, whenever possible, we will take the relationship between translations of Greek and Arabic texts in Ethiopic with their Vorlage and versions in other languages into consideration. The alignment of such units is complex and requires referring to external reference works for specific traditions when they are available.
We encode this information in two distinct places. We give all possible titles of
and assign types and
@corresps in an orderly fashion.
<title xml:id="t1" xml:lang="en">(Ethiopic) Book of Kings 1</title> <title xml:id="t2" xml:lang="gez">ነገሥት፡ ቀዳማዊ፡</title> <title corresp="#t2" xml:lang="gez" type="normalized">Nagaśt qadāmāwi</title> <title corresp="#t2" xml:lang="en">First Kings</title> <title xml:id="t3" xml:lang="en">1 Sam (Septuagint)</title> <title xml:id="t4" xml:lang="en">Book of Samuel 1</title>In addition to this, we also link a bibliographic reference to relevant clavis in a
<listBibl type="clavis"> <bibl corresp="#t3greek" type="CPG"> <ptr target="bm:CPG"/> <citedRange unit="item">2257</citedRange> </bibl> </listBibl>
This information is then visualized in the list view and in the record view.
Figure 9: Works and connection to other traditions canonical ids.
All project data is exposed through documented and testable data APIs, which allow search, basic information and full record retrieval to all accredited users, both in XML and JSON.
This is the core technology behind the app, developed with eXist-db and RESTXQ. It was primarily intended to serve data to our services and to third parties in the formats required. For example the canonical titles of each record are always computed from the available information. A sister application (not in the scope of this presentation), which contains definitions of words, uses the API to get related records and the text of quoted passages.
This API is already available for third-party users, although the status of the data, still at its initial phases, does not allow us to advertise it yet. We hope people will find it useful for example to resolve citations of literary works to the exact passage, or to include representations of the data we curate in other resources.
Figure 10: Works and connection to other traditions' canonical IDs.
Of special interest is the development in the context of the Distributed Text Services's endeavour of collection, text and citations APIs based on the TEI data.
[Orlandi 2013] Orlandi, Tito. A Terminology for the Identification of Coptic Literary Documents. Journal of Coptic Studies Proceedings of the Ninth International Congress of Coptic Studies, Cairo 2008, no. 15 (2013): 87–94. doi:https://doi.org/10.2143/JCS.15.0.3005414.
[Bausi 2010] Bausi, Alessandro. Composite and Multiple Text Manuscripts: The Ethiopian Evidence. In “One-Volume Libraries”: Composite Manuscripts and Multiple Text Manuscripts. Proceedings of the International Conference, Asien-Afrika-Institut, Universität Hamburg, October 7–10, 2010, edited by Michael Friedrich and Jörg Quenzer. Studies in Manuscript Cultures. Berlin - New York: De Gruyter, forthcoming.
[EAe] Uhlig, Siegbert (I-IV), and Alessandro Bausi (IV-V), eds. Encyclopaedia Aethiopica. Wiesbaden: Harrassowitz, 2003-2014.
[Andrist Maniaci et al. 2013] Andrist, Patrick, Paul Canart, and Marilena Maniaci. La syntaxe du codex. Bibliologia ; 34 ; Bibliologia 34. Turnhout: Brepols, 2013.
[TEI Guidelines] Consortium, T. E. I., Lou Burnard, Syd Bauman, and others. TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium, 2008.
[EthioSPaRe] Nosnitsin, Denis. DomLib/Ethio-SPARE Manuscript Cataloguing Database. In Paper Presented at the COMSt Workshop The Electronic Revolution? The Impact of the Digital on Cataloguing. Copenhagen, 2012. http://www1.uni-hamburg.de/ethiostudies/ETHIOSPARE/.
[Digital Catalogue of Syriac Manuscripts - Syriaca.org] Michelson D.A., ed. Digital Catalogue of Syriac Manuscripts in the British Library. Syriaca.org: The Syriac Reference Portal; forthcoming. http://syriaca.org/mss/.
[Qadishē - Syriaca.org] Saint-Laurent J.-N.M., Michelson D.A., eds. Qadishē: Guide to the Syriac Saints. The Syriac Biographical Dictionary. Syriaca.org: The Syriac Reference Portal; forthcoming 2016. http://syriaca.org/q/.
[Epidoc Guidelines 8.23] Elliot, Tom, Gabriel Bodard, Elli Mylonas, Simona Stoyanova, Charlotte Tupman, and Scott Vanderbilt. EpiDoc Guidelines: Ancient Documents in TEI XML, 2013-2007. http://www.stoa.org/epidoc/gl/latest/.
 Beta maṣāḥǝft is a TEI (Text Encoding Initiative, www.tei-c.org) project. We use GitHub to store our data and we deploy our data with eXist-db (http://eXist-db.org/). The project is extremely indebted for their continued support to the TEI and eXist-db communities. We would like to thank the team at the Hiob Ludolf Centre for Ethiopian Studies, the Akademie der Wissenschaften in Hamburg and the University of Hamburg for their support to the project and to the preparation of this contribution.
 The Distributed Text Services project is currently under development. We are just early and eager users of it as we believe it will serve very well the community.