The Journey of The History of the Accademia di San Luca, c. 1590-1635: Documents from the Archivio di Stato di Roma into and out of XML

Peter M. Lukehart

Associate Dean and Project Leader

Center for Advanced Study in the Visual Arts, National Gallery of Art

Copyright ©2018 Board of Trustees, National Gallery of Art, Washington

The Journey of The History of the Accademia di San Luca, c. 1590-1635: Documents from the Archivio di Stato di Roma into and out of XML

Balisage: The Markup Conference 2018
July 31 - August 3, 2018

(SLIDE 1) I want to thank Tommie Usdin for the invitation to speak today. As a former student in one of her intensive summer programs, where I learned so much about the basic elements of XML, this early training helped guide the structuring of our research database, The History of the Accademia di San Luca, c. 1590–1635: Documents from the Archivio di Stato di Roma, from its conception. I am doubly grateful that she invited a traitor to the cause to speak about our reasons for migrating from XML to HTML. Taking a page from the Helsinki lexicon, perhaps what I should initially have said was: I couldn’t imagine that we wouldn’t use XML in perpetuity. Somewhere in that muddle of verb tenses and double negatives lies the tale I have to tell today.

Before engaging in the content of this conference on mark-up, it would be useful to know something about this project and its ambitions. Drawing from the original statutes, the proceedings of meetings, the ledger books, as well as notarial and court records, The Early History of the Accademia di San Luca, c. 1590-1635 brings together a large number of previously unpublished documentary materials concerning one of the first artists’ academies in Europe and the model for most subsequent institutions for the teaching of art worldwide for four centuries. Conceived as two complementary tools for researchers and scholars of early modern Europe, the database of documentation on the website ( and the printed volume of interpretive studies, The Accademia Seminars: The Early History of the Accademia di San Luca in Rome, c. 1590-1635 (Washington: National Gallery of Art, 2009), shed light on the foundation, operation, administration, and financial management of the fledgling academy from its foundation in 1593 to its consolidation as a teaching institution with its own titular church designed by Pietro da Cortona around 1635.

(SLIDE 2) In 2007, when we first undertook the creation of a research database of documents concerning the early history of the Accademia di San Luca in Rome, we were fully committed to following the guidelines of the Text Encoding Initiative (TEI: Based principally on early modern documents in Latin and Italian, the project seemed perfectly tailored for the rich organizing and searching capabilities TEI provides. It also promised to be a future-proof and sustainable platform, anchored as it was in XML, a textual program that is non-proprietary. Our first step was to secure funding for the project, which we did thanks to a large grant from the Getty Foundation in Los Angeles. One of our first tasks was to hire a consultant for the encoding of the documents. Colleagues suggested that we contact David Seaman, a pioneer and longstanding advocate of TEI, which was one of the wisest steps we could have taken. With David’s help, we created a small team of two art historians, a paleographer who made the official transcriptions of all the documents identified by the team, a text encoder, and a p.i. (me) responsible for overseeing all aspects of the project.

For its part, the National Gallery of Art (my employer) provided technical support from the Web team (principally responsible for formatting and styling the pages) and IT support in the form of our code writer and web architect, Richard (Ric) Foster. Ric, like David, was able not only to accomplish large quantities of work against tight deadlines, but also to innovate. Ric wrote the codes (largely in Perl script [this being the early naughts]) that allowed us to automate the tagging of personal names, dates (it was David’s bright idea to make all dates machine readable—maddening for the Europeans, but logical for the coders and the database), and places. This process, which shaved weeks—if not months—off the text mark-up, took a huge burden away from the text encoder on our team who then had to clean up the reduced number of items that remained: implied names (repeated first names without the surname attached) and places (assuming locations within Rome by contextual clues); words that went over multiple lines or folios; people identified only by their title; etc. The encoder was also responsible for marking up key words, document type, and notary. These terms were too idiosyncratic to automate.

Since I am addressing serious XML users, I wanted to share several images of the workflow that Ric created for the project (SLIDE 3). The first slide shows the Site Content production and Processing. The left side addresses conversion of the transcriptions from MS word files (Accademia team), through XHTML to rough TEI XML (IT data processing) to manual clean-up of the TEI XML (Accademia team) to the development folder (Accademia team). On the right, the processing side, the Accademia team input the 2nd draft TEI files, the bibliographies, person IDs, and images files all in XLS. When completed, these files would trigger the application of supporting XML. And finally, we arrive at the promotion to the Web, where the content and search apparatus are displayed in HTML; the underlying TEI mark-up was always available as a clickable asset for any researcher who chose to view it. In addition to the names, places, keywords, document types, notaries, and dates, there was also metadata concerning the authors of the content and mark-up; the date it was last worked on; and the source in the archives in Rome. [I should mention, too, that this metadata is not preserved in the current Accademia website; we have to indicate it in the HTML. It is no longer embedded in the documents, which is a loss.]

The second slide (SLIDE 4) is the one I used to take on the road to share with my fellow art historians so that they could get a general sense of the process of production, conversion, processing, and promotion to the Web. It served as a Reader’s Digest version. With a grant from the Samuel H. Kress Foundation in 2010, the year the site was launched, we made presentations to researchers in Rome, Florence, Pisa, Genoa, Paris, London, Oxford, Cambridge, and Toronto, as well as New York, Washington, Los Angeles, and Chicago (these latter domestic events took place over several years at the Center for Advanced Research in the Visual Art’s [CASVA’s] expense).

From its launch in 2010 until about 2014, the publicly accessible site performed very well and it served tens of thousands of international researchers. What we had not anticipated, however, was that our bespoke website depended very much on the knowledge and expertise of our web architect, who had created a hybrid of TEI that allowed for the automatization of tag creation just described and for the site to interact with content on other areas of the National Gallery of Art’s (NGA) website (under whose umbrella we function). Ric Foster’s untimely death in 2010 put the project in a precarious position vis-à-vis making corrections and updates, as the coding was not documented. For the first two years we were fortunate not to have any major incidents; by 2012-2013, we wanted to integrate corrections to the transcriptions; add new documents, bibliography and images; and create a mapping function using geographic information systems. Each change to the site, addition of content, upgrade, or new feature required hiring consultants at high cost. Beyond the expense, we also had to find tech support who were able to understand the web architect’s coding and create fixes. We did locate one consultant who was able to accomplish this, but the cost limited the number of times we could call on him for assistance.

Adding to our predicament, the National Gallery of Art had changed platforms for its own website and was henceforth requiring that all projects conform with its web content management program: first Adobe Communiqué 5 (CQ), now Adobe Experience Manager (AEM). The motivation was, in part, to find a content management program that would allow staff members to create and update their own pages independently and with minimal training, and, in part to deploy one program across web, on-site, and mobile devices. Any outliers were henceforth responsible for providing all of their own maintenance and consulting needs, an expensive and time-consuming prospect as we knew painfully well. Faced with these compelling challenges (gun to head, is the way I analogized it), we decided to yield to forces beyond our control and thus migrated the entire website from TEI to HTML, which took over a year—from mid-2014 to late 2015.

Which brings us to the second part of my talk, where I will summarize the process of migration. My colleagues in IT suggested that I use the word recreate rather than migrate, since we did not have the luxury of converting all or really any of the marked-up text into CQ. Instead, every one of the 1300 names, as well as hundreds of terms, places, document types, notaries, and dates had to be hand tagged. The documents were still the source of content for our website; however, they were no longer the database itself. In describing this process, I need to offer my deepest gratitude to the Accademia team: Silvia Tita, Courtney Tompkins, Chelsea Cole, and Benjamin Zweig (this summer Hannah Segrave has helped to create this brief history of the migration). They did the lion’s share of the work I will now illustrate.

The first slides show the workflow envisioned by the Gallery’s IT staff in tandem with the consultants/developers from Ukraine who were building the architecture for our new site. In these slides (SLIDES 5-7 showing steps 1-4), you see that one of the most essential aspects of the site was that it is bilingual (English and Italian), which was easily mirrored in TEI. In CQ some content could be copied by the developers across the two languages; but others had to be entered by hand by the team. (SLIDE 6) Adobe AEM considers English and Italian versions completely independent …. From the broken English (SLIDE 7) that follows I am relatively certain that this description was created by the Ukrainian team. What our team provided for the developers were storyboards that showed what we wanted the pages to look like and what kind of faceted searching we envisioned. If anyone would like to review these slides, I am happy to share them, but to save time I will mention only highlights.

(SLIDE 8) shows the Translation of authored content. (SLIDE 9) The English pages were considered the templates for the Italian pages. The former were most often copied to become the basis for the latter. (SLIDE 10: select Accademia tags) Here, the tags intended for searching also had to be translated using the Tag Manager. Customized tags (SLIDE 11) had to be created by the developers, and our team then authored and moved the content to the appropriate categories.

At this point, it would be important to mention a political exigency we had not counted on: in the midst of our data migration we learned that our Ukrainian developers lived in the contested region of Crimea just as it was being annexed by Russia. You can imagine our concern both for the safety of our new-found colleagues and for the future of our site. Amazingly, some team members moved in with families in Russia; others found ways to remain online and working in Crimea. In the end, we did not fall behind on our timeline. The geopolitical implications of the annexation of Crimea are another matter.

(SLIDE 12: Person pages) One of the most valuable aspects of our website is that it provides the names of 1300 artists, church officials, government officials, patrons, landlords, tenants, and other inhabitants of the city of Rome in the late sixteenth and seventeenth centuries. The following slide shows how a Person page (SLIDE 13) was created in CQ. Again, the Accademia team provided the desired structure and the developers created the architecture for its production. Our team then provided the relevant items for searching.

One of the advantages of working on the same platform as the NGA website is that we are able (SLIDE 14: step 3) to populate the images on the Person pages with content already in TMS (the collections management system used by the Gallery; that is paintings, drawings, prints, photographs, sculpture, etc.). Further (SLIDE 15), we are easily able to integrate images from the Digital Asset Management system (from our own collections and those of the Gallery). Similarly (SLIDE 16), we can access the Gallery’s library catalogue to link bibliographic references to books in the collection (when the book is outside of our system, the link goes to WorldCat).

I have reserved discussion of the migration of the document page to the end, because the documents (SLIDE 17) and the digital images of them did depend on the naming protocols that we established in the 1.0 version of the site. (SLIDE 18) We reused the titles and the document id numbers; the repository name, number, and date could therefore be populated from the metadata. (SLIDE 19) As you can see the summaries and all the tags had to be recreated by hand, using the language and organization of the original site. (SLIDE 20) The text of the transcription, by contrast, could easily be copied (but without the links and mark-up, which had to be deleted). (SLIDE 21) Footnotes in CQ are no longer embedded, and there is no mouseover in the text; rather, they are identified by the location at the bottom of the transcription page(s). It is not an ideal solution, but it is a workable one.

Good news accompanies this tale of the loss of our foundation in XML principles: one of the most significant benefits is that members of the team can now add content (SLIDE 22 new doc and transcription by paleographer Roberto Fiorentini) without learning XML or TEI. As we have seen, there are templates for the new site that are easily mastered, and team members can upload new documents, create tags, add images, bibliography, and the like after a few days of training in CQ/AEM.

In addition, there is greater interoperability with the other areas of digital content on the NGA’s website: the newly migrated website provides faceted search components that allow the user to explore the documents by using names, keywords, document types, places, notaries, and year dates, just as the original site did (SLIDE 23: screen capture). Although the structure of the data is different, search results remain as accurate and complete as before, while including significant enhancements. For example, researchers can now either select a single category for searching or combine guided searches in up to six categories. Searchable names (now numbering around 1,300) (SLIDE 24: Screen capture) include those of artists and artisans as well as individuals constituting a wide swath of the population of Rome who transacted business with members of the Accademia.

The site now provides pages for all of the individuals mentioned in the documents, including references and links to the documents in which their names appear, with a new feature that indicates the role or roles that they played in Roman society and/or the Accademia, if retrievable. For well-known artists or artists who contributed significantly to the life of the Accademia, the site now incorporates artists’ pages that include not only links to the documents in which they are named but also selected bibliographies, related images, and in some cases portraits. The site’s original features have been completely updated and re-edited to correct errors and inconsistencies as well as to incorporate new information. Bibliographies are linked either to the catalog of the National Gallery of Art Library or to WorldCat (SLIDE 25) so that researchers can access complete bibliographic information for every reference.

Most of the works of art originally represented were from the collection of the National Gallery of Art, with about a dozen from other museums that house paintings from the Samuel H. Kress Collection (by special agreement with the Samuel H. Kress Foundation). Hundreds more related works of art by academicians from museums throughout the world have been added in the two years since the migration. We have carefully curated these images from institutions that are now making them freely accessible to the public (such as the NGA, the Metropolitan Museum of Art, the J. Paul Getty Museum, and the Yale University Art Gallery, among others); thus, they are of the highest quality and resolution and there are no restrictions on viewing and downloading them.

In addition, the Accademia project team completed the creation of a mapping feature that allows researchers to locate places mentioned in the documents on four historic maps of Rome. Once again, we have benefited from the NGA’s participation in the International Image Interoperability Format (IIIF) initiative. Briefly, there are a growing number of museums, universities, and libraries that are making high-definition images freely accessible to the public. This initiative offers researchers and scholars high quality images that can be compared, zoomed for fine-grained analysis, written on (with texts or notes), and layered. I have cued a (SLIDE 26: screencast) screencast of our maps, dating from Étienne Dupérac in 1577 to Antonio Tempesta in 1593, to Giovanni Maggi in 1625, to Giovanni Battista Falda in 1676, so that you can see how we are taking advantage of these new image technologies. Since the time that the screencast was made last summer, we have added map pins and mouseovers with short entries; a series of five longer essays; bibliographies; comparative images; and soon there will be destination links to rare guidebooks from the sixteenth to the eighteenth centuries from the NGA’s library.

Finally, migration to the platform used by the National Gallery of Art website ( will ensure the long-term sustainability and extensibility of The History of the Accademia di San Luca, c. 1590‒1635: Documents from the Archivio di Stato di Roma. With apologies for preaching to the choir among the digitally savvy, we can never take long-term maintenance and support for granted.

I have a short coda that provides an afterlife for the original site and the documents marked up in TEI. For the first few months after the migration of the site from XML to HTML, the old site and new site ran parallel to one another, with a short text on the original site explaining that it had been de-commissioned and that a newer version was now available. For a brief time, the old site automatically re-directed users to the new site. After six months, the original site was taken off public view and placed onto the NGA intranet. This decision was made following several discussions between the Accademia team and the Technical Services (TS) department about how to preserve the original site. First, TS tried to make a copy of the original site using the Heritrix archiving software, but it was only partially successful, as images, XML files, and site Cascading Style Sheet (CSS) could not be properly retrieved. Then the decision was made to place it on the NGA’s intranet as a stopgap until such time as a long-term archiving solution could be reached. We went back and forth in discussion about releasing the xml data by means of github or some other such source (with the caveat that the data was now outdated). The Accademia project team is concerned about and committed to the long-term preservation of the NGA’s digital projects, which constitute an important part of the institution’s history and its mission to educate. In the end, we could not find a long-term means of archiving a wide swath of the website; instead, we made an 8-minute screen cast of the principal pages (landing page, description of the project and site; search page with sample searches; and all the other major components of the site: images, bibliography, team members, funders, and partners. We have also decided to share the text-encoded data with researchers or organizations who have a legitimate scholarly interest. (SLIDE 27: screencast)

We have focused in this video on the components of the original site and the centrality of TEI to the structure and retrievability of its contents, highlighting typical searches by artist’s name, key term, and date, among others. Together with making available the encoded data, we hope in this way to document our own early history with regard to this successful, if regrettably short-lived, engagement with XML mark-up.


The author acknowledges support from Benjamin Zweig, Robert H. Smith Postdoctoral Research Associate; Silvia Tita, Research Associate; and Hannah Segrave, Summer Graduate Intern on the project described in this paper. He also thanks Veronica Ikeshoji-Orlati, the incoming Robert H. Smith Postdoctoral Research Associate, for her help with technical and editorial matters.