How to cite this paper

Beshero-Bondar, Elisa E. “Text Encoding and Processing as a University Writing Intensive Course.” Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020).

Balisage: The Markup Conference 2020
July 27 - 31, 2020

Balisage Paper: Text Encoding and Processing as a University Writing Intensive Course

Elisa E. Beshero-Bondar

Professor of Digital Humanities

Program Chair of Digital Media, Arts, and Technology

Penn State Erie, the Behrend College

Elisa Beshero-Bondar is a member of the TEI Technical Council, as well as Professor of Digital Humanities and Program Chair of Digital Media, Arts, and Technology at Penn State Erie, the Behrend College. Until June 2020, she was a professor of English Literature and Director of the Center for the Digital Text at Pitt-Greensburg which has featured markup languages as a foundation of a curriculum in Digital Studies. Her projects involve her in experimentations with the TEI, including refining methods for computer-assisted collation of editions and probing questions of interoperability to reconcile diplomatic and critical edition encodings, as with the Frankenstein Variorum. She is the founder and organizer of the Digital Mitford project and its usually annual coding school. Her ongoing adventures with markup technologies are documented on her development site at

Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license


Can learning markup languages and coding constitute a writing intensive experience for university students? Having taught undergraduate students from a wide range of majors from humanities to the sciences to develop web-based research projects with the the XML family of languages, the author proposes that such coursework should fulfill a common state university requirement of a research-oriented writing intensive or "writing-across-the-curriculum" course. Although coding and programming skills are often represented as a kind of literacy, the idea that learning markup technologies may constitute an intensive writing experience is less familiar. This paper calls for teaching markup technologies widely in a cross-disciplinary context to give students across the curriculum an accessible yet intensively challenging course that integrates coding and writing to investigate research questions.

Not just any course that introduces students to markup should be considered writing intensive. Arguably, a class that involves tagging exercises without project development and that invites students to write reflectively about the experience is not engaging in an intensive way with the writing work we associate with coding in the full experience of developing a project. This paper argues that a writing intensive course involving the XML family of languages should require algorithmic problem-solving and decision making, research and citation of related projects, task management, and documentation to share the work and help others to build upon it. A course offering such experiences should be accessible to students from several disciplines, whether to a junior year English or history major with little to no programming experience, or a junior year computer science or information technology major with an interest in applied programming, unused to research questions that drive humanities scholarship. In presenting its case, this paper discusses the pedagogical theory and practice of teaching composition and code as well as the concepts of blind interchange and literate programming important to the XML markup community.

Table of Contents

Coding literacies and writing modalities
Writing intensively with the XML family of languages
Starting from a mess
Developing a schema, cultivating interchange, and writing professionally
Writing-intensive querying, processing, and transforming
The writing intensive work of interchange: two university projects on Emily Dickinson
Writing about code to pay a project forward
Coda: Teaching markup languages during a pandemic

Coding literacies and writing modalities

To any academic who remembers the exhilaration of navigating around an Andrew File System and posting on it a paperless syllabus in HTML back in the 1990s, the application of the phrase teaching with technology in the university discourse of the 2010s may have a jarring sound. Scholarly markup geeks from a past century have lived to see technology for education be packaged in learning management systems that securely take care of all of our electronic interactions with students from posting announcements to grades. Even for 90s markup geeks, learning our way around the technologies driving Blackboard, Canvas, or Moodle can be abstruse and fretfully time-consuming, especially when we attempt to apply their byzantine integrations with various proprietary software applications like Zoom or Panopto. Publications supporting educational technology innovations make evident that educational software applications are universally expected to be integrated and accessed from within a learning management system (LMS). For example,, an open-source and fully open access technology for web annotation is now packaged and funneled into an LMS with a proprietary gradebook integration, without which faculty may not be aware of its existence or understand how to apply it in their courses.[1] A group of faculty from Seton Hall university found that teaching with technology, can be the way to bridge perceived differences between disciplines, especially at the developmental stages of a writing-across-the-curriculum program, and in their work to develop new technologically-enhanced courses, they concentrate on the interactive writing opportunities afforded by faculty’s encounter with a university-wide investment in a learning management system.[2] We now rely on markup-based management architectures that control university website delivery and our work in the technology of higher education is widely presumed to be about form-filling, at most using a limited tag set from a menu system. We have been acculturated to expect that composition and digital media classes like Writing for the Web are and should be about writing in Wordpress templates that provide carefully constrained and little-investigated access to the code layer. We do not ride on the railroad; it rides upon us, wrote Henry David Thoreau, and we might as well update that for today’s university content management systems: we do not direct the content management system, it directs us.[3]

Although we have little practical choice but to commit to developing course materials in frameworks that we do not choose for ourselves, those frameworks (e.g. Blackboard, Canvas, Moodle) are built from XML building blocks which the savvy customer can occasionally find ways to modify under the hood where permitted.[4] XML establishes the context of educational technology in our educational institutions, but our access is carefully gated in ways that make it abstruse and esoteric to modify the framework. Those of us who teach students to write and build projects with markup technologies may be few in number, but we are the ones who are aware of the skills our students rapidly develop and hone over the course of a single semester—skills to construct data models, search interfaces, and informational graphics according to their own design. Learning XML with its family of languages helps students to design their own projects independently from black box software, and also helps make them informed consumers to choose the software they want to commit to particular tasks, to find the tools most amenable to user alterations under the hood and most friendly to transporting the data when the software inevitably upgrades or is no longer supported. Learning markup technologies is a way of learning the structures of the web of information, and much the way we learn the formal genre expectations of an essay or a poem, we learn to adapt formal rules and expectations to contain data and metadata in an XML framework driven by our interests in processing, sharing, accessing the data. We write documents to be read and acted upon by humans, and we write markup to be read and acted upon by humans and machines.

The XML family of languages has not become more abstruse or difficult to learn since the millennium. Indeed, instruction is much easier by the year 2020 with many tutorials and more powerful processing methods available now.[5] However, the learning of markup technologies and the XML stack has been cultivated too narrowly to be recognized as a general skillset beneficial to students across disciplines. Given universities’ deep investment in XML-based ecosystems, can we imagine a widely-accessible cross-disciplinary course that inverts the relationship between code and form-box, that engages students and faculty in organizing ideas with markup languages that they themselves control? This paper argues that a sustained application of markup technologies in the contexts of document data modeling and web project development should serve as a broadly accessible cross-disciplinary writing-intensive experience for students. This exploration should open doors to students who would not otherwise consider themselves tech savvy programmers to learn computational tools that make for more powerful capacities to write, develop, and interact with the document data and the semantic web.

In awakening students to the technologies that shape composition, classes that teach digital writing modalities (or composition that involves multiple media forms) may provide faculty markup practitioners a well-established and widely lauded context for teaching markup languages in a way that educational institutions can classify as a writing-intensive experience. For writing instructors, digital writing modalities or multimodal writing can motivate dialogue about the medium as the message and the choice of form following function and audience. Addressing recent trends to assign student writing compositions in audio and video formats, Laura Giovanelli and Molly Keener observe that Well-designed pedagogy recognizes multimodal writing’s potential to foster student agency and ownership as increasingly participatory citizens where literacy means composing in a range of print and digital media, genres, and modes, where students are consumers and ethical creators. [6] Writing with markup languages can easily be taught in the context of digital multimodal composition that fosters student agency and ownership of the code base. Much as students compose by remixing and ironically applying visual or auditory memes, they might apply markup languages to compose by self-consciously reordering scripts of official documents and develop controlled vocabularies that help communities access their heritage. For example Jessica Lu’s and Caitlin Pollock’s 2019 HILT course, Introduction to the Text Encoding Initiative (TEI) for Black Digital Humanities organized training in markup to support new ways of accessing, reading, sharing, and analyzing texts of marginalized people.[7]

Students learning markup technologies need good tools. Semester-long XML-based writing-intensive courses want access to a good syntax-checking code editor (such as the oXygen XML editor), writeable web space with secure FTP access, and possibly an XML database (such as eXist-dB) and a GitHub account, in place of the more ubiquitous Wordpress account and the Adobe suite as the students’ multimodal writing desk. While I do not expect every digital composition instructor to flock to these technologies (though I wish they might), I do anticipate that those of us who can teach the XML family of languages can do so in a way that constitutes a writing-intensive experience valuable to students across the curriculum, and especially valuable in programs with majors, minors, and certificate programs supporting the digital humanities.

Perhaps the best-known university context for interdisciplinary pedagogy in digital technologies is the call for coding across the curriculum, which sometimes explores the intersections between writing and coding, seeing both as a foundation for literacy in the twenty-first century. Learning to write programming code engages a student in writing commands, conditional expressions, and descriptive annotation as well as metacommentary in documentation. Annette Vee finds parallels to the distinctly imperative aspects of coding in Kenneth Burke’s speech-act theory of human expression as a performance. Vee calls for greater awareness of the intersection of scripting that connects the writing and coding process, since, according to speech-act theory, we write or speak to move an audience to process ideas in response to a scripted delivery, whether that audience is human or machine:

Exploring the nature of language, action, and expression through programming allows us to think about the relationship between writing and speech differently and also to consider the ways in which technologies can combine with and foster human abilities. Computational and textual literacy are not simply parallel abilities, but intersectional, part of a new and larger version of literacy.[8]

When coding is understood as literacy, its nexus with writing becomes explicit when we think of it on the same terms as developing language skills. But the coding literacy movement may not be as profoundly educational as we imagine. A developer writes in Slate that coding literacy books for children are not what students need to learn to code when these books simply lead children to obediently follow scripts to get a so-called correct answer. Rather, the article points out that the learning process needed for coding is simply learning any process thoroughly and well. Far more valuable than code literacy books for children are life experiences like repeatedly taking a piece of furniture apart and putting it back together until we understand how all the pieces connect with each other, or learning how to optimize the efficient use of cookie cutters in rolled dough.[9] Really learning to code is not just about writing a correct syntax to earn points or get the correct answer, but is rather more experimental, creative, and purposeful, with awareness of many possible paths to try. That kind of learning does not belong to computer science departments any more than writing belongs to English departments, as David J. Birnbaum and Alison Langmead make explicit:

The first step toward learning to code is to recognize that computer programming is not computer science; it is more like writing. Everyone can learn to do it, and can be given the opportunity to learn to do it in ways that are appropriate for their disciplines. We offer humanists years of practice in learning to write; let us give them the chance also to learn to code. The second step is to recognize that learning a programming language is like learning a foreign language, except that it is much easier.[10]

These are analogies to learning to write or learning a foreign language, intended to persuade humanists to adopt coding into their disciplines, but the analogies do not in themselves constitute an argument for coding and programming as involving an intensive professional writing experience for students of any discipline.

That argument for the writing-intensiveness of coding can be found outside the humanities. In the context of programming education, Felienne Hermans and Marlies Aldewereld propose that more students would be interested in computer science if they learned to program in a way that followed models for learning to write. They suggest that programming instruction would be improved if it adapted the way writing instructors model examples of their writing process for students to break it down into discrete tasks more efficiently. For example, they cite a study in which elementary school students comprehended scientific concepts better when they were assigned short writing assignments that engaged the topic. They suggest that understanding a real-world context for a code script and writing observantly and reflectively about it makes the material more widely accessible and could help a broader group of kids identify as programmers![11] Indeed, this idea that writing to document your observations can help to learn a science enacts a return to an earlier and more integrated approach to education, before the sciences had become formally distinct from humanities: the late eighteenth century saw the scientific poems of Erasmus Darwin and calls for poetry that would bring zoology and botany to life and encourage sympathy with the natural world.[12] We have known that the act of writing enhances learning across disciplines for a long time now. But futher, as Hermans and Alderweld point out, students can learn programming more easily if they are taught in the mode of writing instruction, with modeled examples and activities that hone observation skills. What might this mean for learning to work with XML? It could involve students reviewing markup, schemas, interfaces, and visualizations from established XML projects as part of their course experience. For example, students can be given assignments to explore a project site like the Map of Early Modern London or the Shelley-Godwin Archive to look at the code under the hood and describe how they understand the component pieces of the project to fit together. In exploring projects like these, students in the author’s classes have often found a basis for understanding what kinds of research questions and web reading and research tools they might design with the right resources, and students often build on what they learn in their own projects to understand how they can organize information about people, places, contexts, language patterns, revision history, and more.[13]

It is probably no coincidence that Donald Knuth modeled his concept of literate programming on code for document formatting in the 1980s. In this context the programming of a machine to format electronic documents unites fundamentally with the action and reproduction of writing. Knuth’s now-familiar concept seems simple in hindsight: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.[14] The process of literate programming heightens an old association of verbal text with fabric textile, applying Knuth’s concepts of weave and tangle:

One line of processing is called weaving the web; it produces a document that describes the program clearly and that facilitates program maintenance. The other line of processing is called tangling the web; it produces a machine-executable program. The program and its documentation are both generated from the same source, so they are consistent with each other.[15]

Literate programming is part of the XML specifications and became paradigmatic for the the XML family of languages by the turn of the new millennium. In 2002 Norm Walsh modeled its application to the DocBook XSLT stylesheets by applying namespaces to permit the tangling of actionable code with documentation.[16]. Responding to Walsh’s work, Eric van der Vlist prepared a clear and thoroughly-documented explanation of literate programming applied to XML with embedded Relax NG code, readily transformable into multiple formats to produce schema validation checking as well as human-readable web-ready documentation.[17] And over the course of its development from the early 1990s onward, the One-Document-Does-It-All (ODD) system has modeled literate programming for the purpose of compiling and delivering the Guidelines of the Text Encoding Initiative (TEI) as a combination of documentation and processing code instantiating the schema rules of the community-maintained XML vocabulary.[18] With its early and continuing investment in literate programming, from its specifications to its schema modeling, the XML family of languages should be understood as thoroughly writerly.

Student coders can gain an introductory experience in literate programming in designing ODD schemas with descriptive glosses and explanation encoded together with schema rules in project. Whether or not we understand the drafting of an ODD customization to constitute an experience in embedded programming, we can recognize its value as a writing intensive experience, because it involves students writing the rules for a project to work systematically and designing its data for consistency and precision, and because it provides for description and explanation that can make a project sharable with others. Markup, documentation, and programming work in the development of XML-based projects should be promoted within educational institutions as a distinctly intensive experience of writing applied to design resources soundly and well.

Writing intensively with the XML family of languages

The aspect of XML encoding that makes it so problematic or impossible for interoperational processing (even within a shared vocabulary like the TEI) is the semantic naming of tags, what Desmond Schmidt has called its illocutionary force.[19] This trouble for universalized processing is also a feature of XML that makes it writerly and scholarly in nature, and gives it power as a research tool, a power that is enhanced when new coders learn not just how to tag but also how to manage and process their tagging as a controlled system. Confronting the challenge of processing the markup and sharing it with others outside one’s discourse community (oneself, one’s team, or one’s semester class) is what we recognize as intensive about a course in writing with markup. An introductory writing experience with markup languages may familiarize students with the data structure and well-formedness and some of the issues of transformation and sharing the data, but intensification of the writing challenge begins with confronting the management issues of a project: writing a customization and designing schema validation rules. Still more intensification is applied when students learn how they can navigate and process the markup data.

This issue with the writerly nature of XML coding and processing is not well appreciated in the larger context of digital humanities work or by professional developers eager to facilitate the publication process and separate it from the markup practice. Coders can produce complicated, even intricate structures with XML and TEI and give impatient developers headaches, but we would do well to encounter those structures as building materials the way writers do and give our students the tools to work with them. Rather than designing projects primarily to suit the needs of developers or content management systems, markup practitioners can learn the writing intensive way to control the developing tools as their writing instruments. Taught how to manage and process their own markup, students will design their own markup more efficiently and systematically than they do when shown only how to tag texts, or tag according to the rules imposed by a content management system that magically shows their work. When ruled by an external content management system, students are given an illusion that code is correct when looks good on the screen and conforms to expectations. Tagging according to the rules of an externally imposed publication software is not a writing-intensive experience, because it is stripped of creative experimentation, rigorous decision making, and intellectual challenge.

A course in text encoding could take a concentration within a discipline like English Literature, as Kate Singer’s class at Mount Holyoke did, to concentrate collectively on constructing a digital edition of a collection of poems. Singer found that the broad-based ‘humanities language’ of the TEI enabled students to question, historicize, and reconsider the poetic terminology we use to describe poems. She and her students found that the controlled vocabulary of the TEI gave considerable latitude to a community of scholars to rethink or apply old conventional literary terms as they saw fit. Because the TEI elements for poetry express simple structural forms rather than specialized terms (line-groups rather than stanzas, for example), Singer’s students recognized that it was up to the encoder to apply specialized, historically specific poetic terms in their customized application of the TEI. In the context of a course, that work of deciding the appropriate system of terms for oneself can be excitingly experimental. The pedagogical benefit of engaging students in markup and its applications was to foster decision-making, documentation, and design thinking, as Singer found her students eager to take on design decisions for customizing their own interface for their edition. Not only did they benefit by gaining tech skills, but they also became more observant readers of poetry as well as the interfaces and infrastructures of larger-scale digital scholarly editions they encountered. This kind of interpretive markup may, finally, give us some inkling of how TEI might be used as an analytical tool for smaller-scale, case-based projects perfect for undergraduates as they learn to parse and categorize their own textual situations.[20]. Courses like these prioritize the intellectual engagement of a class with the document objects they are investigating, and here the markup is clearly a research and investigation tool. Fitting the students’ markup to an externally imposed uniform publication framework would have made the work less messy and easier to publish, but would have stifled the students' experimentation and removed them from the intellectual decision-making process of doing their own project development. Even unfinished work in the course of a short semester is a stepping stone to renewed engagement in a process of structured work with document data modeling, querying, checking, testing, and transforming to share their work. Such work can lead to impressive senior thesis projects.

The XML family of languages were designed not only to be widely accessible, but also to be a vocabulary that the writer controls, consults, remixes, and transforms. It does not take very long to acquaint new coders to the rules of how to tag a document, or how to turn a plain text document into an XML document, though often people experience a little frustration with figuring out what they can do with attributes. The first week or two of a class that involves markup methods can orient people to the basic rules of well-formedness, but that almost immediately introduces an engaging intellectual experience when we ask students to develop their own hierarchies to organize what they are reading, when we invite our students to try to recognize what is implicit and find ways to use elements, attributes, or comments to make that explicit.

Starting from a mess

Just as we think of free-writing as a valuable exercise to start a first preliminary draft in a composition class, in teaching markup, a certain amount of mess and unreliability is okay as we are figuring out what we want to prioritize. Often new coders introduce far more differently named elements than they really need in ways that would be baffling to keep track of in a fully developed project. To understand how to code helpfully and meaningfully, a student needs to confront the problem of sharing and reproducibility. Thinking about how to share a decision process and a set of rules for an XML project with others is cultivating the awareness of audience that is emphasized in rhetoric and composition classes as a means to craft better sentences, to trim out verbosity, to outline a thesis project.

The road to improvement can be based on the same principles of understanding how to convey relationships, and how to prioritize a main idea with a subordinate clause. A student might write a paragraph like the following in a draft that could benefit from some rounds of revision:

Historically, women have had a tough time when it comes to writing novels and combatting prejudices and sexism. Many female authors have had to publish their novels with a male pseudonym or as an anonymous author. When writing Frankenstein, Mary shelley wrote it among her friends to see who could write the best horror story and she did not tag it with her name. The famous story was left anonymous so her friends wouldn't have a prejudice view when reading it. The story ended up being her most famous book and she was a female writer, who wrote a horror story about a male creator and a frightening, male creature.[21]

Instructors who teach writing courses understand the complexities of advising a student on how to revise writing like this. We often comment that the student’s ideas are good or interesting, but we need a stronger sense of how the ideas connect, and every sentence needs to support a single central idea. In this case we think the central idea is about how Mary Shelley, like other women authors in English history, opted to conceal a female identity in publishing her work. Beyond simple misspellings and missed capitalization, we can identify conceptual problems of subordination, especially evident in the last sentence where the fame of the book and the female identity of the writer are placed at the same level as the main idea and represented out of chronological sequence. The ordering of ideas and the decision of where to place subordinate clauses is a problem we can associate with organization or hierarchy. Rewriting such a paragraph sometimes involves reorganizing and condensing, pulling together apparently disconnected parts in the first draft. For example, here is my own attempt to rewrite the student’s paragraph:

Historically, many women authors faced a sexist and prejudiced publishing industry and opted to conceal their identities either by publishing with a male pseudonym or anonymously, as Mary Shelley did with the anonymous first publication of Frankenstein in 1818. The novel, a horror story about a male creator and a frightening male creature, became her most famous work, and eventually was published with her name on the title page.

My suggested rewriting of this seems a little unsatisfactory because I sense I have removed something interesting that the student might have developed, a loose end from the draft, something about the idea of a female author writing about a male scientist creator and a male creature, whose violent conflict drives this book. Would the student have wanted to explore that issue of a woman author investigating male conflict? Is that perhaps the topic of another essay entirely? Writing and rewriting can impose order but also cut out possible avenues of development.

I can find similar examples that demonstrate issues with subordination in students’ first XML encoding efforts. In my roughly seven consecutive years of teaching undergraduates to code with the XML family of languages, I notice a recurring pattern in the first three of weeks of a semester: that some students have difficulties with conceptualizing dependencies, much like the issues we identified in the student paragraph about Mary Shelley. Students just starting to code often prepare shallow hierarchies. For example, instead of bundling list items together into a wrapped cluster, they make a very flat tree where every line is its own entry, a child of the root element. These students benefit from seeing examples of nested markup, and also from understanding something about how the markup may be processed to work out how attributes can refine the markup by helping to categorize, describe, point out related resources, or clarify something unclear in the source document. To help students discover many different ways they could apply markup, I have found that inviting my students to write their first markup on a recipe, one that contains an interesting variety of ingredients, measurement units, and activities, provides a very clear and easily recognizable sense of structure with lots of categories of information. I ask students to envision a scenario for the encoding that tries to create a system for filing documents:

First, read this recipe for homemade bread, and pretend you are filing it with hundreds of other recipes that you need to fit a set purpose, such as running a restaurant, in which you need to keep track of kinds and quantities of ingredients required. XML is written to store information, and when we apply it to a situation with numbers and units, like with coding recipes, the code we write can help make computerized calculations, and help optimize searching across a collection for particular kinds of ingredients. Your code might be designed to help categorize ingredients by what part of the grocery store they can be found in. The challenge of the assignment is to write code that helps categorize ingredients, mark necessary equipment, and stages for processing, but the system you develop is up to you.[22]

It is fascinating to see the variety of encodings students submit for this assignment, with no two being much alike. With a few exceptions the students usually are able to submit well-formed XML by day 2 of the course, but they sometimes don't quite understand the concept of nesting structures or demonstrating relationships, as for example this tagging of the ingredients list of a sourdough recipe.
 <recipe type="allAges" name="sourdoughIngredients">
 <measurement>1 1/4 cups</measurement> (<amount>160 grams</amount>)
 <ingredient>white bread flour</ingredient>, plus more for dusting
 <measurement>1/4 cup</measurement> (<amount>38 grams</amount>)
 <ingredient>stone-ground whole-wheat flour</ingredient>
 <measurement>1/4 cup</measurement> (<amount>32 grams</amount>)
 <ingredient>stone-ground whole rye flour</ingredient>
 <measurement>1/2 teaspoon</measurement>
 <ingredient>instant yeast or bread machine yeast</ingredient>
 <measurement>1 teaspoon</measurement>
 <ingredient>table salt</ingredient>
 <measurement>1/4 cup</measurement> (<amount>55 grams</amount>) 
 <ingredient>dry fermented cider</ingredient>
 <measurement>1/2 cup</measurement> (<amount>120 grams</amount>)
 <ingredient>lukewarm water</ingredient> (<temperature>80 degrees</temperature>), 
 plus an optional 1 tablespoon recipe></recipe>
 <!--jgb: You may substitute a Pilserner beer for the dry fermented cider. -->
Here we see a common problem for new learners of XML: thinking that white space is sufficient to relate like to like, rather than recognizing the need to position wrapper elements. The student has not quite understood that the <measurement> and <amount> elements are not really associated together. Nor has the student tried to apply attributes, yet, but they have ventured an XML comment about ingredient substitution. The code is promising for its regularity and consistency, but lacks an understanding of how to work with the XML tree hierarchy. Another student seems to have a stronger grasp on the assignment, but even here we can find some issues that relate to writing problems of redundancy or overreliance on a particular word, here the attribute @type being overused:
<recipe type="bread" name="country loaf (pain de campagne)">
  <measurement type="cup">1 1/4 cups (<measurement type="gram">160 grams</measurement>)</measurement>
  <!-- sd: is this a good/okay way to do measurement types? it seems weird but i don't really know -->
  <ingredient type="dry">white bread flour</ingredient>, plus more for dusting 
  <measurement type="cup">1/4 cup (<measurement type="gram">38 grams</measurement>)</measurement>
  <ingredient type="dry">stone-ground whole-wheat flour</ingredient>
  <measurement type="cup">1/4 cup (<measurement type="gram">32 grams</measurement>)</measurement>
  <ingredient type="dry">stone-ground whole rye flour</ingredient>
  <measurement type="tsp">1/2 teaspoon</measurement>
  <ingredient type="dry">instant yeast or bread machine yeast</ingredient>
  <measurement type="tsp">1 teaspoon</measurement>
  <ingredient type="dry">table salt</ingredient>
  <measurement type="cup">1/4 cup (<measurement type="gram">55 grams</measurement>)</measurement>
  <ingredient type="wet">dry fermented cider</ingredient> (may substitute Pilsener beer; see
  <measurement type="cup">1/2 cup (<measurement type="gram">120 grams</measurement>)</measurement>
  <ingredient type="wet">lukewarm water</ingredient> (<temp>80 degrees</temp>), 
  plus an <measurement type="tsp">optional 1 tablespoon </measurement> 
  <step n="1"><equipment type="utensil">Whisk</equipment> together the flours, yeast and salt in a 
  <equipment type="bakeware">mixing bowl</equipment></step>. 
  <step n="2">Combine the cider and water in a <equipment type="bakeware">liquid measuring cup</equipment></step>. 
  <step n="3">Add the liquid to the flour mixture; use a <equipment type="utensil">spatula</equipment> 
  or <equipment type="utensil">bench scraper</equipment> or your hand moistened with water to blend them for about a minute
  </step>. The dough should be shaggy yet cohesive. 
  <step n="4">Cover the bowl with a <equipment type="cloth">towel</equipment>; 
  let the dough rest for <time>20 minutes</time></step>. 
  <step n="5">Moisten your kneading hand. If the dough seems stiff, 
      add the optional tablespoon of water.</step>
  <step n="6">Stretch one edge of the dough (still in the bowl), then press it into the center of
  the bowl. Repeat this about a dozen times, moving clockwise to catch all sides of the dough</step>. 
(This should take <time>1 or 2 minutes</time>.) 
  <step n="7">Turn the dough over so the seams are on the bottom</step>. 
  <step n="8">Cover and let rest for <time>20 minutes</time></step>. 
  <step n="9">Repeat the clockwise stretching and folding two more
        times, with <time>20-minute</time> rests after each</step>. 
  <step n="10">Cover and refrigerate <time>at least 8 hours and up to 24 hours</time></step>. 
The dough should have doubled. If it hasn't, leave it on the counter until it does. 
   <step n="11">Lightly flour a work surface</step>. 
   <step n="12">Use a <equipment type="cloth">pastry cloth</equipment> or clean
            <equipment type="cloth">dish towel</equipment> to line a round <equipment
            type="bakeware">colander</equipment>. Dust the cloth with flour</step>. 
  <step n="13">Transfer the dough to the floured work surface. Fold the edges toward the center to create
        a round shape, turning it over so the seams are on the bottom</step>. 
  <step n="14">Let it rest for <time>5 minutes</time>, then transfer to the colander, seam side up</step>. 
  <step n="15">Cover with a <equipment type="cloth">towel</equipment> and let the dough rise for
            <time>1 1/2 hours</time>.</step>
  <step n="16"><time>Thirty minutes before baking</time>, 
       place a <equipment type="bakeware">cast-iron Dutch oven (lid on)</equipment> 
       or <equipment type="bakeware">enameled cast-iron pot with a lid (on)</equipment> 
       in the oven; preheat to <temp>475 degrees</temp></step>. 
  <step n="17">Carefully remove the hot pot from the oven.</step>
  <step n="18">Turn the dough out onto the counter so the seams are on the bottom</step>. 
  <step n="19">Use <equipment type="utensil">kitchen scissors</equipment> 
      to make 8 snips on the top of the dough in an evenly spaced spoke pattern, 
      each about 1/4-inch deep</step>. 
  <step n="20">Lift the dough and carefully drop it into the hot pot. 
         Immediately cover with the hot lid</step>. 
  <step n="21">Bake for <time>30 minutes</time>, then reduce the heat to 
       <temp>450 degrees</temp></step>. 
  <step n="22">Uncover and bake for <time>8 to 10 minutes</time> or
        until the crust is dark brown</step>. 
Try to minimize the amount of time the oven door is open. The bread is done when its internal temperature 
registers <temp>205 degrees</temp> on an <equipment type="utensil">instant-read thermometer</equipment> 
and the loaf sounds hollow when knocked on the underside. 
  <step n="23">Transfer the loaf to a <equipment type="bakeware">wire rack</equipment> 
         to cool for at least <time>1 hour</time> before cutting</step>. 
The nested <measurement> elements inside <measurement> elements show a problem the student was trying to solve with hierarchy, and make a good opportunity for the instructor to discuss with the student how to deal with all the information given about equivalent units (grams to cups). We could suggest the student use just one measurement element and try encoding the equivalency information in attribute values, for example. And we need to think about the representation of fractions: If the code is going to be processed by a computer to, say triple this recipe, how might we write markup to represent the numerical quantities and conversion factors? What is remarkable here for an early XML assignment is the student’s decision to mark types of equipment within the steps. The student could reconsider the variety of attributes, but the effort to track categories of measurement as well as categories of ingredient (wet vs. dry) is admirable on a first experience with angle brackets.

Even when students quickly learn how to apply hierarchies, they are, of course, prone to inconsistencies before they learn to write schema validation code, as in the following example:

 <step n="9"><process type="action">Lightly flour</process> a 
     <item type="equimpent">work surface.</item> 
     Use a <item type="equipment">pastry cloth</item> or 
     <item type="equipment">clean dish towel</item> to 
     <process type="action">line a 
     <item type="equipment"><adj type="equipment">round</adj> 
 <step n="10"><process type="action">Dust</process> the 
     <item type="equipment">cloth</item> with
     <item type="ingredient">flour</item>.</step>
 <step n="11"><process type="action">Transfer</process> the 
     <item type="ingredient">dough</item> to the 
     <adj type="equipment">floured</adj>
     <item type="equipment">work surface</item>.</step>
This student reveled in coming up with complex hierarchies in his first week of coding, but a typo means that his attribute values are inconsistently marked. Such imprecision might pass unnoticed as a relatively harmless error in a student essay, but here it becomes an opportunity to introduce the power of schema writing to students, to write their own spell checkers for their attribute values and control which elements and attributes are permitted to appear at each level of the hierarchy.

In the first week of my class, students usually move from encoding a recipe to marking up a poem or a piece of historical correspondence. Encoding a different genre of document can lead students to recognize different kinds of data and observations about the formal dimensions and organization of patterns. They also often take more of an interest in referencing different kinds of information like names, dates, people, and places, as well as images, motifs, rhyme. Quite frequently student beginners will take the text content of a document and repeat it in an attribute value, as for example wrapping code around a name as given in a text, and using that name in an attribute value on the element, until they receive some suggestions that they might want to use the attribute as a key for a standard identifier whenever this individual is mentioned by their various names. Students who have difficulty constructing dependent clauses may find the preparation of an informative, non-redundant hierarchy just as challenging as their composition courses, and while their first efforts are observably messy, they can be discussed in terms of how to simplify if one were to prepare a large collection and wanted to work systematically with a particularly interesting and tractable kind of data.

Developing a schema, cultivating interchange, and writing professionally

Students often improve their markup dramatically when they learn to write Relax NG schema code that creates rules for encoding. This may be a first data-modeling experience for students in a general education context, when they are called upon to think in a meta or higher order reflective way about formalizing their code, and to make it possible for others to understand and apply it. Learning to write schema code also leads to writing comments to explain decisions and document the code. The following example pairs a short coded document with a student’s schema, and a conversation with another student who was reviewing the code and offering advice. First, here is the XML the student prepared with some good-natured snark from my own assignment instructions:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="01-22_SCHEMA_rngEx02.rnc" type="application/relax-ng-compact-syntax"?>
    <intro> Make sure you do the following: </intro>
    <step num="1"> 1) <act type="spec">Save</act> your <obj type="comp">Relax NG file</obj>
        <desc type="rel">with the <obj type="comp">.rnc extension</obj> at the <obj type="conc"
                >end</obj></desc> and <act type="unspec">work with it</act> in the <desc type="adj"
        <obj type="comp">file directory</obj>
        <desc type="rel">with your <obj type="comp">.xml file</obj>.</desc>
    <step num="2"> 2) <act type="spec">Associate</act> your <obj type="comp">.rnc schema</obj>
        <desc type="rel">with your <obj type="comp">.xml file.</obj></desc>
        <exp>(You are <desc type="rel">finished</desc> with this <obj type="conc">exercise</obj> if
            your <obj type="comp">XML</obj> is <desc type="rel"><act type="spec">associated</act>
                with your schema</desc> and <desc type="adj">both</desc>
            <obj type="comp">files</obj> have <act type="spec">come out</act>
            <desc type="adj">"green"</desc> in <obj type="comp">oXygen</obj>.)</exp>
    <step num="3"> 3) <act type="spec">Upload</act>
        <desc type="adj">BOTH</desc>
        <obj type="comp">files</obj> here. We <act type="unspec">need to see</act> your <desc
            type="rel"><obj type="comp">.xml file</obj> and your <obj type="comp">.rnc
<!-- bb_1/22/20: I wanted to keep it simple for once, so I literally used the assignment as a text. Sue me.--> 
Here is the Relax NG schema in compact syntax, where a new pattern of writing extensive commentary is emerging in the student (whose initials are bb). The peer-reviewing student is amp, who had learned to write schemas and design projects in a previous semester:
datatypes xsd = ""
start = root

#bb_1/22/20: root is the root element
root = element root{intro, step+}

#bb_1/22/20: step is one of our 'highest' content objects on the hierarchy
step = element step{num, (exp|act|obj|desc|text)+}
#amp: This would be better as mixed content! (We've already touched on this in class, but I'll leave a note here as well) So, it would look like this: step = element step{num, mixed{exp | act | obj | desc)+}}

#bb_1/22/20: intro is a misc element
intro = element intro{text}

#bb_1/22/20: other important places in the hierarchy
exp = element exp{(type|act|obj|desc|text)*} #exp = explanation
desc = element desc{(type|act|obj|text)*}    #desc = description

#amp: This is the same as before: these two would be better written with mixed content! So: exp = element exp{mixed{(type | act | obj | desc)*}}
#desc = element desc{mixed{(type | act | obj)*}}

act = element act{type, text}                #act = action
obj = element obj{type, text}                #obj = object

#bb_1/22/20: attributes
num = attribute num{xsd:int}
type = attribute type{text}     #types: 
                                  #comp = computer
                                  #conc = concept
                                  #rel = relate
                                  #adj = adjective
                                  #spec = specific
                                  #unspec = unspecific

#amp: My first comment is about comments! I'm glad to see you using comments 
not only to leave notes to the instructors, but also to give information about decisions 
that you're making while coding. 
This is so helpful when it comes to projects that you're going to share publicly, 
because it allows others to see these choices and better understand your code. 
So, great work with that! 

#amp: Another thing that I like with your schema is the organization! 
You have clearly designated sections for your elements and attributes that make your code really clear. 

#amp: For the future, I would try working more with the other repetition indicators, 
as well as with datatypes! In this specific assignment, you could code for the date of the 
assignment, and use xsd:date or xsd:YearMonth (or both!) to give even more metadata in 
your xml. Overall, this is a well-organized, simple schema that you'll be able to develop 
into a more complex schema if you were to continue with it (and with future projects!)
The beginner student, bb, experimenting with tidy, highly legible commentary on his code, including mapping out attribute values he might use to replace the ambiguous text. But we also see students beginning to hear from each other about project work on which they will eventually collaborate. Students are getting used to seeing a code file as a site of productive conversation and ongoing revision.

When students learn only tagging and do not learn how to write schema validation or process their own code to visualize and analyze the data they have marked, they remain unaware of what the markup makes possible or of the problems imposed by imprecision and inconsistency. They may be tagging correctly, but not in a way that communicates meaningfully or reliably. As they first become aware of the human unreliabilities in applying markup, they may come away with a limited idea that the tree structures we create are necessarily subjective and arbitrary. They would be reinforced in this thinking by old arguments from those who find embedded markup a source of intrusive confusion. For example, Johanna Drucker has asserted that embedded markup confuses levels of discourse: Putting content markers into the plane of discourse (a tag that identifies the semantic value of a text relies on reference, even though it is put directly into the character string) as if they are marking the plane of reference is a flawed practice. Markup, in its very basis, embodies a contradiction. It collapses two distinct orders of linguistic operation in a confused and messy way.[23]Certainly messiness and confusion can be applied to an ill-conceived data model as well as it can to a piece of disorganized and unrevised writing. But embedded encoding itself is only a mess to the extent that it defies comprehension, navigation, and processing by an informed reader of markup who is a member of a community of practice. Against Drucker’s dismissal of markup as mess we should counter the frequent practice in the markup community of blind interchange, as Syd Bauman defined it in 2011:

you want my data; you go to my website or load my CD and download or copy both the data of interest and any associated files (e.g., documentation or specifications like a TEI ODD, a METS profile, or the Balisage tag library); based on your knowledge of my data that comes from either the documents themselves or from the associated files (or both), you either change my data to suit your system or change your system to suit my data as needed. Human intervention, but not direct communication, is required.[24]

Far from posing a mess, the actions of documenting descriptive markup make it sustainable and sharable when the encoder is removed in time and space and technological delivery system from those who encounter the code. Blind interchange is the benefit of the tangle and weave of literate programming, and it is reinforced by communities that encounter the code and interact with it. Understood in this way, the mess of markup is indistinguishable from the mess of writing; both may be ordered with care and explanation for an audience, first an audience of one’s instructor and peers, and then an audience that one does not necessarily meet in person.

Writing-intensive querying, processing, and transforming

Teaching students how to query their XML code and how to transform it for publication encourages new systems of thinking and gives writers access to their own means of production. The learning required takes weeks, not years, and is suitably incorporated in a university semester without needing specialized computing prerequisites. Not until we teach students how to customize, query, or transform their markup can they engage with it in a way that educational institutions might characterize as writing intensive. By way of reference, the Pennsylvania State University’s cross-curricular definition of a writing intensive course requires that writing be used to help students learn course content, as well as ways of writing in the discipline, and that it have formal expectations delivered in structured assignments. These expectations are familiar to us from university composition courses in the requirements for research papers and thesis documents with structured sections, appendices, but they can also be communicated in the context of encoding XML projects according to a carefuly developed schema and a well-documented codebase. Most important and pertinent to XML project development is the expectation that writing-intensive courses engage in significant rounds of revision:

  • Writing seen as processes that develop through iterations

  • Writing in the course includes a combination of formal and informal assignments[25]

Markup and coding with the XML family of languages becomes an intensive writing experience when students return to it and revise it, to better document decisions for the project team, to better document decisions for readers outside the project, to improve the precision of the data, and to simplify the categories to make the code more coherent in categorizing and processing information. When students learn to inspect the code and share its customization in project teams, the markup becomes subject to intensive review and systematic revision to make it sharable rather than subjective. The more this is done, and the more experience that students and scholars gain, the more prepared they are to share in wider conversations. For example, markup practitioners in the classicist community share applications of the EpiDoc guidelines, and a medievalist graduate student prepares to speak at the annual conference of the Text Encoding Initiative or at the annual Kalamazoo International Congress on Medieval Studies. As with peer-reviewed scholarship in any discipline, the XML code-base is subject optimally to heated debate and decisions are made befitting communities of practice.

Because the XML family of languages is amenable to rapid learning, a student can become a stack developer easily in the course of a semester, as Clifford Anderson observed of the course he taught students in XQuery: XQuery makes it possible for students to become productive without having to learn as many computer science or software engineering concepts. A simple four or five line FLWOR expression can easily demonstrate the power of XQuery and provide a basis for students' tinkering and exploration.[26] As XQuery developers know, the simple for, let, where, order by and return statements that make a FLWOR are good ways to introduce students to programming concepts quickly and give them powers to construct all kinds of new data structures from an XML document, whether HTML, SVG, or structured text formats like CSVs to be imported into spreadsheets, or JSON formats for structured maps and arrays. Students work at the intersections of different data formats while exploring what they can build out of XML trees. Bringing students to work at these intersections leads code-writers to make challenging decisions about streamlining the code-base, making it more legible, tractable, XPath-able. Learning XPath and writing XSLT or XQuery to process XML moves a coder from following rules obediently so that others will someday process the data, to becoming an active intellectual investigator who can wield markup as a skilled professional writer.

The writing intensive work of interchange: two university projects on Emily Dickinson

University students can and do create projects built to last, that is, launched in a way that others can build on and continue based on the documentation they provide. As a case in point, let us consider two markup projects, decades apart, addressing the poetry of Emily Dickinson.

The first is a proof-of-concept proposal for a PhD thesis prepared by then graduate student at University of Virginia, Michele Ierardi, and as of 2020, it is now only accessible on the web from the Wayback Machine: Translating Emily: Digitally Re-Presenting Fascicle 16.[27] The project applied 1990s HTML and an early form of JavaScript to render Emily Dickinson’s handwritten variants on her own poems in a way that did not demote those variants to a footnote, but gave them equal space using the capacities of hypertext. On reading one of the poems on the website, the reader would encounter Dickinson’s own different versions of a line in slowly flashing text. The editor’s hope in designing her interface was to make readers more aware of Dickinson’s open-endedness, in not cancelling out multiple versions of a line so that all possibilities could coexist. The site was a proof of concept that did not materialize into a PhD thesis project, but it persisted and influenced my teaching of American Literature courses when I wanted to share Dickinson’s unusual writing process with my students and give them an experience of an interesting and accessible (if slightly hypnotizing) digital edition interface. The JavaScript on the site ceased to function around 2010, and soon thereafter I began seeking a way to continue accessing this cleanly and simply encoded project in a way that would still benefit my students.

In 2015, soon after I had begun teaching courses in coding and the XML stack at Pitt-Greensburg, I was fortunate to find a group of students interested in poetry and fascinated by the possibilities of restoring a digital archive. The students and I contacted Michele Ierardi and obtained her permission to reconstruct her site. This involved converting the code from HTML with unmatched tags to TEI P5, as well as adding new research. My students investigated additional versions of the poems and added more data to include TEI critical apparatus markup that would encode Dickinson’s variants as well as other printed versions of the same poems in a series of editions published after Dickinson’s death. Their new goal was to build on Ierardi’s work and create a readable interface for comparing the multiple versions of Dickinson’s poems, and to begin expanding that work to include other fascicles, or bundles of poems that Dickinson created beyond the original collection Ierardi presented. My students’ site is the second Emily Dickinson project, strongly bound to the first.[28] These students have since graduated from university, but they continue to work on this project, adding a new fascicle and tinkering with the interface, and I understand from the ongoing project director, Nicole Lottig, that she intends to continue coding and developing the site to display all of Dickinson’s fascicles as a long-term project. An excellent sampling of the project’s interface for reading a Dickinson poem and seeing its variant texts and images together is its display of Poem 1605, which shows how most of the early print editions cut out Dickinson’s entire last stanza and typically ignored her variants. The editors share their TEI code from the interface, which applies critical apparatus markup with parallel segmentation in the TEI. The students opted to represent all witnesses even when they were silent, as demonstrated in their coding decisions for the last stanza of poem 1605. They also faced a significant challenge to encode Dickinson’s uncanceled variant passages in her manuscript, and found a way to do this by applying the @type attribute with values of "var0" and "var1":

      <l n="17"><app>
         <rdg wit="#df16 #fh">And then a Plank in Reason, broke,</rdg>
         <rdg wit="#ce #poems3"></rdg>
      <l n="18"><app>
         <rdg wit="#df16 #fh">And I dropped down, and down—</rdg>
         <rdg wit="#ce #poems3"></rdg>
      <l n="19"><app>
         <rdg wit="#df16 #fh" type="var0">And hit a World, at every plunge,</rdg>
         <rdg wit="#df16" type="var1">And hit a World, at every Crash—</rdg>
         <rdg wit="#ce #poems3"></rdg>
      <l n="20"><app>
         <rdg wit="#df16 #fh" type="var0">And Finished knowing—then—</rdg>
         <rdg wit="#df16" type="var1">And Got through—knowing—then—</rdg>
         <rdg wit="#ce #poems3"></rdg>
In their transformation to HTML (linked above), the students applied JavaScript and CSS to the variant data coded in these attributes to produce a dynamic and distinctive reading interface permitting the reader a ready view of comparison data.

In preparing this project, the students had the benefit of an earlier and simpler markup model, as well as a sense of purpose in giving a remarkably interesting project a new lease on life. They needed to study the TEI P5 guidelines and essentially took a crash course as undergraduates from a range of majors in English, Creative Writing, and Information Sciences in manuscript encoding and textual scholarship in the course of a semester. They learned to transform the code and designed the interface in the fall of 2015, and then redesigned and improved the interface while investigating a new research question in a following spring 2016 term. Over the course of one year, in the context of university coursework, the students not only designed a new reading interface but also explored a serious research question of how these editions compare to one another.

Writing XSLT and XQuery on the project, these students produced SVG visualizations of Comparative Dash Reduction (measuring which editions most frequently normalized Dickinson’s dash punctuation into commas, semicolons, or periods), and a network analysis to investigate which editions share the most variants in common. This was created with XQuery to pull and calculate data from the TEI critical apparatus markup they had modelled for the poems. The network analysis explored how frequently published versions aligned with Dickinson’s writing in the manuscript versions, based on generating counts with XPath of how frequently a particular version (coded as a reading witness) aligned with other versions. The students wrote XQuery code to extract this data into simple TSV (tab-separated values file) and plotted in Cytoscape network analysis software. Their programming work with XQuery depended on their care in designing the rules for the project schema and frequently correcting the markup.

The student website documents its methodology extensively and I now use it as a model for my current students to prepare documentation that features code and coding decisions. Along the way of producing it and sustaining their codebase, the project team cultivated multiple GitHub repositories with issue tracking as they turned to new sources of data and worked to combine Fascicles 6 and 16 into a new site. Their writing intensive experience involved countless messages to each other to fix broken code, make a visualization work, update the website, refine the CSS and JavaScript. The professional experience with web development took them far beyond what would be possible in a course in tagging and markup alone, or a course in web development within a content management system. The writing intensive part involved recursively producing and testing and refining their own interface. And the project keeps on giving to future students.

In Fall 2019, a colleague of mine from the History department, William Campbell, did me the great honor of taking my coding course, following a tradition at Pitt begun when I took David Birnbaum’s XML-stack coding course on Obdurodon in Spring 2013.[29]. Campbell launched The Brecon Project, together with students on his team, to study the manuscript tradition of the foundation charter of a Reformation-era collegiate church and school in Wales. The students did not need to know medieval Latin to work on the document data modeling or even to apply critical apparatus markup with a tightly controlled schema combining Relax NG and Schematron, generated from a TEI ODD customization that they devised and revised over the course of their project meetings. For the students and my History colleague involved in the project, this was their introduction to the TEI critical apparatus as a document data model. They turned to previous projects from our course to follow some examples of critical apparatus markup in order to understand how to prepare their own. Without needing to consult the Dickinson project team, the Brecon team was able to adapt and build on the example of their markup to take their own study in new directions, recognizing how their project diverged. For the Brecon team, the text of the charter was a prose text rather than a bundle of poems, but nevertheless required a modeling of textual variation over time. Alyssa Argento, a returning student who was mentoring project teams and continuing learning XSLT and SVG on her own, took on the challenge of trying to show how the manuscript and print witnesses compared quantitatively: which versions of the charter shared the most material in common, inspired by the example of the network analysis on the Dickinson project. Finding her computer unable to install the latest version of the network analysis software that the Dickinson team had relied on for their visualization, Argento studied the project data, worked out how to arrange eight witnesses as nodes in a circle, and produced a network graph based on calculations she made with XPath with weighted edges and sized nodes to provide a detailed visual summary of how much the eight different versions shared in common across 25 sections of the charter. Having produced a static network visualization, she then studied how to make it interactive by applying JavaScript to address attributes on the SVG elements and to associate those SVG elements with corresponding columns and rows in HTML tables containing data from each section of the charter. Her interactive visualization, accessible at represents work that she envisioned and worked out by herself with occasional input from me and the project team, and while it needs work in the documentation area, it represents a line of succession from earlier projects in my course.[30] I share it here to demonstrate what is possible for undergraduates to build on their own with the benefit of learning the XML stack.

The many student projects developed in the two sibling University of Pittsburgh coding courses taught by David J. Birnbaum (see and by me with our respective cohorts of student peer-mentors over the past decade are now my richest data set for comprehending the possibilities of interchange and up-conversion and development on a code base.[31] Our students show us how this work is not only writing intensive in the moment of application, but intensifies over time as we learn new ways of doing things and build on the model of previous projects.

Writing about code to pay a project forward

When students prepare a project in markup, they are not simply writing papers to be filed in a course-specific context. They are preparing a research site, and their work can often be continued by themselves or others. Students can be building beyond the constraints of a single semester, and even if they are not tempted to return to the project, they can leave scaffolding behind for others to continue the work or alter its direction, or to retrieve the source document files and start afresh. Awareness of the potential energy of the work they are doing can give shape to an encounter with markup languages in the course of a semester. The energy input over a course of weeks puts emphasis on preparing material that others can read and reuse. Assignments for a writing intensive experience can be constructed with attention to:

  1. reading the code and documentation of other projects, critiquing it, and building on it

  2. preparing documentation meant to be read by peers and professor(s) working with you, and meant to be read by others who access your code from a repository. Such documentation may include:

    • Kanban board workflow management

    • files in a GitHub repository

    • developing task lists and and issue tracking

    • responding to questions and assisting your teammates on a project discussion forum like Slack

These are informal writing-intensive activities to do with taking responsibility for method and processes and for managing the intellectual content of a project. Studying how copyright applies to code and markup and choosing a license for sharing the work should be part of this experience. Students should learn how to credit their own and others work, and how to transport their data files when they need to move to a new publishing environment.

Giving students access to the full set of tools in the XML arsenal and establishing both immediate (in-semester) short-range and long-range possibilities for their work will introduce students from any background to the potential of writing with markup languages to form and connect with communities of practice. This cannot fail to be a professionalizing experience. Even in developing projects that are not successful, there will be opportunities to recognize in failures what to document, how to redo the work differently, or how another group might start over if a team must sunset the work. Whether or not students go on to use markup again after the course is over, they will have engaged with a powerful form of writing that marks up, investigates, curates, propagates, and conserves textual data. This is the very definition of a writing-intensive experience that is both professionalizing and cross-disciplinary in its reach.

Coda: Teaching markup languages during a pandemic

Teaching a writing-intensive course in markup modalities offers little distress for adaptation to a remote learning environment. In March 2020, when the author’s university (then the University of Pittsburgh at Greensburg) closed the campus and moved all courses online, there was little difficulty in transferring learning materials to a new format because we had already been relying on tutorials and assignments we had written and posted on the web, but more importantly because we had already developed a sense of community in the forms of asynchronous conversation cultivated not in the learning management system but rather in GitHub and on Slack. Prior to the pandemic quarantine, during January and February project teams already had developed asynchronous connections, reinforced by their own emojis (a rubber duck meme associated with rubber duck debugging, for example). The coding class might properly be recognized as flipped, in which most of the learning was already taking place in an applied context outside of class, while student and faculty class meetings reviewed content all together, to learn how to interact with an unfamiliar interface or to review issues students are having or help address something that is not working. It was easy to continue the management of a course in which students had been trained to work with project management tools and to be writing and sharing their documentation in GitHub repositories and over Slack channels. We did miss the in-person interaction on which this class relied, with instructors able to look over the shoulders of students to help resolve a problem on their computers. Synchronous virtual meetings could not replace this, but using screen captures in asynchronous chat became more necessary. Students were certainly challenged to verbalize things that were not working properly when an instructor could not physically come around behind the student’s computer to see what was going wrong. These communication problems were sometimes resolved by connecting with an instructor over Zoom to share a computer screen. Adding comments to code stored in the class’s shared eXist-dB XML database and their project GitHub repositories continued as it had from the beginning of the semester. Because this class had cultivated tools to be able to communicate and work together online, they were perhaps less challenged and more bonded virtually than students in the courses managed only within the learning management system. In the pandemic crisis of 2020 the writing intensive nature of shared project development seemed especially beneficial in supporting our hive of student coders. Build teams of developers in a class, and a pandemic may slow but not stop them.

[1] On the day of this writing, 13 July 2020, the promotes its proprietary LMS integration over its always free web service, even in waiving fees for educational institutions for the fall 2020 semester. Nevertheless, a paid-for integration with an LMS is unnecessary to anyone with access to a web browser and a capacity to follow clear and simple instructions to create a public or private annotation group.

[2] Kelly A. Shea, Mary McAleer Balkun, Susan A. Nolan, John T. Saccoman, and Joyce Wright, One More Time: Transforming the Curriculum Across the Disciplines Through Technology-Based Faculty Development and Writing-Intensive Course Redesign Across the Disciplines Volume 3 (2006) Accessed 25 April 2020.

[3] Henry David Thoreau, Where I lived and what I lived for Walden: A Fluid-Text Edition. Digital Thoreau. Version A (1847), para 17, Accessed 2020-04-12.

[5] The author finds it is becoming easier to teach XPath, XSLT, and XQuery since the introduction of XPath 3.0 and 3.1 with the simplicity of reading the application of multiple functions using the simple map and arrow concatenator. These have made it easier to follow a simple step-by-step thinking process in chaining functions together. This improvement in XPath readability is a refinement of our available writing instruments making them far more accessible to new learners.

[6] Laura Giovanelli and Molly Keener, How to talk about copyright so kids will listen, and how to listen about copyright so kids will talk in Grace Veach, Teaching​ Information Literacy and Writing Studies: Volume 2, Upper-Level and Graduate Courses (Purdue University Press, 2019-01-15), 225. ProQuest ebook.

[7] Lu and Pollock further articulate their approach to TEI as a productive building block for Black digital humanities work in a Digital Dialogue at the University of Maryland, Design, Development, and Documentation: Hacking TEI for Black Digital Humanities, 5 November 2019.

[8] Vee, Annette, Coding Literacy: How Computer Programming Is Changing Writing The MIT Press, 2017. Accessed 14 Apr. 2020.

[9] Joe Morgan, I’m a developer. I won’t teach my kids to code and neither should you. Slate. 6 December 2018. Accessed 13 July 2020.

[10] David J. Birnbaum and Alison Langmead, Task-Driven Programming Pedagogy in the Digital Humanities in New Directions for Computing Education: Embedding Computing Across Disciplines, ed. Samuel B. Fee, Amanda M. Holland-Minkley, and Thomas E. Lombardi (Springer International Publishing, 2017) 76.

[11] Felienne Hermans and Marlies Aldewereld, Programming is Writing is Programming Programming ’17: Companion to the first International Conference on the Art, Science and Engineering of Programming No.33 (April 2017) 1-8; 7.

[12] For context, see Sharon Ruston, The Application of Natural History to Poetry Literature and Science Hub University of Liverpool. Accessed 14 April 2020.

[13] The Map of Early Modern London provides especially helpful, transparent documentation of their markup and editorial practice throughout the site with a strong investment in teaching resources to assist new coders.

[14] Donald Knuth, Literate Programming, The Computer Journal 27:2 (1984) 97-111, 97. Accessed 25 April 2020.

[15] Knuth 2.

[16] Norman Walsh, Literate Programming in XML presented at XML 2002, 15 October 2002. Accessed 25 April 2020.

[17] Eric Van der Vlist, Literate Programming: Generating Relax NG Schemas in Relax NG, O’Reilly Books, 2003.

[18] Sebastian Rahtz and Lou Burnard, Reviewing the TEI ODD System DocEng ’13: Proceedings of the 2013 ACM symposium on document engineering, September 2013, 193-196. Accessed 25 April 2020.

[19] Desmond Schmidt, Towards an Interoperable Digital Scholarly Edition, Journal of the Text Encoding Initiative Issue , November 2014. Accessed 28 April 2020.

[20] Kate Singer, Digital Close Reading: TEI for Teaching Poetic Vocabularies, The Journal of Interactive Technology & Pedagogy issue 3: 15 May 2013. Accessed 28 April 2020.

[21] The quote is taken from a student website project submitted to the author’s Nineteenth-Century British Literature survey course in Fall 2019.

[22] XML Exercise 1 Coding and Data Visualization course taught at Pitt-Greensburg, 2019-2020. The author’s series of assignments developing from XML to Relax-NG, regular expressions and document up-conversion, XPath, XSLT, XQuery, as well as HTML, SVG, CSS, and JavaScript are generally accessible at Accessed 14 July 2020.

[23] Johanna Drucker, SpecLab: Digital Aesthetics and Projects in Speculative Computing (Chicago: University of Chicago Press, 2009) n. 26. pp. 205-206. Drucker cites Dino Buzetti as originating the argument of a fundamentally confused data model in XML in his paper, Text Representation and Textual Models, ACH-ALLC 1999 Conference Proceedings. Accessed 28 April 2020.

[24] Bauman, Syd, “Interchange vs. Interoperability,” Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011).

[25] Features of a Writing Intensive Course, Penn State Learning, 2020. Accessed 28 April 2020.

[26] Anderson, Clifford B. On Teaching XQuery to Digital Humanists, Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014).

[27] Michele Ierardi, Translating Emily: Digitally Re-Presenting Fascicle 16, ~1999. The Wayback Machine capture of 14 October 2019: Accessed 28 April 2020.

[28] Nicole Lottig, Brooke Stewart, Alex Mielnicki, Brooke Lawrence, and Rebecca Parker, Emily Dickinson, Newtfire, 2015. Accessed 28 April 2020.

[29] David J. Birnbaum, Digital Humanities, Obdurodon. Accessed 28 April 2020.

[30] Argento’s XSLT code is a celebration of the XPath count() function, accessible on the Brecon GitHub repository:

[31] David Birnbaum’s course in Spring 2013 oriented me to the XML family of languages and also inspired me to attempt organizing my own classes and teaching materials on markup technologies

Author's keywords for this paper:
markup; writing intensive; literate programming; interchange; code literacy; general education requirement; pedagogy; writing across the curriculum; coding across the curriculum; Text Encoding Initiative; TEI XML; Relax NG; XML family of languages; XPath; XSLT; HTML