Markup Vocabulary Ecosystems symposium logo

Markup Vocabulary Ecosystems
a Balisage pre-conference symposium

Symposium chair: Jeff Beck, US National Library of Medicine

Monday, July 30, 2018

Monday 8:00 am - 9:00 am (location: Conference level)

Symposium Registration & Breakfast

Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.

Monday 9:00 am - 9:15 am

Introductions, Greetings, and Announcements

Monday 9:15 am - 9:45 am

Transcending structure: Applying shared markup vocabularies with your friends and enemies

Jeff Beck, US National Library of Medicine

Markup makes it easier to share. We share documents with our peers, our partners, and even our competitors. Communities of interest form, they define document structures, test them in practice, and affirm them by adoption. Joining a community has obvious advantages: reduced development costs, ease of interchange, tried and tested tools, and an available pool of authors, editors, and developers already familiar with the vocabulary.

Over time, the pace of vocabulary evolution slows naturally. The major structures are developed, applied, tested, and accepted. New structures are added more slowly, and more reluctantly. The community has transitioned into maintenance mode where large scale refactorings and backwards-incompatible changes are known to have burdonsome costs and “best practices” are known to make sharing easier.

What can the “Markup Community in General” do to support these stricter best practices communities?

Monday 9:45 am - 10:15 am

Organizational and funding options for markup vocabulary creation and maintenance

Todd Carpenter, NISO

There are many ways to create a markup vocabulary and many forums in which it can be done. Creating and maintaining markup vocabularies requires significant ongoing volunteer time and effort, significant funding, or both. In light of this, it often makes sense for a multi-institution group to undertake the creation and management process, particularly when interchange is a goal. The community has examples of this consensus model, such as the TEI (which was created by a grant-supported project and is maintained by a consortium created for the purpose) and the STS (which was originally a derivative of JATS, further developed by ISO, and then donated to NISO for the establishment of consensus and for maintenance). Selection of an organizational home and source of funding can have marked effects on vocabularies. The organizational structure affects representation, who has a voice in the process, intellectual property concerns (e.g., patents, copyrights, other standards), and decision making policies. Costs involved in creating and maintaining markup vocabularies begin at conception and continue through development into maintenance and promotion. These costs include editing, hosting, publishing and distribution, and management of the standards process. Real-world examples of the organization and funding of successful markup vocabularies will provide patterns others may find useful.

Monday 10:15 am - 10:45 am

Coffee Break

Monday 10:45 am - 11:30 am

“Be in the room where it happens”: Digital preservation at Portico and the JATS ecosystem

Sheila Morrissey, John Meyer, & Sushil Bhattarai, all of ITHAKA

Institutions such as Portico, who are engaged in ensuring that the digital record of our time is accessible, usable, discoverable, and verifiable for the very long term, continually face the challenge of processing and managing content at very large scales, often with minimal, and sometimes diminishing, resources to accomplish the task.

A key resource in meeting the challenge of preserving born-digital and digitized scholarly literature has been the JATS (formerly NLM) standard, and the community of practice centered on those standards. We will be talking about our shared experience in developing those standards: what motivated our participation, what benefits we have seen, and what challenges we still face.

Monday 11:30 am - 12:15 pm

The Universal Business Language ecosystem and the OASIS TC process

G. Ken Holman

UBL, the Universal Business Language, is a complete system for structuring a family of 81 business documents around a common library of business objects. With both normative XML schemas and non-normative JSON schemas, UBL provides a complete ecosystem for electronic commerce. UBL was developed by an OASIS technical committee. Using the OASIS TC Process allowed UBL to be submitted directly to ISO. UBL is not simply a collection of schemas, however: it is an expression of a whole system of collaborative development and support for the life cycle of specifications and tools for business processes and documents, conducted under the process governance by a group of volunteers from around the world.

Monday 12:15 pm - 1:30 pm


Monday 1:30 pm- 2:00 pm


Norman Walsh

DocBook is a general purpose XML schema particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). DocBook has been under active maintenance for more than 20 years; it began life as an SGML document type definition. Because it is a large and robust schema, and because its main structures correspond to the general notion of what constitutes a “book,” DocBook has been adopted by a large and growing community of authors writing books of all kinds. DocBook is supported “out of the box” by a number of commercial tools, and there is rapidly expanding support for it in a number of free software environments. These features have combined to make DocBook a generally easy to understand, widely useful, and very popular schema. Dozens of organizations are using DocBook for millions of pages of documentation, in various print and online formats, worldwide.

Monday 2:00 pm - 2:30 pm


Debbie Lapeyre, Mulberry Technologies

The Journal Article Tag Suite is an application of NISO Z39.96-2015, which defines a set of XML elements and attributes for tagging journal articles. BITS, the Book Interchange Tag Suite, and NISO STS, the NISO Standards Tag Suite, are applications of NISO Z39.96-2015 for books and standards. All of the models share a common foundation, customized to meet the needs of specific document types.

Monday 2:30 pm - 3:00 pm


Kristen James Eberlein, Eberlein Consulting

DITA, the OASIS Darwin Information Typing Architecture, is an XML-based specification for modular and extensible topic-based information. DITA is specializable, which allows for the introduction of specific semantics for specific purposes without increasing the size of other DTDs, and which allows the inheritance of shared design and behavior and interchangeability with unspecialized content.

Monday 3:00 pm - 3:30 pm

Coffee Break

Monday 3:30 pm - 4:00 pm


Syd Bauman, Northeastern University

TEI, the Text Encoding Initiative, was founded in 1987 to develop guidelines for encoding machine-readable texts of interest to the humanities and social sciences. The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and maintains an eponymous technical standard, a journal, a wiki, a GitHub repository and a toolchain. The TEI Guidelines, which collectively define an XML format, are the defining output of the community of practice. The format differs from other well-known open formats for text (such as HTML and OpenDocument) in that it's primarily semantic rather than presentational.

Monday 4:00 pm - 4:30 pm

Structural metadata & standardization failures: Just a little bit of history repeating

Jerome McDonough, University of Illinois

The design of the Extensible Markup Language has placed a premium on modularity and promoting the re-use and intermixture of pre-existing tag sets in the service of new goals. While this design tends to promote standardization, it clearly does not guarantee it, as the multiplicity of competing XML languages for rights expression or word processing demonstrates. This paper examines the history and evolution of structural metadata standards within the digital library community to help identify factors leading to production of multiple markup languages competing for similar or identical ecological niches.

Monday 4:30 pm - 5:30 pm

Open Discussion

symposium participants

After hearing about a wide variety of markup vacabularies what have we learned? What do they seem to have in common? What could one community learn from another? How could we all improve use of shared markup vocabularies? This is the time for observations, suggestions, and complaints. What do you think?