Balisage logo Symposium on Markup Vocabulary Ecosystems logo

Balisage 2018 Program

Monday, July 30, 2018
symposium: Markup Vocabulary Ecosystems

Tuesday, July 31, 2018

Tuesday 8:00 am - 9:00 am (location: Baker)

Conference Registration & Breakfast

Pick up your conference badge in Baker (across the hall from Sinequa, the conference room) and join us for breakfast.

Tuesday 9:00 am - 9:15 am (location: Sinequa)

Welcome and Introductions

Tuesday 9:15 am - 9:45 am

YAMC? Why are we here? Why are we here again?

B. Tommie Usdin, Mulberry Technologies

There is nothing new about markup, or even generic markup. (I have been working with generic markup for 40 years!) So what is there to talk about after all this time? What are we accomplishing by gathering at Balisage: The Markup Conference? Why do some of us find events like this one valuable? What can you do to make it valuable to you and to the others here? Not only is markup old hat, XML is 20 years old, and some people in the outside world keep trying to tell us that its time has passed.

Groups are still gathering to create shared markup vocabularies in order to enable high quality information sharing. Scholars are using bespoke markup vocabularies to enable them to focus on the works they are reading, interpreting, and writing. Trendy end user displays are being populated by solid maintainable XML content. An ever improving tool set is available to users of marked up documents. We learn from each others’ projects, tools, techniques, and experiences — and enjoy the process!

Tuesday 9:45 am - 10:30 am

In praise of XML

Steven Pemberton

It’s about sixty years since the start of public computing; fifty years since the term “software crisis” was coined; Europe is celebrating thirty years of the internet this year; and we’re celebrating twenty years of XML this year too. It is a milestone year for XML, and an important juncture as well. The last W3C XML WG has finished its work. What next? Where should we be heading, what do we want to achieve next, and how should we do it?

Tuesday 10:30 am - 11:00 am

Break

Tuesday 11:00 am - 11:45 am

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Tuesday 11:45 am - 12:30 pm

Documentation of XSLT stylesheets with code intelligence

Vasu Chakkera, Sapient Corporation

We benefit more from documenting why certain functionality was implemented, or coded in a particular way in an XSLT stylesheet, than from the typical “what the code does” comment. K7:XSLTDocuMentor is a personal project (non-commercial) to create XSLT stylesheet documentation from both inline stylesheet comments and documentation living outside the stylesheet. The external documentation lives in XML files, written in a variant of DocBook, that are generated by script and populated by XSLT analysts. These files are then used to generate configurable HTML documentation that provides the text as well as 1) hyperlinks to named templates, global variables and functions, imported/included templates and 2) reports of code violations such as potentially overridden functions, single-expression <xsl:choose>s, unused variables, and the like. Code violation criteria are defined in user-configurable rule sets.

Tuesday 12:30 pm - 2:00 pm (location: Social Circle - lobby level)

Lunch

Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.

Tuesday 2:00 pm - 2:45 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Tuesday 2:45 pm - 3:30 pm

Implementing and using concurrent document structures

C. M. Sperberg-McQueen, Black Mesa Technologies

Markup necessarily expresses a view of the text that it marks up; it codifies the boundaries of structures and their interrelationships in a precise way. It is incontrovertibly the case that multiple structures exist in many documents; lines of verse and sentences within them, for example. In cases where these different structures are applied to the same text, when the collected sequences of characters in the leaf nodes are the same, it is possible to document the structures concurrently. The syntax, and the possibilities of data models that support both dominant and recessive hierarchies, open up interesting avenues for exploration.

Tuesday 3:30 pm - 4:00 pm

Break

Tuesday 4:00 pm - 4:45 pm

Using Excel spreadsheets to communicate XML analysis to subject matter experts

Betty Harvey, Electronic Commerce Connection

What is the best approach for analyzing large XML datasets? Reading thousands (or possibly millions) of pages of raw XML to fully understand the markup constructs is not feasible. CSS stylesheets are useful for displaying XML, but that’s not practical at large scales either. Creating Excel spreadsheets to hold analysis information is a very useful tool for understanding the full range of XML data constructs. This approach is also understandable to the stakeholders who control the datasets. This paper will describe an approach for creating document analysis Excel spreadsheets using XSLT and XML datafiles bundled using XLink master documents.

Tuesday 4:45 pm - 5:30 pm

In defense of style guides

Ari Nordström

The markup world has for more than four decades obsessed over document structures and schema languages for representing and validating those structures. From time to time, markup enthusiasts have managed to pry themselves away from schema languages long enough to create languages for manipulating structures and rendering those structures on presentation surfaces. But they’re missing the point! To be sure, all this language development has enabled mechanical processing of data, but there’s no assurance that it will make the data comprehensible. That’s where real style guides and the editors who apply and enforce them come in. It’s all about supporting semantics, the real information of which all that markup should be the servant. Come pay your respects to Strunk and White!

Tuesday 5:30 pm - 6:30 pm (location: Gathering Place - lobby level)

Reception

Please join us for cheese, wine, and conversation!

Tuesday 8:00 pm - 10:00 pm (location: Baker)

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. We'll have desserts, coffee, a comfortable place to talk, and possibly a toy or two worth a look.

Wednesday, August 1, 2018

Wednesday 8:00 am - 9:00 am (location: Baker)

Conference Registration & Breakfast

Pick up your conference badge in Baker (across the hall from Sinequa, the conference room) and join us for breakfast.

Wednesday 9:00 am - 9:45 am

TAGML: A markup language of many dimensions

Ronald Haentjens Dekker, Elli Bleeker, Bram Buitendijk, & Astrid Kulsdom, KNAW Humanities Cluster
David J. Birnbaum, University of Pittsburgh

The virtues and limitations of the XML tree paradigm have been discussed and criticized ad infinitum, but a more general question is how any model and markup language (need to) align with the functional requirements of an intuitive and effective workflow. How should we make decisions about document modeling? The TAG document structure, the TAGML markup language, and the Alexandria TAG reference implementation point toward a combination of model, syntax, repository, and workflow that begins to offer users an integrated framework for expressing their interpretation of the structural properties of text and document.

Wednesday 9:45 am - 10:30 am

Metaphors we code by: Taking things a little too seriously

Mary Holstege, MarkLogic Corporation

Computer information and software are abstractions. We comprehend them through the use of metaphors. Different metaphors lead us to understand our information and our processing of it in different ways. They lead us to focus on certain aspects of the experience over other aspects. We use metaphors in talking about markup, but rarely think about them. Being mindful of what our metaphors are telling us implicitly, allows us to see what we are missing. By taking the metaphor a little too seriously, we can look to the non-metaphorical domain as a source of inspiration for good practices.

Wednesday 10:30 am - 11:00 am

Break

Wednesday 11:00 am - 11:45 am

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Wednesday 11:45 am - 12:30 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Wednesday 12:30 pm - 2:00 pm (location: Social Circle - lobby level)

Lunch

Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.

Wednesday 2:00 pm - 2:45 pm

The journey of “The History of the Accademia di San Luca, c. 1590–1635” into and out of XML

Peter Lukehart, National Gallery of Art

Wednesday 2:45 pm - 3:30 pm

Panel discussion: Why successful XML/SGML projects are reimplemented or decommissioned

James Mason
Bob Yencha
panelist to be announced

We’ve seen it happen again and again. The responsible parties survey the available technologies, choose XML/SGML, and complete the project successfully. The solution meets the (original) requirements and solves the (initial) problems. And then at some point, for whatever reasons, the project is reimplemented with other technologies or decommissioned entirely. Perhaps it’s XRX (XML, REST, XQuery) applications rewritten in Ruby. Perhaps it’s an XML application using XQuery and XSLT reworked in Javascript and HTML. There are plenty of examples of projects first built in XML/SGML and then rebuilt using non-XML technology or retired. Is this a problem? Is it an opportunity? Is it BOTH a problem and an opportunity? How can we ensure that our projects can survive not just changes in technology but changes in organizational techno-culture?

Wednesday 3:30 pm - 4:00 pm

Break

Wednesday 4:00 pm - 4:45 pm

Stand-off bridges in the Frankenstein Variorum Project

Elisa E. Beshero-Bondar, University of Pittsburgh at Greenburg
Raffaele Viglianti, Maryland Institute for Technology in the Humanities at the University of Maryland

The Frankenstein Variorum Project works with multiple editions of a single novel, originating in several divergent markup systems. To reconcile these editions, we have had to flatten the original hierarchical structures and identify low-level units of lateral intersection, points shared in common across editions, in order to construct “bridge” or intermediary formats that can be compared automatically. We transform the output of the comparison into a TEI format we call stand-off parallel segmentation, in which stand-off pointing mechanisms operate like a switchboard: they connect the individual editions, which for the most part can remain undisturbed by the comparison process. The TEI “stand-off bridge” can help overcome the silo effect of specially encoded editions. Far from being an ephemeral support structure, the stand-off bridge provides a “backbone” for the variorum project because it improves the interoperability and interchangeability of all the markup ecosystems involved. The stand-off bridge allows us to reconstitute the hierarchies in a way that expresses intersections essentially as a graph structure of nodes with edge pointers to comparable nodes.

Wednesday 4:45 pm - 5:30 pm

Markup ethics: Trolley problems for text encoders

Allen H. Renear, University of Illinois – Urbana-Champaign

We are engineering; we are solving problems, improving reliability, effectiveness, efficiency. But more generally encoding decisions determine whether, how, how much, when, and for whom the information in our documents will be useful. This seems to be important not just instrumentally, but with respect to larger human interests as well, or even to the very largest human interests. Just how else to explain the earnestness, anger, fear, and tears one sees at Balisage? But problems abound, and some tradeoffs appear not just incalculable, but incommensurable. Left track or right?

Wednesday 8:00 pm - 10:00 pm (location: Baker)

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. Will someone bring out a card game this evening?

Thursday, August 2, 2018

Thursday 8:00 am - 9:00 am (location: Baker)

Conference Registration & Breakfast

Pick up your conference badge and join us for breakfast.

Thursday 9:00 am - 9:45 am

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Thursday 9:45 am - 10:30 am

White-hat web crawling: Industrial strength web crawling for serious content acquisition

Mark Gross, Tammy Bilitzky, Rich Dominelli, & Allan Lieberman, Data Conversion Laboratory (DCL)

Much original source material today appears only on the web or with the web version as the copy of record. We have been developing methods and bots to facilitate high-volume data retrieval from hundreds of websites, in a variety of source formats (HTML, RTF, DOCX, TXT, XML, etc.), in both European and Asian languages. We produce a unified data stream which we then convert into XML for ingestion into derivative databases, data analytics platforms, and other downstream systems. We will examine the thought behind our approaches, the analysis techniques we used to detect and deal with website and content anomalies, our methods to detect meaningful content changes, and our approaches to verification.

Thursday 10:30 am - 11:00 am

Break

Thursday 11:00 am - 11:45 pm

Dynamic style

Steven J. DeRose, Consultant

Many capabilities of hypertext can be realized by declarative markup and styling technologies. This has been made evident on the web through the widespread adoption of HTML markup styled with CSS properties. But sometimes you need to reach beyond declarative limitations to enable truly interactive hypertext. This, also, can be seen on the web through the near ubiquity of JavaScript. But today’s JavaScript solutions are stand-off. Using them requires knowing how to navigate the structure of the document and find the relevant elements. What if JavaScript could be employed from within CSS? Could this simplify things? How would it make long-neglected hypertext capabilities easier to achieve?

Thursday 11:45 pm - 12:30 pm

CETEIcean: TEI in the browser

Hugh Cayless, Duke Collaboratory for Classics Computing (DC3)
Raffaele Viglianti, Maryland Institute for Technology in the Humanities (MITH)

The typical method for displaying a TEI document on the web is to use XSLT or XQuery to pre-transform it into static HTML or dynamically transform it when requested. CETEIcean is a Javascript library designed to render TEI XML directly in a modern web browser using custom elements. CETEIcean was developed to support a lightweight TEI presentation workflow that requires neither pre-display document transformation nor a complex server-side architecture. This makes possible a distributed web-based document preparation workflow. Method explained; examples shown; limitations discussed.

Thursday 12:30 pm - 2:00 pm (location: Social Circle - lobby level)

Lunch

Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.

Thursday 1:15 pm - 2:00 pm (during lunch -- location: Sinequa)

Balisage Bard

Lynne Price, Gamemaster

Once again, Balisage Bard gives you the opportunity to exercise your literary creativity with poems, short stories, jokes, and songs. Subject matter must be related to Balisage (markup, venue, papers, and so forth). Read your effort during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit for each presentation. As many submissions as time permits will be taken; authors will be called in the order they sign up (there will be a sign-up sheet at conference registration). If time permits, additional volunteers will be accepted during the game.

Thursday 2:00 pm - 2:45 pm

Easing the road to declarative programming in XSLT for imperative programmers

Abel Braaksma, Abrasoft

Programmers who learned their trade in mainstream languages like C, C#, Java, Python, PHP, Objective-C or Ruby sometimes find it challenging to switch their mindset from such imperative languages to the declarative nature of XSLT. In imperative languages you tell the computer what to do, step by step. In declarative and functional languages, you tell the computer what result you wish for, and how that depends on your input. You guide the processor with a soft hand and give it suggestions, instead of imperatively making finite decisions for the compiler one by one. There’s no need to become a fully fledged functional programmer and understand all its paradigms before you can be relatively versatile with writing effective XSLT stylesheets. After mastering the basics of the declarative mindset and learning to think not in opening and closing tags, but instead in trees and traversals, we can work with XSLT stylesheets without resorting to frustratingly deeply nested <xsl:if> and <xsl:choose> elements. I hope to help both seasoned XSLT programmers and interested interpretive programmers to not be afraid of the wolf.

Thursday 2:45 pm - 3:30 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Thursday 3:30 pm - 4:00 pm

Break

Thursday 4:00 pm - 4:45 pm

Adventures in automating

Katherine Ford & Will Thompson, O’Connor

Significant efforts to improve and automate long-standing manual processes are fraught with peril. A third party tool might appear attractive at first, but when we tried one, it turned out to be difficult to integrate with our existing engine’s data and processing model. Building our own solution revealed that technologies long thought moribund — like XSLT in the browser — might turn out to be alive and well, even on the modern, mobile web. What is sometimes judged as a fault with technology may simply be a fault of imagination and understanding. And throughout, perhaps the most important part of any solution is a lucid understanding of the problem.

Thursday 4:45 pm - 5:30 pm

Fractal information is

Wendell Piez

We wrestle often with the granularity of data formats, object models, interfaces, and APIs: their strengths, their weaknesses, and the supports they provide to creators and consumers. Opinion is often muddled or extrapolated from limited experience: “X is lightweight”, “Y is ‘self-describing’”, “everyone prefers Z”. This is a fractal experience; there is self-similarity across scales. Issues that arise at one level of the system have weird echoes elsewhere. Indeed, one way of discriminating among options (XML, HTML, Markdown, JSON, YAML, SAX, DOM, etc.) is to consider their different approaches to the problem of managing the chaos and representing (ir)regularity. This examination leads to a better understanding of how to exploit their differences to make them work better together.

Thursday 8:00 pm - 10:00 pm (location: Baker)

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. We might be talking about markup or the organization of electronic materials, but we might just as easily be talking about astronomy, butterflies, scuba diving, antique cars, or ... something else entirely.

Friday, August 3, 2018

Friday 8:00 am - 9:00 am (location: Baker)

Breakfast

Join us for breakfast.

Friday 9:00 am - 9:45 am

PreTeXt: An XML vocabulary for scholarly documents

Robert A. Beezer, University of Puget Sound

Need to write a textbook or scholarly article for mathematics and the physical sciences? Have too many special challenges for existing vocabularies like DocBook and TEI? Try PreTeXt instead. Researchers in the sciences are often comfortable with LaTeX, but it lacks the flexibility of XML for repurposing text for multiple outputs, particularly those based on HTML. PreTeXt combines an easy-to-learn XML vocabulary with simple escapes to TeX notation and graphics formats common in the sciences. While PreTeXt borrows common elements like paragraphs and lists from HTML, it adds elements appropriate to the subject matter, like “theorem” and “proof”. Supported by an online community, PreTeXt has already been used for dozens of books.

Friday 9:45 am - 10:30 am

How are dependent works realized?

Jacob Jett & David Dubin, University of Illinois

When a work of authorship is published in a new edition, what exactly is the relationship between the edition and the contribution of the author or authors? Specifications in the FRBR family offer contrasting accounts of how we should understand the relationships among the edition, its text, and the work of authorship realized by the both of them. The intellectual contribution of markup in a digital edition adds a further wrinkle.

Friday 10:30 am - 11:00 am

Break

Friday 11:00 am - 11:45 am

Scaling XML using a Beowulf cluster

John J. Chelsom, University of Victoria
Jay H. Chelsom, Abingdon School

Scale up or scale out? A large, XML-centric application such as the cityEHR health records system lends itself naturally to implementation with XML technologies: XForms, REST, and XQuery. One thing is certain about records systems: there will always be more records. Eventually, the system will outgrow its initial environment — and, in the long run, the bigger one that replaces it. Scaling out becomes the only answer. Can you scale a mission-critical medical records application on a cluster of Raspberry Pi computers in a Beowulf cluster? The possibly surprising answer is “yes”.

Friday 11:45 am - 12:30 pm

Why are we here?

C. M. Sperberg-McQueen, Black Mesa Technologies

Sometimes our technological specifications give rise to unexpected ecological niches which in turn give rise to unexpected communities.

Friday 12:30 pm - 2:00 pm (location: Social Circle - lobby level)

Lunch

Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.

Relax at the Cambria and enjoy talking about markup over lunch. For participants who must rush off, wrapping materials and bags are supplied so you can take your sandwich with you to enjoy in the cab or at the airport (but do not eat on Metro!).