Balisage logo

Balisage 2017 Program

Tuesday, August 1, 2017

Tuesday 8:00 am - 9:00 am

Conference Registration & Breakfast

Pick up your conference badge in Baker (across the hall from Sinequa, the conference room) and join us for a light breakfast.

Tuesday 9:00 am - 9:15 am

Welcome and Introductions

Tuesday 9:15 am - 9:45 am

It is time to make ourselves clear

Tommie Usdin, Mulberry Technologies

We, the markup community, have for too long pussy-footed around in a misguided effort to get along with the unenlightened. We have compromised, equivocated, and taken one thing after another into consideration. That time is over. It is time for us to insist that the world straighten up and fly right. To stand up and put our collective feet down! Start marking up documents with explicit tags, no more of this word-processor hide-the-markup stuff. Separate content from format! Make all publications accessible! Enable interoperability! We know what's right; let's do it and demand that others do, too!

Well, if they don't mind. And if they can afford it. And if it won't break any current systems, and nobody is offended. Of course.

Tuesday 9:45 am - 10:30 am

Doing digital humanities today: what does it take? A view from the NEH

Brett Bobley, Office of Digital Humanities, National Endowment for the Humanities

What does it take to do good digital humanities work nowadays? What counts as solid work? What counts as cutting-edge? Projects involving cultural-heritage data and serving long-term scholarly goals have often illuminated issues in the management of information. The Director of the Office of Digital Humanities at the National Endowment for the Humanities tells us about the current state of the art in digital humanities: what trends are visible in the field, and what tradeoffs face those working in this field.

Tuesday 11:00 am - 11:45 am

Patterns and antipatterns in XSLT micropipelining

David J. Birnbaum, University of Pittsburgh

The program logic of pipelining is often expressed by nesting function calls within one another. An alternative formulation assigns intermediate results to a number of convenience variables. But in XSLT, legibility can break down in a sea of parentheses, and maintenance is a challenge when both variables and references need to change or be reordered as you add, delete, or rearrange steps in the pipeline. We can avoid these challenges by expressing the pipeline as a sequence of simple steps (for example, operations in a visitor pattern or a table of matching value pairs for string replacement). Single steps are easier to read and also easier to edit because adjustments to a step are self-contained and do not affect other steps.

Tuesday 11:45 am - 12:30 pm

Making a difference by processing JSON as XML

Robin La Fontaine, DeltaXML

Anyone who has ever published more than one version of a document can readily understand the benefits of tracking changes within it. Systems and APIs that exchange JSON haven’t typically been able to take advantage of such tracking, though the problems of changing JSON structures are essentially the same as in XML. This paper looks beyond JSON Patch (a fine specification as far as it goes) to a more general mechanism for representing changes in JSON, one that includes the context of the changes so that new ways of processing change can be supported. Along the way, it introduces a loss-less, bi-directional transformation from JSON to XML, making the more mature XML processing infrastructure available to JSON developers. The best of both worlds.

Tuesday 2:00 pm - 2:45 pm

An XSLT translator for the openEHR

John Chelsom

Building a pure XML electronic health records system such as cityEHR eventually requires translating EHR documents. But before that can happen, it is first necessary to translate the patterns of ISO 13606/openEHR from its specialized Archetype Definition Language (ADL) into something that can be processed with XML tools like XSLT. While ADL began as a domain-specific language analogous to XSD, it has its own unique syntax that is not XML. Nonetheless it has been possible to create recursive string processors in XSLT to convert ADL templates into OWL/XML assertions. As a result, the cityEHR system can be built in pure XML without resorting to proprietary Java processors for openEHR.

Tuesday 2:45 pm - 3:30 pm

How many hamsters does it take? Under the hood at PMC

Jeff Beck, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

PubMed Central (PMC) is a free full-text XML-based archive of biomedical and life sciences journal literature at the U.S. National Library of Medicine. Publishers submit XML, images, and supplemental files for their articles, the text converts to a common JATS XML, and they load to the database cleanly. The power of XML compels it! But that is not the whole story (or even a true story). Policies, miscommunications, and technical misunderstandings conspire against our Utopian XML workflow. We will share the details of how we get 30,000 new articles into the archive each month.

Tuesday 4:00 pm - 4:45 pm

Testing Schematron using XSpec (LB)

Vincent Lizzi, Taylor & Francis

Schematron is a powerful, flexible, and user-friendly tool for validating and reporting on XML content. Developing a Schematron schema can involve a lot of testing to ensure that each Schematron rule works as expected; a robust test suite may contain multiple XML samples for every Schematron rule, so as to test both passing and failing conditions. XSpec, an open source unit test and behavior driven development framework for XSLT and XQuery, now has the ability to test Schematron schemas. Tests for a Schematron can be described using XSPec test scenarios, and the tests can be run automatically by XSpec. The end result is a report showing which tests passed, which failed. The new support for Schematron testing in XSpec enables test-driven development for Schematron and automated regression testing for Schematron in a continuous integration environment.

Tuesday 4:45 pm - 5:30 pm

Publishing multiple editions

Murray Maloney

Murray Maloney will report on an academic book project that produced multiple editions of the same book in print, ePub, and PDF, each year for four years. The book is a multi-disciplinary study of the art and science of organizing. Murray contributed and edited content, participated in the design, and coordinated production. He will talk about what they wanted to do, what they did in each successive edition, what didn’t get done, what worked, what didn’t, what they would do differently in retrospect, and what they would do the same, if they had it to do over.

Tuesday 5:30 pm - 6:30 pm


Please join us for Cheese, Wine, and Conversation!

Tuesday 8:00 pm - 10:00 pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. We'll have desserts, coffee, a comfortable place to talk, and possibly a toy or two worth a look.

Wednesday, August 2, 2017

Wednesday 8:00 am - 9:00 am

Conference Registration & Breakfast

Pick up your conference badge in Baker (across the hall from Sinequa, the conference room) and join us for a light breakfast.

Wednesday 9:00 am - 9:45 am

Translating imperative algorithms into declarative, functional terms

C. M. Sperberg-McQueen, Black Mesa Technologies

When developing in XSLT or XQuery, it is sometimes useful or necessary to re-implement standard algorithms in declarative and functional ways. This can be challenging because standard algorithms are often described in imperative terms unsuitable for use in XSLT or XQuery, which are declarative and functional languages. Earley parsing illustrates some of the challenges which arise. Earley’s parsing algorithm is interesting because it can parse an input string against any context-free grammar in Backus-Naur Form, including grammars that are not well-behaved and so are unsuitable for recursive-descent or table-driven approaches. Re-thinking the Earley algorithm not only makes it easier to implement in XSLT and XQuery, but helps make clear why the parser is both complete (it will always find a parse if there is one) and correct (any parse it finds will be a real parse).

Wednesday 9:45 am - 10:30 am

Automatically denormalizing document relationships (LB)

Will Thompson, O'Connor's

What do you do if your data is stored in XML documents, but you need to perform queries on the data that might be better suited to a relational database? Automatic denormalization at the time of data creation can solve many problems, particularly with many-to-many relationships. Reading denormalized documents is a reliably simple and fast operation compared to relational joins, and the trade-off for additional write-time processing will net gains in non write-dominated database workloads.

Wednesday 11:00 am - 11:45 am

It’s more than just overlap: Text as graph; Refining our notion of what text really is—this time for sure! (LB)

Ronald Haentjens Dekker, Huygens Institute for the History of the Netherlands & David J. Birnbaum, University of Pittsburgh

The XML tree paradigm has several well-known limitations for document modeling and processing. Some of these have received a lot of attention (especially overlap), some have received less (e.g., discontinuity, simultaneity, transposition, white space as crypto-overlap). Many of these have work-arounds, also well known, but—as is implicit in the term “work-around”—these work-arounds have disadvantages. Because they get the job done, however, and because XML has a large user community with diverse levels of technological expertise, it is difficult to overcome inertia and move to a technology that might offer a more comprehensive fit with the full range of document structures with which researchers need to interact both intellectually and programmatically. A high-level analysis of why XML has the limitations it has can enable us to explore how an alternative model of Text as Graph (TAG) might address these types of structures and tasks in a more natural and idiomatic way than is available within an XML paradigm.

Wednesday 11:45 am - 12:30 pm

The secret life of schema in web protocols and software APIs (LB)

David Lee, Nexstra, Inc

Harvesting a large corpus of data and then attempting to generate a schema for it often results in an ugly schema and a confused API. Yet there are software tools designed for refactoring Java classes that can be applied to a JSON schema to return a much cleaner schema. Starting with OpenAPI documents, corresponding Java Code can be auto-generated that produces XML documents representing in high fidelity the underlying data which the API exposes only indirectly and incrementally. Similarly it is possible to create an XQuery implementation of an exhaustive search over the API that creates a semantic representation in XML and RDF stored in a NoSQL database.

Wednesday 2:00 pm - 2:45 pm

The concrete syntax of documents: Purpose and variety

Mary Holstege, MarkLogic

In the mid-1980s, a research group built an ambitious language-development environment supporting parsing and rendering of samples of a programming language that was itself under development, and for which standard models of context-free grammars were not suitable. We learned a lot from that project. Those lessons extend naturally to structured documents: separate the presentation from the structure; run rules both ways; be aware that language versioning is a form of language translation; separate the type of an abstract syntax unit from its role within the parent construct; realize that presentation order relates closely to layout across space; observe that presentation order for non-lists is an aspect of concrete syntax; separate the abstract geometry from the concrete geometry; ... There is more.

Wednesday 2:45 pm - 3:30 pm

Pointy brackets for poets: Can an English Major Use XML?

Syd Bauman, Northeastern University

For nearly thirty years the Women Writers Project has been training university students in the humanities to encode SGML and XML documents and to edit marked up texts, without the WYSIWYG interfaces that are sometimes thought to be absolutely essential for domain experts interacting with marked up data. A historical survey of the tools and training methods used in the project will be followed by an attempt to identify what can be learned from the project's experience: what works, what doesn't work, and what (we think) are the ideal circumstances for teaching XML.

Wednesday 4:00 pm - 4:45 pm

Encoding the Ethiopic manuscript tradition

Pietro Maria Liuzzo, Universität Hamburg

The Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea project aims to construct a virtual research environment to encode and manage the rich and complex manuscript tradition of the Ethiopian and Eritrean Highlands. The Ethiopic manuscript culture, consisting of varying and difficult-to-identify literary works, is a living tradition. This project explores the difficulties in encoding and managing the relationships not only between the manuscripts themselves, but also between the manuscripts and the broader Ethiopian literary traditions that include Greek and Arabic texts as well.

Wednesday 4:45 pm - 5:30 pm

Your Standard Average Document Grammar

Peter Flynn, University of Cork, Ireland

For all the surface differences, we are all working from the same fundamental view of document structure, a Standard Average Document Grammar (similar in spirit to the ‘Standard Average European’ grammatical model with which linguists describe European languages). Most prose-based XML applications adopt or adopt-and-modify one of a few public document grammars. Fundamentally, these document grammars are all expressions of the same logical view of prose structure. This Standard Average Document Grammar includes nested headed sections, restrictions on what may and may not occur, links to referenced portions of the document, and citations of outside material. The modifications and customizations users make to these document grammars are informative both in their variety and their similarity, and in the fact that they all fit so comfortably within the Standard Average Document Grammar.

Wednesday 8:00 pm - 10:00 pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. Will someone bring out a card game this evening?

Thursday, August 3, 2017

Thursday 8:00 am - 9:00 am

Conference Registration & Breakfast

Pick up your conference badge and join us for a light breakfast Baker (across the hall from Sinequa, the conference room).

Thursday 9:00 am - 9:45 am

XML applications on the web: Implementation strategies for the Model component in a Model-View-Controller architectural style

Zahra Al-Awadi, Anne Brüggemann-Klein, Michael Conrads, Andreas Eichner, & Marouane Sayih, Technical University of Munich (TUM)

How can we use XML, XQuery, and SCXML (State Chart XML) to implement the Model component in a Model-View-Controller web application? First we must be able to do function decomposition of XQuery functions that perform updates (a task rendered more complex by XQuery's restrictions on updating expressions). Then we would like a systematic method of using UML state diagrams in the design of the web application and of integrating an SCXML processor into the implementation of the Model component. A BaseX extension implementing the WebSockets protocol enables us to make the Model observable and thus to realize multi-player games that require server push. All these practices are compatible with domain-driven design and model-driven solutions; they pave the way for XML developers to create XML-based applications on the web.

Thursday 9:45 am - 10:30 am

SOCRview: a case study in web application development

John Cooper, SAGE Publications

SOCRview is part of the SAGE Online Content Repository (SOCR); it provides generalized content access to other SOCR services, an access API for technical users and, through a very thin XSLT 1.0 layer, a generalized web browser interface. A RESTful web application layer exposes content — including transformed, packaged, and listed or analyzed content — to users with varying levels of technical expertise. SOCRview exposes this content through persistent, readable, and meaningful URIs. From the first proof-of-concept through to the fully realized service, the system teaches a number of lessons.

Thursday 11:00 am - 12:30 pm

Panel: Optimization

Abel Braaksma, Adem Retter, and Tommie Usdin, Mulberry Technologies

So you think you wanna optimize your XML, do ya?

Well, do you?

Panelists discuss how to optimize for interchange and interoperability. And, while they're at it, reduce the file size, increase the readability, future-proof it by making sure it conforms to all applicable standards now and forever, and … What do you mean, I can’t have it all? XML is supposed to enable all of these things!

Conference participants will chime in with questions, opinions, and counter-examples. Someone is almost guaranteed to quote Donald Knuth or Michael Jackson. Premature optimization is the root of all evil, yes, but what exactly is premature? What is the expected gestation period for optimization? Are we optimizing for file size, processing speed, retrieval speed, loading speed, longevity of data, ease of comprehension without having to check the manual to discover that “pglg” means “programListing”? Are we optimizing our XML, our XSLT, our XQuery, our XProc, or something else? By the end of the discussion, optimization will no longer seem quite as simple or straightforward as it did before -- but you'll be able to do a much better job of it.

Thursday 1:15 pm - 2:00 pm (during lunch)

Balisage Bard

Lynne Price, Gamemaster

Exercise your literary creativity with poems, short stories, jokes, and songs. Subject matter must be related to Balisage (markup, venue, papers, and so forth). Read your effort during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit for each presentation. As many submissions as time permits will be taken; authors will be called in the order submissions are received.

Thursday 2:00 pm - 2:45 pm

Entity services in action with NISO STS (LB)

Matt Turner, Marklogic

Standards impact nearly every industry and government process, and standards organizations like BSI and ISO have been leading a change in how to provide a variety of audiences not just with the standards documents themselves but with valuable data about standards and the process of standardization. Now, there is a new data standard for standards called NISO STS (National Information Standards Organization Standards Tag Set). Working with ISO and sample content from ISO, MarkLogic has created a demonstration of NISO STS using MarkLogic 9's new Entity Services feature. This session will review the industry impact of being able to leverage this new standard at the database layer and includes a live demo.

Thursday 2:45 pm - 3:30 pm

Pilot project to identify plagiarized images in STM journal submissions (LB)

Mark Gross, Data Conversion Laboratory & Ari Gross, CVISION Technologies

While detection of plagiarism in the textual portion of documents is commonly done today, detecting it in images is a different challenge. What if an image isn't simply copied but is rescaled or cropped or has its color space altered? A pilot project dissects images using tiled descriptors that can be compared, in spite of image transformations.

Thursday 4:00 pm - 4:45 pm

Interactive web applications demonstrating SaxonJS

Wendell Piez, Piez Consulting Services

SaxonJS promises “real” XSLT in the browser. Old-timers are thrilled, cool kids are showing interest, and many people are very intrigued. The architecture is still characterized by a strong distinction between logical and presentation layers, but it is now possible to program user interaction in the browser as event-driven transformation logic, using XSLT alone. The unit of composition (the “work”) now corresponds to the unit of delivery (no longer a “page” but a “resource”). Most importantly, it is now possible to build and deploy interactive web sites with XML and XSLT alone -- no Java, no Javascript, no specialized server app or complex batch processing. But to deploy, you need a web server, a compiled XSLT stylesheet, and a certain amount of infrastructure. XML Jelly Sandwich, a starter XSLT hosted on GitHub, can provide infrastructure of sufficient quality for testing. Cool demos of TEI-tagged poetry and BITS-tagged prose meditations may help convince you to try SaxonJS.

Thursday 4:45 pm - 5:30 pm

Compiling XSLT3, in the browser, in itself (LB)

John Lumley, jωL Research & Saxonica; Debbie Lockett & Michael Kay, Saxonica

This paper describes the development of a compiler for XSLT3.0 which can run directly in modern browsers. It exploits a virtual machine written in JavaScript, Saxon-JS, which interprets an execution plan for an XSLT transform, consuming source documents and interpolating the results into the displayed web page. Ordinarily these execution plans (Stylesheet Export File, SEF), which are written in XML, are generated offline by the Java-based Saxon-EE product. Saxon-JS has been extended to handle dynamic XPath evaluation, by adding an XPath parser and a compiler from the XPath parse tree to SEF. By constructing an XSLT transform that consumes an XSLT stylesheet and creates an appropriate SEF, exploiting this XPath compiler, we have managed to construct an in-browser compiler for XSLT3.0 with high levels of standards compliance. This opens the way to support dynamic transforms, in-browser stylesheet construction and execution, and a potential route to language-portable XSLT compiler technologies.

Thursday 8:00 pm - 10:00 pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. We might be talking about markup or the organization of electronic materials, but we might just as easily be talking about astronomy, butterflies, scuba diving, antique cars, or ... someting else entirely.

Friday, August 4, 2017

Friday 8:00 am - 9:00 am


Join us for a light breakfast in Baker (across the hall from Sinequa, the conference room).

Friday 9:00 am - 9:45 am

Using DITA to create security configuration checklists

Joshua Lubell, National Institute of Standards and Technology

Security configuration checklists, represented using the Extensible Configuration Checklist Description Format (XCCDF), are frequently used to monitor computers and other information technology products for compliance with security policies. XCCDF syntax is not easy to author. Current practice is to maintain it with a fairly ad hoc approach to both authoring and content reuse, documented in XSLT scripts and Makefiles that contain directory dependencies. This small-scale case study investigates implementing shorthand XML vocabularies for XCCDF rules and profiles as specializations of DITA “concept”s and “map”s respectively. The representation of an XCCDF benchmark document as a specialized DITA map type makes explicit the high-level checklist structure currently implicit in the Makefiles and XSLT and could simplify the shorthand-to-XSLT transforms. In addition, DITA provides a more stable mechanism for reuse of content fragments. Preliminary results look very promising!

Friday 9:45 am - 10:30 am

Life, the universe, and CSS tests

Tony Graham, Antenna House

The W3C CSS Working Group maintains a CSS test suite already composed of more than 16,000 tests and growing constantly. Tracking the results of running such a large number of tests on a PDF formatter is more than anyone could or should want to do by hand. The system needs to track when a test's result changes so that the changes can be verified and the test's status updated. Finding differences is not the same as checking correctness. An in-house system for running the tests and tracking their results has been implemented as an eXist-db app. Is it a masterpiece of agile development, or an example of creeping featurism?

Friday 11:00 am - 11:45 am

Bridging the gap between XML and RDF validation

Kurt Cagle, Semantical LLC

Users of RDF, while having access to the vast expressive power of OWL, have not (unlike users of conventional XML applications) had a convenient way of building applications, validating documents, or constructing user interfaces. The Shape Constraint Language, SHACL, a SPARQL-friendly validation language that bears a lot of resemblance to XSD, may at last provide builders of RDF information bases some of the conveniences that XSD users have long enjoyed. SHACL could act as a unifying bridge between the world of RDF and those of XML and JSON and thus may enable processing pipelines that involve multiple worlds.

Friday 11:45 am - 12:30 pm

Text. You keep using that word ...

C. M. Sperberg-McQueen, Black Mesa Technologies

Every data representation constitutes a data interpretation. What are SGML, XML, and other tools for descriptive markup telling us about the nature of text?