Balisage logo

Balisage 2017 Program - Preliminary

Tuesday, August 1, 2017

Tuesday 8:00 am - 9:00 am

Conference Registration & Breakfast

Pick up your conference badge near the conference room and join us for a light breakfast.

Tuesday 9:00 am - 9:15 am

Welcome and Introductions

Tuesday 9:15 am - 9:45 am

It is time to make ourselves clear

Tommie Usdin, Mulberry Technologies

We, the markup community, have for too long pussy-footed around in a misguided effort to get along with the unenlightened. We have compromised, equivocated, and taken one thing after another into consideration. That time is over. It is time for us to insist that the world straighten up and fly right. To stand up and put our collective feet down! Start marking up documents with explicit tags, no more of this word-processor hide-the-markup stuff. Separate content from format! Make all publications accessible! Enable interoperability! We know what's right; let's do it and demand that others do, too!

Well, if they don't mind. And if they can afford it. And if it won't break any current systems, and nobody is offended. Of course.

Tuesday 9:45 am - 10:30 am

Doing digital humanities today: what does it take? A View from the NEH

Brett Bobley, Office of Digital Humanities, National Endowment for the Humanities

What does it take to do good digital humanities work nowadays? What counts as solid work? What counts as cutting-edge? Projects involving cultural-heritage data and serving long-term scholarly goals have often illuminated issues in the management of information. The Director of the Office of Digital Humanities at the National Endowment for the Humanities tells us about the current state of the art in digital humanities: what trends are visible in the field, and what tradeoffs face those working in this field.

Tuesday 11:00 am - 11:45 am

Patterns and antipatterns in XSLT micropipelining

David J. Birnbaum, University of Pittsburgh

The program logic of pipelining is often expressed by nesting function calls within one another. An alternative formulation assigns intermediate results to a number of convenience variables. But in XSLT, legibility can break down in a sea of parentheses, and maintenance is a challenge when both variables and references need to change or be reordered as you add, delete, or rearrange steps in the pipeline. We can avoid these challenges by expressing the pipeline as a sequence of simple steps (for example, operations in a visitor pattern or a table of matching value pairs for string replacement). Single steps are easier to read and also easier to edit because adjustments to a step are self-contained and do not affect other steps.

Tuesday 11:45 am - 12:30 pm

Making a difference by processing JSON as XML

Robin La Fontaine, DeltaXML

Anyone who has ever published more than one version of a document can readily understand the benefits of tracking changes within it. Systems and APIs that exchange JSON haven’t typically been able to take advantage of such tracking, though the problems of changing JSON structures are essentially the same as in XML. This paper looks beyond JSON Patch (a fine specification as far as it goes) to a more general mechanism for representing changes in JSON, one that includes the context of the changes so that new ways of processing change can be supported. Along the way, it introduces a loss-less, bi-directional transformation from JSON to XML, making the more mature XML processing infrastructure available to JSON developers. The best of both worlds.

Tuesday 2:00 pm - 2:45 pm

An XSLT translator for the openEHR

John Chelsom

Building a pure XML electronic health records system such as cityEHR eventually requires translating EHR documents. But before that can happen, it is first necessary to translate the patterns of ISO 13606/openEHR from its specialized Archetype Definition Language (ADL) into something that can be processed with XML tools like XSLT. While ADL began as a domain-specific language analogous to XSD, it has its own unique syntax that is not XML. Nonetheless it has been possible to create recursive string processors in XSLT to convert ADL templates into OWL/XML assertions. As a result, the cityEHR system can be built in pure XML without resorting to proprietary Java processors for openEHR.

Tuesday 2:45 pm - 3:30 pm

How many hamsters does it take? Under the hood at PMC

Jeff Beck, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

PubMed Central (PMC) is a free full-text XML-based archive of biomedical and life sciences journal literature at the U.S. National Library of Medicine. Publishers submit XML, images, and supplemental files for their articles, the text converts to a common JATS XML, and they load to the database cleanly. The power of XML compels it! But that is not the whole story (or even a true story). Policies, miscommunications, and technical misunderstandings conspire against our Utopian XML workflow. We will share the details of how we get 30,000 new articles into the archive each month.

Tuesday 4:00 pm - 4:45 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Tuesday 4:45 pm - 5:30 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Wednesday, August 2, 2017

Wednesday 8:00 am - 9:00 am

Conference Registration & Breakfast

Pick up your conference badge and join us for a light breakfast.

Wednesday 9:00 am - 9:45 am

Translating imperative algorithms into declarative, functional terms

C. M. Sperberg-McQueen, Black Mesa Technologies

When developing in XSLT or XQuery, it is sometimes useful or necessary to re-implement standard algorithms in declarative and functional ways. This can be challenging because standard algorithms are often described in imperative terms unsuitable for use in XSLT or XQuery, which are declarative and functional languages. Earley parsing illustrates some of the challenges which arise. Earley’s parsing algorithm is interesting because it can parse an input string against any context-free grammar in Backus-Naur Form, including grammars that are not well-behaved and so are unsuitable for recursive-descent or table-driven approaches. Re-thinking the Earley algorithm not only makes it easier to implement in XSLT and XQuery, but helps make clear why the parser is both complete (it will always find a parse if there is one) and correct (any parse it finds will be a real parse).

Wednesday 9:45 am - 10:30 am

To Be Announced

Invited speaker
Wednesday 11:00 am - 12:30 pm

Panel: Vocabulary selection

The most appropriate vocabulary to use for any prose corpus is DITA. Er, DocBook. I mean HTML 5. No, JATS. NIEM. TEI! Pick one. Whatever. Users contemplating any other option are at best misguided. Proponents of each of these vocabularies will explain why their vocabulary is a compelling choice to be THE default tag set. Each will also be asked to discuss circumstances in which their tag set of choice may not be appropriate.

Wednesday 2:00 pm - 2:45 pm

The concrete syntax of documents: Purpose and variety

Mary Holstege, MarkLogic

In the mid-1980s, a research group built an ambitious language-development environment supporting parsing and rendering of samples of a programming language that was itself under development, and for which standard models of context-free grammars were not suitable. We learned a lot from that project. Those lessons extend naturally to structured documents: separate the presentation from the structure; run rules both ways; be aware that language versioning is a form of language translation; separate the type of an abstract syntax unit from its role within the parent construct; realize that presentation order relates closely to layout across space; observe that presentation order for non-lists is an aspect of concrete syntax; separate the abstract geometry from the concrete geometry; ... There is more.


Wednesday 2:45 pm - 3:30 pm

Pointy brackets for poets: Can an English Major Use XML?

Syd Bauman, Northeastern University

For nearly thirty years the Women Writers Project has been training university students in the humanities to encode SGML and XML documents and to edit marked up texts, without the WYSIWYG interfaces that are sometimes thought to be absolutely essential for domain experts interacting with marked up data. A historical survey of the tools and training methods used in the project will be followed by an attempt to identify what can be learned from the project's experience: what works, what doesn't work, and what (we think) are the ideal circumstances for teaching XML.

Wednesday 4:00 pm - 4:45 pm

Encoding the Ethiopic manuscript tradition

Pietro Maria Liuzzo, Universität Hamburg

The Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea project aims to construct a virtual research environment to encode and manage the rich and complex manuscript tradition of the Ethiopian and Eritrean Highlands. The Ethiopic manuscript culture, consisting of varying and difficult-to-identify literary works, is a living tradition. This project explores the difficulties in encoding and managing the relationships not only between the manuscripts themselves, but also between the manuscripts and the broader Ethiopian literary traditions that include Greek and Arabic texts as well.

Wednesday 4:45 pm - 5:30 pm

Your Standard Average Document Grammar

Peter Flynn, University of Cork, Ireland

For all the surface differences, we are all working from the same fundamental view of document structure, a Standard Average Document Grammar (similar in spirit to the ‘Standard Average European’ grammatical model with which linguists describe European languages). Most prose-based XML applications adopt or adopt-and-modify one of a few public document grammars. Fundamentally, these document grammars are all expressions of the same logical view of prose structure. This Standard Average Document Grammar includes nested headed sections, restrictions on what may and may not occur, links to referenced portions of the document, and citations of outside material. The modifications and customizations users make to these document grammars are informative both in their variety and their similarity, and in the fact that they all fit so comfortably within the Standard Average Document Grammar.

Wednesday 8:00 pm - 10:00 pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room.

Thursday, August 3, 2017

Thursday 8:00 am - 9:00 am

Conference Registration & Breakfast

Pick up your conference badge and join us for a light breakfast.

Thursday 9:00 am - 9:45 am

XML applications on the web: Implementation strategies for the Model component in a Model-View-Controller architectural style

Zahra Al-Awadi, Anne Brüggemann-Klein, Michael Conrads, Andreas Eichner, & Marouane Sayih, Technical University of Munich (TUM)

How can we use XML, XQuery, and SCXML (State Chart XML) to implement the Model component in a Model-View-Controller web application? First we must be able to do function decomposition of XQuery functions that perform updates (a task rendered more complex by XQuery's restrictions on updating expressions). Then we would like a systematic method of using UML state diagrams in the design of the web application and of integrating an SCXML processor into the implementation of the Model component. A BaseX extension implementing the WebSockets protocol enables us to make the Model observable and thus to realize multi-player games that require server push. All these practices are compatible with domain-driven design and model-driven solutions; they pave the way for XML developers to create XML-based applications on the web.

Thursday 9:45 am - 10:30 am

SOCRview: a case study in web application development

John Cooper, SAGE Publications

SOCRview is part of the SAGE Online Content Repository (SOCR); it provides generalized content access to other SOCR services, an access API for technical users and, through a very thin XSLT 1.0 layer, a generalized web browser interface. A RESTful web application layer exposes content — including transformed, packaged, and listed or analyzed content — to users with varying levels of technical expertise. SOCRview exposes this content through persistent, readable, and meaningful URIs. From the first proof-of-concept through to the fully realized service, the system teaches a number of lessons.

Thursday 11:00 am - 12:30 pm

Panel: Optimization

So you think you wanna optimize your XML, do ya?

Well, do you?

Panelists discuss how to optimize for interchange and interoperability. And, while they're at it, reduce the file size, increase the readability, future-proof it by making sure it conforms to all applicable standards now and forever, and … What do you mean, I can’t have it all? XML is supposed to enable all of these things!

Conference participants will chime in with questions, opinions, and counter-examples. Someone is almost guaranteed to quote Donald Knuth or Michael Jackson. Premature optimization is the root of all evil, yes, but what exactly is premature? What is the expected gestation period for optimization? Are we optimizing for file size, processing speed, retrieval speed, loading speed, longevity of data, ease of comprehension without having to check the manual to discover that *pglg* means *programListing*? Are we optimizing our XML, our XSLT, our XQuery, our XProc, or something else? By the end of the discussion, optimization will no longer seem quite as simple or straightforward as it did before -- but you'll be able to do a much better job of it.

Thursday 1:15 pm - 2:00 pm (during lunch)

Balisage Bard

Lynne Price, Gamemaster

Exercise your literary creativity with poems, short stories, jokes, and songs. Subject matter must be related to Balisage (markup, venue, papers, and so forth). Read your effort during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit for each presentation. As many submissions as time permits will be taken; authors will be called in the order submissions are received.

Thursday 2:00 pm - 2:45 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Thursday 2:45 pm - 3:30 pm

Reserved for Late-breaking Information

This spot on the program has been reserved for late-breaking news. The topic and speaker will be announced in July.

Thursday 4:00 pm - 4:45 pm

Interactive web applications demonstrating SaxonJS

Wendell Piez, Piez Consulting Services

SaxonJS promises “real” XSLT in the browser. Old-timers are thrilled, cool kids are showing interest, and many people are very intrigued. The architecture is still characterized by a strong distinction between logical and presentation layers, but it is now possible to program user interaction in the browser as event-driven transformation logic, using XSLT alone. The unit of composition (the “work”) now corresponds to the unit of delivery (no longer a “page” but a “resource”). Most importantly, it is now possible to build and deploy interactive web sites with XML and XSLT alone -- no Java, no Javascript, no specialized server app or complex batch processing. But to deploy, you need a web server, a compiled XSLT stylesheet, and a certain amount of infrastructure. XML Jelly Sandwich, a starter XSLT hosted on GitHub, can provide infrastructure of sufficient quality for testing. Cool demos of TEI-tagged poetry and BITS-tagged prose meditations may help convince you to try SaxonJS.

Thursday 4:45 pm - 5:30 pm

Publishing Multiple Editions

Muray Maloney

Murray Maloney will report on an academic book project that produced multiple editions of the same book in print, ePub, and PDF, each year for four years. The book is a multi-disciplinary study of the art and science of organizing. Murray contributed and edited content, participated in the design, and coordinated production. He will talk about what they wanted to do, what they did in each successive edition, what didn’t get done, what worked, what didn’t, what they would do differently in retrospect, and what they would do the same, if they had it to do over.

Thursday 8:00 pm - 10:00 pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room.

Friday, August 4, 2017

Friday 8:00 am - 9:00 am

Breakfast

Join us for a light breakfast.

Friday 9:00 am - 9:45 am

Using DITA to create security configuration checklists

Joshua Lubell, National Institute of Standards and Technology

Security configuration checklists, represented using the Extensible Configuration Checklist Description Format (XCCDF), are frequently used to monitor computers and other information technology products for compliance with security policies. XCCDF syntax is not easy to author. Current practice is to maintain it with a fairly ad hoc approach to both authoring and content reuse, documented in XSLT scripts and Makefiles that contain directory dependencies. This small-scale case study investigates implementing shorthand XML vocabularies for XCCDF rules and profiles as specializations of DITA “concept”s and “map”s respectively. The representation of an XCDDF benchmark document as a specialized DITA map type makes explicit the high-level checklist structure currently implicit in the Makefiles and XSLT and could simplify the shorthand-to-XSLT transforms. In addition, DITA provides a more stable mechanism for reuse of content fragments. Preliminary results look very promising!

Friday 9:45 am - 10:30 am

Life, the universe, and CSS tests

Tony Graham, Antenna House

The W3C CSS Working Group maintains a CSS test suite already composed of more than 16,000 tests and growing constantly. Tracking the results of running such a large number of tests on a PDF formatter is more than anyone could or should want to do by hand. The system needs to track when a test's result changes so that the changes can be verified and the test's status updated. Finding differences is not the same as checking correctness. An in-house system for running the tests and tracking their results has been implemented as an eXist-db app. Is it a masterpiece of agile development, or an example of creeping featurism?

Friday 11:00 am - 11:45 am

Bridging the gap between XML and RDF validation

Kurt Cagle, Semantical LLC

Users of RDF, while having access to the vast expressive power of OWL, have not (unlike users of conventional XML applications) had a convenient way of building applications, validating documents, or constructing user interfaces. The Shape Constraint Language, SHACL, a SPARQL-friendly validation language that bears a lot of resemblance to XSD, may at last provide builders of RDF information bases some of the conveniences that XSD users have long enjoyed. SHACL could act as a unifying bridge between the world of RDF and those of XML and JSON and thus may enable processing pipelines that involve multiple worlds.

Friday 11:45 am - 12:30 pm

Text. You keep using that word ...

C. M. Sperberg-McQueen, Black Mesa Technologies

Every data representation constitutes a data interpretation. What are SGML, XML, and other tools for descriptive markup telling us about the nature of text?