Balisage logo

Balisage 2016 Program

Tuesday, August 2, 2016

Tuesday 8:00 am - 9:15 am

Conference Registration & Continental Breakfast

Pick up your conference badge outside the White Flint Amphitheater and join us for a light breakfast.

Tuesday 9:15 am - 9:45 am

Welcome and Introductions

Tuesday 9:45 am - 10:30 am

Representing overlapping as change in XML

Robin La Fontaine, DeltaXML [Paper] [EPUB]

Changes in an XML document may effect not only element and attribute content but, more problematically, the markup hierarchy. Markup for tracking structural changes must represent multiple, often overlapping, structures in the same document. Thus the perennial problem of overlap becomes a subset of the problem of managing change to structured documents, such as versions of documents amended over time. Our work started with a traditional delta format for two documents, which easily represents inline changes, but handles hierarchy change by duplicating content. In order to avoid duplication, we introduce a distinction between the name of the element (its tag) and the element content, so that assertions can be made separately. We then introduce @dx (change) and @dxTag (change tag) attributes to mark changes. This representation allows us to define overlapping hierarchies in a completely XML way without declaring a dominant hierarchy and while keeping element fragmentation to a minimum. While this solution probably will not scale for large numbers of variants, it shows promise for many classes of documents.

Tuesday 11:00 am - 11:45 am

The Mystical Principles of XSLT: Enlightenment through Software Visualization

Evan Lenz, Lenz Consulting Group [Paper] [EPUB]

The mature XSLT developer has an inner seeing about how a stylesheet works that can seem almost mystical to an outsider. But demystification is possible using an XSLT visualizer, making the structure of a transformation visible. Due to its functional nature, XSLT is particularly well-suited to software visualization, because an XSLT transformation can be represented and viewed as a static dataset. A subset of XSLT visualization (using a “trace-enabled” stylesheet to generate representations of transformation relationships) was used to empower non-programming staff to predict, understand, and manipulate content enrichment rules. We would like to generalize these case-specific techniques into a general tool for XSLT. There are challenges including scalability (memory usage), what to visualize and what not to, avoiding noise for the user, and whether to store annotations externally or within the result document.

Tuesday 11:45 am - 12:30 pm

Graceful tag set extension

B. Tommie Usdin, Mulberry Technologies, Inc.
Deborah A. Lapeyre, Mulberry Technologies, Inc.
Laura Randall, NCBI/NLM/NIH
Jeffrey Beck, NCBI/NLM/NIH [Paper] [EPUB]

It’s well understood that inventing a new tag set or XML vocabulary is time-consuming, complicated, and expensive. Not only must the semantics of the tag set be established, but an ecosystem of tools to support that vocabulary must be designed and built. Consequently, organizations often choose to adapt an existing vocabulary to suit their particular needs rather than starting from scratch. This decision is expected to have immediate benefits: a shorter development cycle, simpler customization, and reduced costs. Sometimes this is the case. Some changes lead to compatible documents that interoperate gracefully with existing tools. But this is not always the case. The authors explore when and how vocabulary changes will be compatible and when they will be disruptive.

Tuesday 2:00 pm - 2:45 pm

(LB) Moving toward common vocabularies and interoperable data

Todd Carpenter, NISO

The maturity and stability achieved by a standard like XML, far from being an impediment to creativity, should give users the confidence in the opportunities for innovation. The extensibility that has served as an inspiration to those who wish to take ownership of their information since the days of SGML gives us the opportunity to identify more robust metadata and more unique information elements in our documents. Publishers who think towards the future have the opportunity and the duty to exploit the strengths of XML so that the added value of their markup allows their products to rise above the dull gray sea of featureless HTML5.

Tuesday 2:45 pm - 3:30 pm

(LB) Saxon-JS - XSLT 3.0 in the Browser

Debbie Lockett and Michael Kay, Saxonica [Paper] [EPUB]

Saxon-JS is an XSLT 3.0 run-time written in pure Javascript. We've effectively split the Saxon product into its compile time and run time components. The compiler runs on the server, and generates an intermediate representation of the compiled and optimized stylesheet in a custom XML format. It's the same compiler whether you want to execute in the browser or on the server. Saxon-JS, running on the browser, reads in the compiled stylesheet and executes it. It has all the same event-handling machinery as Saxon-CE, so it can be used to write fully interactive applications using XSLT's declarative programming model. Because it only handles the run-time, it's much smaller than Saxon-CE, and so we've been able to add a lot of the useful XSLT 3.0 features like support for maps, arrays, try/catch, and JSON.

Tuesday 4:00 pm - 4:45 pm

Discerning the intellectual focus of annotations

Jacob Jett, Timothy W. Cole, David Dubin, & Allen H. Renear, University of Illinois at Urbana-Champaign [Paper] [EPUB]

Much attention has been given to strategies for anchoring annotations in digital documents, but very little to identifying what the annotation is actually about. We may think of annotations as being about their anchors, but that is not typically the case. Two annotations may have the same anchor, such as a string of characters, but one annotation is about the sentence represented by that string and the other about the claim being made by that sentence. Identifying targets and making this information available for computational processing would provide improved support for a variety of information management tasks. We discuss this problem and explore a possible extension to the W3C Web Annotation Data Model that would help with annotation target identification.

Tuesday 4:45 pm - 5:30 pm

The hard edges of soft hyphens in XML

Syd Bauman, Northeastern University [Paper] [EPUB]

End-of-line soft hyphens turn out to be hard. In print, end-of-line hyphens may have been introduced by the typesetter to avoid overlong lines (soft hyphens), or they may be a normal part of the word’s spelling and appear in every occurrence of the word in question, even in the middle of a line (hard hyphens). Textual editors and historians of language study end-of-line hyphenation carefully, in order to establish authorial usage and to document shifts in spelling over time. At the Women Writers Project, soft hyphens are encoded (somewhat problematically) with SOFT HYPHEN (U+00AD). Handling words that split over a line break properly so that they can be reconstituted for word lists, search results, and other products turns out to be very difficult. Handling soft hyphens correctly requires a sensitivity to neighboring nodes which is not impossible in XSLT but which doesn’t follow the usual pattern of XSLT processing.

Tuesday 6:30 pm -

Related Event: 30 Years of Standard Generalized Markup Language

SGML became ISO 8879 in 1986. Come celebrate this important Anniversary. Description at SGML Birthday Party. Reservations through Birthday Party Page. Balisage attendees are welcome and requested to sign in advance. Attendance is free.

Wednesday, August 3, 2016

Wednesday 8:00 am - 9:00am

Conference Registration & Continental Breakfast

Pick up your conference badge outside the White Flint Amphitheater and join us for a light breakfast.

Wednesday 9:00 am - 9:45 am

FOXpath: an expression language for selecting files and folders

Hans-Jürgen Rennau, Traveltainment GmbH [Paper] [EPUB]

FOXpath (short for folder XPath) is a new expression language which enables XPath-like addressing of files and folders in a file system. The first version of the language can be thought of as a modified copy of XPath 3.0, with node navigation removed and file system navigation added; the language’s data model is XDM 3.0, without modifications. A second more powerful version of the language was created by merging the first version back into XPath 3.0. The result is FOXpath 3.0, a superset of XPath 3.0. FOXpath 3.0 supports node navigation, file system navigation, and a free combination of both functionalities within a single path expression. A reference implementation exists, and there are opportunities for extending the new functionality beyond file systems.

Wednesday 9:45 am - 10:30 am

focheck XSL-FO validation framework

Tony Graham, Antenna House [Paper] [EPUB]

XSL-FO documents are typically generated by an XSLT transform and thus rarely edited by hand. However, validating XSL-FO markup can be a useful check of the correctness of the transformation. The focheck Validation Framework is an open source project that bundles a RelaxNG Schema for validating the structure of FO files with Schematron rules for validating property expressions. Validation uses an expression parser generated as an XSLT program by the REx parser generator; Schematron rules are mostly autogenerated, using an XSLT stylesheet to extract property value definitions from the XML version of the XSL 1.1 Recommendation. Since its introduction in 2015, focheck has added extension formatting objects, properties specific to Antenna House Formatter V6.3, and some Schematron Quick Fixes (SQF). Both GitHub and Oxygen framework versions of focheck are available. See what it can do!

Wednesday 11:00 am - 11:45 am

Hidden markup – The digital work environment of the “Digital Dictionary of Surnames in Germany”

Franziska Horn, Technische Universität Darmstadt
Sandra Denzer, Technische Universität Darmstadt
Jörg Hambuch, Academy of Sciences and Literature Mainz [Paper] [EPUB]

What’s in a name? A lot more than you might think! Besides the linguistic, historical and sociological significance of personal names, just getting them into a dictionary and database presents many technological challenges. When the editors of such a dictionary are only incidentally computer users, creating a rich TEI-based infrastructure requires building a good user interface while disguising XSLT, databases, and other favorite tools of the XML literate. The editing interface depends on the customization of a common XML editor, while the remainder of the system is created using other readily available tools.

Wednesday 11:45 am - 12:30 pm

Trials of the Late Roman Republic: Providing XML infrastructure on a shoe-string for a distributed academic project

C. M. Sperberg-McQueen, Black Mesa Technologies [Paper] [EPUB]

We associate Ancient Rome with the Rule of Law, but actual legal records from the Republic are scarce. What we do know about Roman trials, compiled from oratory, history, and other indirect sources, was summarized in book form almost a generation ago. Although this book was produced on computers, its markup was typographic and sparse. Bringing the book into the modern XML world so that it can be edited and revised by historians with minimal computing experience, and queried for future uses requires creative strategies in devising markup and processing the results with XSLT and other tools. The project needs both an editing interface and a query interface, and both need to disguise the markup. The interface currently proposed depends on XForms, with such things as more XSLT and an XQuery engine running in the background. The project is very much a work in progress, but its development has potential interest for practitioners in other fields of XML application.

Wednesday 2:00 pm - 2:45 pm

Framing the problem: Building customized editing environments and workflows

Wendell Piez, Piez Consulting Services [Paper] [EPUB]

The maturity of the XML technology stack has made it easier now than it has ever been to construct whole functional operating environments for an application, fit to purpose, with tools made to take advantage of the XML. How do we exploit this moment? In designing and building such workflows and environments, what considerations regarding user needs, end products, interim work states, validation needs, and constraints on the system need to be taken into account? How can we best take advantage of existing libraries, tools, specifications, and platforms? And how can we achieve and communicate a clear understanding of the framework we are constructing, as distinct from the tools, tool libraries, platforms and specifications which support it and help realize it?

Wednesday 2:45 pm - 3:30 pm



What do you wish was better? If you could improve just one thing about the markup ecosystem, what would it be? Is someone on the internet wrong? We invite ten minute presentations on a topic of your choice. Slides, if you want them, must be provided before the lunch break. The limit of ten minutes will be strictly enforced - and five minute presentations are welcomed. Provide a title and a one sentence summary by close-of-business Tuesday. In the event of overlap, er, overflow, a subset of speakers will be selected.

Wednesday 4:00 pm - 4:45 pm

Print quality eXchange: Applying an agile development methodology to XML schema construction

Dianne Kennedy, IDEAlliance [Paper] [EPUB]

To assure brand integrity, corporations such as Coca Cola and Proctor & Gamble receive, score, and track the quality of their print suppliers over time. Historically, this information has used proprietary, incompatible formats. In 2015, Idealliance launched an effort to develop an XML-based print quality exchange message format (Print Quality eXchange) for use by brand scoring and tracking systems. We developed the XSD schema for PQX using agile software development techniques with two Webex working sessions a week for 44 iterations. The iterative agile process was faster, made use of an online collaborative website, and enabled print quality reporting products to commercialize PQX immediately following publication.

Wednesday 4:45 pm - 5:30 pm

Tracking documents (and toys)

Ari Nordström [Paper] [EPUB]

What is it? There is hardly a more fundamental question. But a fundamental answer can be remarkably elusive. This is especially the case in systems where documents, for example, may change over time and be published in several languages. Does a child’s “box of toys” remain the same box of toys even as the years pass and the particular toys in the box change and become more sophisticated? Are there a few key characteristics that suffice to answer the fundamental question: what is it? Does a base identifier, a version, and some notion of a rendition (a particular language encoding, for example) suffice? If it does not, then how can we uncover a better answer?

Wednesday 8:00pm - 10:00pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room.

Thursday, August 4, 2016

Thursday 8:00 am - 9:00 am

Conference Registration & Continental Breakfast

Pick up your conference badge outside the White Flint Amphitheater and join us for a light breakfast.

Thursday 9:00 am - 9:45 am

Integrating top-down and bottom-up cybersecurity guidance using XML

Joshua Lubell, National Institute of Standards and Technology [Paper] [EPUB]

The Cybersecurity Framework provides top-down guidance for managing and reducing cybersecurity risk; the NIST SP 800-53 security control catalog provides a bottom-up list of specific security measures. Relating an organization’s cybersecurity requirements to both of these disparate sources can be a challenge, but an XML-based approach can help. The approach described here has been implemented using XForms and XSLT; the implementation makes it easier to use two complementary, but differently structured, guidance specifications together. An example scenario demonstrates how the software implementation can help a security professional select the appropriate controls for restricting unauthorized access to an Industrial Control System. The implementation and example show the benefits of the XML-based approach and suggest its potential application to disciplines other than cybersecurity.

Thursday 9:45 am - 10:30 am

From GitHub to GitHub with XProc: An approach to automate documentation for an open source project with XProc and the GitHub Web API

Martin Kraetke, le-tex publishing services [Paper] [EPUB]

For all too many developers, documentation comes last. The move from Subversion to Git provided the impetus to rethink storage of both code and documentation. It also provided an opportunity to use XSLT to harvest input for documentation from the code itself: XProc pipelines, XSLT transforms, XML Catalogs, and other sources. The process also allows the integration of human-generated documentation, such as tutorials written in DocBook. Final documentation is generated by more XSLT, which generates HTML pages which are committed with the source code to the GitHub repository.

Thursday 11:00 am - 11:45 am

Extracting funder and grant metadata from journal articles; Using language analysis to automatically extract metadata

Mark Gross, Tammy Bilitzky, & Richard Thorne, Data Conversion Laboratory [Paper] [EPUB]

Scientific, Technical, and Medical (STM) journal articles are an explosively growing data source with increasing requirements for complex metadata. It has become critical that publishers know the sources of funding for journal articles so that they can meet the associated open source distribution obligations and so that funding organizations can track publications associated with their funding. This cannot be managed by hand. Over the past few years, we have been improving techniques to extract specific kinds of metadata automatically from scientific and legal documents. We report on recent work done to extract funding source and grant information from within STM journal articles and on implications for other kinds of metadata gathering. These textual analysis techniques and this methodology are applicable to other metadata extraction in contexts such as legal documents.

Thursday 11:45 am - 12:30 pm

XML in the air: changing workflows with TTML (Timed Text Markup Language)

Andreas Tai, Institut für Rundfunktechnik, Munich [Paper] [EPUB]

Although broadcast TV subtitles are well established, digital production workflows are changing. Increasingly, the internet is a primary channel for distribution. Audio and video standards have already adapted, but changes to broadcast subtitle workflows are only just beginning. Timed Text Markup Language (TTML) is the leading contender to replace legacy subtitle file and transmission formats for digital and hybrid broadcasting. TTML is a format for authoring, exchange, and presentation of subtitles. Used at different stages in the workflow, TTML addresses some, but not all, of the current problems in media distribution. We examine how TTML succeeds and where it falls short. We view each shortcoming as an opportunity for further advancement. Whether it’s a question of adapting TTML to non-XML environments or encouraging broader use of XML technologies in new areas, there is much to learn from these efforts.

Thursday 1:15 pm - 2:00 pm (during lunch)

Balisage Bard: The markup verse and lyrics game

Lynne Price, Gamemaster

Exercise your literary creativity with poems, short stories, jokes, and songs. Subject matter must be related to Balisage (markup, venue, papers, and so forth). Read your effort during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit. Submissions pertaining to SGML are particular welcome in honor of this year's 30th anniversary of publication of the SGML standard. As many submissions as time permits will be taken; authors will be called in the order submissions are received. To enter, send email to [email protected] or sign up at the conference desk. A panel of judges will select the winner. Results will be announced at the closing session on Friday.

Thursday 2:00 pm - 2:45 pm

SGML in the age of XML

Betty Harvey, Electronic Commerce Connection, Inc. [Paper] [EPUB]

Today (2016!), there are organizations, especially in the military, who have SGML documents and/or requirements to meet SGML-based specifications. Given the unfashionability of SGML and the shrinking availability of SGML tools and SGML expertise, these organizations face significant challenges. How can they best approach the task of working with existing SGML document collections? What about a requirement to create SGML that will integrate cleanly into existing SGML document collections to be processed with existing SGML tools? What questions should someone facing an SGML requirement ask? What resources are they going to need? How much can they do with XML infrastructure to meet SGML requirements and where must they “cut over” to SGML? How should they make SGML if they really need to? How can they leverage XML tools while maintaining SGML source requirements?

Thursday 2:45 pm - 3:30 pm

Marking up and marking down

Norman Walsh [Paper] [EPUB]

Markup provides a means of annotating a text such that its important characteristics are readily apparent. Simplicity of annotation and richness of meaning are often at odds. Through one lens, we can see the evolution of markup as developing along this fault line. TANSTAAFL. SGML provided mechanisms that reduced the complexity of annotation at considerable cost in implementation. XML reduced implementation cost at the expense of simplicity in annotation. HTML attempted to simplify annotation complexity and implementation cost by choosing a single tag set and inventing entirely new extension mechanisms. Online communities like GitHub and Stack Overflow have abandoned angle brackets in favor of Markdown, Common Mark, AsciiDoc, and other markup reminiscent of wiki syntax or SGML SHORTREF. Why am I in this basket and where are we going?

Thursday 4:00 pm - 4:45 pm

(LB) A MicroXPath for MicroXML (AKA A New, Simpler Way of Looking at XML Data Content)

Uche Ogbuji, Zepheira [Paper] [EPUB]

There has always been tension in the development, and in the community reception of the XML stack of technologies, across the many poles of technology and philological interests which have been attracted by XML's success. For some, XML was too spare for rich data applications, and needed additional support from schema systems and sophisticated query systems, culminating in XSLT 3.0 and XQuery. Others craved greater and greater simplicity, allergic to any constructs complicating things too far beyond the basis of elements, attributes and text. This latter camp came together in 2012 to create a MicroXML specification. MicroXML is a radical simplification, stripping away namespaces, syntactic quirks such as CDATA Sections, the various trappings of DTDs, and much more. Nevertheless, there is need for systems to process it, and these systems start with a basic data model of the MicroXML, and a basic language for processing documents using that data model. In other words, MicroXML needs an XPath.

Thursday 4:45 pm - 5:30 pm

(LB) A catalog of Functional programming idioms in XQuery 3.1

James Fuller, MarkLogic

A wild rummage through the treasure chests of the highest towers and deepest vaults of Functional Programming revealing functional idioms in XQuery 3.1 form - memoisation, partitioning, currying, numerical methods, algebraic data types, grammar based parsing, constraint programming and more. Each idiom is presented as an answer to a common programming problem provided as a handy reference catalog. The exploration will end with a few examples of how easily functional idioms compose up to solve more complex problems.

Thursday 8:00pm - 10:00pm

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room.

Friday, August 5, 2016

Friday 8:00 am - 9:00 am

Continental Breakfast

Join us for a light breakfast.

Friday 9:00 am - 9:45 am

The ugly duckling no more: Formatting DITA without DITA-OT

Autumn Cuellar & Jason Aiken, Quark Software [Paper] [EPUB]

DITA is a popular document standard across a range of industries. As it grows in popularity outside the scope of technical publications, businesses become more insistent that output from DITA should conform to branding imposed across the organization. In this context, formatting documents using DITA-OT may be insufficient to meet document production requirements. We discuss some of the ways in which PDF can be generated and explore using page layout software to design complex, visually rich templates for DITA and other XML document formats.

Friday 9:45 am - 10:30 am

Approximate CSS Styling in XSLT

John Lumley, jωL Research & Saxonica [Paper] [EPUB]

Many readers will be familiar with the task of using XSLT to generate content (HTML, for example, or SVG) that will subsequently be styled with CSS. Often, this separation of structure and presentation is useful and desirable. But that is not always the case: consider a publication that for business process reasons needs a more fixed presentation, or the application of formatting in environments, such as XSL Formatting Objects, where the individual properties are applicable but the external mechanisms of CSS are not. In those environments it may be necessary to approximate the CSS styling results through other, more direct means. Transforming a CSS stylesheet into an XSLT transform that projects an approximation of the styling from the CSS onto a target XML document affords one such approach to this problem.

Friday 11:00 am - 11:45 am

Web-Based job aids using RESTXQ and JavaScript for authors of XML documents

Amanda Galtman, MathWorks [Paper] [EPUB]

Publishers have known for years that structured markup languages can facilitate processing of documents into final deliverables. However, making the markup palatable to authors is a big challenge. In two browser-based applications that take advantage of the document XML, authors have come to see XML as an ally and not just a necessary evil. Compared to desktop tools, the web applications are more convenient for users and less affected by hard-to-predict inconsistencies among users’ computers. One application analyzes file dependencies and produces custom reports that facilitate reorganizing files. The other helps authors visualize their network of topics in their documentation sets. Both applications rely on XQuery and the RESTXQ web API. The visualization application also uses JavaScript, including jQuery and the D3 libraries.

Friday 11:45 am - 12:30 pm

Creatures of a season

C. M. Sperberg-McQueen, Black Mesa Technologies

Thoughts about permanence, longevity, and transience.