Monday, July 27, 2020

Monday 10:00 10:30 EDT

Welcome to Balisage 2020

B. Tommie Usdin Mulberry Technologies

Balisage 2020 is both totally new and comfortably familiar. Balisage regulars will recognize many of this year’s presenters and welcome some new points of view on familiar topics. Logistically, technologically, we are on a new path. As markup designers, theorists, and practitioners, we are used to tiptoeing near the edge from time to time. I was saddened when I had to admit that Balisage-as-usual could not happen in 2020. I was delighted when it became clear that because we were now virtual many old friends will be able to re-join us this year, and hope that this new format will let us welcome some newcomers to the Balisage community.

Monday 11:00 11:30 EDT

High-Quality Microsoft Word documents from XML: The Wordinator

Eliot Kimber Contrext, LLC

Many products make XML from Microsoft Word, but consider the reverse: making Word versions of your XML documents, thus using MS Word as an document composition engine. The Wordinator enables automatic creation of high-quality Word documents from XML source. It uses an extension of the Word2DITA project’s SimpleWP (Simple Word Processing markup language) as the input to an Apache POI-based Java application that generates Word documents. XSLT generates the SimpleWP XML, managing the mapping of source XML elements to Word constructs and styles. I consider, in particular, the separation of concerns between the XSLT that generates the SimpleWP XML and the Java code that generates the Word documents.

Monday 12:00 12:30 EDT

XSLT 3.0 on ordinary prose

Norman Tovey-Walsh

You work with text and documents for a living, and XSLT 3.0 comes out. You hear it’s great and really want to try it, so you read about some features (streaming, maps, arrays, higher order functions) and when look at some applications, you first think “that’s for data not text”. But maybe 3.0 is for you too, really. Using DocBook as a prototypical text-application, I will demonstrate why XSLT 3.0 solutions are just better and easier than anything that’s been possible before. (Examples to include: CALS table processing, image sizing support, callouts, and structuring code for easy extensibility.)

Monday 14:00 14:30 EDT

Toward a function Library for statistical plotting with XSLT and SVG

David J. Birnbaum Department of Slavic Languages and Literatures, University of Pittsburgh

Visualizing quantitative information is not always about economics and finance. Research in computational textual humanities often uses descriptive statistics and graphic visualization to communicate quantitative information about textual objects. There is no standardized function library in XSLT or XQuery comparable to the ones available in JavaScript, Python, and many statistical packages. So here is the beginning of such a library. I describe my assumptions and design principles then illustrate a few functions. (Statistical terms will be explained, at least in brief!)

Monday 15:00 15:30 EDT

A new \u: Extending XPath regular expressions for Unicode

Joel Kalvesmaki

XPath regular expressions are richly equipped to handle Unicode, from specific code points (e.g., for a tab) to blocks (e.g., \p{IsArrows} for all arrows U+2190...21FF) to entire classes (e.g., \p{L}+ for one or more letters, from any alphabet). But why stop there? What if you want all variations on the letter b, or any character that has a tilde? For that, you need a new \u!

Monday 16:00 16:30 EDT

Ontologies as Filters: Utilizing ontologies as a mechanism for abstracting data collection during a crisis (LB)

Kurt Cagle Semantical LLC

The Covid-19 pandemic, in addition to affecting public health, has highlighted the challenges involved in attempting to collect, curate, and disseminate data sets throughout the US, the EU, and the UK. Wildly divergent datasets require an incredible amount of effort both to harmonize the contents and to comply with privacy legislation such as the GDPR. A potential federated solution might be built around knowledge graphs that could act as both data catalogs and harmonizing proxies. Such an application could be launched quickly and could work across multiple open source and commercial platforms. This talk is intended to discuss that process and to illustrate how it applies to the markup world as well when dealing with narrative information.

Tuesday, July 28, 2020

Tuesday 10:00 10:30 EDT

Four basic building principles (patterns) for XML schemas

Anne Brüggemann-Klein Technical University of Munich (TUM)

Practitioners have long identified four distinct patterns for construction of XSD schemas, known by the picturesque names “Salami Slice”, “Venetian Blind”, “Russian Doll”, and “Garden of Eden”, and based on two binary choices: are all the element declarations global? or (apart from the intended document root) local? Are all the type definitions global? Or (apart from the built-in types) local? Informal discussions often focus on the effect of pattern choice for schema re-use, encapsulation, coupling, and cohesion. But a more formal approach is needed to determine whether choice of pattern affects the languages we can define with the schemas we can write. Do all four patterns have the same expressive power? Or are some capable of defining things not expressible in the others?

Tuesday 11:00 11:30 EDT

Disabled by default

Bethan Tovey-Walsh

Accessibility is not a single, straightforward concept. For a particular user, the accessibility of any resource is determined by a web of factors, first by the nature and severity of the disability (cognitive, physical, or mental), then influenced by poverty, tech access, language, and many other factors. Designing content to take account of accessibility on this wide scale is a daunting task. Markup is well placed to address accessibility, because markup is optimized to encourage choice. It allows us to say what things are, and choose later (or, better, allow the user to choose) what that means for how content is displayed, printed, spoken, or otherwise manifested in the output. It also allows us to say how things relate to each other, so that we can easily offer choices of the same content in different formats. Concrete examples of what this means in some common markup outputs will highlight things we could be doing in our own practice to encourage more accessible content creation from markup.

Tuesday 12:00 12:30 EDT

Balisage website accessibility demonstration

Liam Quin Delightful Computing

Making your website more accessible does not need to be a huge, production-altering migration. Even if your site is old and crufty, and you cannot completely kill that 1980’s web design, there are minor changes and accessibility-friendly tweeks that can make a major accessibility difference. On a page-by-page, feature-by-feature basis, we will take a look at part of the previous Balisage Proceedings website, then look at the new, improved website for the same page/feature/function, and discuss how they differ, why the new site is improved, and how we got there (including some of the dead-ends and trade-offs made along the way).

Tuesday 14:00 14:30 EDT

Text encoding and processing as a university writing intensive course

Elisa E. Beshero-Bondar Penn State Erie, the Behrend College

Why should writing and coding be seen as opposing activities? Both depend foundationally on analytical processes, and learning to code is akin to learning a foreign language. Whether students of coding or literature are applying an existing system such as TEI or developing their own tools, they must think intensively about their corpus of data, and that analysis should be reflected back into their own writing, as well as their understanding of literature. Tagging poetry, for example, requires decisions about different kinds of information like names, dates, people, and places, as well as patterns such as images, motifs, and rhyme. For the students engaged in the process, not only is writing intensive in the moment of application, but it intensifies over time with learning new ways of doing things and building on the model of previous projects.

Tuesday 15:00 15:30 EDT

Saxon-JS meets XSpec Unit Testing: Building high quality into your web app

Amanda Galtman MathWorks

With Saxon-JS, you can create web applications that run XSLT code in a web browser. With unit testing, you can develop and maintain high quality software. And the XSpec tool provides unit testing for XSLT transforms. What could be a better match! Well, yes, there are some challenges: XSpec running with Saxon-EE cannot access the web browser, the DOM, or the JavaScript processor that influences your web application. Nor does XSpec natively understand the interactive XSLT features of Saxon-JS. We found two ways of bridging these gaps: we can mock up the parts of the Saxon-JS operation that XSpec cannot natively access, or we can run XSpec tests directly in the browser using Saxon-JS. Each approach has pros and cons; we discuss how we chose between them in a specific project involving user documentation for a software API.

Tuesday 16:00 16:30 EDT

An XML infrastructure for spell checking with custom dictionaries

C. M. Sperberg-McQueen Black Mesa Technologies LLC

Spell checking has been available in desktop tools for a long time, but dealing with checking in XML documents presents problems that are less likely to be found in conventional text editors, beginning with the presence of markup interleaved with the text to be checked. Many XML sources present additional problems because they are transcriptions of documents with multiple or archaic languages. Nonetheless, with an appropriate theory of language, it is possible to build a simple tool, perhaps with an XForm interface, for interactive checking of XML documents, even outside the scope of other XML processing.

Wednesday, July 29, 2020

Wednesday 10:00 10:30 EDT

Cooking up something new: An XML and XSLT experiment with recipe data

Peter Flynn

Everyone who cooks knows the recipe for a recipe: title, list of ingredients, method of preparation. While the pattern may be common knowledge, published recipes are often full of errors. Could XML help the cook or the cookbook writer? Armed with analysis (and a resulting schema), a creator of recipes could avoid numerous common errors and publish consistent recipes. What is needed, then, is tools that don’t require the creator to learn XML. Both the analysis and suggestions for tools are available for those who knead them.

Wednesday 11:00 11:30 EDT

Experiences from declarative markup to improve the accessibility of HTML

Vincenzo Rubano & Fabio Vitali Department of Computer Science and Engineering, University of Bologna

Producing accessible content for the Web is largely (if we go by the focus of most existing recommendations) a matter of adding reasonably simple markup with a clear declarative meaning to documents. How then can it be that producing really accessible content happens so rarely? One reason is that normally-abled designers have serious difficulty perceiving the difference between correct and incorrect assistive markup, and existing tools do little to reduce this handicap. We believe that better results can be achieved by using a framework of accessible web components capable of enforcing best practices, automated tools for checking accessibility, and a new approach to manual tools to let developers and content creators examine visually the accessibility markup so that they can make sense of their impact on people with disabilities.

Wednesday 12:00 12:30 EDT

Creating class attributes with XSLT? (LB)

Gerrit Imsieke le-tex

When going from one XML application, such as JATS, to another, perhaps HTML, the stylesheet author may not be able to follow any simple pattern of translation. Attributes that may be useful in the target may not be present in the source, or what is an attribute in the target may be an element in the source. That which is represented as a class attribute in the source might become a style attribute or even a wrapper element in the target. What conventions should the writer employ to ensure consistency, especially when the stylesheet may be combined with others, particularly in a layered application? This presentation looks at the implications of stylesheets not only for JATS but also for TEI and DocBook.

Wednesday 14:00 14:30 EDT

Accessibility metadata statements

Madeleine Rothberg The National Center for Accessible Media at WGBH

Accessibility metadata statements let publishers describe the accessibility features of their publications and make conformance claims. Metadata properties listed in Schema.org enable accessibility statements on a Web page that describes a publication. They also allow statements to be embedded in a packaged publication (such as an EPUB). Accessibility statements can describe, for example: what kinds of media are used; which accessibility features are included (image descriptions, math markup, video captions, etc.) and more. Hazards (flashing images that could cause seizures) can be noted or the absence of hazards confirmed. EPUB has some additional metadata terms important to publications, including conformance statements with credentials (as defined in “EPUB Accessibility 1.0” [https://www.w3.org/Submission/epub-a11y/]. I will examine both basic and complex accessibility metadata statements and offer resources for learning how to implement them.

Wednesday 15:00 15:30 EDT

A Document-based view of the Risk Management Framework

Joshua Lubell NIST

Cybersecurity professionals know the Risk Management Framework as a rigorous yet flexible process for managing security risk. But the RMF lacks a document focus, even though much of the process requires authoring, reviewing, revising, and accessing plans and reports. It is possible to build such a focus by looking more closely at these documents, starting with the System Security Plan and the roles of key participants responsible for it. Such a document- and role-centric view of the RMF process can lead the way toward more efficient and less error-prone security assurance.

Wednesday 16:00 16:30 EDT

Converting typesetting codes to structured XML

Patrick Andries XCential Corporation

Lauren Wood XCential Corporation and Textuality Services

Before XML, the United States Government Printing Office (GPO) created complex typography using non-hierarchical, line-based typesetting systems characterized by “locator” files which contain lines of typesetting instructions. Our mission is to convert years of locator files that describe US government bills, laws, and statues (etc.) into structural XML, valid to the United States Legislative Markup (USLM) XML Schema. This was and is complicated, as locator files, in addition to being completely presentation-focused, use stylistic differences to communicate semantic significance. Our iterative analysis grew the mapping specification in stages. The conversion is in two parts. First Java converts the locator files into hierarchical XML (the JAVA lexical, syntactical, decomposition, and generational phases are the focus of this paper). Then XSLT improves the resulting XML. Quality control and testing required additional programming and the creation and maintenance of a large set of reference samples.

Thursday, July 30, 2020

Thursday 10:00 10:30 EDT

Marking up microrevisions with major implications: Textual genetic editing in TAG

Elli Bleeker, Bram Buitendijk, & Ronald Haentjens Dekker Research and Development Group, Royal Dutch Academy of Arts and Sciences

Micro-level textual variation is an important fact of life for textual critics; we discuss how it can be expressed idiomatically in markup and how the markup can be used by digital collations tools to achieve a more refined analysis of the textual variation than is possible when the collation tool ignores the markup. Using Virginia Woolf’s manuscript drafts of To the Lighthouse (1927) as a case study, we show that the deletions, insertions, and rewritings which express the author’s struggle with her materials constitute non-linear, discontinuous, and often multi-hierarchical information structures which are easily represented in the “Text As Graph” (TAG) hypergraph data structure, readily expressed using the markup of TAGML, and usefully exploited by the TAG-aware collation tool Hypercollate. Micro-level revisions do not need to be a special case handled with ad-hoc extensions to our markup infrastructure: we can have an infrastructure that handles them naturally and allows us to use them comfortably to improve our understanding of authors, and their works, and their textual revisions.

Thursday 11:00 11:30 EDT

How Suite it is: Declarative XForms Submission Testing (LB)

Steven Pemberton CWI

The original test suite for XForms 1.1 required considerable manual intervention to run. As a part of the XForms 2.0 effort, a new test suite is being designed and built that tests features by introspection, without user intervention, so that the XForm itself can report if it has passed or not. This talk will give an overview of how the test suite works and in particular discuss the issues involved with submission to the server, how to deal with aspects of the HTTP protocol that were designed before XML and XForms were created, and how you go about introspecting something that has left the client before you can cast your eyes on it.

Thursday 12:00 12:30 EDT

Document similarity: Transcription, edit distances, vocabulary overlap, and the metaphysics of documents

Claus Huitfeldt University of Bergen

C. M. Sperberg-McQueen Black Mesa Technologies LLC

There are many contexts in which the similarity of documents is a critical point of concern. In textual criticism, for example, understanding the faithfulness of a transcription to the exact text of a particular document is key. Or, in a more mundane example, a user may wish to understand the similarities of doc.docx, doc_final.docx, doc_final_v2.docx, and doc.html! There are many measures of the “distance” between sets of documents, but a woeful absence of such measures that consider the richer structure of marked-up documents. Could we do better?

Thursday 14:00 14:30 EDT

What is a diagram, really?

Steven DeRose Docugami

One rationale sometimes offered for the use of XML to represent text documents is that text documents consist by nature of ordered hierarchies of content objects (OHCO). But XML is used with success for much more than text documents: math, music, vector graphics, and more. When XML is used for diagrams, does it fit their inner nature? What is the inner nature of a diagram? Could it be that it consists of objects whose type is determined by their content? Could it be that those objects can have a hierarchical organization? Could it be that even in a two-dimensional space they are ordered? Could diagrams be…ordered hierarchies of content objects?

Thursday 15:00 15:30 EDT

Syntax-From-Doc: A case study of powering IDE code completion From XML documentation

C. Edward Porter SAS

Syntax highlighting is a profoundly useful feature for developers. To make it work, you must produce a concise and accurate description of the syntax of your language. For a rich language with a long, organic history, this can be a significant challenge. SAS engaged in a multi-year project using multiple markup languages to produce a scalable, automated solution: building the syntax descriptions directly from the language documentation.

Thursday 16:00 16:30 EDT

Workflows for white hats: Challenges and potentials of declarative markup for systems security assurance

Wendell Piez

Declarative markup (not just XML but also its predecessors and near neighbors) has succeeded in many kinds of information management tasks. Most especially, it has proven to be broadly applicable in technical publishing. The needs of the Open Security Controls Assessment Language (OSCAL) do not appear, on the surface at least, to be similar to publishing. And yet, below the surface we find striking similarity. Declarative markup, perhaps, can get the job done by doing the job.

Friday, July 31, 2020

Friday 11:00 11:30 EDT

XML for art: A case study

Mary Holstege

Art, it has been argued, lives where there is complexity: between the boundaries of pure order and pure chaos. Generative art applies judicious amounts of randomness to the pure order of algorithmic logic to produce candidate works of art, which the artist selects from. The XML stack forms a palette for these works. Algorithms can be expressed in XSLT and XQuery, randomness can be introduced through simple and higher-order functions, and SVG can be the canvas. Along the way, common problems of annotation, knowledge management, and searching arise. Surprisingly, perhaps, so does issue of the separation of form and content.

Friday 12:00 12:30 EDT

Pipelined XSLT transformations

Ari Nordström

Sure, you can do almost anything in a monster, monolithic XSLT program. But you can make life much easier for yourself, and produce code that’s easier to maintain as well, if you break your process into steps and bind them together with XProc. Pipelining not only speeds and simplifies coding, it also aids debugging, testing, and even documentation. Even if you’re an experienced XSLT coder, it’s worth considering.

Friday 14:00 14:30 EDT

Asynchronous XSLT (LB)

Michael Kay Saxonica

Javascript, with its dependence on asynchronous interactions, would seem to be intrinsically in conflict with XSLT and its expectations, like reading in whole documents. Saxon-JS 1.0 attempted to work around this conflict with a specialized scheduling instruction added to XSLT. Saxon-JS 2.0 attempts to deal with asynchrony on both the server and the client and has gone further than 1.0. But the work on 2.0 development has given the developers a good opportunity to come up with ideas for a more ambitious solution in the future.

Friday 15:00 15:30 EDT

Fault tolerance, error tolerance, diversity tolerance

C. M. Sperberg-McQueen Black Mesa Technologies LLC

How to react when things are not as we expect them to be.