Balisage logo

Balisage 2014 Program

Monday, August 4, 2014
see: Symposium on HTML5 & XML: Mending Fences

Tuesday, August 5, 2014

Tuesday 7:00am - 9:00am

Conference Registration & Continental Breakfast

Pick up your conference badge outside the White Flint Ampitheater and join us for breakfast.

Tuesday 9:00am - 9:15am

Welcome and Introductions

Tuesday 9:15am - 9:45am

First Person: The art of the elevator pitch

B. Tommie Usdin, Mulberry Technologies

Many of us at Balisage feel that the universe (or our organization, sponsor, client, or mother-in-law) doesn’t have sufficient appreciation of or respect for technologies we know could significantly improve the world. XSLT, techniques for processing overlap, DITA, XQuery, HTML5, even XML, are not given the attention they deserve. This is our fault, at least in part. We as a community need to learn to say less and communicate more, and more persuasively.

Tuesday 9:45am - 10:30am

Analyzing XSLT streamability

John Lumley, jωL Research, Saxonica

Increasing use of XSLT in ‘big data’ problems has highlighted two known shortcomings: processing arbitrarily large documents (i.e. larger than available memory, or potentially of indefinite size), and producing output before an input document has been loaded completely (reducing latency). Both of these issues are addressed in XSLT 3.0 through the introduction of “streaming” facilities. Streaming imposes a number of significant constraints on stylesheet design and introduces a new large suite of rules and technical vocabulary (‘sweep’ and ‘posture’) for describing and analysing streaming properties. We describe a new tool designed to help stylesheet authors understand streamability, the new vocabulary, and the nature of the streamability constraints.

Tuesday 10:30am - 11:00am

Coffee Break

Tuesday 11:00am - 11:45am

(LB) When 57,300,000 Full Text Search Results Are Just Too Many

Pat Case, Congressional Research Service, Library of Congress

The Web changed the paradigm for full-text search. Searching Google for search engines returns 57,300,000 results at this writing, an impressive result set. Web search engines favor simple searches, speed, and relevance ranking. The end user most often finds a wanted result or two within the first page of search results. This new paradigm is less useful in searching collections of homogeneous data and documents than it is for searching the web. When searching collections end users may need to review everything in the collection on a topic, or may want a clean result set of only those 6 high-quality results, or may need to confirm that there are no wanted results because finding no results within a collection sometimes answers a question about a topic or collection. To accomplish these tasks, end users may need more end user functionality to return small, manageable result sets. The W3C XQuery and XPath Full Text Recommendation (XQFT) offers extensive end user functionality, restoring the end user control that librarians and expert searches enjoyed before the Web. XQFT offers more end user functionality and control than any other full-text search standard ever: more match options, more logical operators, more proximity operators, more ways to return a manageable result set. XQFT searches are also completely composable with XQuery string, number, date, and node queries, bringing the power of full-text search and database querying together for the first time. XQFT searches run directly against XML, enabling searches on any elements or attributes. XQFT implementations are standard-driven, based on shared semantics and syntax. A search in any implementation is portable and may be used in other implementations.

Tuesday 11:45am - 12:30pm

RELAX NG and DITA: an almost perfect match

Eliot Kimber, Contrext, LLC
George Bina, SyncroSoft

Practical authoring, management, and production of DITA documents requires the use of an XML document grammar, but the DITA architecture is independent of any particular grammar facility. DTDs, XSDs, and RELAX NG have all been used with DITA. The structure of DITA imposes modularity and parameterization constraints that have proved challenging to code and/or maintain in both DTDs and XSDs. RELAX NG provides advantages over the other grammar formalisms, significantly simplifying the integration of modules into working documents and the definition of vocabulary and constraint modules. There are a few improvements to RELAX NG that would make it an even better fit to DITA’s requirements.

Tuesday 12:30pm - 2:00pm

Lunch

Tuesday 2:00pm - 2:15pm

Announcements

Tuesday 2:15pm - 2:45pm

First Person: Semantics and the Internet of things

Kurt Cagle, Avalon Consulting, LLC

It usually starts with a coffeemaker. The number and variety of devices now connected to the internet is astonishing: computers and laptops, phones and tablets, of course, but also game consoles and televisions, heating and cooling systems, automobiles, sensors of every variety. Not far behind are watches, eye glasses, shoes, jackets, and fobs of every variety. Beyond simple connectivity, many of these devices carry significant processing power of their own: the ability to recognize faces, extract conversations from noisy rooms, or tell the difference between spoilt milk and stinky cheese in the refrigerator. Looking beyond the obvious concerns about privacy and security, if we want these devices to work for us as well as against us, they will have to be connected in ways that we can leverage. Using semantic technologies and SPARQL could save us from vendor or aggregator lock-in. Maybe.

Tuesday 2:45 - 3:30

(LB) Identity constraints for XML

Anne Brüggemann-Klein, Mustapha Maalej, Marouane Sayih, Technische Universität München

Identity constraints, a fundamental database concept, are built in to XML Schema. In this paper, we attempt to explain clearly our reading of XML Schema's identify constraint concepts. We illustrate our reading with examples, in the style of a tutorial. We also illustrate usage styles and limitations of identity constraints in XML Schema. Finally, we demonstrate how a more general notion of identity constrains that is adapted to the hierarchical nature of XML documents can be expressed with XPath 2.0. Hence, the limitations that we have identified can be by-passed with assertions as introduced by XML Schema 1.1.

Tuesday 3:30pm - 4:00pm

Coffee Break

Tuesday 4:00pm - 4:45pm

Standard change tracking for XML

Robin La Fontaine, DeltaXML

XML is generally accepted as the default markup language for structured document and data management systems worldwide. But it has no native ability to track changes. Some document formats provide rudimentary support for change tracking, but no full solution is available. A generic change-tracking standard would allow documents to move from one XML editor to another complete with change history and the ability to roll back to previous versions; it would allow editing applications to track changes in any XML document type; software to handle changes in XML could be applied to many different XML document types. This paper outlines one proposed solution to this important problem, representing successive changes or edits to an XML document either in XML markup or in processing instructions. The tracked changes can be an independent addition to a file or can be integrated into the applicable schema.

Tuesday 4:45pm - 5:30pm

Multilevel versioning for XML documents

Ari Nordström, Condesign AB

Most current versioning systems produce a new ‘version’ with every save, which is not quite what is really needed for versions of complex XML documents in which every text or graphic module is version controlled separately. These systems provide no way to distinguish significant from insignificant versions or to extract (easily and conveniently) exactly the previous version wanted. A multilevel XML-based versioning abstraction layer operating ‘on top of’ the current eXist versioning system addresses the problem. This versioning model places new versions on different ‘levels’ (stages) based on explicit checkin/checkout operations that can move the resources up or down in the versioning structure. Document components are versioned using the ordinary versioning system and version abstractions for each resource are identified using a basic URN namespace, which is tracked in an XML URN/URL mapping document.

Wednesday, August 6, 2014

Wednesday 7:00am - 9:00am

Conference Registration & Continental Breakfast

Pick up your conference badge outside the White Flint Ampitheater and join us for breakfast.

Wednesday 9:00am - 09:45am

Extending XQuery with pattern matching

Benito van der Zander, University of Lübeck

Pattern matching in a broad sense is a common feature of modern functional programming languages, answering the question: does this complex structured object have a form that is the same as this other complex structured object, for some definition of “the same”. In XQuery, we sometimes describe path expressions, switch, and typeswitch statements as performing pattern matching, but these are merely impoverished flavors of matching when compared to the real thing. General pattern matching can be integrated into the syntax and semantics of the XQuery language. We demonstrate that this pattern matching can be used to match JSONiq as well as XML, and we summarize real-world experience using it for large-scale data mining in library systems.

Wednesday 9:45am - 10:30am

XQuery Topic Tools

Hans-Jürgen Rennau, Traveltainment GmbH

XQuery is well-suited for the agile development of command-line tools. But as their number grows - and their configurability evolves - at a fast pace, users may get confused. A possible answer is topic tools. A topic tool is many tools in one, providing them with a single point of access and a single, integrated command-line interface. The basic idea evolves into a generic model of topic tool interface and behaviour, and the model is supported by a simple framework for topic tool development. Goals are an optimal user experience on the one hand, maximizing simplicity, consistency and expressiveness; and a highly efficient and agile development process on the other hand, centred in the idea of perennial extensibility: first delivery after a few hours, incremental growth for years. Key features are code generation, declarative interface definition, automated input validation and the automated translation of user input into a request message hiding input complexity behind a rich interface. The framework is 100% XQuery-based and thus requires no additional installation, just access to the framework code.

Wednesday 10:30am - 11:00am

Coffee Break

Wednesday 11:00am - 12:30pm

MathML: Technology and practice

Scott Dineen, Optical Society of America
R. Alexander Miłowski, University of Edinburgh
Kennett Rawson, IEEE
Lauren Wood, Design Science

A panel of users and vendors will discuss topics including authoring, web browsing, ebooks, accessibility, conversion, and proofreading MathML in practice. MathML is key to encoding documents in a wide variety of contexts — because mathematics is key to communication in science, engineering, medicine, and a wide variety of other disciplines. It sounds easy. There is a tag set for math: encode your math with it and use that. Is it in fact that easy?

Wednesday 12:30pm - 2:00pm

Lunch

Wednesday 2:00pm - 2:15pm

Awards

Wednesday 2:15 - 2:45

(LB) In pursuit of streamable stylesheet functions in XSLT 3.0

Abel Braaksma, Exselt

It is only a matter of time until the XSLT 3.0 Working Draft becomes a Candidate Recommendation, locked for changes and suitable for implementors to adopt; it has been in Last Call since December 2013. One of the bugs reported to the Working Group was about the inability to create stylesheet functions that take streamable nodes as an argument. The group considered the omission and decided to ask me to write up a proposal. After several iterations, an addition to the specification was made that enables authors of library packages to write library functions that work both in streaming and non-streaming scenarios alike. While the impact of the specification is minimal, the impact for authors of packages and stylesheet authors in general is potentially big and opens up a whole world of new possibilities in streaming.

Wednesday 2:45pm - 3:30pm

NIEM: Implementation experience

Priscilla Walmsley, Datypic

The U.S. National Information Exchange Model (NIEM) is a 6000-element XML vocabulary used by U.S. government entities and their information-sharing partners at the federal, state and local levels. While adoption of NIEM is growing, there is a lot of variation in how it is being managed. Using NIEM, like using any large complex XML model, presents challenges including approaches to customization, interoperability among differing subsets, and versioning. A variety of approaches to implementing NIEM and integrating it into existing organizational infrastructure are discussed and compared.

Wednesday 3:30pm - 4:00pm

Coffee Break

Wednesday 4:00pm - 4:45pm

Markup formats in context: A comparison of the strengths of some widely-used markup systems

Liam R. E. Quin, W3C

Many text markup formats are popular today. Some of these, such as JSON and Markdown, have risen in popularity recently; others, such as SGML and troff, have waned. Whenever a format becomes popular it gains proponents who want to see it used everywhere, for everything, forever, right away. A possibly over-simplistic analysis of the rhetorical nature of some of these formats provides a basis for comparison. The results of this analysis suggest areas of use for the different formats and demonstrate that, far from being in competition with one another, the formats complement one another.

Wednesday 4:45pm - 5:30pm

How to survive the coming namespace winter

R. Alexander Miłowski, University of Edinburgh
Norman Walsh, MarkLogic

Is XML condemned to be an orphaned syntax with a dimly lit future within the Web browser? What can information providers with rich sources of XML do, other than down-translate to HTML? The evolving Web Components environment may provide a solution! With some simple translations, stylesheets and scripts, it will be possible to wrap custom XML in a minimum amount of HTML and serve it over the Web. The browsers will never know they’re being tricked into delivering XML.

Thursday, August 7, 2014

Thursday 7:00am - 9:00am

Continental Breakfast

Breakfast outside the White Flint Ampitheater.

Thursday 9:00am - 9:45am

Hierarchies within range space: from LMNL to OHCO

Wendell Piez, Piez Consulting Services

Documents, we understand, are ordered hierarchies of content objects, that is to say, ordered hierarchies of smaller pieces that may themselves be ordered hierarchies. Simultaneously, we recognize that they have multiple concurrent hierarchies: poems have lines and stanzas, on the one hand, and phrases and sentences on the other. In the general case, neither hierarchy subsumes the other, they stand disjoint, entangled. It is a well known, and much discussed consequence of the design of XML, that it can represent only a single hierarchy. Encodings like LMNL offer alternative syntaxes for representing these overlapping hierarchies. Examining the overlap in texts suggests some interesting things about the evolution, purposes, and uses of the concepts of hierarchy in literary artifacts – and by implication in any hierarchical model such as XML.

Thursday 9:45am - 10:30am

Overlapproaches in documents: a definitive classification (in OWL, 2!)

Silvio Peroni, Francesco Poggi, Fabio Vitali, University of Bologna

Yes, you read it right: there is overlap everywhere, including in the title! Overlap has been a source of endless fascination at Balisage. The authors revisit earlier work, including their own, to develop an OWL ontology of overlap situations. Together with their EARMARK framework of stand-off markup in RDF, they are seeking a complete solution to the problem of expressing any marked up document with Semantic Web technologies, especially in the domain of Digital Humanities.

Thursday 11:00am - 11:45am

Non-hierarchical structures: How to model and index overlaps

Faegheh Hasibi & Svein Erik Bratsberg, both of Norwegian University of Science and Technology

Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.

Thursday 11:45am - 12:30pm

Document lattices: Equivalence, compatibility, and contradiction in document markup

C. M. Sperberg-McQueen, Black Mesa Technologies LLC
Yves Marcoux, Université de Montréal
Claus Huitfeldt, University of Bergen

What does markup mean and how do we discuss it? Given a body of content, there may be more than one way of marking it up. Without succumbing to the purely syntactic approach of XML differencing programs, there are ways of analyzing the semantic import of different document markup vocabularies. Using these techniques, we can discover whether two documents are logically equivalent, one subsumes the other, they are unrelated, or perhaps they are contradictory. Applying lattice models allows us to visualize these relationships. Finding correspondences may allow construction of translation inference rules, and these may tell us something about the expressitivity licensed by different markup approaches.

Thursday 12:30pm - 2:00pm

Lunch

Thursday 1:15pm - 2:00pm

Balisage Bluff: An Entertainment

Balisage Attendees

Balisage Bluff: markup-truth may be stranger than fiction! Participants will listen to short stories that involve markup, the greater Washington DC area, or have some other connection to the conference. The audience will be challenged with identifying which stories are true (or close to it) and which are mostly fabricated.

Do you have a story to tell? Stories will be limited to 2 minutes, but even so there are a lot of Balisageurs with great tales to tell. Volunteer by sending email to info@balisage.net, or by talking with Lynne Price, gamemaster, on site. If there are more than ten volunteers, ten will be randomly selected. If we have more time in the actual session volunteers will be recruited from the audience/participants.

Thursday 2:00pm - 2:45pm

(LB) JSOX: A Justly Simple Objectization for XML

Steven J. DeRose

XML can be as easy to work with as JSON. However, this has not been obvious until now. JSON is easy because it supports only datatypes that are already native to Javascript and uses the same syntax to access them (such as [1:10], [“x”], and “.” notation). XML, on the other hand, supports additional datatypes, and is most commonly handled via SAX or DOM, both of which are low-level and meant to be cross-language. Typical developers want high-level access that feels “native” in the language they are using. These shortcomings have little or nothing to do with XML, and can be remedied by a different API. Software that demonstrates this is presented and described. It uses Python’s richer set of abstract datatypes (such as tuples and sets), and provides native Python style syntax with richer semantics than JSON or Javascript.

Thursday 2:45pm - 3:30pm

(LB) NoXML: Extending the relevance of XPath by breaking the chains of the DOM

David Lee, MarkLogic Corporation

XPath is an ingenious invention and the core strength, if not the foundation, of the success of XML. Through its life it has been enhanced, redefined, specified, extended and embedded into nearly every XML technology. XPath and XDM (the data model of XPath 2.0 and XQuery 1.0) intricately bind XML (the serialization format) and XML technologies (the languages) into a powerful and successful set of Data Specific Languages (DSLs) that power the XML Ecosystem. It is this very success, however, that is both pushing the original boundaries of XML Processing and holding them back from the prominence they once claimed. The elegance and power of XPath is at risk as we push the use cases of what-was XML Only languages but are held back by an XML Only data model for XPath.

Thursday 4:00pm - 4:45pm

GameX — event-based programming with XML technology

Marouane Sayih, Martin Kuhn, & Anne Brüggemann-Klein, all of the Technische Universität München

GameX, a student project at Technische Universität München, is a ‘serious’ browser game that is intended to further systemic thinking in players. GameX is implemented almost exclusively with XML technology, which makes the game essentially platform independent. XML lends itself to involving domain experts in all phases of development, and to the model-driven designs which can adapt easily to changing requirements. Browser games, however, are quintessentially event-driven, reactive systems — how can such applications be built using the XML technology stack? GameX uses XForms, SVG, XProc, XSLT, and XQuery, as well as the native HTML DOM to put the event-driven programming paradigm into practice on an implementation platform of XML technology.

Thursday 4:45pm - 5:30pm

XForms user interfaces for small arcane nontrivial datasets

Joshua Lubell, National Institute of Standards and Technology

Small Arcane Nontrivial Datasets (SANDs) are frequently complex enough to warrant custom software for access and editing, yet too small or specialized to justify a full-blown server-based database application. Such data is typically presented in tabular form within documents or as editable spreadsheets. To test the alternative of using XForms as a user interface for SANDs, an application was built for browsing a conformance test suite for Product and Manufacturing Information, a formal specification of a product's functional and behavioral requirements as they apply to production. XForms proved a much better match than tabulations for the underlying data model. To further test the concept, XForms was evaluated for use with the NIST Special Publication 800-53 security control catalog, which is a comprehensive catalog of security controls for managing cyber-risk, many parts of which are already available in XML form. The model-view-controller (MVC) software pattern of XForms seems well-suited for creating specialized applications for tailoring and navigating this catalog.

Friday, August 8, 2014

Friday 7:00am - 9:00am

Continental Breakfast

Breakfast outside the White Flint Ampitheater.

Friday 9:00am - 9:45am

Meeting the twin challenges of Open Data for DATA Act compliance and delivering next generation industry services

David R.R. Webber, Oracle Public Sector

The Digital Accountability and Transparency Act (DATA Act) was signed into law in May, 2014. A legislative mandate for data transparency, the act requires the US Department of the Treasury and the White House Office of Management and Budget (OMB) to change the recording of U.S. federal spending into open standardized data formats that must be published online. Disclosure that can currently take months or years must now be available directly. New tools and methodologies will be required to meet this challenge, hopefully based on open public semantic techniques, shared vocabularies, and common service specifications. Implementations of the NIEM model (National Information Exchange Model) have lessons and techniques that may help, and open source and public tool sets are available. Illustrations from health care services, financial reporting, emergency preparedness, election management, and municipal services across the United States, Europe, and Asia show these new techniques successfully utilized.

Friday 9:45am - 10:30am

Using XML to publish the Foreign Relations of the United States series

Joseph Wicentowski, U.S. Department of State, Office of the Historian

The U.S. State Department’s Office of the Historian uses XML data and XML technology as a key publishing, archiving, and research tool. The core motivation for adopting XML was to transform the Foreign Relations of the United States series, the official documentary history of U.S. foreign-policy, into a modern online publication with a searchable archive (http://history.state.gov/historicaldocuments) using TEI markup and eXist-db. Some challenges that the project have faced were striking a balance between the rich possibilities of semantic markup and the costs of entry and quality control, and accounting for a range of styles, metadata, and editorial practices across the over 150-year life of the series. Some benefits that the project has realized from selecting XML data and technologies include a number of applications (many unforeseen at the outset), including the release of ebooks and APIs from the data, contributions to open government data initiatives, new forms of analysis and visualization, and an increasingly, if not yet fully, digital publishing workflow.

Friday 11:00am - 11:45am

(LB) Methodology For Providing National Information Exchange Model (NIEM) Model Understanding to XML and NIEM Novices

Betty Harvey, Electronic Commerce Connection

NIEM is a U.S. government initiative to enable the sharing of data in many domains. The NIEM model relies heavily on the use of references to create relationships between data. It also relies on different namespaces for each domain. Many large government projects have mandated that NIEM be used for exchange of data between the government agencies, states and other trading partners. NIEM data models are very complex. One of the challenges with using NIEM is how to provide a mechanism to present a complex data model in a way that will provide business analysts, SMEs, and programmers the ability to understand the complex elements, relationship,s and bi-directional linkages between pieces of information that could be understood by both technical and non-technical individuals.

Most of the projects have software development life cycle (SDLC) artifacts, i.e., UML models, data dictionaries, business analysis documents etc. However, these artifacts do not provide the clarity of schema design needed from a NIEM and XML perspective. This 'crazy' mechanism (out of the norm) provides an understandable artifact of a very large NIEM schema that that was provided to thousands of diverse trading partners for a very large federal and state government program.

Friday 11:45am - 12:30pm

(LB) On Teaching XQuery to Digital Humanists

Clifford Anderson, Vanderbilt University

XQuery provides an excellent means for teaching programming to digital humanists because it works seamlessly with their existing XML data, has an elegant and simple core with a well-structured standard library, and can be used in conjunction with XML databases to develop end-to-end web applications. However, current teaching materials for XQuery do not address the needs of digital humanists, presupposing implicit knowledge of programming concepts that they frequently lack. Based on experience teaching XQuery to digital humanists (including alt-ac professionals, archivists, faculty members, graduate students, and librarians) in three distinct setting: a weekly training session for librarians, a graduate seminar on digital humanities, and a two week NEH-supported Institute for Advanced Topics in Digital Humanities, I suggest how the XML community might develop resources to widen the appeal and accessibility of XQuery.

Friday 12:30pm - 1:15pm

First Person: Seeing things whole

C. M. Sperberg-McQueen, Black Mesa Technologies

Sometimes we need to focus on the trees, or the leaves on the trees. Sometimes we need to focus on the forest.