Balisage 2008 logo
Balisage 2008 Programs
Balisage Conference
Schedule at a Glance
Speaker/Author Bios
Pre-conference Versioning Symposium
Symposium and Conference proceedings
Peer Review
Announcement List

Balisage 2008 Detailed Program

Tuesday 9:45—10:30
Cool versus useful

B. Tommie Usdin, Mulberry Technologies

True versus Useful, or True versus Likely-to-be-useful, are tradeoffs we find ourselves making in document modeling and many other markup-related situations all the time. But Cool versus Useful is a far more difficult tradeoff, especially since our world now includes a number of very cool techniques, tools, and specifications. Cool toys can have a lot of gravitational pull; attracting attention, users, projects, and funding. Unfortunately, there is sometimes a disconnect between the appeal of a particular tool/technology and its applicability in a particular circumstance.

Tuesday 11:00—11:45
REST Oriented Architectures (ROA): Taking a resourceful approach to web data

Kurt Cagle, O’Reilly Networks

The paradigm shift away from Service Oriented Architectures (SOAs) toward Resource Oriented Architectures (ROAs) can be expected to continue. The cost of developing “editors” for the service-oriented silos of data now piling up on the web often exceeds the value of those silos to the services that need to provide such “editors”. Within organizations, the complexity and heterogeneity of data increasingly resists management via name/value approaches. It is simpler and more efficient to view the web as a giant database, refocusing development efforts on query-oriented substrates, rather than on verb-oriented ones. In the general case, it is easier to get data from users and to provide it to them via ROAs.

Tuesday 11:45—12:30
Informal ontology design: A wiki-based assertion framework

Murray Altheim,

Wiki software has historically provided little support for even simple organizational structure. But when the wiki grows large, a flat organization no longer works well. The addition of ‘category links’ to each page helps, but these are often ad hoc or undefined and themselves need structuring. Our wiki architecture permits expression of an underlying structure using a wiki-like syntax: user-authored assertions are dynamically harvested to create a Topic Map graph, which mirrors the explicit structure of the wiki and provides it with an underlying classification system. The resulting implementation is a combination of wiki plugins, event handlers to capture and process their output, a harvester to scan the assertions on the wiki, and a manager to maintain the set of assertions and to respond to queries.

Tuesday 2:00—2:45
(FP) XML: It was not televised after all ...

Eduardo Gutentag, Sun

Few may remember that XML was launched with an explicit technical and social agenda centered on the revolutionary idea that you own the content you produce. This has now come full circle and has become an almost trivial assertion (albeit far from being universally true). Yet in the meantime, while no TV cameras were watching, it facilitated another revolution, which in turn has had a global transformative effect on the way we define the words “content”, “ownership” and even “freedom”. Will this now unleash another deep and almost antithetical change, centered on the equally revolutionary concept that ownership of an idea is not necessarily vested upon the person who comes up with it?

Tuesday 2:45—3:30
Optimized Cartesian product: A hybrid approach to derivation-chain checking in XSD 1.1

Maurizio Casimirri, Paolo Marinelli, & Fabio Vitali, University of Bologna

Conditional type assignment in XSD 1.1 allows an element’s type to be determined at validation time by evaluating XPath expressions to determine which of several possible types to assign. Conditional type assignment on child elements makes it challenging to verify statically whether one type is a legal restriction of another. The current draft of XSD 1.1 adopts a dynamic approach to the problem, which means that some schema errors may remain undetected if not exposed by the document instance. We propose a hybrid solution, partly dynamic and partly based on static analysis, for verifying that a restriction actually restricts its base type.

Tuesday 4:00—4:45
Office Suite Markup: Is It Worthwhile?

Patrick Durusau

There is a lot of smoke but is there any fire? Microsoft, Sun, IBM, Oracle, Google, Redhat and others are contending over two XML vocabularies for office documents (word processor, spread sheet, presentation tool, etc.) at the moment. Fans of OOXML or ODF are hard to find among people who have spent the last twenty years researching and developing markup and markup systems. In fact, many in the markup community dismiss these efforts as unimportant and/or uninteresting. Patrick Durusau, the editor of ODF and who has called for the co-evolution of OOXML and ODF in an ISO context thinks they are important, possibly very important. Come find out what has captured the interest of at least one topic map and overlapping markup theorist.

Wednesday 9:00—9:45
Topic maps in near-real time

Sam Hunting, Universal Pantograph

A topic map is an editorial product in which everything known to the topic map about each member of some set of subjects of conversation is co-located. When a community allows its members to contribute content about existing and new members of the set of subjects, the resulting publishable topic map may take time to produce, even if it is produced without human editorial intervention, because the co-location entailment may require significant changes to the graph of nodes that must ultimately bear a one-to-one correspondence to the subjects under discussion. When a topic map will be published as a corresponding set of interconnected web pages, it may be economically vital to use existing web publishing tools, such as Drupal, even if they were not originally designed especially for topic map publishing. A new topic map module for the Drupal open-source content management system now supports the publication of collaboratively written topic maps in near-real time, using a plug-in architecture that can be extended to support specific information sets.

Wednesday 9:00—9:45
SGF: An integrated model for multiple annotations and its application in a linguistic domain

Maik Stührenberg & Daniela Goecke, University of Bielefeld

Linguists must often merge multiple annotation layers, in heterogeneous formats, that describe the same primary data. In recent years several approaches have been proposed for storing such multiple annotations: Prolog fact-based architectures, XML-related approaches, and graph-based models using XML syntax. In real-world application, however, these architectures have serious practical disadvantages. The XML-based Sekimo Generic Format (SGF) is based on graph-based design principles but uses the tree structures inherent in XML to reduce processing complexity and costs. SGF data can be analyzed using standard XML tools such as XPath or XQuery, as illustrated by our own project on the detection of anaphoric antecedents.

Wednesday 9:45—10:30
Using Atom categorization to build dynamic applications

Alex Milowski

Atom feeds provide the ability to categorize both the feed and its entries. This categorization provides a simple and way for feed authors to associated terms and semantics to their feed contents. This talk will demonstrate how such author-generated categorization can be used to build web applications dynamically from feed and how Atom categories map into the world of RDF and the “Semantic Web”.

Wednesday 9:45—10:30
(LB) An event-centric API for processing concurrent markup

Oliver Schonefeld, University of Tübingen

A programmer can basically choose from two different APIs when working with XML documents. One provides an event-centric view (SAX) of the document, while the other offers an object-centric view (DOM). This presentation introduces an event-centric programming interface to work with XCONCUR documents which is inspired by the XML’s SAX-API. It provides a very easy-to-use API for parsing XCONCUR documents.

Wednesday 11:00—11:45
(LB) xmlsh - a command language (shell) based on the philosophy of the Unix Shells, designed for XML

David A. Lee, Epocrates

xmlsh, an Open Source project, is a command language (shell) modeled after the philosophy of traditional Unix Shells but designed to support XML natively. Largely backwards compatible with the unix shells, xmlsh is designed for both interactive and script use. It has built-in support for XML data (documents and sequences) as expressions, variables, files and pipelines. Support for full multithreading xml and text pipelines as well as direct execution of OS processes in and commands in portable and familiar syntax allows developers to construct complex jobs composed of xml tasks and traditional text and file operations easily and portably. Written in ‘pure java’, and integrated closely with the Saxon XQuery and XSLT library, xmlsh is portable to any platform which runs the java 1.6 JDK.

Wednesday 11:00—11:45
Discontinuity in TexMecs, Goddag structures, and rabbit/duck grammars

C. M. Sperberg-McQueen, World Wide Web Consortium / MIT
Claus Huitfeldt, University of Bergen

Our Montréal conferences have long had a fascination with the problems caused by overlapping structures in markup. One special category of markup problems not fully examined in past conferences comprises those caused by discontinuous structures: document components that may be logical units but which cannot be easily marked as such because of interruptions. It may be possible to construct a graph structure which more nearly reflects our intuitive notions about how documents are constructed, if we retain the principle that parent/child and ancestor/descendant relations imply that the ancestor contain the descendant, but jettison the converse principle that any element properly contained by another element is necessarily a descendant of (dominated by) that other element.

Wednesday 11:45—12:30
(LB) State of the art of streaming: Why W3C XProc, W3C XSLT WGs and ISO SC34 WG 1 are looking closely at streaming?

Mohamed Zergaoui, member of XProc and XSLT 2.0 WG

XML has been out for 10 years and is now mainstream. XML is now recognized for its value (Unicode, Structure, Extensibility) and not seen only as something heavy and difficult to use. XML needs still to improve its capacity to be processed more naturally as a stream of information. After a short presentation about where we come from, we will look very closely ongoing the work relating to streaming processing, and especially from the XSLT WG, XProc WG and ISO DSDL. We will also discuss some interesting approachs to finding workarounds for streaming and propose some new areas of use for XML.

Wednesday 11:45—12:30
Graph characterization of overlap-only TexMecs and other overlapping markup formalisms

Yves Marcoux, Université de Montréal

A criterion for determining whether any given graph can be serialized as a document with overlapping markup is described. This provides an exact and complete characterization of the element-containment relationships expressible in markup formalisms allowing overlap, such as TexMECS (without interrupted or virtual elements). Such a characterization will allow DOM-based applications to determine, for example, whether a given modification to a document would preserve its ability to be serialized using overlapping markup.

Wednesday 2:00—3:30
(FP) Dirty laundry: Committee disasters, what happened, what we learned

Jon Bosak, Sun
Mavis Cournane, Cognitran
Patrick Durusau
James David Mason, Y-12 National Security Complex
David Orchard, BEA Systems
Lauren Wood, Sun

Markup standards and projects are created, managed, and sometimes destroyed through group process. While this process is often a bit bumpy, there are some occasions when it goes spectacularly badly. Tales of these committee disasters can be not only entertaining, but also (and more importantly) informative. Panelists will spend a maximum of 10 minutes each, describing a committee/working group disaster of some sort, including: what went wrong, how it could have been prevented, how it could have been (or how it was) resolved. Participants may anonymize their tales of woe, provided they assure us that the events they describe actually occurred and that that they were actually involved.

Wednesday 4:00—4:45
(LB) Beyond the Semantic Web: the Semantic Space

Pierre Lévy, University of Ottawa

Today, the sharing of semantics remains a conundrum. Semantics can be shared within a universe of discourse, but individuals and communities cannot be relieved of the need to define their own universes of discourse. The emergence of collective intelligence is increasingly seen as necessary for human survival, but it is difficult for people who live in diverse universes of discourse to know when they are talking about the same things. Collective intelligence — the ability of a community to exhibit self-sustaining, rational behaviors — is related to its participants’ ability to understand each other.

Diverse minds can create, recognize, and think in terms of diverse sets of distinct concepts and relationships between them. A conceptual addressing system can map such sets into a shared abstract “semantic space” that is structured by an algebraically definable group of transformations. Information Economy Meta Language (IEML) is such a “semantic space addressing system”; it defines a very large space of semantic addresses. A small number of the points in that space — more than 2,500 of them — are now listed in an “IEML Dictionary”, along with interpretations of each of them in several natural languages. A language for compactly specifying sets of locations in the space exists, and a parser that translates expressions in this language into XML is available. A programming language for discovering and asserting relationships between sets of semantics is being developed, along with a variety of related software tools.

The semantic space research program could provide a scientific (measurable, principled, experimentally repeatable) foundation on which technologies and professional disciplines can be created, including distributed collaborative semantic search engines, models and simulations of collective intelligences, tools and editorial practices for the automated production of multimedia documents, and many more.

Thursday 9:00—9:45
(LB) Reconsidering Conventional Markup for Knowledge Representation

David Dubin, University of Illinois at Urbana-Champaign
David J. Birnbaum, University of Pittsburgh

The main attraction of semantic web technologies such as RDF and OWL over conventional markup is the support those tools provide for expressing precise semantics. Formal grounding for RDF-based languages (in, for example, description logics) and their integration with logic programming tools are guided and constrained by issues of decidability and the tractability of computations. Users of these technologies are invited to use less expressive representations, and thereby work within those constraints. Such compromises seem reasonable when considering the roles automated reasoning agents are expected to play by the semantic web community. But where expectations differ, it may be useful to reconsider using conventional markup and inferencing methods that have been applied with success despite their theoretical weaknesses. We illustrate these issues with a case study from manuscript studies and textual transmission.

Thursday 9:00—9:45
Hybrid parallel processing for XML parsing and schema validation

Yu Wu, Qi Zhang, Zhiqiang Yu, & Jianhui Li, Intel Corporation

XML parsing and validation is widely regarded as a performance bottleneck in the processing of very large XML documents. We propose a novel chunk-based algorithm for parallel processing of XML using the multi-core architectures now more and more widely deployed both on desktops and in servers. We partition the XML document into chunks and process the chunks speculatively in parallel, both for parsing and for schema validation, before reintegrating them into a single result. Experimental results show that this approach provides a great overall performance advantage by exploiting the parallelism of multi-core platforms.

Thursday 9:45—10:30
(LB) Linking Page Images to Transcriptions with SVG

Hugh A. Cayless, Carolina Digital Library and Archives, University of North Carolina at Chapel Hill

This paper will present the results of ongoing experimentation with the linking of manuscript images to TEI transcriptions. The method being tested involves the automated conversion of images containing text to SVG, using Open Source tools. Once the text has been converted to SVG paths, these can be grouped in the document to mark the words therein and these groups can then be linked using standard methods to tokenized versions of the transcriptions. The goal of these experiments is to achieve a much more fine-grained linking and annotation mechanism than is so far possible with available tools, e.g. the Image Markup Tool and TEI P5 facsimile markup, both of which annotate only rectangular sections of an image. The method envisioned here would produce a legible tracing of the word, expressed in XML, to which transcripts and annotations might be attached and which can be superimposed upon the original image.

Thursday 9:45—10:30
The Apache Qpid XML Exchange: High-speed reliable enterprise messaging using open standards and open source

Jonathan Robie, Red Hat

The Advanced Message Queueing Protocol (AMQP) is an open, language-independent, platform-independent standard for enterprise messaging; it provides a simple, coherent architecture for sophisticated messaging applications. Qpid is a multi-language implementation of AMQP being developed at the Apache Software Foundation. The Qpid XML Exchange provides XQuery-based routing for XML content, allowing AMQP and Qpid to be used for mission-critical XML messaging applications. Together, these tools vastly simplify the task of writing XML messaging software.

Thursday 11:00—11:45
An onion of documents and metadata

D. Matthew Kelleher, Albert J. Klein, & James David Mason, Y-12 National Security Complex

This XML stuff is not just theory; it is proving eminently practical in some large organizations now coping with the problem of translating fifty-year-old paper-based workflow into online embedded metadata. The United States DOE Y-12 Security Complex in Oak Ridge builds products that cannot be tested as finished units, so each component and assembly must be thoroughly inspected and tested. Because such products have a potentially long shelf life, extraordinary measures are necessary to document not only the products but also the computing environment in which the documentation has been prepared, as well as the output data from test equipment. XML applications for the most important paper-based components exist, and by the time of the Balisage conference we hope to have a live pilot and early results to report.

Thursday 11:45—12:30
Structural metadata and the social limitation of interoperability: A sociotechnical view of XML and digital library standards development

Jerome McDonough, University of Illinois at Urbana-Champaign

XML is like a rope: it is extraordinarily flexible; unfortunately, just as with rope, that flexibility makes it all too easy to hang yourself. The apparent simplicity of XML, combined with its flexibility, make it an all too obvious choice for encoding metadata for the catalogs of digital libraries. However, it may provide too much flexibility. Not only are there competing metadata schemes, such as METS and MPEG-21 DIDL, but it is also possible to create variant interpretations of generic high-level structures within a single metadata scheme. The result? Catalogs lose interoperability. Establishing a standard for a metadata scheme is not enough: libraries must also build community consensus about how to apply the standards.

Thursday 2:00—2:45
(FP) Parser possibilities: Why write a markup parser?

Norman E. Smith, Science Applications International Corporation

Since high-quality validating XML parsers for multiple schema languages are widely available, our community seems to have lost interest in writing new parsers. But there are still many good reasons to roll your own: for the learning experience, because none of the existing ones quite meets your needs, so you can parse multiple markup languages, to write to your own API, and more. My mlParser started modestly and has evolved into a primary tool in my markup toolkit. Let me tell you about the choices I made along the way.

Thursday 2:45—3:30
Properties of schema mashups: dynamicity, semantic, mixins, hyperschemas

Philippe Poulard, INRIA

The Active Schema Language (ASL), based on the Active Tags engine, allows us to experiment with features that might be useful in the next generation of XML schema languages. ASL allows us to build content models on the fly: think of a purchase order that can have a free-item element only if the item elements total $500 or more. ASL can specify a datatype that knows that 68°F is warmer than 19°C and cooler than 22°C. ASL can mix constraints written in different schema languages. And, of course, we can write a schema in ASL for ASL itself. ASL illustrates an important point: Active Tags can help significantly in the design of runnable XML languages.

Thursday 4:00—4:45
(LB) Hypertext Links and Relationships in XML Databases

Anne Brüggemann-Klein & Lorenz Singer, Technische Universität München

Hypertext links are, for semistructured data and narrative documents in XML databases, a fitting analogue to foreign-key references for structured data in relational databases. We encode hypertext links with XLink. For processing the links, we use the XLink processor HyQuery, an XQuery module which turns a native, XQuery-enabled XML database into a hyperdata system. This system is used in a lab course “XML Technology” and in the case study XTunes, a Web application that manages metadata and recordings of classical music.

Thursday 4:00—4:45
Secure publishing using schema-level role-based access control policies for fragments of XML documents

Tomasz Müldner & Robin McNeill, Acadia University
Jan Krzysztof Miziołek, University of Warsaw

Increasing use of XML to store large data collections such as medical records has created a need to generate specialized views of those collections and to control access to the views according to the roles of the viewers. A medical practitioner, for example, might need to view many patients’ records, while a patient should have access only to an individual chart. Granting secure, encrypted, and role-based access to XML fragments drawn from a collection, without requiring a massive encryption overhead, can be done with creative use of path processing that is aware of the schemas behind the collection. Minimal keyrings for encrypting/decrypting can then be generated based on structures allowed by document schemas, and keyrings can be distributed according to roles and rights of different users of the collection.

Friday 9:00—9:45
Putting it all in context: Context and large-scale information sharing with Topic Maps

Peter F. Brown, Pensive

Two major concerns in creating large-scale Topic Maps applications are: first, the role and capture of context within a specific topic map, and second, how the Topic Maps paradigm — specifically the XTM specification — allows the linking and efficient use of Topic Maps deployed on a massive scale, such as the Internet, to support extremely large-scale information sharing. These two subjects initially seem to be separate concerns. However, upon reflection and further discussion, it is clear that they are intertwined and closely related to the issue of scalability.

Friday 9:45—10:30
Translation between RDF and Topic Maps: Divide and translate

Christo Dichev, Darina Dicheva, Boriana Ditcheva, & Mike Moran, Winston-Salem State University

Is translation between RDF and Topic Maps really feasible in the general case? Any affirmative answer to this question depends on achieving the right balance between the competing objectives of semantic fidelity, completeness, and usability of the resulting translation. Such a balance can only be achieved when the ontological correspondences between RDF and Topic Maps are exploited. An analysis of the translation task reveals relevant requirements. A balanced method has been implemented as a plug-in for the TM4L topic maps editor.

Friday 11:00—11:45
Freedom to constrain: Where does attribute constraint come from, Mommy?

Syd Bauman, Brown University

There are many ways to express constraints on classes of XML documents. Some can and should be controlled by document creators, some at the project or local level, and some by remote schema developers. Each type is illustrated by explaining ways to constrain an attribute value to one of an enumerated list of values. The constraint could be expressed formally in a schema language such as RELAX NG, in a rules-based process such as Schematron, in a metadata element such as a TEI header, in a metaschema file such as a literate program, or independently in a separate file. Pros and cons will be given for each, but the best answer may depend on access requirements: will encoders as well as designers need access and change rights to the constraint?

Friday 11:45—12:30
Text retrieval of XML-encoded corpora: A lexical approach

Liam R.E. Quin, World Wide Web Consortium

The latest XQuery 1.0 and XPath 2.0 Full-Text 1.0 specification extends XPath 2.0 to support full text searching. The software lq-text is a 1989 open source text retrieval package. Whereas XPath 2.0 Full-Text is node-based, working over XML document trees and fully XML-aware, lq-text operates over text files by indexing the location of each natural-language word in all the files. Since lq-text has high precision, good performance over large datasets, and flexible concordance generation, enhancing it to add XML support allows interesting comparisons between the two approaches.

Friday 12:30—1:15
But wait, there’s more!

C. M. Sperberg-McQueen, World Wide Web Consortium / MIT

XML has been widely adopted and forms part of the infrastructure of most modern information technology. We have a satisfyingly large collection of XML vocabularies and XML tools. Is it time to declare victory and go home yet? Or is there more to do?

Return to Balisage