Balisage

Preconference Symposium

QA and QC in XML

Proceedings

Balisage Series on Markup Technologies

Balisage 2012 Program

Tuesday, August 7, 2012

Tuesday 9:15 am - 9:45 am

Things change, or, the “real meaning” of technical terms

B. Tommie Usdin, Mulberry Technologies

Vocabulary is slippery, especially the sorts of technical jargon we are immersed in at events like Balisage. When we want to talk about a new idea, process, specification, or procedure we have two choices: make up a new word or use a word that is already in use to mean something else. New words may be difficult to remember and awkward to use. Re-purposing an existing word may cause confusion between the “old” and your “new” meaning. In either case, usage of terms changes. The usage of a technical term may mutate over time and may evolve differently in different communities. At times it is useful for a community to pressure users to use terms to mean what they meant when coined, but more often it is simple pedantry to insist that any usage other than that of the person who first introduced the term is incorrect. Our challenge is in finding that balance.

Tuesday 9:45 am - 10:30 am

Type introspection in XQuery

Mary Holstege, MarkLogic

Type introspection allows a program to determine the type of an object at runtime and to manipulate the type of the object as an object in its own right. It can be used as a basis for generic and flexible services, meta-programming, runtime adaptation, and data exploration and discovery. This paper outlines a proposal to provide some type introspection capabilities to XQuery, looking at some design and implementation considerations and demonstrating the application of type introspection in various contexts. Since software to access and navigate among XSD schema components faces many of the same problems outlined here, the relationship of type introspection to schema component paths is explored. (Proceedings, e-book)

Tuesday 11:00 am - 11:45 am

Using XML to implement XML: Or, since XProc is XML, shouldn’t everything else be, too?

Ari Nordström, Condesign

Implementing XProc pipelines using XML throughout can simplify development and eliminate dependencies on non-XML tools, thus enabling people who know XML to take control of their own processes. The author works in an environment where much of the production pipeline is implemented in C# and changes to that pipeline must be made by programmers who know C# but who do not know XML. Expressing processes, pipelines, etc. as XML allows black-boxing of feature sets and provides a blueprint for the total features available without having to go into specifics. This enables users who may not have detailed knowledge of the mechanics of the process or the system to use it aptly and independently. (Proceedings, e-book)

Tuesday 11:45 am - 12:30 pm

(LB) Finally — an XML markup solution for design-based publishers: Introducing the PRISM Source Vocabulary

Dianne Kennedy, IDEAlliance

On April 2010 the design-based publishing world was rocked when the iPad became the first of many new digital publishing platforms for magazine publishers. Finally it became clear to magazine publishers that in order to publish simultaneously in print and across a range of diverse mobile platforms with different aspect ratios, sizes, and operating systems, that they must shift from design-focused workflows to a model in which a platform agnostic content source is used to feed all platforms and designs. The PRISM Source Vocabulary, which has been developed over the past 2 years and posted as a final public draft in mid June 2012, provides a design-based publishing architecture. PSV leverages rich metadata and controlled vocabulary semantics coupled with semantic HTML5 to enable design-based publishing. (Proceedings, e-book)

Tuesday 2:00 pm - 2:45 pm

Fleshing the XDM chimera

Eric van der Vlist, Dyomedea

So long as XDM, the XQuery and XPath Data Model, was concerned only with traditional XML documents, it was relatively tidy. Version 3.0, however, proposes new features to support such things as JSON maps and could be extended to support RDF triples. How can we support such things that do not map simply into conventional XML? Several possible approaches are examined, along with methods for validation and processing, to extend the XML ecosystem for the future. (Proceedings, e-book)

Tuesday 2:45 pm - 3:30 pm

Serialisation, abstraction and XML applications

Steven Pemberton, W3C / CWI Amsterdam

In principle the advantages of abstraction in programming are well understood. Yet daily interactions with everyday objects can lead us to confuse the concrete with the abstract, and think that the thing we are dealing with *is* the abstraction. Getting the right level of abstraction can have profound consequences. I believe that there are things we are struggling with today that are the consequences of a mistake in an abstraction made in the 1970's. This talk will be about data abstractions, and how we use them in XML applications, with a passing reference to the developments in XForms 2.0, and how declarative applications can make your life easier (or save you money, depending on who's doing the actual work).

Tuesday 4:00 pm - 4:45 pm

XQuery, XSLT, and JSON: Adapting the XML stack for a world of XML, HTML, JSON, and JavaScript

Jonathan Robie, EMC

XML and JSON have become the dominant formats for exchanging data on the Internet. JSON has not yet developed an application stack as mature as XML’s, and the XML application stack has not yet evolved to easily process JSON. The XML stack should evolve to support this new world. A proposal from the XSL Working Group implements maps using higher order functions, as does the rbtree.xq library; a proposal created by members of the XML Query Working Group adds JSON objects and arrays to the XDM data model. These features, introduced to support JSON in XQuery and XSLT, also allow simpler, more efficient processing of intermediate results when processing XML. The two Working Groups expect to agree on a common solution that can be used in both XSLT and XQuery. (Proceedings, e-book)

Tuesday 4:45 pm - 5:30 pm

From XML to UDL: A unified document language, supporting multiple markup languages

Hans-Jürgen Rennau, Traveltainment

The XML node model described in XDM should be changed so as to encompass JSON markup as well as XML markup. Since XML processing technologies like XPath, XQuery, XSLT, and XProc see instances of the node-oriented XDM model, but do not see the surface syntax of their input, they could handle JSON as well as XML, if the XML parser could deserialize JSON as well. The crucial step is to define a new [key] property for nodes in the model, along with associated constructs. The extended node model proposed here is dubbed the Unified Document Language; it defines the construction of documents from building blocks (nodes) which can be encoded in various markup languages (XML, JSON, HTML). (Proceedings, e-book)

Wednesday, August 8, 2012

Wednesday 9:00 am - 9:45 am

Contemporary transformation of ancient documents for recording and retrieving maximum information: When one form of markup is not enough

Anna Jordanous, King’s College London; Alan Stanley, University of Prince Edward Island; & Charlotte Tupman, King’s College London

The Sharing Ancient Wisdoms Project (SAWS) explores the tradition of wisdom literatures in ancient Greek, Arabic, and other languages. The scholarly goal is to enable linking and comparisons within and between documents, their source texts, and texts which draw upon them and to make scholarly assertions about these complex relationships. We use Open Annotation Collaboration (OAC) to record historically important relations among sources. The technical challenge is to mark up RDF triples directly in documents marked up with TEI-bare, the minimal subset of TEI. Basic units of interest are marked as <seg> elements, and relationships are expressed in four attributes on a <relation> element using an ontology that extends the FRBR-oo model. We now have the capacity to extract RDF triples from TEI-tagged documents to use for queries and inferencing concerning a document and its related external documents. (Proceedings, e-book)

Wednesday 9:00 am - 9:45 am

(LB) Using XProc, XSLT 2.0, and XSD 1.1 to validate RESTful services

Jorge L. Williams & David Cramer, Rackspace

Documentation of RESTful services must be accurate and detailed. As a REST service is being developed, the documentation must be kept up to date and its accuracy constantly validated. Once the REST service is released the documentation becomes a contract; clients may break if an implementation drifts from the documented rules. Also, third-party implementations must adhere to the rules in order for clients to interact with multiple implementations without issue. Ensuring conformance to the documentation is complicated, tedious, and error prone. We use our existing XML documentation pipeline to generate highly efficient validators which can check a RESTful service (and it's clients) for conformance to the documentation at runtime. We validate all aspects of the HTTP request including message content, URI templates, query parameters, headers, etc. We describe the transformation process and some of the optimizations that enable real time optimization and discuss challenges including testing the documentation pipeline and the validators themselves.

Wednesday 9:45 am - 10:30 am

(LB) Design considerations in the implementation of a boil-this-corpus-down-to-a-sample-document tool

Charlie Halpern-Hamu, Tata Consultancy Services

Creation of representative sample(s) of a large document collection can be automated using XSLT. Such samples will be useful for analysis, as a preliminary document analysis step in vocabulary redesign or conversion and to guide design of storage, editing, and transformation processing. Design goals are: to work intuitively with default configuration and no schema, produce plausible output, and produce a range of outputs from a large representative set to a short but highly complex sample document. The technique can be conceptualized in passes: annotate structures as original or redundant; keep wrappers to accommodate original markup found lower in the hierarchy; retain required children and attributes; and collapse similar structures. Possible settings include redundancy thresholds, text compression techniques, target length, schema-awareness, schema intuitions, how much context to preserve around kept elements, and whether similar structures should be collapsed (overlaid).

Wednesday 9:45 am - 10:30 am

The ontologist: Controlled vocabularies and semantic wikis

Kurt Cagle

Semantic wikis are being used in commercial projects and provide a powerful tool for structuring complex knowledge management systems. A robust high-performance Knowledge Management System can be built using the MarkLogic ecosystem, semantic triples, XForms, SKOS, and SPARQL. Most content management systems work on the premise that relationships between documents can be modeled with simple folder arrangements (and perhaps a few keywords for quick search). This model assumes small document collections and consistent categorization and becomes fragile as collections or complexity grow. Combining semantic assertion modeling with RESTful services and XQuery search capability, a system can treat documents, controlled vocabularies, audit traces, and related entities simply as terms within a rich interconnected ontology. These objects can be modeled via RDF/OWL constructs in a way to move most of the business logic into the server while still providing value to the client, web, and human. (Proceedings, e-book)

Wednesday 11:00am - 11:45am

(LB) Meta-stylesheets: Exploring the provenance of XSL transformations

Ashley Clark

When documents are transformed with XSLT, what methods can be used to understand and record those transformations? Though they aren't specifically meant for provenance capture, existing tools and informal practices can be used to manually piece together the provenance of XSLTs. However, a meta-stylesheet approach has the potential to generate provenance information by creating a copy of XSLT stylesheets with provenance-specific instructions. This new method is being currently being implemented, using the strategies and workflows detailed here. Even with the complications and limitations of the method, XSLT itself enables a surprising amount of provenance capture. (Proceedings, e-book)

Wednesday 11:00am - 11:45am

The MLCD Overlap Corpus (MOC)

Yves Marcoux, University of Montréal; Claus Huitfeldt, University of Bergen; & C. M. Sperberg-McQueen, Black Mesa Technologies

The immediate goal of the MLCD Overlap Corpus (MOC) project is to build a collection of samples of texts and text fragments with overlapping structures. The resulting body of material will include well-understood and well-documented examples of overlap, discontinuity, alternate ordering, and related phenomena in various notations, for use in the investigation of methods of recording such phenomena. The samples should be of use in documenting the history of proposals for dealing with overlap and in evaluating existing and new proposals. (Proceedings, e-book)

Wednesday 11:45 am - 12:30 pm

(LB) Literate programming: A case study and observations

Sam Wilmott

All production programming languages support integrating comments with code. Comments are often used to help the reader understand why coding is done in a particular way, and to document how a program is to be used. Comments embedded in code are not user-friendly documentation. Markup can be added to programming language comments, which allows user documentation to be extracted from the programming code and repurposed for the user. However, this means a lot of information needs to be duplicated, so that there are both "human" and "computer" versions of the same information. The future lies in taking it a step further and adding markup to a programming language's code itself, so that it can be used within the documentation without duplication. Markup-based Literate Programming gives us the opportunity to bring the advantages of markup in general, and XML in particular, to a wider community. Better and more reliable documentation could significantly improve the practice of computer programming more than any new programming language feature. (Proceedings, e-book)

Wednesday 11:45 am - 12:30 pm

Luminescent: Parsing LMNL by XSLT upconversion

Wendell Piez, Mulberry Technologies

Among attempts to deal with the overlap problem, LMNL (Layered Markup and Annotation Language) has attracted its share of attention but has also never grown much past its origins as a thought experiment. LMNL’s conceptual model differs from XML’s, and by design its notation also differs from XML’s. Nonetheless, a pipeline of XSLT transformations can parse LMNL input and construct an XML representation of LMNL, with the resulting benefit that further XML tools can be used to analyze and process documents originating from the alien notation. The key is to regard the task as an upconversion: structural induction performed over plain text. (Proceedings, e-book)

Wednesday (approximately) 1:15 pm - 2:00 pm

Balisage Bluff — an Entertainment

Games master: Lynne Price, Text Structure Consulting

Come play Balisage Bluff during today’s lunch break. Listen to stories about experiences with markup. Can you tell which are fabricated and which are strange but true?

Wednesday 2:00 pm - 2:45 pm

CodeUp: Marking up programming languages and the winding road to an XML syntax

David Lee, MarkLogic

We mark up texts, so why don’t we mark up programming languages? If we did mark up programming source code in XML, would we gain the same sorts of benefits as we gain from marking up texts? What sorts of tools could we use: could an XML editor supplement or even replace some part of a programming IDE? And what might program code fully marked up in XML look like? What might workable samples of XML-tagged code look like, and how might they be implemented with XML tools? (Proceedings, e-book)

Wednesday 2:45 pm - 3:30 pm

On XML languages ...

Norman Walsh, MarkLogic

Many XML languages (validation languages, transformation languages, query languages) have been designed for processing XML. Syntactically, some use XML syntax; some do not; some used mixed syntax, mostly XML with non-XML parts, or use XML only peripherally. A case can readily be made for XML syntax: familiarity, well defined extensibility points, automatic syntax checking (and thus typically cleaner input), availability of XML tools. Conversely, a case can equally be made for non-XML syntax: conciseness, familiarity (again!), and availability of non-XML tools. When creating a non-XML syntax there are additional questions such as delimiters, comments, annotations, and conciseness versus readability. To explore the implications of these purely syntactic distinctions, two compact syntaxes for XProc (an XML pipeline language defined with a pure XML-document syntax) are described and compared. (Proceedings, e-book)

Wednesday 4:00 pm - 4:45 pm

Encoding transparency: Literate programming and test generation for scientific function libraries

Mark D. Flood, Matthew McCormick, & Nathan Palmer, Office of Financial Research, Department of the Treasury

Knuth’s original vision of literate programming may rarely have been attained, but it nonetheless suggests some targeted applications for maintaining libraries of scientific function code. We demonstrate a prototype implementation, in XSLT and DocBook, of a system that generalizes the literate programming paradigm as implemented in other projects to a wide variety of languages. The system allows not only for the generation of documentation from comments, but also the production of both pseudocode for translation of routines from one programming language to another, and parameters and valid results for unit testing of those routines. (Proceedings, e-book)

Wednesday 4:45 pm - 5:30 pm

Extending XML with SHORTREFs specified in RELAX NG

Mario Blažević, Stilo International

When SGML was replaced by its simplified successor XML, nobody regretted the omission of SHORTREFs. Or did they? Non-XML syntaxes stubbornly persist in programming languages, schema languages, and most visibly of all in wikis. We present a novel method for specifying a concrete syntax that combines the notational convenience of non-XML markup with the structure, searchability, and tool-chain support of XML, allowing authors to create valid XML without entering any XML tags. Using an extension of Relax NG to specify a concrete syntax, a parser can read a well-formed XML document conforming to the given concrete syntax specification. The output of the parser is another XML document conforming to the abstract syntax described by the base Relax NG schema. (Proceedings, e-book)

Thursday, August 9, 2012

Thursday 9:00 am - 9:45 am

Documents as timed abstract objects

Claus Huitfeldt, University of Bergen; Fabio Vitali, University of Bologna; & Silvio Peroni, University of Bologna

At Balisage 2009 and 2010 Renear and Wickett discussed problems in reconciling the view that documents are abstract objects with the view that documents can undergo change. In this paper we discuss a commonly held alternative account of documents in which documents are indeed abstract objects but are associated with different strings or trees at different points in time. This account of documents as timed abstract objects, however, may be subject to the same criticisms as have been raised against the notion of space-time slices. We conclude that either documents are not abstract objects, or else they are abstract objects of a kind which differs from the standard definitions of what abstract objects are. (Proceedings, e-book)

Thursday 9:45 am - 10:30 am

(LB) A standards-related web-based information system

Maik Stührenberg, Oliver Schonefeld, & Andreas Witt, Institut für Deutsche Sprache (IDS) Mannheim

There is an unmanageable number of XML-related standards, which can be grouped in many ways. The many inter-relationships between specifications (and versions) complicates their correct use. We propose a community project to build a platform providing guidance through this jungle. Starting with a very small set of standards and specifications and constructed as an XRX (XForms, Rest, XQuery) application we offer a starting point for a platform that allows experts to share their knowledge. We have prototyped a web-based information system to serves as a starting point. It currently contains information on 25 specifications, and includes topics such as Meta Language, Metadata, Constraint Language, and standards body. We hope to create a product which will be of use to scholars and researchers around the world. We will publish the annotation format for comments and add further enhancements. After the format is established, we will upload contributed specifications sheets into the platform and will open the platform for reading so other people can give feedback in a less technical way. (Proceedings, e-book)

Thursday 11:00 am - 11:45 am

Utilizing new capabilities of XML languages to verify integrity constraints

Jakub Malý & Martin Nečaský, Charles University, Prague

Object Constraint Language (OCL) describes integrity rules that apply to Unified Modeling Language (UML) models but that cannot be diagrammatically expressed in UML. OCL integrity constraints can be verified in XML data using XML technologies like Schematron, XPath/XQuery, and XSLT, using the principles of model-driven architecture. Some constructs typical for OCL constraints are difficult to handle with idiomatic XPath/XQuery expressions, so we have written XSLT 2.0 extension functions to translate some OCL expressions. With the new features such as higher-order functions and dynamic evaluation proposed in drafts for XSLT 3.0, XPath 3.0, and XQuery 3.0, necessary constructs such as iterator expressions and free variables can be handled more elegantly, making the transition from OCL to XML technologies much more seamless and transparent. (Proceedings, e-book)

Thursday 11:45 am - 12:30 pm

Testing Schematron in the context of the Clinical Document Architecture (CDA)

Kate Hamilton, Maplekeys Consulting, & Lauren Wood, Textuality Services

The Clinical Document Architecture (CDA) is widely used in healthcare. Its scope is any clinical document or report. The (single) CDA schema that is used to validate all of these reports is derived from a UML model. The element names reflect specializations of various concepts, while the attribute values can refine element meaning, add flavor to the parent/child relationship, reverse the subject and object of a compound expression, negate the meaning, or explain the absence of a value. Separately-defined prose constraints represent the requirements for individual document types such as a Procedure Note or a public-health accounting of bloodstream infections. These report-specific constraints are, of course, not defined in the general CDA.xsd schema. Although the element-attribute relationships can be tested using the schema, the value-driven conditional and alternative rules are best tested using Schematron. We create Schematron and use it in conjunction with the CDA schema to confirm that the CDA documents conform to the relevant specific report constraints and requirements. The Schematron must itself be tested to ensure that the combination of W3C Schema and Schematron correctly checks the rules and that the Schematron error messages point comprehensibly to the real error. (Proceedings, e-book)

Thursday 2:00 pm - 2:45 pm

Lightning visualizations

Balisage Participants (perhaps including you)

The rules are simple:

You show one visual (e.g., a diagram or chart) that speaks to you.
You explain or describe the graphic and what it conveys.
You have three minutes.

The visual may be the presenter’s creation or something the presenter saw and admired. The visuals should be related to markup or markup technologies in some way; visualizations of markup will be welcome, as will compelling diagrams driven by markup (e.g., SVG), and images useful in the understanding or communication of markup-related concepts.

Thursday 2:45 pm - 3:30 pm

Exploring the unknown: Understanding and navigating large XML datasets

Micah Dubinko, MarkLogic

In an age of big data, linked data, and open data, you as a user of XML may often face collections of XML documents with more data than you know what to do with. Often these collections will be written in an unspecified set of vocabularies and use some unknown set of elements, attributes, namespaces, and content models. This paper describes an approach for quickly summarizing as well as guiding exploration into a non-indexed XML database. Probabilistic histograms can be generated to approximate faceted search over large datasets, without the need of building particular index configurations in advance. (Proceedings, e-book)

Thursday 4:00 pm - 4:45 pm

(LB) Moving sands: Adventures in XML ebook-land

Michel Biezunski, Infoloom

Producing ebooks that are useable on multiple devices from XML can be quite challenging. In doing a project for the US IRS (first step due to be completed in July 2012) targeting multiple devices, some announced but not yet available, which differ in size, features, the formats they can read, and how much of the epub standard they implement, the author encountered a variety of problems. Deficiencies in the epub standard, undocumented bugs in various of the devices (known to experts who have been the first to experience these weird effects and are sharing their findings), and issues such as fixed layout vs. flowable content added complexity to an already significant task. The topics discussed include: unorthodox albeit parsable HTML to accommodate known bugs in some readers (font change issue on the iPad, table display issue on the Nook, rendering of graphics). Radical changes needed to ensure a sustainable, minimally intrusive, long term solution for producing ebooks on various devices, will be discussed. (Proceedings, e-book)

Thursday 4:00 pm - 4:45 pm

XML entropy study

Hervé Ruellan, Canon Research Centre France

Many studies and research efforts have sought to reduce the size of XML documents and increase processing speed using informal, ad hoc, and partial approaches. To provide stronger foundations for such work, we present here a comprehensive formal study of the quantity of information contained in XML documents. For a carefully chosen collection of test documents, we estimate their information content by calculating the entropy of various representations of the documents. We then compare those theoretical results to the effective compactness of textual XML, Fast Infoset, and EXI, and characterize the effectiveness of various techniques for partitioning and indexing the representation of XML documents. (Proceedings, e-book)

Thursday 4:45 pm - 5:30 pm

XiBIT: XML-in-the-browser interoperability tests

C. M. Sperberg-McQueen, Black Mesa Technologies

XiBIT is an effort to investigate and document the behavior of existing web browsers in the processing and display of XML. This is not conformance testing, but interoperability / consistency testing. XiBIT tests are XML documents which explore one or more dimensions along which XML documents can vary. XiBIT will generate several work products: a set of tests available from a web server, documentation and tabulation of browser behavior on those tests, and a public interface allowing volunteers to submit data recording the behavior of specific browsers in specific environments. (Proceedings, e-book)

Thursday 4:45 pm - 5:30 pm

(LB) Leveraging XML technology for web applications

Anne Bruggemann-Klein, Jose Tomas Robles Hahn, & Marouane Sayih, Technische Universität München

As eBooks evolve into interactive applications, our vision at Electronic Publishing Group (EPT) is to empower authors to write and deploy not only documents and eBooks but whole Web applications using widely available tools without system lock-in. We envision XML technology as open, accessible, well supported technology to be leveraged for Web applications: Information is represented and manipulated with XML technology. Data and programs are deployed on a Web server, stored in an XML database, run by XML processors (XSLT, XQuery, XProc) and accessed from XML-aware Web clients (XForms) via the HTTP protocol. We document a calendar system, CalendarX, as a case study. We illustrate our use of XML technology and the methodology we employed, drawing on ideas from Domain-Driven Design and Abstract State Machines. (Proceedings, e-book)

Friday, August 10, 2012

Friday 9:00 am - 9:45 am

MicroXML: Who, What, Where, When, Why

John Cowan

MicroXML began at the end of 2010 when James Clark wanted to explore a subset of XML. MicroXML wasn't intended to replace XML, but to make something simple enough that people who ran screaming from XML would find it palatable. MicroXML is to XML as XML is to SGML: strip down the spec to the bare bones and start over, adding back as little as possible. James wrote a brief grammar and a really simple data model: everything is an element with a name, an attribute map, and ordered children, either elements or strings. In 2011, I wrote an Editor's Draft that expanded James's writeup to ten pages, corresponding to the 100 pages of XML, namespaces, infoset, xml:base, and xml:id. Now a Community Group at the W3C, chaired by Uche Ogbuji and with James and me as co-editors, is discussing exactly what MicroXML should be.

Friday 9:45 am - 10:30 am

Simplifying XSLT stylesheet development using higher order functions

Abel Braaksma, Abrasoft

Higher-order functions (HOFs) are sometimes considered hard to get your head around, but they can help make stylesheets easier to maintain and can even provide a certain degree of information hiding for XSLT libraries. HOFs can be easy to use, and together with the new packaging features of XSLT 3.0 they can dramatically simplify common tasks in XSLT stylesheet and library development. Built-in functions like fn:filter, fn:map, and fn:fold-left/right have many general applications for filtering, binary search trees, and other tasks. HOFs can offer some new challenges, as well; this paper will also discuss some problems to look out for and some things not to do. HOFs are great fun, and with them programming in XSLT will be even more fun than it already is!

Friday 11:00 am - 11:45 am

(LB) Developing a low-cost functional Class 3 IETM

Betty Harvey, Electronic Commerce Connection

The specifications for U.S. DoD technical documents and IETMs (Interactive Electronic Technical Manuals) were developed in the 1990s based on ISO standards (SGML and HyTime) developed in the 1980s. US DoD contracts continue to specify deliverable technical documentation defined in these specs. The DTDs and stylesheets (FOSI) were developed over 20 years ago. Many of the original tools to create and manipulate these documents are no longer available; in fact the hardware platforms many of the tools ran on are no longer in existance. This paper will describe an approach that was developed for Cobham Mission Systems Division in Orchard Park, New York for delivering a Class 3 IETM using current technologies. Microsoft Word documents were transformed using XSLT into XML, edited, and then converted to SGML and searchable IETMs. There were a few bumps in the road, which will be discussed, but the presentation will end with a demonstration of a converted Class 3 IETM. (Proceedings, e-book)

Friday 11:45 am - 12:30 pm

(LB) Characterizing ill-formed XML on the web

Liam R. E. Quin, W3C

There are a substantial number of documents on the Web that are served as XML but are not well-formed XML documents. Building on the work of Steven Grijzenhout, who built the Amsterdam XML Corpus, this paper explores the types of errors that occur in XML documents on the Web by document type. An interesting sidelight is an analysis of the document types, or at least top level elements, of XML documents on the Web. The aim is to bring a more XML-centric view to the analysis of the Corpus and to inform work on error recovery in XML parsing.

Friday 12:30 pm - 1:15 pm

Things stay the same, or, the real meaning of technical work

C. M. Sperberg-McQueen, Black Mesa Technologies

What does not change when things change.

There is nothing so practical as a good theory

Balisage