XML Processing Symposium 2009 logo
Symposium Description
Balisage Conference
Speaker/Author Bios
Balisage Series on Markup Technologies

International Symposium on Processing XML Efficiently

8:00 am - 9:00 am — Continental Breakfast
9:00 am - 9:10 am

Introduction and welcome

Michael Kay, Saxonica
9:10 am - 9:50 am

The XML chip at 6 years

Michael Leventhal & Eric Lemoine, LSI Corporation

The XML chip is purpose-built silicon for high performance XML processing. It has the potential to reduce server costs, to reduce power consumption, and to reduce latency. The paper compares the performance of the XML chip (hybrid specialized hardware and software) with optimized XML software for a number of operations. The benefits of the XML chip increase as CPUs get faster, especially with the introduction of multi-core technology. There are challenges, notably the cost of copying data to and from the co-processor, but the challenges can be overcome. Results show that the use of an XML co-processor can reduce CPU cycles per byte of XML processed by amounts ranging from a factor of 3 to a factor of 50 depending on the workload, while power consumption can be reduced by a factor of 7.

9:50 am - 10:30 am

Hardware and software trade-offs in the IBM DataPower XML XG4 processor card

Richard Salz, Heather Achilles, David Maze, IBM Academy

This presentation will discuss some of the hardware and software trade-offs in the IBM DataPower XML processor, known as the XG4. The XG4 is a PCI card that parses XML, and supports XPath, schema validation, and has a generic post-processing engine. It can return events like SAX, build a tree like a DOM, or switch between modes within a document. It is capable of supporting thousands of sessions simultaneously, and because of its pipeline nature can process more than one character per clock tick. The talk will explain some of the features in the card and its device driver, such as memory usage and zero-copy, synchronization of QName identifiers between card and software, and the programmability.

10:30 am - 11:00 am — Break
11:00 am - 11:40 am

Parallel bit stream technology as a foundation for XML parsing performance

Rob Cameron, Ken Herdy, Ehsan Amiri, Simon Fraser University

By first transforming the octets (bytes) of XML texts into eight parallel bit streams, the SIMD features of commodity processors can be exploited for parallel processing of blocks of 128 input bytes at a time. Established transcoding and parsing techniques are reviewed followed by new techniques including parsing with bitstream addition. Further opportunities are discussed in light of expected advances in CPU architecture and compiler technology. Implications for various APIs and information models are presented as well opportunities for collaborative open-source development.

11:40 am - 12:20 pm

Memory management in streaming: buffering, lookahead, or none. Which to choose?

Mohamed Zergaoui, Innovimax

Although the ideal approach to streaming is to process markup events as soon as they are encountered, with no memory needing to be used for storing parts of the input document, this is not always feasible, and in practice it is useful to consider "near-streaming" approaches that involve a limited amount of buffering or lookahead. In the extreme, however, such approaches degenerate until they are indistinguishable from non-streaming processes. This paper attempts a classification of streaming and near-streaming processing methods using different approaches to memory management, and discusses the advantages and disadvantages of each.

12:20 pm - 1:00 pm

Efficient scripting

David Lee, Epocrates & Norman Walsh, Mark Logic

The efficiency and performance of individual XML operations such as parsing, processing (XSLT, XQuery) and serialization, and the merits of different in-memory document representations, have been widely discussed. However, real world uses cases often involve many operations orchestrated using a scripting environment. The performance of the scripting environment can often overshadow any performance gains in individual operations. In an exploration of real world scripting, we compare performance of several scripting languages and techniques on a set of typical XML operations such as generation of a table of contents and conditionally accessing non-XML files identified in XML documents. Based on performance results, we suggest best practices for scripting XML processes. Scripting languages compared include DOS Shell (CMD.EXE), Linux Shell (bash), XMLSH, and XProc (calabash). These are run (where possible) on multiple operating systems: Windows XP, Linux, and Mac/OS.

1:00 pm - 2:00 pm — Lunch
2:00 pm - 3:00 pm

Performance of XML-based applications

James Robinson, Highwire Press

This talk examines the performance of XML-based applications from a user viewpoint. Highwire Press is the online publishing operation of Stanford University Library, and currently hosts online journals for over 140 separate publishers. HighWire has developed and released a new XML-based publishing platform, and is in the process of migrating all of its publishers to this new platform. Using the evolution of this platform as a case study, the presentation will discuss the design decisions considered and implemented in order to deliver required throughput and response time, and describes some of the particular problems that were encountered during the development of this system.

3:00 pm - 3:30 pm

Review and summary

Michael Kay, Saxonica

Based on what has been said and not said during the previous talks, the conference chair will attempt a summing-up of the state of the art in efficient XML processing. Is there a problem, and if so what is it? Are there any technical developments on the horizon that are likely to make a big impact over the next few years? Where should researchers be focusing their efforts to deliver maximum benefit to users? What factors should users be taking into account in selecting technologies and in designing their application architecture?

3:30 pm - 4:00 pm — Break
4:00 pm - 5:30 pm

Open Forum

Symposium Participants

The final session of the day is an open forum. There will be an opportunity for short 5-minute presentations of technologies that have not been covered elsewhere in the day, and a chance to ask questions and air your views. The papers accepted for the symposium reveal some interesting differences of views, and past experience suggests that there will be some lively debate. Be prepared for discussion to continue into the evening.

There is nothing as practical as a good theory.