How to cite this paper

Lee, David A., and Norman Walsh. “Efficient scripting.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009).

International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth
August 10, 2009

Balisage Paper: Efficient scripting

David A. Lee

Principal senior software engineer

Epocrates, Inc.

David Lee has over 20 years experience in the software industry responsible for many major projects in small and large companies including Sun Microsystems, IBM, Centura Software (formerly Gupta.), Premenos, Epiphany (formerly RightPoint), WebGain. As principal senior software engineer at Epocrates, Inc., Mr Lee is responsible for managing data integration, storage, retrieval, and processing of clinical knowledge databases for the leading clinical information provider.

Key career contributions include Real-time AIX OS extensions for optimizing transmission of real-time streaming video (IBM), secure encrypted EDI over internet email (Premenos), porting the Centura Team Desktop system to Solaris (Gupta,Centura), optimizations of large Enterprise CRM systems (Epiphany), author of xmlsh an open source scripting language for XML.

Norman Walsh

Principal Technologist in the Information & Media group

Mark Logic Corporation

Norman Walsh is a Principal Technologist in the Information & Media group at Mark Logic Corporation where he assists in the design and deployment of advanced content applications. Norm is also an active participant in a number of standards efforts worldwide: he is chair of the XML Processing Model Working Group at the W3C where he is also co-chair of the XML Core Working Group. At OASIS, he is chair of the DocBook Technical Committee.

Before joining Mark Logic, he participated in XML-related projects and standards efforts at Sun Microsystems. With more than a decade of industry experience, Mr. Walsh is well known for his work on DocBook and a wide range of open source projects. He is the principle author of DocBook: The Definitive Guide.

Copyright © 2009 David A. Lee and Norman Walsh. Used by permission.


The efficiency and performance of individual XML operations such as parsing, processing (XSLT, XQuery) and serialization, and the merits of different in-memory document representations, have been widely discussed. However, real world uses cases often involve many operations orchestrated using a scripting environment. The performance of the scripting environment can often overshadow any performance gains in individual operations. In an exploration of real world scripting, we compare performance of several scripting languages and techniques on a set of typical XML operations such as generation of a table of contents and conditionally accessing non-XML files identified in XML documents. Based on performance results, we suggest best practices for scripting XML processes. Scripting languages compared include DOS Shell (CMD.EXE), Linux Shell (bash), XMLSH, and XProc (calabash). These are run (where possible) on multiple operating systems: Windows XP, Linux, and Mac/OS.