How to cite this paper

Sperberg-McQueen, C. M. “Sometimes a question of scale.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). https://doi.org/10.4242/BalisageVol3.Sperberg-McQueen02.

Balisage: The Markup Conference 2009
August 11 - 14, 2009

Balisage Paper: Sometimes a question of scale

C. M. Sperberg-McQueen

Black Mesa Technologies LLC

C. M. Sperberg-McQueen is a consultant specializing in the use of XML and related technologies to help preserve public information (especially cultural heritage data) and make it accessible. He has served as co-editor of the XML 1.0 specification, the Guidelines of the Text Encoding Initiative, and the XML Schema Definition Language (XSDL) 1.1 specification. He holds a doctorate in comparative literature.

Abstract

Reflections on size, scale, scaleability, and value.

My friend David Ezell, the chair of the World Wide Web Consortium’s XML Schema Working Group, wrote me recently: Best laid plans, and careful designs, often degenerate into unintended consequences in the face of SCALE.^[1]

I have been thinking a lot about that, lately.

When we scale up from a prototype or experimental system to production work, or when we scale up from a niche product to a mass product, things can go wrong in a number of ways:

Maybe things don’t run fast enough, so they can’t keep up with the volume of data that it takes to run something in production.
Maybe we failed to foresee that when usage grows, or the quantity of some particular resource grows (changes in scale), that some other resource is going to become scarce, and we’ve neglected to take steps to conserve it.
Maybe we’ve failed to remember that the successful delivery of the first part of a multi-part solution can change the environment within which and in terms of which the problem was identified, in such a way that the second and third parts of the proposed solution no longer fit because the problem has changed shape. Or more generally: we may have failed to see that the salient properties of today’s situations are not necessarily permanent. And although we are trying to plan for the long term, we end up deciding (for example), that namespaces cannot possibly be declared using processing instructions, because Netscape 3 would do the wrong thing with them. And it’s essential that namespaces be deployable in the browser. Neither of those constraints seems to have quite the same force today that it did when the namespace technology was being designed. Netscape 3 is no longer the standard browser in at least many organizations [laughter], and the people responsible for HTML and W3C’s interface to the browser seem to have changed their minds about how essential it is that namespaces be deployed in the browser in the first place.
Or we could simply fail to predict the true complexity of the environment within which our system will be asked to operate, or the true difficulty of the problems it’s going to be asked to help solve.

Thinking about it this way I have begun to think that in one way or another almost every paper at this conference has touched, directly or indirectly, on some problem relating to scale.

First, of course, there’s the problem of speed and streamability.

We’ve heard a lot this week about the problems of speeding up XML processing and constraining its memory requirements in order to allow users to process larger and larger streams of data.

On Monday we had a full day on the topic. starting with extremely clever work first on a chip intended to accelerate XML processing in the software running in the hardware containing the chip [Leventhal / Lemoine 2009], or an entire XML system on the chip, with parser and XSLT processor and schema validator and XML signature processor, and so on [Salz / Achilles / Maze 2009]. And the team from Simon Fraser University led by Rob Cameron told us about an extremely ingenious use of the parallel instructions on newer chips intended to support multi-media, to support instead parallel parsing of XML [Cameron / Herdy / Amiri 2009]. I take a certain pleasure now in recollecting that one of the problems Charles Babbage struggled with for a long time was to improve the behavior and performance of his mechanisms for carrying digits from one column to another, because of course one carry could lead to another carry, could lead to another carry. And making that happen fast was a serious problem for Babbage. It gives me particular pleasure to recollect that, because Rob Cameron has observed that his team have been able to simplify and refine their technique to the point where the entire technique revolves around this one problem of making a carry bit go from one context to another when needed: everything can be reduced to that one problem. It’s nice how much continuity there can be in technological history.

Layered above those low-level problems, there are similarly complex problems at higher levels.

On Monday, David Lee demonstrated that XML processing is no exception to what some old-timers have described to me as an eternal law of data processing: context switches are really expensive. and you really want to avoid them [Lee / Walsh 2009]. Why does the flagship Oracle database ship with its own programming language, instead of just using the techniques which are the core of the SQL standard to allow users to call SQL from C or Cobol or Fortran or whatever programming language they want to use? A grayhaired researcher from Oracle once told me, no, it was not because they had hot-shot programmers who just couldn’t rest until they had invented yet another programming language with yet another syntax and yet another function library and yet another set of quirks. It was because every time the database system threw control over the wall to the application and then the application threw control back over the wall to the database system, they lost several hundred milliseconds. A hundred milliseconds here, a few hundred there. It begins to mount up.

Mohammed Zergaoui illustrated that these are not simple problems [Zergaoui 2009]. Even defining what streamability means, and what the rules for streamability ought to be, is not simple. What are the tradeoffs among pure streamability, something that’s not strictly streamable because its memory requirements grow without bound, but grow only very slowly, so it’s sort of mostly streamable, and rules that allow a process to be kinda sorta streamable, on a good day, if the wind doesn’t change? It’s more complicated than you might think at first.

Maybe that’s why David Birnbaum found, when he ran into his XPath quicksand [Birnbaum 2009], that the system he was using wasn’t in much of a position to explain to him why his query was painfully slow. It took careful experimentation by systematically changing first one part of the program, then another, in the carefully calibrated process known to professional programmers as random guesswork, to find the bottleneck. And then it took human eyes to examine that part of the query and explain why it was so slow. It then took more random, er, systematic, experimentation to find alternative formulations of the problem that would behave better in that particular implementation.

David concluded by observing that the best performance of all on the problem was achieved when the developers changed the behavior of the implementation on that kind of query, under the hood. But I draw a slightly different moral from his experience escaping from that quicksand: those of us on the user side of the keyboard are not powerless; with application and determination and a little luck we can help ourselves to better performance, even if we can’t get much help from the developers. Work on scalability is not an exclusive club: anybody can play the game, if they are willing to do the work.

Norm Walsh gave hope to those who really would prefer that other people do that work, by observing that as far as we can tell, based on our minimal understanding of what the usage patterns will be for an extremely new language, most typical XProc pipelines will be streamable, so that an aggressively optimizing processor can hope to provide good performance even if the user does nothing [Walsh 2009].

Michael Kay gave us an enlightening discussion of the push/pull duality in pipelines, and illustrated a surprising point about the way things scale over time [Kay 2009]. When one part of a problem or system changes scale dramatically, the character of the problem changes, and we often need new algorithms. So if you look at textbooks published in the 1980s, their description of sorting doesn’t look like the same thing as the descriptions of sorting published in textbooks in the 1970s. Why? Because the kinds of things people needed to sort had changed — or rather, the kinds of things people needed to sort were much the same, but the underlying hardware was so different that you could have an entirely new set of algorithms that performed much better. Over time it can happen that the other factors in the system catch up, and the system regains something like its original overall proportions, the problem changes again, and we may find that techniques developed in the 1970s for program inversion are practially relevant again. Knuth’s discussion of tape sorting has never lost its purely mathematical interest and beauty. But nowadays, programmers may be reading it not for the sake of its mathematical beauty but to help them optimize current systems, which aren’t using tapes but are using storage which, like tapes, is slower than core. (Do I date myself if I call it core? I guess I do. Sorry; I never learned to call it anything else.)

Tomasz Müldner’s work on queryable compression for XML shows that, as with time, so also with space: if you apply enough ingenuity you may find ways to reduce space requirements without impairing usability (at least, if we take queryability as a reasonably good proxy for usability) [Müldner et al. 2009].

There’s another sense in which we speak of things scaling up. Some people are interested in any technology only to the extent that it has large numbers of users.

If you want a large number of users, the World Wide Web appears to be an inescapable model, or object of contemplation.

How did that happen?

Why did the World Wide Web become quasi-universal? In the early 1990s it was one of several hypertext systems. HTML and HTTP were not better developed that Guide or System G. In fact, everybody I know who actually knows all of those systems tells me that Guide and System G were much more impressive than the first HTML browsers.

Why was it that the World Wide Web spread, and became quasi-universal, and not Guide or System G?

There’s a story that people tell, that in the early ’90s, this guy at CERN, Tim Berners-Lee, proposed a paper to the European Conference on Hypertext (ECHT). In it, he described a system he wanted to build, which would provide essentially a world-wide hypertext. The paper, famously, was rejected.

How the rejection is interpreted, and thus the next part of the story, depends on who is telling it. I have heard people associated with the World Wide Web Consortium tell it as a story of reviewers who were essentially blind. They were fixed in their ways, they understood how existing systems did things, and the Web wasn’t going to do things that way. So the reviewers said, It’s impossible; hypertext systems will never scale to those dimensions. It can’t be done; the paper is technically naive. The story illustrates, in this telling, how new ideas are often misunderstood.

I talked once with someone who had a number of friends among the program committee of the European Conference on Hypertext. From that person, I heard a slightly different version of the story. The reviewers read the paper and said yes, you can build a system like that. We’ve all built toy hypertext systems, systems that don’t do anything to ensure link integrity and don’t care about findability. We’ve built them; but they are not interesting. They are not capable; we are interested in better systems than that. The paper would be interesting if it had some new method of dealing with link rot or discoverability, but there’s nothing here about either topic. So the paper offered nothing new, and was rejected on that ground.

Actually, the reviewers did miss something. Maybe Tim didn’t explain it in the clearest possible way, but the technical innovation in the World Wide Web’s treatment of broken links is to say Yeah, they happen. Deal with it. The mechanism that runs fastest and scales best is the mechanism you don’t have to build at all. By just saying that we won’t have any mechanism at all to deal with those problems, Tim was able to achieve what I believe must have been his primary technical design goal for the World Wide Web: he made it scaleable. It’s decentralized; at least initially Web browsers and Web servers could be relatively simple (I don’t think that modern Web browsers are dramatically simpler than Guide, but the initial ones certainly were); and it encourages low expectations on the part of the user. Those are all recipes for good scaleability.

But those properties don’t explain why the Web became universal; at most, they explain why, when it became universal, it was able to survive the transition.

I think the reason the Web became universal was that it provided an advantageous cost/benefit ratio to huge numbers of people. Lots of people found it worth their while to set up a Web server and start serving pages. It was easy enough to set up the server or to write HTML that even the relatively modest payback in kudos or visibility or convenience was enough to justify the expenditure of effort to make it happen. Of course, the initial Web site hosts were typically technical people interested in doing fun things, or interested in disseminating information, not necessarily people seeking to make a buck. but it illustrates what I have come to call the Paoli Principle, named for Jean Paoli, one of the co-editors of the XML 1.0 specification, who is probably more responsible than any other single human being for getting many different product groups within Microsoft to take an interest in XML, and therefore probably more responsible than any other single person for the ubiquity of XML as an infrastructure in today’s commercial IT environment.

Jean has frequently said If you ask people to expend five cents worth of effort, then you need to give them five cents worth of benefit, very quickly. Ideally, you’d like to give them ten cents worth of benefit, but at the very least they have to make back their effort. And if you don’t manage that, then it’s only some who is already suffering severe pain owing to other causes who will persevere with your technology.

This leads me to think about Alex Milowski’s talk and the state of XML on the Web [Milowski 2009]. Is XML on the Web a failure? or a success? Good question; sometimes I think, yes, and sometimes I think, no, to each of those questions in turn, and sometimes vice versa. If what you want is for everybody to use XML, then it’s not a success. Manifestly, not everyone in the world is using XML, and if that’s the goal then the only question left is whether we should keep trying, or accept defeat and move on to do something else with our lives.

If on the other hand your goal, as Murray Maloney was saying this morning in one of the discussions, was to do things the way you want to do them, and not the way some program wants to try to force you to do them, and if like some people (including me) what you want to do is to call things by the names you give them, and not by the names some program tries to make you call them by, then XML on the Web is a huge success. I can do what I wanted to do: I can publish, in XML, on the Web, and browsers will display it appropriately. I can publish XML on the Web and treat HTML purely as a page description language for certain kinds of interactive pages, exactly as I treat PDF as a page description language for paper pages.^[2] I don’t think about the page description language, except when I’m trying to design and describe the page. When I’m editing, I can deal with the ontology I choose.

Some of us only ever wanted to be able to publish on the web in SGML. We probably expected some others, particularly large organizations that had valuable data with valuable semantics, to want to do the same. My memory may be faulty, but I don’t remember anyone in the SGML on the Web WG claiming that the criterion for success should be that the large body of HTML users should throw down their HTML and take up XML. I’m sure everyone thought that would be a wonderful thing if it happened, but I don’t remember anyone saying that that should be the criterion of success. Others, not necessarily members of that working group, who took it as axiomatic that for any technology the only definition of success is universal uptake, naturally assumed that universal uptake was the criterion of XML’s success. They gave it a one- or two-year deadline (we are speaking after all about Web time!) and now ten years later they more or less naturally conclude that the experiment has been a failure, that XML has failed to live up to their expectations, and that we should forget about it.

How do we make XML scale up? Do we need to? Do we wish to?

Perhaps XML’s natural niche is simply a smaller one, in which it is used by many people (beneath the hood), but invisible to most of them (except for a small number of people, like perhaps many in this room, who habitually want to pop the hood and look at what’s happening underneath).

Wide usage can mean having lots of users. But it can also mean wide applicability, ubiquity, applications in many areas, and openness to the wider community, or at least smooth interoperation with the rest of the world.

I’m happy to see papers about work that pushes XML into new and interesting areas of application, and illustrates the problems and opportunities in XML’s coexistence with the non-XML part of the universe (yes, I’m told there is one).

The work reported on at the U.S. National Archives [Nguyen / Harvey 2009], the work on health care data in several papers [Beuchelt et al. 2009], [McCay et al. 2009], Zoe Borovsky’s illustration that yes, you can take XML data and use network analysis tools to visualize it [Borovsky et al. 2009]. Even if it does typically mean stripping out all the markup so the network analysis tools will read it, still, you can get there.

Mohamed Zergaoui’s illustration that modeling tools don’t necessarily have to put angle brackets into the users’ eyes [Cau / Zergaoui 2009].

Peter Flynn’s paper on XML editors helps show us what openness to the rest of the world may entail in practice [Flynn 2009].

Related to this topic are efforts to push XML into the infrastructure, like those reported by Slava Zholudev and Michael Kohlhase, with the goal of making an XML-aware versioned storage system [Zholudev / Kohlhase 2009].

And on the more general topic of our interactions with and our responsibilities to the wider community, of course, the memorable talk by Kurt Cagle this morning [Cagle 2009].

As Patrick Durusau and Kurt Cagle agreed this morning, it is not necessarily our mission to teach others what justice and transparency are. But we can, and I think some of us will choose to, try to develop descriptive markup and its technologies in such a way as to make it easier for technology to support appropriate kinds of transparency and appropriate kinds of responsibility.

Now, this is kind of an idealistic line of thought, about our ethical responsibility as professionals, and it’s not hard to make fun of it. During the first Gulf War a wit [Borenstein 1992] wrote

It should be noted that no ethically-trained software engineer would ever consent to write a DestroyBaghdad procedure. Basic professional ethics would instead require him to write a DestroyCity procedure, to which Baghdad could be given as a parameter.

We are not the sole repositories of truth. And what we do to benefit the causes which we believe to be right, may also — indeed, it almost certainly will also — prove useful to those doing what we believe to be wrong. The Web was built for fine and noble reasons, but it does provide communication and support for racists, and terrorists, and enemies of all the things we may believe to be fine and noble, as well as providing support for those we may agree with. The consequences of our actions as technologists are as likely to be mixed as are the consequences of any other people’s actions.

And yet I can’t quite bring myself to agree with Patrick Durusau’s proposal, this morning, that none of us does anything because we think it will make the world a better place, but only because someone pays us to do it. A former colleague with whom I disagreed about practically everything once hit the nail on the head when he said (to the visible dismay of management, who were disconcerted by this independence of thought) that he chose to work for our common employer because it helped him achieve a specific set of goals for the Web and the world, and that the moment he concluded that further employment with that employer was not helping him and the world achieve the goals he was working toward, he would be gone. Whether we work for employers or for clients, we have the choice: either to work in a particular way so as to have clients, or to have clients in order to allow us to work in the ways that we choose. Or, as a character in Lessing’s play Nathan der Weise put it: Kein Mensch muß müssen. No one can be compelled to be compelled.

The Web shows us that one way to scale up is to make things easy, to allow faults, to make the system robust in the face of error.

But there is another time-honored method of scaling things up, and that is to control error, to mechanize and automate operations, so as to control complexity and minimize the number of things we have to keep in our heads at any one point while we’re trying to do things.

Arithmetic, algebra, symbolic logic are all examples of this approach to scalability. They provide mechanical rules for manipulating formulas, so that we can think about other things than how to preserve the truth of the formula. Formalizations of all kind follow the same pattern, modeling languages among them.

So it was good to see the papers of Dennis Pagano and Anne Brueggeman-Klein [Pagano / Brüggemann-Klein 2009], and Bruce Bauman [Bauman 2009], about interfacing with and interacting with modeling languages. There is a long way to go from where we are now to really good and comfortable control over semantics and modeling, but the way to make that long journey is to keep moving.

I’m similarly happy to have heard the papers by Jacques Durand [Durand et al. 2009] and Josh Lubell [Lubell 2009] about better validation, and of course, I have a soft spot in my heart for the philosophers who tell us whether documents can be edited or not, or how to formalize their meaning [Renear / Wickett 2009].

There is yet another sense in which a given set of tools can be said to scale, or not to scale. They can scale to harder problems, or they can fail to address harder problems.

Jens Stegmann and Andreas Witt’s paper [Stegmann / Witt 2009], and the paper by Fabio Vitali’s group in Bologna [Di Iorio / Peroni / Vitali 2009], the work by Maik Stührenberg and Daniel Jettka [Stührenberg / Jettka 2009] on stand-off markup, and the work reported by Pierre Edouard Portier and Sylvie Calabretto [Portier / Calabretto 2009] — all of these people working on complex annotations and complex problems help ensure that the technologies we are building will scale in that direction.

And I for one will be forever grateful for the ingenious demonstration by Desmond Schmidt [Schmidt 2009]of an application of an indexing technique based on semi-infinite strings, first developed for the New Oxford English Dictionary,^[3] to the problem of manuscript collation. I certainly never expected to see that application; it’s a beautiful one, and if you didn’t hear the talk I encourage you to read the paper.

Of course, the attempt to scale up can be pushed too far. As Tommie Usdin pointed out in her opening remarks on Tuesday morning [Usdin 2009], standards can be and often are pushed on people who have no pressing need for them. This results, perhaps, from people determined to increase the number of adherents to the standard they themselves have declared allegiance to.

But sometimes the right scale for a standard or technology, the right number of users or application areas, is not large. Some problems are unusual; some requirements are special. Some communities can be small. Some standards meet the needs not of a large community but of a small one; there is no need to make the community forcibly larger, in the interests of standards compliance.

In the same way, technologies have suitable fields of application which may be large or small, broad or narrow. For any technology, there are things it is not suited some for, areas where it should not be applied.

Of course, when it comes to descriptive markup I don’t really believe this for an instant. If there are applications within IT to which descriptive markup really should not be applied, I am not at all sure I’ve seen them. But I’ve been told on good authority that this is a theoretical possibility, or at least some smart people think so. Perhaps I’ve just been carrying this hammer for too long, but you know, it’s hard to put down the hammer once you have discovered that pretty much everything turns out to be a nail if you look at it long and intently enough.

It is not standards in themselves that are harmful, but mindless adherence to standards that is harmful.

And similarly scale in itself may cause fewer problems than a mindless devotion to working at a particular scale of things, an unthinking reflexive conviction that to be worthwhile, a thing must be large scale, or small scale, or any particular scale.

Mindlessness is harmful.

Don’t be mindless. Don’t build systems that encourage mindlessness. Let us go out from here, when this conference ends, to work on better ways to use descriptive markup to support mindfulness, to enhnace it, to augment it. Nous réaliserons ainsi les vrais avantages du balisage. In that way, we will bring to reality the real benefits of markup, and of Balisage.

References

[Bauman 2009] Bauman, Bruce Todd. “Prying Apart Semantics and Implementation: Generating XML Schemata directly from ontologically sound conceptual models.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Bauman01.

[Beuchelt et al. 2009] Beuchelt, Gerald, Harry Sleeper, Andrew Gregorowicz and Robert Dingwell. “hData - A Simple XML Framework for Health Data Exchange.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Beuchelt01.

[Birnbaum 2009] Birnbaum, David J. “An XML user steps into, and escapes from, XPath quicksand.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Birnbaum01.

[Borenstein 1992] Borenstein, Nathaniel S. “Computational mail as network infrastructure for computer-supported cooperative work.” Presented at 1992 ACM Conference on Computer-supported Cooperative Work, Toronto, Canada, November 1 - 4, 1992. (Apparently a draft of what was later published in Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work. The proceedings version lacks the material quoted in the text.)

[Borovsky et al. 2009] Borovsky, Zoe, David J. Birnbaum, Lewis R. Lancaster and James A. Danowski. “The Graphic Visualization of XML Documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Borovsky01.

[Cagle 2009] Cagle, Kurt. “Open data and the XML community.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Cagle01.

[Cameron / Herdy / Amiri 2009] Cameron, Rob, Ken Herdy and Ehsan Amiri. “Parallel Bit Stream Technology as a Foundation for XML Parsing Performance.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Cameron01.

[Cau / Zergaoui 2009] Cau, Jean Michel, and Mohamed Zergaoui. “Visual Designers: Those XML tools with no angle bracket at all!” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Zergaoui01.

[Di Iorio / Peroni / Vitali 2009] Di Iorio, Angelo, Silvio Peroni and Fabio Vitali. “Towards markup support for full GODDAGs and beyond: the EARMARK approach.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Peroni01.

[Durand et al. 2009] Durand, Jacques, Stephen Green, Serm Kulvatunyou and Tom Rutt. “Test Assertions on steroids for XML artifacts.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Durand01.

[Flynn 2009] Flynn, Peter. “Why writers don’t use XML: The usability of editing software for structured documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Flynn01.

[Gonnet / Baeza-Yates / Snider 1992] Gonnet, Gaston H., Ricardo A. Baeza-Yates and Tim Snider. “New indices for text: Pat trees and Pat arrays.” Chap. 5 in Information Retrieval: Algorithms and Data Structures, edited by W. Frakes and R. Baeza-Yates, 66-82. New York: Prentice-Hall, 1992.

[Kay 2009] Kay, Michael. “You Pull, I’ll Push: on the Polarity of Pipelines.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Kay01.

[Lee / Walsh 2009] Lee, David, and Norman Walsh. “Efficient scripting.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Lee01.

[Leventhal / Lemoine 2009] Leventhal, Michael, and Eric Lemoine. “The XML Chip at 6 Years.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Leventhal01.

[Lubell 2009] Lubell, Joshua. “Documenting and Implementing Guidelines with Schematron.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Lubell01.

[McCay et al. 2009] McCay, Charlie, Michael Odling-Smee, Joseph Waller and Ann Wrightson. “Graciously handling a level of change in a complex specification: Configuration management for community-scale implementation of an HL7v3 messaging specification.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Wrightson01.

[Milowski 2009] Milowski, R. Alexander. “XML in the Browser: the Next Decade.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Milowski01.

[Müldner et al. 2009] Müldner, Tomasz, Christopher Fry, Jan Krzysztof Miziołek and Scott Durno. “XSAQCT: XML Queryable Compressor.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Muldner01.

[Nguyen / Harvey 2009] Nguyen, Quyen L., and Betty Harvey. “Agile Business Objects Management Application for Electronic Records Archive Transfer Process.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Harvey01.

[Pagano / Brüggemann-Klein 2009] Pagano, Dennis, and Anne Brüggemann-Klein. “Engineering Document Applications — From UML Models to XML Schemas.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Bruggemann-Klein01.

[Portier / Calabretto 2009] Portier, Pierre-Edouard, and Sylvie Calabretto. “Methodology for the construction of multi-structured documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Portier01.

[Renear / Wickett 2009] Renear, Allen H., and Karen M. Wickett. “Documents Cannot Be Edited.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Renear01.

[Salz / Achilles / Maze 2009] Salz, Richard, Heather Achilles and David Maze. “Hardware and software trade-offs in the IBM DataPower XML XG4 processor card.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Salz01.

[Schmidt 2009] Schmidt, Desmond. “Merging Multi-Version Texts: a Generic Solution to the Overlap Problem.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Schmidt01.

[Stegmann / Witt 2009] Stegmann, Jens, and Andreas Witt. “TEI Feature Structures as a Representation Format for Multiple Annotation and Generic XML Documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Stegmann01.

[Stührenberg / Jettka 2009] Stührenberg, Maik, and Daniel Jettka. “A toolkit for multi-dimensional markup: The development of SGF to XStandoff.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Stuhrenberg01.

[Usdin 2009] Usdin, B. Tommie. “Standards considered harmful.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Usdin01.

[Walsh 2009] Walsh, Norman. “Investigating the streamability of XProc pipelines.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Walsh01.

[Zergaoui 2009] Zergaoui, Mohamed. “Memory management in streaming: Buffering, lookahead, or none. Which to choose?” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Zergaoui02.

[Zholudev / Kohlhase 2009] Zholudev, Vyacheslav, and Michael Kohlhase. “TNTBase: Versioned Storage for XML.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Zholudev01.

^[1] This version of these remarks was prepared by consulting both the notes from which I spoke and a recording of what I said, but I have taken the opportunity to recast a few passages here and there and to correct (silently) some errors made during the presentation.

^[2] It should be noted that PDF readers do have some interactive features; I just don’t use them very often.

^[3] See [Gonnet / Baeza-Yates / Snider 1992] for the indexing technique, which formed the basis of the commercial search tool Pat.

Bauman, Bruce Todd. “Prying Apart Semantics and Implementation: Generating XML Schemata directly from ontologically sound conceptual models.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Bauman01.

Beuchelt, Gerald, Harry Sleeper, Andrew Gregorowicz and Robert Dingwell. “hData - A Simple XML Framework for Health Data Exchange.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Beuchelt01.

Birnbaum, David J. “An XML user steps into, and escapes from, XPath quicksand.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Birnbaum01.

Borenstein, Nathaniel S. “Computational mail as network infrastructure for computer-supported cooperative work.” Presented at 1992 ACM Conference on Computer-supported Cooperative Work, Toronto, Canada, November 1 - 4, 1992. (Apparently a draft of what was later published in Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work. The proceedings version lacks the material quoted in the text.)

Borovsky, Zoe, David J. Birnbaum, Lewis R. Lancaster and James A. Danowski. “The Graphic Visualization of XML Documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Borovsky01.

Cagle, Kurt. “Open data and the XML community.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Cagle01.

Cameron, Rob, Ken Herdy and Ehsan Amiri. “Parallel Bit Stream Technology as a Foundation for XML Parsing Performance.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Cameron01.

Cau, Jean Michel, and Mohamed Zergaoui. “Visual Designers: Those XML tools with no angle bracket at all!” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Zergaoui01.

Di Iorio, Angelo, Silvio Peroni and Fabio Vitali. “Towards markup support for full GODDAGs and beyond: the EARMARK approach.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Peroni01.

Durand, Jacques, Stephen Green, Serm Kulvatunyou and Tom Rutt. “Test Assertions on steroids for XML artifacts.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Durand01.

Flynn, Peter. “Why writers don’t use XML: The usability of editing software for structured documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Flynn01.

Gonnet, Gaston H., Ricardo A. Baeza-Yates and Tim Snider. “New indices for text: Pat trees and Pat arrays.” Chap. 5 in Information Retrieval: Algorithms and Data Structures, edited by W. Frakes and R. Baeza-Yates, 66-82. New York: Prentice-Hall, 1992.

Kay, Michael. “You Pull, I’ll Push: on the Polarity of Pipelines.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Kay01.

Lee, David, and Norman Walsh. “Efficient scripting.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Lee01.

Leventhal, Michael, and Eric Lemoine. “The XML Chip at 6 Years.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Leventhal01.

Lubell, Joshua. “Documenting and Implementing Guidelines with Schematron.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Lubell01.

McCay, Charlie, Michael Odling-Smee, Joseph Waller and Ann Wrightson. “Graciously handling a level of change in a complex specification: Configuration management for community-scale implementation of an HL7v3 messaging specification.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Wrightson01.

Milowski, R. Alexander. “XML in the Browser: the Next Decade.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Milowski01.

Müldner, Tomasz, Christopher Fry, Jan Krzysztof Miziołek and Scott Durno. “XSAQCT: XML Queryable Compressor.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Muldner01.

Nguyen, Quyen L., and Betty Harvey. “Agile Business Objects Management Application for Electronic Records Archive Transfer Process.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Harvey01.

Pagano, Dennis, and Anne Brüggemann-Klein. “Engineering Document Applications — From UML Models to XML Schemas.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Bruggemann-Klein01.

Portier, Pierre-Edouard, and Sylvie Calabretto. “Methodology for the construction of multi-structured documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Portier01.

Renear, Allen H., and Karen M. Wickett. “Documents Cannot Be Edited.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Renear01.

Salz, Richard, Heather Achilles and David Maze. “Hardware and software trade-offs in the IBM DataPower XML XG4 processor card.” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Salz01.

Schmidt, Desmond. “Merging Multi-Version Texts: a Generic Solution to the Overlap Problem.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Schmidt01.

Stegmann, Jens, and Andreas Witt. “TEI Feature Structures as a Representation Format for Multiple Annotation and Generic XML Documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Stegmann01.

Stührenberg, Maik, and Daniel Jettka. “A toolkit for multi-dimensional markup: The development of SGF to XStandoff.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Stuhrenberg01.

Usdin, B. Tommie. “Standards considered harmful.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Usdin01.

Walsh, Norman. “Investigating the streamability of XProc pipelines.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Walsh01.

Zergaoui, Mohamed. “Memory management in streaming: Buffering, lookahead, or none. Which to choose?” Presented at International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, Montréal, Canada, August 10, 2009. In Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth. Balisage Series on Markup Technologies, vol. 4 (2009). doi:https://doi.org/10.4242/BalisageVol4.Zergaoui02.

Zholudev, Vyacheslav, and Michael Kohlhase. “TNTBase: Versioned Storage for XML.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Zholudev01.