How to cite this paper

Sperberg-McQueen, C. M. “Fault tolerance, error tolerance, diversity tolerance.” Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). https://doi.org/10.4242/BalisageVol25.Sperberg-McQueen02.

Balisage: The Markup Conference 2020
July 27 - 31, 2020

Balisage Paper: Fault tolerance, error tolerance, diversity tolerance

C. M. Sperberg-McQueen

Founder and principal

Black Mesa Technologies LLC

C. M. Sperberg-McQueen is the founder and principal of Black Mesa Technologies, a consultancy specializing in helping memory institutions improve the long term preservation of and access to the information for which they are responsible.

He served as editor in chief of the TEI Guidelines from 1988 to 2000, and has also served as co-editor of the World Wide Web Consortium’s XML 1.0 and XML Schema 1.1 specifications.

Abstract

How to react when things are not as we expect them to be.

Zero tolerance

Policing the boundaries
Separation of concerns
Essentialism
Focus

Fault tolerance

Diversity tolerance

Zero tolerance

One of the milestones in the history of programming languages and thus in the history of computing was the introduction of the ALGOL 60. You’ll see this if you look at any history of computing; you’ll see this if you look at any of the short papers that you can find in collections of the works of C A. Hoare, Edsger Dijkstra, or many others. And it’s worth asking why ALGOL 60 had such an impact. It was not in fact ever widely used in the U.S. for actual programming, although it was used for the publication of algorithms. It was, I think, more widely adopted in Europe, but nevertheless it was far more a milestone in the history of programming languages than it was a programming language widely used for writing actual software, at least in North America. And it’s worth asking why.

I think perhaps the main reason is the introduction of BNF (Backus-Naur format) as a grammatical notation. Why? Because suddenly there was no longer a need for casuistry applied to parsers or to programming-language syntax. ALGOL 60 had a formal syntax, and, what is more, it was a decidable syntax, so there was no agonizing over syntactic corner cases with dubious syntax. The formalism of BNF made it possible to know whether a given construct should be accepted by the compiler or not. That alone will have saved significant amounts of time in compiler development teams. If the grammar accepts it, the compiler should accept it. If the grammar rejects it, the compiler doesn’t have to compile it.

Now, of course, it was eventually also seen to have produced a reliable foundation for language analysis and the systematic creation of parsers, and even later for the automatic generation of parsers. But just the fact that you no longer had to argue over whether something was syntactically legal or not will, I think, have made a huge difference to a lot of people.

I think that I assign the same kind of importance of ISO 8879, the specification of SGML, for a completely analogous reason: because SGML introduced the notion of document grammars, which means that document-processing software can focus more on the interesting problems of actually processing the documents and less on trying to make sense of off-the-wall input which you feel obligated to accept because there is no actual rule against it in the manual.

DTDs essentially freed people writing text processing applications from defensive programming, or at least they reduced the load of defensive programming; they reduced the need for paranoid checking of the input and allowed people to spend the corresponding amount of effort on more interesting things.

In the essay in which he introduces the concept of literate programming, which was mentioned in one of the papers this year, Donald Knuth makes an interesting observation. He says:

Another surprising thing that I learned while using WEB was that traditional programming languages had been causing me to write inferior programs, although I hadn’t realized what I was doing. My original idea was that WEB would be merely a tool for documentation, but I actually found that my WEB programs were better than the programs I had been writing in other languages. How could this be?

Well, imagine that you are writing a small subroutine that updates part of a data structure, and suppose that the updating takes only one or two lines of code. In practical programs, there’s often something that can go wrong, if the user’s input is incorrect, so the subroutine has to check that the input is correct before doing the update. Thus, the subroutine has the general form:
  procedure update;
  begin if <input data is invalid> then
    <Issue an error message and try to recover>; 
  <Update the data structure>; 
  end.
A subtle phenomenon occurs in traditional programming languages: While writing the program for <Issue an error message and try to recover>, a programmer subconsciously tries to get by with the fewest possible lines of code, in cases where the program for <Update the data structure> is quite short. For if an extensive error recovery were actually programmed, the subroutine would appear to have error-message printing as its main purpose. The programmer knows that the error is really an exceptional case that arises only rarely; therefore a lengthy error recovery doesn’t look right, and most programmers will minimize it (without realizing that they are doing so) in order to make the subroutine’s appearance match its intended behavior.

— [Knuth 1984]

Notice that what Knuth describes is not exclusively, but is largely, focused on finding errors in the input. Explicit grammars for programming languages and documents can take a lot of that load off of our shoulders. We don’t need to spend quite so much time and effort on defensive programming. We don’t need to be quite so paranoid about our input.

It’s not just Donald Knuth. A friend of mine once worked on a program to typeset documents. And when they had finished the programming and they had a complete run of the full program, when printed out on fan-fold paper, the program was about an inch and a half of paper. And then, of course, they started deploying it for alpha testing and beta testing and for actual users, and they got error reports. And after a while, if you printed the program out, it was four inches of fan-fold paper, and eventually six inches of fan-fold paper. And when my friend left the company, it was nine inches or so of fan-fold paper. So you had essentially a four-fold increase in the size of the program during which no significant new functionality had been added, only error checking and error recovery. The program was not complete at nine inches — that’s just when my friend left the company.

Policing the boundaries

Now, if we hope to reap the benefits of cleaner input, it means that schemas and schema languages matter, and we need to understand them as well as we possibly can. Anne Brüggemann-Klein’s paper focuses on the expressive power of various patterns in the definition of XSD schemas — or schemas in any language with both global and local elements and types [Brüggemann-Klein 2020]. (The patterns are meaningless for DTDs because in DTDs all elements are global, and its types are present only as conventional uses of parameter entities. The patterns have no direct meaning in RELAX-NG because in RELAX-NG there are references to elements by name, only to named patterns, and again, types are present only as conventions in the use of named patterns. Although Anne did mention that there is an analysis that may make these patterns relevant to RELAX-NG in a way that I have not yet fully understood.)

But when we do have both global and local elements and types, the four patterns discussed by Anne Brüggemann-Klein do occur, and her paper is a master class in how to reason out the implications of design patterns like these by a judicious application of abstraction and careful formalization. I never hear Anne talk without learning something from her, and I often find myself wishing I could go back to graduate school so I could learn even more.

Of course, even with good schemas and good up-front checking of our data, we will still occasionally have erroneous input, and — it’s embarrassing to admit this perhaps, but we should face facts — every now and then, every once in a while, on an odd Tuesday after a new moon, it might happen that some code we wrote is … wrong.

So, we need to test. The XSpec vocabulary and testing infrastructure is quite popular for people working in the XML stack, but applying XSpec can be challenging when we’re deploying XSLT in some contexts. Some ingenuity may be needed. Fortunately, Amanda Galtman has lots of ingenuity, and in her paper, she describes two quite different ways to approach the problem of using XSpec to test XSLT to be run in the browser [Galtman 2020]. I was impressed with her sober evaluation of the alternatives and particularly impressed with her observation that even though the approach eventually chosen did not allow testing of everything, it did allow some things to be tested, and testing some things is better than not testing anything. Hear, hear!

Steven Pemberton also described an ingenious approach to testing which, like XSpec in part, relies on and exploits the infrastructure it’s testing and manages to make good use of that fact [Pemberton 2020].

In both cases, I noticed, complications arise from interactions with other parts of the software infrastructure: the SaxonJS library, in Amanda Galtman’s case; the HTTP server accepting submissions, in Steven Pemberton’s case. As Wendell Piez pointed out the other day, no real existing system is complete and self-contained; our systems are never alone in the world [Piez 2020]. I was struck by the fact that both Amanda Galtman and Steven Pemberton in the end found it helpful to mock up an alternative implementation of significant other parts of the system: in Galtman’s case as stubs, and in Steven Pemberton’s case as a fully functional HTTP server (although not one hardened or intended for production use).

Separation of concerns

The more confident we can be about our input data and about the other parts of our software systems, the less time we need to spend on defensive programming and paranoia. In a way, I think explicit syntactic rules like those of ALGOL 60 or document grammars are an instance of the general principle of separation of concerns. By separating well-formedness checking from validation, and separating validation from processing, we can make each of those tasks a little simpler and a little easier to perform cleanly, and the code we write will be a little easier to maintain.

Eliot Kimber illustrated the virtues of separation of concerns in his work on the Wordinator. He broke his work into two tasks: first that of identifying the salient layout features of the input and translating the data into his intermediate simple word-processing XML format, and second the task of writing the result out in DOCX format [Kimber 2020]. By separating the tasks he made each of them simpler. (And separating the tasks also made it possible to use off-the-shelf software for the second task.)

This morning, Ari Nordström showed us a way to build a complex workflow when we have systematically separated concerns by building small, simple XSLT transforms and using them as components in building complex pipelines [Nordström 2020]. Having a clear pattern to use, like the one Ari described, allows us to deal in a routine and systematic way with problems that would otherwise overwhelm us with their complexity. His paper on pipelines is extremely instructive to anyone who deals with complex transforms.

Essentialism

As the mathematician George Pólya says in his book How To Solve It — and as Michael Kay reiterated just about an hour ago — sometimes a more general problem is easier to solve than a specific instance of the general problem, so generalizing any difficult problem may be what it takes to allow us to move forward [Pólya 1945, Kay 2020]. Generalization is a way of identifying the essentials of a problem; it has an important role to play in document analysis in which we seek to identify the essential features of the documents or the non-documentary information we want to represent. We use the answer to the question What are the essential features of these documents? to define the vocabulary.

In his paper, Steve DeRose applied the same principles to diagrams by asking What is a diagram really? [DeRose 2020]. And at a level of analysis which seems to be quite useful to me, he concludes that diagrams are made of objects which we can usefully represent by XML elements or by analogous constructs in other notations, sometimes ordered and sometimes not, linked by relations which can be represented effectively by the same techniques we have developed over the years to represent hyper-links or other non-ordering relations.

If we understand the essential nature of a thing, it is (at least sometimes) easier to deal with all the complications and variations that arise in practice without getting lost in the details and the exceptions. In the ideal case, at least, we can deal with the trees without losing sight of the forest.

Elisa Beshero-Bondar vividly describes the challenges she faces — one faces, anyone faces — helping students identify the essential properties of the documents they’re working with well-enough to model these properties and mark up those documents [Beshero-Bondar 2020]. She sees this, as I understand it, as one of the core undertakings of descriptive markup — and I am inclined to agree with her — and as an activity well-suited to encourage reflection in the students. So she has persuaded me — and I hope she can persuade a lot of administrators around the world — that markup is a useful topic for an intensive writing course.

Michel Biezunski’s demo of his Networker system illustrates just how general things can be if we reduce them to essentials. User-specified topics (or things), related in various user-specified ways, together with suitable generic operation on them, suffice in the case of his application for a very general purpose representation of knowledge [Biezunski 2020].

Claus Huitfeldt’s talk yesterday illustrated in its own way how our understanding of the essential nature of documents can have very concrete effects on attempts to measure similarity between documents [Huitfeldt and Sperberg-McQueen 2020]. (Although it must be admitted that for some practical applications, all of the measurements succeeded equally well or equally badly, independent of the underlying model of the documents. So maybe we need to take this with at least one grain of salt.)

Focus

Identifying the essentials of a problem or a situation is a way of focusing our attention. Of course, with complex artifacts, we focus now on this aspect, now on that aspect of the thing, and the essence we identify may vary.

But focus can help us do better work, so tools and technologies that help us focus are always useful. Joel Kalvesmaki’s library for Unicode regular expressions was motivated, he said, in part by his desire to be able to specify the patterns he needed without losing his focus on the transformation he was writing by being forced to deal with irrelevant detail and to look things up in the Unicode tables [Kalvesmaki 2020]. (That, of course, is a goal to which anyone who has succumbed to the allure of descriptive markup will feel sympathetic. One of the great advantages of descriptive markup is that is allows us to focus on the essentials of a document type without being forced to deal with inessential details. And one of the advantages of being able to specify one of our own vocabularies is, of course, that we get to be the ones who decide what counts as essential.)

In talk after talk, we have seen beautiful use made of domain-specific languages, often as intermediate stages in complex pipelines or workflows.

The XSLT 3.0 features described by Norm Tovey-Walsh can also help us focus [Walsh 2020]. Many of them are really very small; as Michael Kay said in the chat window, they were dead simple to specify and to add to an XSLT processor (although, of course, Michael’s local standards for what counts as dead simple may be broader than those of the rest of us). But they help us solve recurrent problems with minimal fuss and thus minimal disruption to our thought patterns. They are, at least when all goes well, a modest but useful form of intelligence augmentation.

Another reason that ALGOL 60 and BNF made such a difference is that when a BNF grammar is well-done, it provides a vocabulary for discussing syntactic constructs. And in a language with compositional semantics, those syntactic constructs are the units in which we discuss semantics. Someone said the other day, if I understood them correctly, that syntax without semantics is kind of pointless; I would observe that semantics without syntax is almost impossible to describe, or to discuss.

C. Edward Porter showed a compelling example of just how much can depend on having access to the appropriate labeled structures [Porter 2020]. When all of the syntax details of a language are hidden in CDATA sections and comments in the documentation, doing anything with the language documentation except the one application of getting it out on paper (or on screen) will require manual labor. But if you can provide access to the structure of the information — whether that is by more detailed tagging or by software-based inference or by a combination of the two — then we can enable much more effective automatic processing. There is no guarantee that that is going to be simple; some of his system diagrams were rather daunting to me, and I think possibly to others. But we can often observe that if we get better control over one part of the system, we can take the mental effort that used to be absorbed by that part of the system and put it into handling more complex problems. As we improve our own tools, we can achieve more; this has been a constant theme of computing visionaries since the days of Douglas Englebart and, for that matter, Alan Turing.

From one point of view, what the XML stack provides is essentially a set of tools for identifying structures and labeling them. It’s very thin conceptually, but it is amazingly useful. Having a complete set of tools like this makes it easier to build new vocabularies and new systems, not by de-skilling the operation, but by making it more systematic and routine, in very much the same way that designing a programming language was still as difficult and challenging as it ever was after ALGOL 60 and BNF, but a lot easier nevertheless.

The XML stack makes it feasible to develop much richer, much more sophisticated systems for highly specialized documents that might otherwise not have the frequency necessary to support significant development effort, like the system security assessments that were described by Josh Lubell and whose larger context was illuminated by the analysis of context in Wendell Piez’s paper [Lubell 2020, Piez 2020].

Peter Flynn’s recipes may seem at first glance merely a nice toy, at least for those of us who are not publishers of cookbooks [Flynn 2020]. But the kinds of plausibility and consistency checks that he noticed as being necessary are also applicable to other real-world systems, including systems of great import like the security assessment documents described by Josh and Wendell.

Fault tolerance

Now, prevention of error by systematic means, by formal grammars, and by testing is very helpful, but it’s not enough. It’s helpful in the same way that public health measures for the prevention of disease are helpful. It is not hard to persuade ourselves that more lives have probably been saved by the fact that the cities of the world have learned to supply clean water to their inhabitants than have been saved by all the medicinal interventions in history. But even with good public health measures, people still get sick, and we still need to be able to treat them.

The same applies in technical contexts. These two approaches are sometimes in conflict: some people want to prevent all errors; other people want to recover from errors. And sometimes they require different approaches. Edsger Dijkstra, for example, says in one of his papers that if you are controlling a loop by counting an index variable down from ten to zero, you may be tempted, as a lot of programmers will be, to make the termination condition be I less than or equal to zero. He says, Why are you testing for it to be less than or equal to zero? You’re counting down from ten by one; test for equality to zero. The answer, of course, is Well, if something goes wrong and I end up below zero, I would like the loop to terminate. And Dijkstra says, If the index variable gets below zero, you have a problem. You have a bug; you have an error, and what you have done by making the termination condition overbroad is ensure that you won’t find that error, unless something else also goes wrong.^[1]

Other people, of course, see it differently. There are those who say, roughly: Well, everything in Unix has slightly over-general termination conditions. Everything is defined to do the right thing if the input is correct and to do something more or less plausible if the input is incorrect, but not to die because you can’t always count on the other guy having read the documentation or having followed it. And by over-compensating — by defensive programming — we are able to make systems that are stable despite the fact that they’re built by fallible human beings.^[2]

Sometimes this view is formulated in the name of Jon Postel, as Postel’s Law: be conservative in what you send and liberal in what you accept [RFC 1122]. This is sometimes interpreted as meaning send only valid data and don’t check your input; accept anything. When interpreted that way, Postel’s Law will almost certainly induce a very fast, very short race to the bottom. If you have any doubts about this, the early history of the World Wide Web offers absolutely harrowing examples and, for that matter, not just the early history of the World Wide Web. The definition of HTML used to be quite simple, until a sufficient number of people conceived the desire to write a spec that described the actual set of languages that common browsers actually accept. And the result is HTML5; it is an extremely impressive piece of work, but it is also an awful lot more complicated than the specification of HTML ever was before.

It is not always remembered that the draconian error handling of XML — whose specification says that processors are not allowed to recover from well-formedness errors — came from the browser makers in the Working Group. It was resisted by others, including me, on the grounds that the language we were designing, XML, has a lot of redundancy, and the whole point of having that redundancy is to make it easier to detect and recover from errors. And under those circumstances, forbidding error recovery seems very odd. The browser makers — the representatives of Netscape and Microsoft — argued that allowing error recovery for well-formedness errors would lead to the same software bloat in XML parsers as in HTML processors, where each vendor’s error recovery strategy and/or parsing errors had to be reverse-engineered by their competition. Tim Bray commented:

I do not want us lurching over the slippery slope where every little formerly-lightweight piece of useful XML client code is loaded with bloated guess-what-the-author-really-meant heuristics. Empirical evidence would suggest that the danger of this is real.

— [Bray 1997]

However, well-formedness and validity are not going to prevent all errors. They will prevent a lot at relatively low cost, but additional error-protection techniques will be needed for more subtle errors. Eventually these will have increasing costs and decreasing returns, and we won’t be able to afford them. So, in addition to reducing errors up-front as far as possible, we need to be able to recover from them. I would caution people against recovering silently because if we have software that recovers silently from any given error, the consequence will be that the input will gradually be full of that particular error because no one ever notices it because the software recovers silently.

Now, here we reach a slightly awkward point. When we focus on cleanliness of our input data and uniformity in our input data, we sometimes have a tendency to conflate the two. We conflate clean data with uniform data, and conversely, we sometimes have a tendency to conflate error with unexpected variation in the data. And the concept of error expands to become the concept of things we’re not prepared for. But there are a lot of things we need to be prepared for that are not errors.

After C. Edward Porter described the context of his project and showed some of his complicated system diagrams, including systems that were created at widely different times and using widely different technological infrastructures [Porter 2020], Syd Bauman asked Is it harmful to have all these old technologies hanging around and rubbing elbows with all these new technologies? to which my answer is, Harmful or not, it’s a situation we are often going to be in. Any systems that last long enough eventually risk going organic, in the words of one programmer programmer who worked on that document typesetting system.

But even if we manage to keep them from going organic, that is, if we manage to keep some control over their complexity, the complexity of the system is likely to increase. Sometimes we can reduce complexity by re-factoring the system or re-writing from scratch, but not always. And even when we can, it’s not always wise. Joel Spolsky in his often quotable technical blog, Joel on Software, writes in one essay,

There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. …

It’s just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I’ll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn’t have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.

Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it’s like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.

When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

— [Spolsky 2000]

So, replacing a program or even re-factoring it is not always going to be a panacea. (Now, I note in passing, that the kinds of bug fixes Spolsky was describing so eloquently are many of them examples of the defensive programming — or paranoid programming — that are often needed in systems that do not or cannot validate their input. So, let us never underestimate the advantages, for our own sanity, of schemas and document grammars when they are judiciously written and consistently applied.)

Also, remember in any environment with a lot of legacy code that is still running (even though it looks, at first glance, like a big hairy mess and we wonder how it managed to get past the compiler in the first place), in those environments there is usually another name for those legacy systems: that would be production systems. And in a commercial environment, there is usually a third term, revenue systems.

Now, for a variety of reasons, some of them irrational and some of them perfectly sound, many organizations are unwilling to throw away their entire system every few years and rebuild from scratch in order to have new technology. At least, that is very seldom true of organizations that have lasted more than a few years. (There might a correlation.) So, whether we like it or not, the systems we build are going to need to fit into a larger context. As Wendell Piez said yesterday, your system is somebody else’s sub-system [Piez 2020]. We will almost never control the other components in that sub-system, which means that in order for our systems to be useful or even to survive, they need to play well with others.

We know that descriptive markup is a good tool and might provide a better foundation for those other parts of the larger system than they currently have, but not everyone knows that. And if we insist that in order for our systems to shine, everything else in the shop is going to have to change, then what we’re going to do is convince people with decision authority that descriptive markup is a delicate hot-house flower able to survive only in the most favorable conditions and thus not fit for purposes in the real world which is often imperfect.

Living in a world where not everything is under our control and not everything is in XML (those really are not necessarily the same thing), we have to co-exist with others. Now, in many ways, co-existence with other formats and systems is written deep in the DNA of descriptive markup. Unlike other systems developed at roughly the same time like, for example, the Office Document Architecture, SGML did not attempt to prescribe the set of image formats one could use for documents; instead of prescribing support for a particular format, it provided syntax for declaring the format of the image and left support to the application, which is just as well since very few of the image formats available for standardization in the 1980s have aged particularly well. So, if SGML had chosen an image format, we would now still be suffering from it.

If systems based on descriptive markup must co-exist with older systems, then we need work like the work described by Patrick Andries and Lauren Wood [Andries and Wood 2020]. The logic of locator codes is foreign (or perhaps very foreign) to the way people think if they have spent years working with the standard troff macro sets or the standard DCF Script or Waterloo Script macro sets, let alone if they grew up thinking about document rendering in terms of GML, or SGML, or Scribe, or LaTeX. But the logic does exist, and Patrick Andries and Lauren Wood gave us an enlightened description of how by patient, dogged work, it is possible to work it out and produce an acceptable translation from the old format into a newer format which some people who had dealt with the older format would never have thought was possible.

Eliot Kimber’s Wordinator goes in the other direction [Kimber 2020]. That can also make sense because it’s true that the document processing software most people have access to is either Word or OpenOffice or some other word processor. (There are other word processors; it is likely that the word processor, WordPerfect, will die only after its last existing user has been taken to the cemetery.) So, being able to move documents into Word or other word processors is useful for all the obvious reasons.

And it’s also true that Word may be the best document rendering engine to which people have easy access. Those with a long memory may recall that before we first introduced the XML spec in public, Jon Bosak used a DSSSL stylesheet to translate the SGML source of the XML spec into RTF so that he could import it into Word and use Word to paginate the document and also to tweak the whitespace so as to ensure that we came out at just under twenty pages of normative text.

Now, although it is not explicit in ISO 8879 or in the XML spec, I have always understood there to be a similar kind of openness to variation on the other side of the parser, so to speak. SGML and XML are defined normatively as serialization formats. And what data structures you build from the input is not constrained; or in the case of XDM and DOM, it is constrained by a different and independent specification. One of the most persistent complaints I heard about XML in the late ’90s and early ’00s, from both database theorists and RDF enthusiasts, was that XML has no model. My interlocutors were unimpressed by the response that in the SGML and XML view what you do in the privacy of your own CPU is nobody’s business but yours. Relational database theorists had specified a model; RDF had specified a model; they wanted us to specify a model, too.

Now, in the meantime, I hear that complaint less. I think that in some parts of the universe, it’s due in part to the development of tools to read schemas and generate Java code (or code in other languages) to de-serialize XML into more or less conventional objects in that language and to re-serialize it after processing. In other parts of the world, the widespread use of XSLT and XQuery has led to the de facto identification of XML with XDM, the XPath Data Model. Now, it’s very handy to have such a powerful set of data structures conveniently at hand — out of the box, so to speak — so, it’s not surprising that many of us use the XDM model for our processing wherever possible.

But it is still possible in principle, I think, to read XML data and build whatever data structures you want in whatever programming language you use. In practice, however, most of the work that has been done in recent years on thinking about models of documents other than XDM or DOM has been done outside the XML stack. And of all the work in that vein that could be mentioned, some of the most serious has been that done in Amsterdam by the team led by Ronald Haentjens Dekker. The TEI Guidelines have always (at least, since TEI P3) made hand-waving gestures in the direction of non-hierarchical structures, but the TEI Guidelines have never described exactly how to understand the abstract model instantiated by the TEI facilities for discontinuous elements or virtual elements and the like. What Elli Bleeker showed us glimpses of the other day is a concrete, specific, fully worked-out model for non-linear document structures. Bram Buitendijk has built concrete software to demonstrate that the model is implementable and processable, not just a castle in the air [Bleeker et al. 2020]. The work they are doing is of immeasurable importance. And I thank them for sharing it with us.

Part of the task of toleration at the technical level is tolerating the existence of features aimed at users who are not us and who have problems we do not have. Sometimes, as Norm Tovey-Walsh showed us, those features may turn out to be useful after all, even in our own work [Walsh 2020].

Sometimes, as illustrated by David Birnbaum’s talk about his XSLT library for statistics and plotting [Birnbaum 2020], we need new things because our interests have broadened. We are engaged in our field, minding our own business, going about our work, when our field confronts us with problems that turn out to be shared by other fields (and perhaps are the primary focus of other fields). And in those situations, it’s useful, for those of us at home in the XML stack, if we can address those problems with out XML-based tools. So I’m grateful to David for his work.

SGML was designed largely for documents, with some efforts to ensure it was general enough to be useful for other things. And XML was designed by people who would self-identify as document heads. But it’s helpful if we can use XML for statistical data, and if we can use XQuery and XSLT for statistical calculations, and use an XML vocabulary like SVG to draw the plots.

In a wholly unrelated area, Mary Holstege also described the need for creating more libraries for dealing with new kinds of information [Holstege 2020b]. (I notice, however, that both Mary and David are targeting SVG. Coincidence? I don’t think so.) And although Mary spoke eloquently about the need for more libraries, she also showed stunningly what good use can be made of existing libraries, like the library for geospatial data in MarkLogic.

I talked about defensive programming and how SGML and XML free us from it or at least reduce the burden, but of course defensive programming is not made necessary solely by errors; sometimes what we have to defend against is just the sheer complexity of our situation. And dealing with foreign systems adds to the complexity. Dealing with interactive systems, interrupts, and asynchronous systems adds more complexity.

Fortunately, those of us who rely on the XML stack have a secret — or not so secret — weapon in our struggles to deal with that particular kind of complexity: his name is Michael Kay. He has described some of the challenges of dealing with asynchronous processes in the environment of XSLT, which is a functional language and thus, on the usual account, a language in which time and change don’t really exist in ways that we can touch or manipulate in the program [Kay 2020]. Anyone who wants to develop web applications within XSLT is already in the debt of Mike and his collaborators. And now we live in hope of more debt to come.

Gerrit Imsieke showed us just how many things you may have to take into consideration in order to do something as apparently simple as calculating a value for an HTML @class attribute [Imsieke 2020]. As a user customizing a standard stylesheet, you may have to keep a lot of tiny, tedious details in mind in order to modify the behavior where you want something different without unintentionally breaking something somewhere else. As the creator of a stylesheet or system of stylesheets with a customization layer, you have even more things to keep in mind if you want the downstream customizers to succeed.

Fortunately, Gerrit Imsieke has developed what looks like a useful approach that XSLT developers in both roles can understand and rely on when it’s applied, and he has simplified the task of implementing it by identifying a straight-forward pattern and giving it a useful name.

Diversity tolerance

I’ve talked about dealing with the fact that we’re not alone in the world, that there are other people — other problems, other systems — and that there are frailties in our input that can all be subsumed under the term fault tolerance or error tolerance. But what is tolerance exactly? In the Stanford Encyclopedia of Philosophy, Rainer Forst says that tolerance, or toleration, has three essential features:

There is disapproval of or objection to certain beliefs, practices, things, or people.
There is acceptance of those beliefs, practices, things, or people despite our disapproval or objection because there are reasons that say it is better to accept them even though we disapprove.
There is a limit — a boundary — of toleration: there is a boundary between the things that are tolerated and the things that don’t qualify for tolerance [Forst 2017].

Thomas More is a nice example of this. In his book Utopia, he writes in a sort of inspiring, wonderful way about the religious toleration in Utopia [More]. Utopians do not enquire very closely about each other’s religious beliefs; they leave each other in peace in a way in which was not actually terribly common at Thomas More’s day. But there are limits to the toleration in Utopia. Atheism is not tolerated, nor is contention, so anybody who talks too loudly or too much about their own religious beliefs is likely to find themselves on the wrong side of the tolerance boundary. John Locke, the English philosopher, also wrote eloquently in favor of religious toleration, but not for Catholics.

Fortunately, when it comes to documents, we are seldom likely to end up burning people at the stake if they are on the wrong side of a tolerance boundary. But our schemas and document grammars sometimes serve as a kind of tolerance boundary, and sometimes that boundary is difficult to locate usefully. Years ago, David Birnbaum gave a talk and wrote a paper under the title In Defense of Invalid SGML [Birnbaum 1998, Birnbaum and Mundie 1999]. And as an example, he asked us to consider dictionaries.

If you look at entries in any dictionary, you will see if you look at ten or a hundred or a thousand of them that they tend to follow a very regular pattern. There is orthographic information. There is pronunciation. There is grammatical information. There may be a description of the etymology of the word. There are senses with examples, and the senses may be subdivided in various ways. And in some cases, one or more of these sub-areas will be omitted, and some of the sub-areas may become very large and complex in themselves and have important substructure.

But that pattern is not quite universal. It will hold for 99% of the entries — maybe more — but in any dictionary there will be a number of entries to which the pattern does not apply. And for a modern dictionary, if you ask the publisher, Why does this entry not follow this usual pattern? there is maybe a 50/50 chance the answer will be: Because the editor was asleep; that is an error. It’s embarrassing, and if you can give us a system that would catch that error, that would be really, really useful.

But there is also maybe a 50/50 chance that what they will say is Oh, well, that entry can’t follow the usual pattern because it’s special in this way or that way. Because let’s face it; the lexicon of any language is a lot like a lot of other linguistic data; there are patterns, and then there is a huge, long tail that goes out with no upper bound in which things are more and more unusual, and less and less standard. And if you want a dictionary to cover the lexicon, you’re going to have to deal with words that don’t follow the usual patterns.

David pointed out that we have a choice; we can perhaps say Well, those words exist, and their entries have unusual structures, so we have to define a grammar that accepts them. The problem is that the easiest way to define a grammar that accepts those extremely unusual structures is to define a grammar that accepts essentially anything: anything can go anywhere. And that tells you absolutely nothing about the pattern you will find in at least 99% of the entries. So, it’s not a very informative document grammar.

David’s suggestion was that the problem is not that our schema languages can’t do this; we can use our schema languages perfectly well to define interesting patterns. The problem is that we’re assuming that invalid input is necessarily erroneous input, and he proposed that we should be able to build systems in which invalid documents could be processed like any other because they might exist. That way you would have a useful grammar, and you would know when it applied. And you would be able to handle cases that didn’t conform to the grammar, and you would know when you needed to take special steps to handle them.

Now, I don’t know of anybody who has actually built a system this way, but you do see attempts to deal with this problem in a kind of similar way, and that is essentially an attempt to provide two different grammars for dictionary entries. The TEI, for example, defines an element called <entry> which has a more or less conventional dictionary structure; it’s not very restrictive. It’s not nearly as restrictive as you would want if you were producing a dictionary, but it’s reasonably restrictive. And there is an alternative element called <entryFree>, the name of which is intended to suggest that inside this element, it is a free-for-all, and anything can happen anywhere. There are no promises and no guarantees.

And I’ve heard of similar situations in vocabularies for legislation or other legislative documents because legislative documents tend to follow an extremely rigid, extremely predictable, and extremely well-defined structure almost all of the time. But if the Chair of Appropriations says This needs to happen this way, your DTD does not have a prayer. In no legislature in the world will a DTD or a schema count for more than what is said by the Chair of Appropriations. So, I have heard about legislative systems that also have an element whose name may vary, but whose semantics is essentially All hell breaks loose here. And good luck writing a general purpose processor for it.

What these things remind us is that if we want to reflect the world as it exists, variety and unusual phenomena must be tolerated. And we have to be careful because if we’re not careful, our aversion to errors and our tendency to conflate rarity with error will lead us not just to fail to register variety but to suppress it unintentionally. And in that situation, toleration — conscious toleration — is not really enough because if our attitude towards unusual phenomena — non-standard phenomena, minority views — is merely one of toleration, then unusual views and unusual phenomena will tend to be suffocated, and if we want them to survive, we must not just tolerate them but nurture them.

The German writer, Johann Wolfgang von Goethe, put it very nicely:

Toleranz sollte eigentlich nur eine vorübergehende Gesinnung sein: sie muß zur Anerkennung führen. Dulden heißt beleidigen.

— [Goethe 1829]

Or using the translation from Rainer Forst:

Tolerance should be a temporary attitude only: it must lead to recognition. To tolerate means to insult.

— [Forst 2017]

Now, there is a long history of philosophers and others who have insisted since Plato’s Republic on uniformity as the means of achieving unity and avoiding discord [Plato]. But anyone who has been an outlier, whether labeled as such by others or self-identified, should feel vaguely threatened (or maybe not very vaguely threatened) by any prescription for peace which is achieved through total uniformity in a society or organization because that is the peace of the strait-jacket. Perfect unanimity will tend to lead to monoculture, and monoculture will tend to lead to single points of failure. And we know how well that works out.

Now, taking active steps to encourage variation in our readers, for example, may take the form of doing the kind of accessibility work described by Liam Quin’s report on his accessibility workover of the Balisage proceedings [Quin 2020]. And as Bethan Tovey-Walsh said in her talk the other day, if you haven’t designed for everyone, your design is incomplete [Tovey-Walsh 2020].

But not everyone is the same; often when we offer better affordances for some people, it turns out that those new affordances benefit a much broader group than the initial target group. We sometimes hear this referred to as the curb cut principle because curb cuts, that is, the depressions in curbs to make a smooth surface from the sidewalk to the street, have turned out to benefit many people beyond the wheelchair users who appear to have been the initial target.

But the curb cut principle doesn’t always apply. It doesn’t always work out that way. What helps address one form of disability can under some circumstances exacerbate a different form of disability. What are we to do? Bethan Tovey-Walsh suggested that sometimes the best way forward is multiple presentations of the information, tailored for different profiles of capability and impairment, and presented in such a way that the end user — who is, after all, the best accessibility expert on the needs of that end user — can choose at reading time which presentation to read [Tovey-Walsh 2020]. Sometimes we will need multiple source representations in order to support the varied forms of presentation to the reader. Sometimes, I hope, we will be able to generate different presentations from a single source; that is, after all, one of the original goals of descriptive markup and one of the things we have most experience doing.

Either way, the more alternative forms of information we produce, the more need there is going to be to help people find the right one. Conceptually at least, the library community has been preparing for this moment for years, with the development and systematic engagement throughout libraries with the FRBR model: we may argue about whether a film with subtitles and the same film with an audio track describing what is happening visually are, or should be treated as, different expressions of the same work, or different manifestations of the same expression. (And it won’t surprise me in the least to see papers submitted on that topic to future Balisage conferences.) But FRBR has, at least, provided a vocabulary with which we can discuss the question.

Madeleine Rothberg’s talk reminded us that in order to help readers find our documents — and help our documents find their readers — there is infrastructure we are going to need for describing the particular profile of capabilities called upon by a given presentation of the document [Rothberg 2020]. And thanks to the hard work of many people, including Madeleine Rothberg herself, crucial parts of that infrastructure are already there, waiting to be used. What are we waiting for?

Sometimes it’s only a portion of the document that needs multiple presentations: figures, tables, other special callouts. Or more precisely, sometimes different portions of the document will have different sets of alternative presentations which vary in different ways. We are (some of us, at least) used to the idea that users may want to reset the font size. (And Madeleine Rothberg suggested plausibly the other day that maybe even designers are coming around to the idea that designing for the web is not like designing for a glossy magazine in which things stay put where the designer put them once the thing is printed. From her mouth to God’s ears!)

Different readers may and will have different preferences at different locations in the document. In order to allow as many readers as possible to make use of our documents, we are, I suppose, going to find ourselves building more and more documents — more and more websites — that change their presentation dynamically based on user preferences and choices. And we are going to need both better ways of working dynamically with documents, along the lines discussed by Mike Kay and — crucially — good testing infrastructures that can adapt to and handle the requirements of testing dynamic phenomena and testing in what will sometimes be a slightly awkward environment. As I mentioned earlier, Amanda Galtman and Steven Pemberton both showed us ways that we can do better at testing [Galtman 2020, Pemberton 2020]. And Vincenzo Rubano and Fabio Vitali discussed both a development framework that can help us handle the additional complexity of dynamic behaviors correctly and a testing tool built to accommodate and test dynamic changes in the document [Rubano and Vitali 2020].

Along the way, they [Rubano and Vitali] pose a compelling question: we know there are physical differences among humans; some people are different from other people. But in that sentence, who is the some people and who is the other people? How certain are we that these ones are normal and those ones are impaired?

Bethan Tovey-Walsh observed that by some useful definitions, disability is context dependent. Change the context, and as Vincenzo Rubano and Fabio Vitali pointed out, it is sighted web developers who are disabled because they are unable to perceive critical properties of their work product without assistive technology. It is fortunate that Rubano and Vitali are working on tools that may help those of us disabled in that way to cope better with our disability. I have high hopes for the software tools they described, but I think what I will remember longest about their talk is that question: just who is impaired here and who is disabled?

In natural science — in any quantitative discipline — the variation in the data that is not explained away by our theory is often called the residue. And as the name suggests, the residue is often associated with things you are going to throw out. Even with a theory that is perfectly correct, errors in measurement will quickly ensure there is a residue, so we will frequently associate the residue with error.

But it is from the residue that we learn. When Newtonian physics failed to agree with experiments, the natural inference was there was measurement error. But when repeated increasingly careful measurements did not resolve the anomalies, we eventually ended up with the theory of relativity and the theory of quantum electrodynamics. So, we don’t want to just discard the residue; it’s what we learn from.

Thomas Kuhn says scientific revolutions come precisely from our efforts to resolve the anomalies that arise in the theories that have up until then been dominant [Kuhn]. And along the same lines, Isaac Asimov is said to have said The most exciting phrase to hear in science … is not ‘Eureka!’ … but ‘Hmmm, that’s funny ….’^[3]

So, the residue may be noise. It may be error, but it may also be precious information which contains the germs of what we are going to learn next. So, there are good, self-interested reasons for not only making our systems more resilient in the face of error, but also for cultivating and nurturing rare or unusual or minority phenomena.

But there is another deeper reason. It’s a little hard to talk about; I’m not sure how well I can express it. Variation of many kinds is an intrinsic property of humankind as a whole: it’s part of being a human. And as far anyone can tell, so is making mistakes. So variation and error both seem to me to be intrinsically part of humanity.

Part of the appeal of descriptive markup, it seems to me, has been that it allows us to deal — or at least try to deal — with things as they are or as we understand them to be (or, at least, as our theories would have them be). It allows us to (try to) deal with documents or information, not just in their superficial manifestations, but at a level closer to their essential nature. It allows us as system designers — and it allows our users as users of our systems — to work with documents and information in ways that are more worthy of a human being.

In the end, we should just not tolerate, but should embrace diversity of all kinds, not only because it helps us make better, more resilient systems and learn more things, but because doing so helps us and our users to deal with documents as human beings. Documents are made by human beings and made for human beings. Let us make them, as far as we can, serve the full human being and serve all human beings. All of them. All of us.

Thank you for listening. Thank you for attending Balisage. Come again soon!^[4]

References

[Andries and Wood 2020] Andries, Patrick, and Lauren Wood. Converting typesetting codes to structured XML. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Wood01.

[Beshero-Bondar 2020] Beshero-Bondar, Elisa E. Text Encoding and Processing as a University Writing Intensive Course. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Beshero-Bondar01.

[Biezunski 2020] Biezunski, Michel. Introducing the Networker: Knowledge Graphs Unmediated. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020.

[Birnbaum 1998] Birnbaum, David J. In Defense of Invalid SGML. Presented at Markup Technologies ’98, Chicago, Illinois, November 19 - 20, 1998.

[Birnbaum 2020] Birnbaum, David J. Toward a function library for statistical plotting with XSLT and SVG. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Birnbaum01.

[Birnbaum and Mundie 1999] Birnbaum, David J., and David A. Mundie. The problem of anomalous data: A transformational approach. Markup Languages Theory & Practice 1.1 (1999): 1-19.

[Bleeker et al. 2020] Bleeker, Elli, Bram Buitendijk and Ronald Haentjens Dekker. Marking up microrevisions with major implications: Non-linear text in TAG. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Bleeker01.

[Bray 1997] Bray, Tim. Re: Sudden death: request for missing input. Email posted to the W3C SGML Working Group <w3c-sgml-wg@w3.org>, 29 April 1997. On the Web at https://lists.w3.org/Archives/Public/w3c-sgml-wg/1997Apr/0277.html.

[Brüggemann-Klein 2020] Brüggemann-Klein, Anne. Four Basic Building Principles (Patterns) for XML Schemas. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Bruggemann-Klein01.

[DeRose 2020] DeRose, Steven. What is a diagram, really? Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.DeRose01.

[Dijkstra 1976] Dijkstra, Edsger. A discipline of programming. Englewood Cliffs, N.J.: Prentice-Hall, 1976. xvii + 217 pp.

[Flynn 2020] Flynn, Peter. Cooking up something new: An XML and XSLT experiment with recipe data. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Flynn01.

[Forst 2017] Forst, Rainer. Toleration. In The Stanford Encyclopedia of Philosophy, Fall 2017 ed., edited by Edward N. Zalta. On the Web at https://plato.stanford.edu/archives/fall2017/entries/toleration/.

[Galtman 2020] Galtman, Amanda. Saxon-JS Meets XSpec Unit Testing: Building High Quality Into Your Web App. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Galtman01.

[Goethe 1829] Goethe, Johann Wolfgang von. Maximen und Reflexionen, Werke 6. 1829. Frankfurt am Main: Insel, 1981.

[Holstege 2020a] Holstege, Mary. Data Structures in XQuery. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Holstege02.

[Holstege 2020b] Holstege, Mary. XML for Art: A Case Study. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Holstege01.

[Huitfeldt and Sperberg-McQueen 2020] Huitfeldt, Claus, and C. M. Sperberg-McQueen. Document similarity: Transcription, edit distances, vocabulary overlap, and the metaphysics of documents. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Huitfeldt01.

[Imsieke 2020] Imsieke, Gerrit. Creating Class Attributes with XSLT (Making Stylesheets Extensible). Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Imsieke01.

[RFC 1122] Internet Engineering Task Force. Requirements for Internet Hosts — Communication Layers. RFC 1122, § 1.2.2 (Robustness Principle). Edited by R. Braden. October 1989 (restating Postel’s Law: Department of Defense. Internet Protocol. RFC 760, § 3.2. Edited by J. Postel. January 1980.).

[Kalvesmaki 2020] Kalvesmaki, Joel. A New \u: Extending XSLT Regular Expressions for Unicode. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Kalvesmaki01.

[Kay 2020] Kay, Michael. Asynchronous XSLT. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Kay01.

[Kimber 2020] Kimber, Eliot. High-Quality Microsoft Word documents from XML: The Wordinator. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Kimber01.

[Knuth 1984] Knuth, Donald E. Literate Programming. The Computer Journal 27, no. 2 (January 1984): 97–111. doi:https://doi.org/10.1093/comjnl/27.2.97.

[Kuhn] Kuhn, Thomas S. The Structure of Scientific Revolutions. 4th ed. 1962. Chicago: University Of Chicago Press, 2012.

[Lubell 2020] Lubell, Joshua. A Document-based View of the Risk Management Framework. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Lubell01.

[More] More, Thomas. Utopia. 3rd ed. Edited by George M. Logan. Translated by Robert M. Adams. Cambridge Texts in the History of Political Thought. 1516. Cambridge, United Kingdom: Cambridge University Press, 2016.

[Nordström 2020] Nordström, Ari. Pipelined XSLT Transformations. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Nordstrom01.

[Pemberton 2020] Pemberton, Steven. How Suite it is: Declarative XForms Submission Testing. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Pemberton01.

[Piez 2020] Piez, Wendell. Systems security assurance as (micro) publishing: Declarative markup for systems description and assessment. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Piez01.

[Plato] Plato. The Republic. 10th ed. Edited by G. R. F. Farrari. Translated by Tom Griffith. Cambridge Texts in the History of Political Thought. Cambridge, United Kingdom: Cambridge University Press, 2012.

[Pólya 1945] Pólya, George. How to Solve It. 1st ed. Princeton, New Jersey: Princeton University Press, 1945.

[Porter 2020] Porter, C. Edward. Syntax-From-Doc: A Case Study of Powering IDE Code Completion from XML Documentation. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Porter01.

[Quin 2020] Quin, Liam. Improving Accessibility of an XML-based Conference Proceedings Web Site. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Quin01.

[Asimov] Quote Investigator. The Most Exciting Phrase in Science Is Not ‘Eureka’ But ‘That’s funny …’. Citing Usenet Newsgroup: comp.sources.games, v01i040: fortune – quote for the day, Part 14/16 (Source code listing for fortune computer program distributed via Usenet). From games-request at tekred.UUCP. June 3, 1987. (Google Usenet groups archive; Accessed February 28, 2015.) On the Web at https://quoteinvestigator.com/2015/03/02/eureka-funny/

[Rothberg 2020] Rothberg, Madeleine. Accessibility metadata statements. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Rothberg01.

[Rubano and Vitali 2020] Rubano, Vincenzo, and Fabio Vitali. Experiences from declarative markup to improve the accessibility of HTML. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Vitali01.

[Sperberg-McQueen 2020] Sperberg-McQueen, C. M. An XML infrastructure: for spell checking with custom dictionaries. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Sperberg-McQueen01.

[Spolsky 2000] Spolsky, Joel. Things You Should Never Do, Part I. Joel on Software (blog). April 6, 2000. On the Web at https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/.

[Tovey-Walsh 2020] Tovey-Walsh, Bethan Siân. Disabled by default. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Tovey-Walsh01.

[Usdin 2020] Usdin, B. Tommie. Welcome to Balisage 2020. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Usdin01.

[Walsh 2020] Walsh, Norman. XSLT 3.0 on ordinary prose. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Walsh01.

^[1] It’s not unlikely that Dijkstra said this or something like it more than once. The discussion I found first is on pp. 56-57 of A discipline of programming [Dijkstra 1976].

^[2] In preparing these remarks for publication in the proceedings, I have searched through some of the usual suspects for the remark I remember and paraphrase here, but I have not found it. So I have made the attribution vaguer than it was in the oral presentation.

^[3] This remark is widely attributed to Asimov, for example, in popular quote-of-the-day software [Asimov], but no one seems to know when or where he said it. So it may be apocryphal.

^[4] The present text is a very lightly copy edited version of my remarks at the end of Balisage 2020. The thanks to those who listened were more than usually sincere, since the talk lasted much longer than it should have, and some auditors must have feared that they might have strayed unintentionally into a four-hour speech in the manner of Fidel Castro (though with less discussion of the progress of the sugar cane harvest). Additional thanks are due to Tonya R. Gaylord, who transcribed the talk, chased down references for many things that needed references, and prodded me to clarify some points.

Andries, Patrick, and Lauren Wood. Converting typesetting codes to structured XML. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Wood01.

Beshero-Bondar, Elisa E. Text Encoding and Processing as a University Writing Intensive Course. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Beshero-Bondar01.

Biezunski, Michel. Introducing the Networker: Knowledge Graphs Unmediated. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020.

Birnbaum, David J. In Defense of Invalid SGML. Presented at Markup Technologies ’98, Chicago, Illinois, November 19 - 20, 1998.

Birnbaum, David J. Toward a function library for statistical plotting with XSLT and SVG. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Birnbaum01.

Birnbaum, David J., and David A. Mundie. The problem of anomalous data: A transformational approach. Markup Languages Theory & Practice 1.1 (1999): 1-19.

Bleeker, Elli, Bram Buitendijk and Ronald Haentjens Dekker. Marking up microrevisions with major implications: Non-linear text in TAG. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Bleeker01.

Bray, Tim. Re: Sudden death: request for missing input. Email posted to the W3C SGML Working Group <w3c-sgml-wg@w3.org>, 29 April 1997. On the Web at https://lists.w3.org/Archives/Public/w3c-sgml-wg/1997Apr/0277.html.

Brüggemann-Klein, Anne. Four Basic Building Principles (Patterns) for XML Schemas. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Bruggemann-Klein01.

DeRose, Steven. What is a diagram, really? Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.DeRose01.

Dijkstra, Edsger. A discipline of programming. Englewood Cliffs, N.J.: Prentice-Hall, 1976. xvii + 217 pp.

Flynn, Peter. Cooking up something new: An XML and XSLT experiment with recipe data. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Flynn01.

Forst, Rainer. Toleration. In The Stanford Encyclopedia of Philosophy, Fall 2017 ed., edited by Edward N. Zalta. On the Web at https://plato.stanford.edu/archives/fall2017/entries/toleration/.

Galtman, Amanda. Saxon-JS Meets XSpec Unit Testing: Building High Quality Into Your Web App. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Galtman01.

Goethe, Johann Wolfgang von. Maximen und Reflexionen, Werke 6. 1829. Frankfurt am Main: Insel, 1981.

Holstege, Mary. Data Structures in XQuery. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Holstege02.

Holstege, Mary. XML for Art: A Case Study. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Holstege01.

Huitfeldt, Claus, and C. M. Sperberg-McQueen. Document similarity: Transcription, edit distances, vocabulary overlap, and the metaphysics of documents. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Huitfeldt01.

Imsieke, Gerrit. Creating Class Attributes with XSLT (Making Stylesheets Extensible). Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Imsieke01.

Internet Engineering Task Force. Requirements for Internet Hosts — Communication Layers. RFC 1122, § 1.2.2 (Robustness Principle). Edited by R. Braden. October 1989 (restating Postel’s Law: Department of Defense. Internet Protocol. RFC 760, § 3.2. Edited by J. Postel. January 1980.).

Kalvesmaki, Joel. A New \u: Extending XSLT Regular Expressions for Unicode. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Kalvesmaki01.

Kay, Michael. Asynchronous XSLT. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Kay01.

Kimber, Eliot. High-Quality Microsoft Word documents from XML: The Wordinator. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Kimber01.

Knuth, Donald E. Literate Programming. The Computer Journal 27, no. 2 (January 1984): 97–111. doi:https://doi.org/10.1093/comjnl/27.2.97.

Kuhn, Thomas S. The Structure of Scientific Revolutions. 4th ed. 1962. Chicago: University Of Chicago Press, 2012.

Lubell, Joshua. A Document-based View of the Risk Management Framework. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Lubell01.

More, Thomas. Utopia. 3rd ed. Edited by George M. Logan. Translated by Robert M. Adams. Cambridge Texts in the History of Political Thought. 1516. Cambridge, United Kingdom: Cambridge University Press, 2016.

Nordström, Ari. Pipelined XSLT Transformations. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Nordstrom01.

Pemberton, Steven. How Suite it is: Declarative XForms Submission Testing. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Pemberton01.

Piez, Wendell. Systems security assurance as (micro) publishing: Declarative markup for systems description and assessment. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Piez01.

Plato. The Republic. 10th ed. Edited by G. R. F. Farrari. Translated by Tom Griffith. Cambridge Texts in the History of Political Thought. Cambridge, United Kingdom: Cambridge University Press, 2012.

Pólya, George. How to Solve It. 1st ed. Princeton, New Jersey: Princeton University Press, 1945.

Porter, C. Edward. Syntax-From-Doc: A Case Study of Powering IDE Code Completion from XML Documentation. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Porter01.

Quin, Liam. Improving Accessibility of an XML-based Conference Proceedings Web Site. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Quin01.

Quote Investigator. The Most Exciting Phrase in Science Is Not ‘Eureka’ But ‘That’s funny …’. Citing Usenet Newsgroup: comp.sources.games, v01i040: fortune – quote for the day, Part 14/16 (Source code listing for fortune computer program distributed via Usenet). From games-request at tekred.UUCP. June 3, 1987. (Google Usenet groups archive; Accessed February 28, 2015.) On the Web at https://quoteinvestigator.com/2015/03/02/eureka-funny/

Rothberg, Madeleine. Accessibility metadata statements. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Rothberg01.

Rubano, Vincenzo, and Fabio Vitali. Experiences from declarative markup to improve the accessibility of HTML. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Vitali01.

Sperberg-McQueen, C. M. An XML infrastructure: for spell checking with custom dictionaries. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Sperberg-McQueen01.

Spolsky, Joel. Things You Should Never Do, Part I. Joel on Software (blog). April 6, 2000. On the Web at https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/.

Tovey-Walsh, Bethan Siân. Disabled by default. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Tovey-Walsh01.

Usdin, B. Tommie. Welcome to Balisage 2020. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Usdin01.

Walsh, Norman. XSLT 3.0 on ordinary prose. Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). doi:https://doi.org/10.4242/BalisageVol25.Walsh01.

BalisageThe Markup Conference2020