How to cite this paper

Sperberg-McQueen, C. M. “Knock Down This Wall.” Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Sperberg-McQueen02.

Balisage: The Markup Conference 2023
July 31 - August 4, 2023

Balisage Paper: Knock Down This Wall

C. M. Sperberg-McQueen

Founder and principal

Black Mesa Technologies LLC

C. M. Sperberg-McQueen is the founder and principal of Black Mesa Technologies, a consultancy specializing in helping memory institutions improve the long term preservation of and access to the information for which they are responsible.

He served as editor in chief of the TEI Guidelines from 1988 to 2000, and has also served as co-editor of the World Wide Web Consortium’s XML 1.0 and XML Schema 1.1 specifications.

Copyright 2023 by the author

Abstract

Life can be comfortable inside a walled garden. But if we wish to engage with the world, we need to knock down those walls.

Table of Contents

Introduction
The Right Thing
Getting to Results
Artificial Intelligence
When Things Don’t Go as Expected
XML Stack
But … Life Happens
Praxis
Albania
Commerce
Unicorns
Just in Time Leather
Conclusion

Introduction

Thank you all for coming. I am speaking to you today from land that is historically part of Kah’p’oo Owinge, also known as Santa Clara Pueblo in New Mexico.

On Monday, Tommie Usdin invited us into a secret garden [Usdin 2023]. It’s secret, and I imagine it as a walled garden because how else can a garden be secret? Also, there’s a long tradition of Christian iconography around the image of the hortus inclusus (or enclosed garden).[1]

I hope that you have enjoyed our visit this week to this secret garden, but what I want to suggest now in closing is that perhaps the time has come to tear down those enclosing walls. I don’t want to tear up the garden, but I do want to knock down the wall or at least put a couple of gates in it.

And to explain why, I think I’m going to need to tell you a story. Some years ago, when I was a senior in college and avoiding whatever work I was supposed to be doing at the time, I was wandering around in the stacks of the main library at my university, picking up books that looked interesting; and I spent half an hour looking at a book on Indian logic. The preface explained that the author had for many years been hoping to write an account of classic Indian logic and how it compares with western logic derived from Greek practice. Eventually he had succeeded in putting a first draft together and had shown it to some friends, and they were enthusiastic about it. One of the nice things about finishing the first draft was that it gave him a better overview of the subject matter and allowed him to see that a different organization would be much better, so he set about the work of restructuring the presentation. He was writing it again from scratch when he was interrupted by some of his friends who had apparently despaired of his perfectionism and his unwillingness to call anything finished. They presented him with galley proofs produced by a typesetter to whom they had given the first draft, and they said, We’re going to publish this. Would you please proofread these galleys and write a preface?

This has been on my mind I think partly because it illustrates a tension that is visible in many places in the world between a desire to get something right — to get the right answer to a question no matter how long it takes — and deadlines — the desire to get some results, even if they are imperfect.

The Right Thing

The descriptive markup community has plenty of people who continue to seek the right answer to the question however long it takes. Patrick Durusau’s paper on Monday is, I think, a good example [Durusau 2023]. I have been hearing people worry about overlap since at least the late 1980s; I have done a fair amount of worrying about overlap myself. And some people are still trying, as Patrick’s paper illustrates, to find a solution that works as generally, and is as convincing for the general case, as XML is for the case where you only really care about one hierarchy. They are trying to find some solution that has a convincing story about a serialization form and a data structure and the conception of a document vocabulary as a document language for which there could be a grammar and for which you can imagine some kind of validation. Patrick’s answer may or may not be the general solution we’ve been looking for. I’m sure the discussion will continue. Of course, many people have solved this problem in a purely pragmatic way; if they had deadlines, they chose an approach, and they did it.

But even if what you really need is a result, things can change — things can get better. Amanda Galtman demonstrated a specific technique using xsl:accumulator that can help a lot with the kind of discontinuous elements that sometimes trouble people [Galtman 2023].

Of course, looking for the right answer no matter how long it takes works most easily when you don’t have a deadline. I talked the other day about finally finding a solution to a practical problem that arose in generating an ebook for Frege [Sperberg-McQueen 2023], but the first time I thought about that ebook was 2014. And I have had the luxury of being able to spend off-and-on nine years thinking about it, before iXML came along and I realized that it was the key to my solution.

Not everybody has that luxury because sometimes you do have a deadline, and when you do have a deadline, quite often you just need results, no matter how you get them. I have always thought that processing instructions in both SGML and XML are a signal that the designers of those languages included people who knew that sometimes you might need to resort to impure methods to get the results, but also people who cared enough about the difference between pure methods and impure methods to want to mark the places where you had resorted to impure methods, to processor-specific instructions and so on, in order to get your results. If you mark those places, you give yourself the chance to go back later and fix things to work with purer methods.

Getting to Results

Sometimes, of course, results matter, deadline or no deadline. Some years ago, I was at dinner with a bunch of people in the XML Schema working group; and the discussion turned to artificial intelligence and to the shift in artificial intelligence that Elisa Beshero-Bondar talked about, away from symbolic computation and towards purely statistical computation [Beshero-Bondar 2023]. And somebody said, You know, it’s really kind of a shame because with symbolic AI once a problem is solved, the result represents insight; the problem crystallizes things, and it presents an advance in human understanding. And Paul Biron, one of the editors of the data types specification, who worked at that time for a very large healthcare organization, said, It doesn’t matter in the least to me. If a black box that we don’t understand will give us better patient outcomes, I would much rather have better patient outcomes than better insight that doesn’t lead to better patient outcomes. What I care about is the result, not the process. And it is a fact that symbolic AI produced some really nice results, but statistical AI has produced more results and better results — more impressive results.

Artificial Intelligence

We may or may not be approaching the singularity that some people have predicted, but clearly things are changing and very rapidly. I am extremely grateful to Uche Ogbuji and Joel Dubinko, who each showed us that there is useful work that AIs can do right now, as well as useful work that is right now perhaps a little beyond them [Ogbuji 2023, Dubinko 2023]. But for how long will it be beyond them? They persuaded me — and I hope they persuaded you — that we do need to get our hands dirty, not just to make sure that we’re not left behind — although that does matter in some respects — but also to make sure that the artificial intelligences and those who are developing them engage fully — or more fully than they have so far — with human diversity in language and otherwise, and that our work with AI can serve the kind of values that the people in this community care about.

If what you care about is the process, then how the results are achieved matters; if what you care about is product, then how they’re achieved doesn’t really matter, as long as the results are correct. And that’s why I think one of the important papers at this conference was the report from Paul Prescod, Ben Feuer, and others, reporting on a project which is attempting to provide a method to allow us to produce testable reproducible answers to the questions Are the results of this auto-markup process any good? How good are they? Where can they be better? [Prescod et al. 2023]. I think there is great promise in the general principle that they described: specifying one or more target documents and then scoring the result by measuring the edit distance (for some edit distance or other) between the results produced by the artificial intelligence (or auto-markup process) and one or the other of the target documents. Like Elisa Beshero-Bondar, I wish that we could combine the strengths of large language models and the statistical approach with the strengths and explanatory power of the symbolic approach, but I have no idea how that will happen [Beshero-Bondar 2023]. All I know is that this will continue to be an important tension and an important challenge.

When Things Don’t Go as Expected

Sometimes I think that the shift in AI research from symbolic to statistical computation is just one instance of a much larger and more complex pattern. What happens when there is a group of people who expect things to go in one way and find that they don’t go that way? What happens then? This is particularly visible and interesting in cases where you have a community of people who would like to change the world — and who expect to change the world — and then encounter a certain recalcitrance in the world as the world refuses to change in the way that they expected. Early AI researchers thought that general machine intelligence was just around the corner; from the late 1950s on, there were predictions that general intelligence was maybe five years away. But the five-year interval never got shorter and eventually became a joke, until finally AI as a field said We’re not getting anywhere this way; we need other approaches.

For a long time, socialists and Marxists the world over expected world revolution; world revolution was just around the corner. And it is interesting to look at the history of the various flavors of socialism to see how different people react to the fact that world revolution hasn’t happened and now looks less likely than ever. Sometimes people change gears — they change paths — the way the AI field did. The old ideas have failed; we’re going to try something new. So there are plenty of Communists and Bolsheviks who became neoconservatives.

When I first learned SGML and learned about descriptive markup, I found it very hard not to think that world revolution was just around the corner. It was so obvious that this was a better way to do things, so obvious that it was useful both in the short-term and in the long-term that surely everybody will soon see that this is a better way to do things. Surely, very soon the whole world will be using SGML, because it’s so obviously the right thing.

XML Stack

After all, look at the stuff we can build on top of descriptive markup and nowadays on top of XML and XML technologies. John Chelsom showed us the other day that classic symbolic methods of AI like forward and backward rule-chaining can be done in XForms — not even calling an external program but writing the executable code in XForms itself [Chelsom 2023]. Ari Nordström argued this morning (and I believe him) that the XML stack provides a nicer basis for content management systems than existing content management systems have [Nordström 2023]. The XML stack is nice for the implementers of the content management systems, because the XML stack provides features useful for document management. And it’s nice for those who maintain and run the content management system: it’s very convenient for people like Ari and me and no doubt most of you, if we can use XML technologies when interacting with the content management system.

Geert Bormans and Srikanth Venkata Subramanian demonstrated how XProc, and XSLT, and the web (which for purposes of this talk I’m going to claim as an application of descriptive markup) can help with quality assurance in an extremely complex and important area and keep the complexity of the problems tractable [Bormans and Subramanian 2023]. Eliot Kimber showed us how DITA keys can make it easier to maintain large bodies of documentation and how you can use the XML stack to get from here to there — from a world in which you’re not using the keys to a world in which you are — even if your conversion window is very, very narrow because you are part of a very large organization [Kimber 2023].

In order to make the most of the strengths of the XML stack, we need of course to hone our skills so I’m grateful to Amanda Galtman for teaching us how to use XSLT accumulators to spark joy in our code and how to use XSpec to check their work along the motto trust, but verify [Galtman 2023]. Mary Holstege showed us how a single programmer suitably guided by laziness and impatience (I’m sorry, Mary, you may think of yourself as lazy, but we can do the arithmetic: 150,000 lines of code in three years! You might wish you were lazy, but you’re not) can build her own tools and make an impressive library available in both XQuery and XSLT [Holstege 2023].

People can build beautiful things with the XML stack. And they do.

But as Allen Renear pointed out on Sunday not everyone is sold [Renear 2023]. Lots of people don’t want to use XML. Lots of people who use XML don’t want to see it, so here is another call-out to Geert Bormans and Srikanth Venkata Subramanian and to their users who do want to see XML — who do want an XML editor so they can work with Akoma Ntoso instead of in Word [Bormans and Subramanian 2023]. Huzzah! May that be an omen of things to come! But in the meantime, when we are dealing with a world in which not everyone wants to use XML, what is to be done?

But … Life Happens

It’s important to remember that no single technology or family of technologies is ever going to be the only thing in the world. However good the technology is, complications will arise, complications of very different kinds, sometimes non-technological.

Praxis

We want to build things that last, either for commercial reasons or for other reasons. We want to use descriptive markup maybe to preserve our cultural heritage and public data. We know or hope that our cultural heritage should outlive any single piece of software, but it’s easy to forget when we’re planning a project that we also need the cultural heritage that we’re trying to preserve and the project that is preserving it to outlive us.

Jeff Beck pointed out the other day that for projects to live on, succession planning is necessary [Beck 2023]. And when public vocabularies are so successful that everyone is using a predefined public vocabulary, that will have consequences because it means that fewer people will have experience designing and maintaining vocabularies. We have to be careful not to let our successes poison our future.

Some things are hard to plan for, even when we plan for them. Ash Clark reminded us yesterday that even when you know that performance may be an issue and you plan for it and you think ahead and you test in order to detect and solve performance issues, nevertheless, performance issues can arise when you go into production — performance issues that weren’t there before you went into production [Clark 2023]. Now, we can say, having heard the talk, well, remember the bots. Remember that bots will crawl your site and they will exercise portions of your code that you weren’t thinking were going to be heavily exercised because humans aren’t going to do that. And that’s a good lesson to draw. We should remember that bots will stress our systems.

But what will it be next time? There will always be something that we have not foreseen, and when that problem appears, the solution is very likely to involve the same kind of things that Ash described: studying the problem, thinking, and being willing to re-architect your design [Clark 2023].

Sometimes the things that don’t work out the way we expected are that the world turns out not to match a description that we published. In publishing, as Debbie Lapeyre pointed out earlier today, it’s easy to assume that once something is published it jolly well stays published, and that’s an end to it. That’s almost true, but not always. As Jessica Hymers and Qinqin Lin showed us this morning, it is really, really important to think about how to handle retractions and corrections and how to make sure that they are visible to people who might otherwise be tempted to rely on faulty science [Hymers and Lin 2023].

Sometimes what you need turns out to have been built in, or at least there were some facilities for it, but you will still need to work out interoperable ways of working with those built-in facilities.

Albania

One of the big challenges when you expected the world to change and then it didn’t, is that when you are thinking about how the world is going to change and how things are going to be, you make certain plans and adopt certain behaviors, and then when the world doesn’t change the way you expected, you find yourself confronted with the continued existence of people and institutions whose continued existence was not part of your original plan or expectation. In the case of AI, those were, for example, problem areas that resisted solution. In the case of the Communist International, they were the continued existence of non-socialist countries. For XML people who expected the entire world to start using XML, it is perhaps the continued existence of non-XML formats and non-XML users.

There are several things that people can do in this case. Sometimes there are people who refuse to give in, who just continue working for the original goal. Leon Trotsky never gave up on world revolution; that was one of the serious substantive policy differences between Trotsky and Stalin. Trotsky wanted world revolution, and Stalin said, you know, we have control of this country; we really need to try to make it work within this country. And socialism within one country required a certain amount of attention that Trotsky didn’t want to give it because Trotsky wanted to focus on world revolution. But eventually, for most purposes, Trotsky became irrelevant to the way the world went.

But even under Stalin, the Soviet Union took an attitude that, I guess, could be described as hostile co-existence: the non-socialist world continues to exist, but it won’t forever because we are in a contest; ultimately one system will outlive the other. And you get remarks like Khrushchev’s remark when he visited the UN in 1960, We will bury you which sounded really, really threatening to many Americans because it sounded as though Khrushchev had plans to bring about the state of affairs in which it would be necessary for him to bury us, even though in reality, as I learned years later, he was just citing an old Russian proverb that means We will outlive you.

At this point, I find it necessary, however, to remind people that world revolution was not necessarily part of everyone’s original plan for XML. Converting everyone to XML is not one of the goals in the XML specification. The database management people showed up at W3C and at XML meetings not because anybody told them to, but of their own accord. I remember, again, a Working Group dinner at which I said, Don’t take this wrong. I don’t want to offend you, but why you here? What does XML have that you need? You know, XML is for documents, and databases are really not document-like. And I’m happy that you’re here, but I don’t understand why. And they said, Well, we have interchange problems. When we need to move data from one database to the other, we have trouble. And I said, What do you mean? You’ve got comma-separated values; surely that’s all you need. I mean, I’m not really a database guy, but I have worked enough with relational databases to know that the tables fit really nicely in common-separated values. And they said, Yes, but comma-separated values are responsible for about half of our of our support costs because, well, for one thing, the format looks so simple that many programmers don’t bother to look for a library. They figure they can write it on their own, and they do, and they don’t always get it right, so they have quoting problems and white space problems. And the other thing is that, even if they got it right, comma-separated value formats have no place to identify the character set being used. So the fact that XML was created in part by character set geeks who had been struggling with character set issues for years and therefore built an encoding declaration into XML was partly responsible for the interest of database people in XML. Because XML had a way to declare the character encoding.

Now, being part of a movement that may change the world is a lot of fun. It’s very satisfying; it gives context and significance to our lives. And giving up on the idea that it’s going to change the world can be psychologically very difficult. But I submit that being part of that kind of movement can give our work meaning whether we end up taking over the world or not. I’ll also note that Charles Goldfarb, for what it’s worth, was always careful, at least when I heard him speaking, not to imply that SGML was out to conquer the world or that SGML must conquer the world or count as a failure.[2] On the contrary, he sometimes positioned SGML and descriptive markup as a sort of niche technology for people who could not afford to take sides in battles between proprietary formats backed by large organizations that they did not control. I remember a talk in which one slide read When giants do battle, choose a different location for your picnic.

There’s a threat to all of us when organizations that are larger than we are are fighting each other and we risk becoming collateral damage. But there’s also a threat when others are competing with us and see our non-existence as advantageous to them. Even if we don’t seek world domination ourselves, they may be seeking world domination. After all, a lot of people have learned the lesson of the network effect and think This world will be so much better if our technology is adopted universally. And any technology that’s not theirs is a threat to them.

So what do you do if you find yourself living in a world full of threats? One approach is to build a wall: ignore them, defend yourself, and pull back your attention to focus on what is close at hand. In a political environment, maybe the best example of this is Albania which was extremely isolated not just from the non-socialist west, but also the rest of the socialist world. East Germany also spent a lot of resources trying to isolate itself.

On a smaller scale, we can build a wall around our garden to make it safe for us to tend our own garden, as Voltaire put it. But sometimes the threats that led us to shield ourselves from the outside world turn not to be not quite so threatening as we thought. You may remember that ominous picture of a very threatening-looking snake in Tommie’s slides [Usdin 2023]. That was a garter snake, of no conceivable threat to any human being attending this conference or not attending this conference.

Sometimes we decide there’s really no threat. We can live in the world; we can interact with the rest of the world in the same way that Scandinavian Social Democrats do not find it necessary to shield themselves off from non-socialists. They engage with others in parliamentary democracies, they enter into coalitions, they form governments. They’re just another political party. They have certain policy preferences, but they don’t regard the rest of the political spectrum as an enemy with whom it is dangerous to enter into any commerce. In a similar way, there are XML users and tools that engage with other formats. And we can learn from their experience.

Commerce

Charles O’Connor and Mark Gross talked about an XML early work-flow where the first step, of course, is moving data into XML from non-XML formats [Gross and O’Connor 2023]. I was impressed with how much they could achieve and with their idea of detecting problems automatically so that problem documents could be routed for human fix-up. That seemed to me, in its own way, the kind of self-awareness — awareness of the limitations and of the things that their system doesn’t do — that Elisa Beshero-Bondar found missing from the AIs with which she ran her experiments [Beshero-Bondar 2023]. I was also impressed by how much mileage they got out of early normalization as a way of increasing their success rate.

Phil Fearon and Gursheen Kaur also exploited early normalization — in their case, just-in-time normalization — as a way of reducing the variability in markup and making it easy for their code to focus on the core task of comparing the two tables rather than comparing the two tables with one hand while, with the other, desperately trying to compensate for the wild variation in the explicitness and quality of the table markup found in some inputs [Fearon and Kaur 2023].

High variability in our input is always a challenge. It is important in cases where it carries meaning, and that’s why I have always been amused at the habit the database people have of attempting (as I like to say) to de-legitimize XML by describing it as semi-structured in contrast to the much more rigid, much more highly structured (in their view) structure of a relational table. I think it’s not really a difference between structured information and semi-structured information; I think it’s a distinction between the kind of structure you find in table salt and the kind of structure you find in DNA. The variability is where a lot of the information is. But if the variability is not carrying information — if it’s accidental — then it’s very helpful to normalize it away.

In some cases, that normalization will be a real challenge, especially if one is dealing with some legacy markup formats. Joel Kalvesmaki gave us a vivid example that will keep some of us awake at night for some time: a reminder of just how strange and wonderful formats can be and just how devious and ingenious the inventors of new formats in the 1960s and ’70s and ’80s could be, especially when they were single-mindedly aiming at a single application of putting ink on paper [Kalvesmaki 2023]. In that context, anything that gets the same ink on paper is fine, and any goal beyond that doesn’t need to be met.

One of the horrifying things, I think, about large and far-sighted organizations is that they see the need for standardization before anybody else does. And if they’re large and self-confident, they move forward on standardizing things, at least for themselves. And the result is that they end up committed to systems that ultimately become utterly, completely unlike anything anyone else is using, so they are completely isolated and they have a much harder time learning from other people because other people started from a different foundation.

Joel’s talk also illustrated, it seemed to me, the huge potential long-term costs of focusing only on results and not also on process [Kalvesmaki 2023]: incomplete documentation, or the complete absence of documentation; no archival copies of earlier versions. Why? Because those things didn’t help people get ink on paper in time for the 4:00pm Fedex pickup.

Unicorns

You know, there is another reason I was thinking about that Indian logician. The real reason I was thinking about that Indian logician is a problem related to existential and universal quantification. By universal quantification, I mean logical sentences like All human beings are mortal; and by existential quantification, I mean sentences like Some human beings are mortal. And it sticks in my mind, in part, because the Indian logician taught me something, in that thirty minutes I spent reading a book that I will never find again (because I can’t remember his name and I don’t remember enough about the book ever to find it again).

He said that the treatment of quantification is one of the crucial differences between Indian logic and Greek logic. The sentence All human beings are mortal is a stronger sentence than Some human beings are mortal, and the stronger sentence entails the weaker sentence. So, in Indian logic, if the universal quantification is true, then an existential quantification will also be true.

Greek logicians made an opposite choice. They said, consider the sentence All unicorns have one horn. Well, that’s true because there are no counter-examples. There are no unicorns that don’t have one horn. And so they established the principle that if the class involved is empty, then the universal quantification is always true.

But an existential quantification — a sentence like Some unicorns have one horn — that, the Greeks really wanted to be true only if there existed at least one that had one horn. And that means the truth of the universal statement does not imply the truth of the existential statement. And in general, in Western logic, an existential statement is taken to entail, as the term existential suggests, the existence of members of that class.

I think of it as natural that we should call a statement using the word some an existential statement, but that’s only because I’ve spent a lot of time thinking about it within the framework of Western logic. The word some and the word existence have no etymological tie, and their semantic tie is, as the Indian author made clear, the doing of Greek logic. And he described the difference in approach this way:[3] he said, two logicians — an Indian and a Greek — went for a walk. And there were sharp rocks, and they hurt their feet on the rocks. And the Greek logician said, You know what we should do? We should get a big piece of leather, and we should cover the entire globe, so when we go for a walk, there is always a layer of leather between our feet and the rock, and we can’t hurt our feet. And the Indian said, Having leather between the rock and our feet would be a good idea, but I think it’s simpler if we just wear sandals.

Just in Time Leather

You will see, I think, how this applies to markup. It seems to me that this goes right to the heart of the problem of co-existence of descriptive markup with other styles of information representation, or XML and other formats. If we had succeeded in bringing the entire world into the practice of descriptive markup and the use of XML, it would be like having leather all over the world; we could go barefoot anywhere, and we would never have to worry about sharp rocks. But we haven’t reached that point, so it’s better if we can just wear shoes when there are rocks.

The approach of making things be XML when we need them to be XML works very nicely for some people. The other day somebody asked me, in a discussion of Google Sheets, Do you know how to sort the rows of a spreadsheet in Google Sheets? And I said, Sure, and they said, How do you do it? I click at the upper left, I export to CSV, I download, I flip over to BaseX, I load it in BaseX as unparsed data, and I use the csv:parse command to turn it into XML. Then, I can sort the XML or do whatever I want with it. And I’m happy. And for some reason, the person I was talking to did not really think of this as a satisfactory answer to the question How do I sort the rows of a spreadsheet in Google Sheets? And even for me, I confess that sometimes that’s a step or two more than I would like to take, and I would like it to be a little simpler. I would be happier, for example, if Google Sheets had an XML export.

And for the people I work with, the problem is that I become a sort of informational black hole. I can pull in information and turn it into XML, and I can do nice things with it. But it’s hard for me to share the information with them if they can’t deal with XML. I suppose that, having sorted the records, I could write them back out to CSV and upload them again and overwrite the spreadsheet. But I’ve never actually tried that, and I don’t know for sure that it would work. Life will be better for me and for the people I work with if we could make it easier to move data into and back out of XML.

There is some work going on in this area. We heard Michael Kay talk this morning about how to do better at XML-to-JSON conversion by using schema [Kay 2023]. And you will have seen, if you attended his talk, just how complicated that problem is in the general case. Ash Clark talked about how to do better by not necessarily using the standard or default XML representation of JSON but by using another one that would have better performance characteristics in your particular work-flow [Clark 2023]. And in invisible XML we have a general tool for moving data into XML. Norm Tovey-Walsh talked on Monday about an important issue in the implementation of invisible XML, an issue that’s important not just for implementers but also for grammar writers [Tovey-Walsh 2023].

Conclusion

Invisible XML is focused on moving things into XML. And I think there is continuing pain and thus a continuing opportunity in the area of export from XML. iXML simplifies import. You could always get stuff into XML; you could write a parser in XSLT or XQuery, using the facilities that are built in. But iXML makes it a lot easier because all you have to do is specify a grammar and an ixml processor will parse your input into XML for you. I wonder if there is an opportunity for a notation that could similarly define export so that we have a declarative notation that we could compile to XQuery or XSLT that would produce a suitable export into a text stream that matches the grammar or obeys the rules that we’ve laid out for the notation.

It seems to me that there are two difficulties in data interchange — I thought this for a long time; this was part of our thinking when we were working on the Text Encoding Initiative — one problem is understanding the structure of the data you acquired from somebody else. Where are the field boundaries? Do some things contain other things? Can things repeat? What is data here, and what is delimiter? And stuff like that. And the second arises typically after you have answered the first question and understood the structure of the data; that’s the problem that arises when you discover that the information you wanted is not there, at least not in the form that you wanted.

Maybe they analyzed the world in a different way so it’s going to be more complicated to get the information you want. Maybe the information you want just isn’t there because, oddly enough, their view of the world is different from your view of the world, and the things that they are interested in turn out not to be the things that you’re interested in. The TEI, I always used to say, can help with the first, but it really cannot help with the second. The only way to help with the second would be to prescribe that everybody has to care about the same things. And that’s not something most people want to do, or even if they do want to do it, it’s not something that would ever succeed; we’re not ever all going to be interested in the same things.

And analogously there are two difficulties in moving data into and out of XML. The first is just a difference in format which we can solve with better import/export tools. We can get data into XML. But the second is the difference — I’ll call it a difference in worldview — that people who created that external data format don’t think the way we do. The people who use it often don’t think the way we do. And so what you get will be XML, but it will not necessarily be XML that uses the principles of descriptive markup. And it’s important to be aware of this, so we can prepare ourselves. There are cases where we will have to relax our concerns about that second thing. We are in a position to make it easier to move data into and out of XML; we are not in a position to make the entire world do descriptive markup. And just getting the data into XML helps a great deal.

If you doubt that, ask yourself: would you rather work with a Word 95 binary file or with the XML that you can extract from a .docx file? I submit to you that no one who has ever worked with either of those, and absolutely no one who has worked with both of those formats, will have any doubt that they are much happier working with the XML than with the binary format. If descriptive markup is as helpful as we say it is, things will always be easier with descriptive markup than without it, and the work we do will be better. And if XML is as helpful as we say it is, then things will be better with XML than without it, and our work will be better.

Gardens are very nice places to retreat from the world, but having rested here, we need to go back out into the world in part because we have things to offer the world. If we tear down the wall or at least punch a couple of gates in it, there are new places we can go. But be careful. The world out there is full of stones, and we’re not going to manage to cover it all with leather. So we’re going to need leather on our feet; put on your shoes. Let’s go some places, and then come back next year and tell us where you’ve been.

Thank you for attending Balisage.

References

[Beck 2023] Beck, Jeffrey. The Future Begins Tomorrow: Succession Planning for XML Infrastructure Resources. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beck01.

[Beshero-Bondar 2023] Beshero-Bondar, Elisa E. Markup and Migratory Workflows in the Context of AI and Big Data Analytics: Reflections on the Data Modeling Groundwork of the Digital Humanities. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beshero-Bondar01.

[Bormans and Subramanian 2023] Bormans, Geert, and Srikanth Venkata Subramanian. Unveiling Linguistic Harmony: Asserting Interlingual Synchronicity in Documents. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Bormans01.

[Chelsom 2023] Chelsom, John J. Artificial Intelligence with XForms. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Chelsom01.

[Clark 2023] Clark, Ash. A Wondrous Historie of Intertextual Networks: Or, How Not to Index Your Data. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Clark01.

[Dubinko 2023] Dubinko, M. Joel. Building Applications with Generative AI. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Dubinko01.

[Durusau 2023] Durusau, Patrick. Hypergraphs: Escaping the Surly Bonds of Syntax. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Durusau01.

[Fearon and Kaur 2023] Fearon, Phil, and Gursheen Kaur. Processing Lax XML Element Trees: Fixing HTML Tables with a Content Model Directed XSLT Transform. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Fearon01.

[Galtman 2023] Galtman, Amanda. Accumulators in XSLT and XSpec: Developing, Debugging, and Testing XSLT 3 Accumulators. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Galtman01.

[Gross and O’Connor 2023] Gross, Mark, and Charles O’Connor. Pulling All Production Processes Together with an XML-First System. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Gross01.

[Holstege 2023] Holstege, Mary. Adventures in Single-Sourcing XQuery and XSLT. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Holstege01.

[Hymers and Lin 2023] Hymers, Jessica, and Qinqin Lin. Retractions and Corrections at Scholars Portal Journals. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Hymers01.

[Kalvesmaki 2023] Kalvesmaki, Joel. Serializing the Locator Format of the United States Government Publishing Office as XML. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kalvesmaki01.

[Kay 2023] Kay, Michael. Schema-Aware Conversion of XML to JSON. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kay01.

[Kimber 2023] Kimber, Eliot. Turning a Battleship: Migrating ServiceNow Documentation to Use DITA Keys. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kimber01.

[Nordström 2023] Nordström, Ari. The Dream of a CMS. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Nordstrom01.

[Ogbuji 2023] Ogbuji, Uche. Privately Automating Common, Uncommon, and Surprising Markup Tasks Using AI Large Language Models. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Ogbuji01.

[Prescod et al. 2023] Prescod, Paul, Ben Feuer, Andrii Hladkyi, Sean Paulk and Arjun Prasad. Auto-Markup BenchMark: Towards an Industry-standard Benchmark for Evaluating Automatic Document Markup. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Prescod01.

[Renear 2023] Renear, Allen H. The SGML/XML Approach to Document Processing: [an incomplete] History of Criticisms and Challenges. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Renear01.

[Sperberg-McQueen 2023] Sperberg-McQueen, C. M. Keyboarding Frege’s Concept Writing: A Case Study in the Use of invisible XML. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Sperberg-McQueen01.

[Tovey-Walsh 2023] Tovey-Walsh, Norm. Ambiguity in iXML: And How to Control It. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Tovey-Walsh01.

[Usdin 2023] Usdin, B. Tommie. The Secret Garden. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Usdin01.



[1] The author gratefully acknowledges the help of Tonya Gaylord for her transcription of this talk; for the most part, I have kept this written form close to what was actually said, but I have not scrupled to reword things when it seemed likely to make them clearer. Deep thanks are also due to Debbie Lapeyre, for her help preparing the talk. Thanks are also due to the conference committee for inviting me to make these closing remarks and to the participants in the conference for listening.

[2] For the benefit of non-SGML users, it should perhaps be pointed out that Charles Goldfarb was one of the developers of GML and the editor of the ISO 8879 standard which defined SGML.

[3] This is not the form in which he made the comparison.

×

Beck, Jeffrey. The Future Begins Tomorrow: Succession Planning for XML Infrastructure Resources. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beck01.

×

Beshero-Bondar, Elisa E. Markup and Migratory Workflows in the Context of AI and Big Data Analytics: Reflections on the Data Modeling Groundwork of the Digital Humanities. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beshero-Bondar01.

×

Bormans, Geert, and Srikanth Venkata Subramanian. Unveiling Linguistic Harmony: Asserting Interlingual Synchronicity in Documents. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Bormans01.

×

Chelsom, John J. Artificial Intelligence with XForms. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Chelsom01.

×

Clark, Ash. A Wondrous Historie of Intertextual Networks: Or, How Not to Index Your Data. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Clark01.

×

Dubinko, M. Joel. Building Applications with Generative AI. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Dubinko01.

×

Durusau, Patrick. Hypergraphs: Escaping the Surly Bonds of Syntax. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Durusau01.

×

Fearon, Phil, and Gursheen Kaur. Processing Lax XML Element Trees: Fixing HTML Tables with a Content Model Directed XSLT Transform. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Fearon01.

×

Galtman, Amanda. Accumulators in XSLT and XSpec: Developing, Debugging, and Testing XSLT 3 Accumulators. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Galtman01.

×

Gross, Mark, and Charles O’Connor. Pulling All Production Processes Together with an XML-First System. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Gross01.

×

Holstege, Mary. Adventures in Single-Sourcing XQuery and XSLT. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Holstege01.

×

Hymers, Jessica, and Qinqin Lin. Retractions and Corrections at Scholars Portal Journals. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Hymers01.

×

Kalvesmaki, Joel. Serializing the Locator Format of the United States Government Publishing Office as XML. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kalvesmaki01.

×

Kay, Michael. Schema-Aware Conversion of XML to JSON. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kay01.

×

Kimber, Eliot. Turning a Battleship: Migrating ServiceNow Documentation to Use DITA Keys. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kimber01.

×

Nordström, Ari. The Dream of a CMS. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Nordstrom01.

×

Ogbuji, Uche. Privately Automating Common, Uncommon, and Surprising Markup Tasks Using AI Large Language Models. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Ogbuji01.

×

Prescod, Paul, Ben Feuer, Andrii Hladkyi, Sean Paulk and Arjun Prasad. Auto-Markup BenchMark: Towards an Industry-standard Benchmark for Evaluating Automatic Document Markup. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Prescod01.

×

Renear, Allen H. The SGML/XML Approach to Document Processing: [an incomplete] History of Criticisms and Challenges. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Renear01.

×

Sperberg-McQueen, C. M. Keyboarding Frege’s Concept Writing: A Case Study in the Use of invisible XML. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Sperberg-McQueen01.

×

Tovey-Walsh, Norm. Ambiguity in iXML: And How to Control It. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Tovey-Walsh01.

×

Usdin, B. Tommie. The Secret Garden. Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Usdin01.