How to cite this paper

Usdin, B. Tommie. “The Secret Garden.” Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Usdin01.

Balisage: The Markup Conference 2023
July 31 - August 4, 2023

Balisage Paper: The Secret Garden

B. Tommie Usdin

President

Mulberry Technologies, Inc.

B. Tommie Usdin is President of Mulberry Technologies, Inc., a consultancy specializing in XML for textual documents. Ms. Usdin has been working with SGML since 1985 and has been a supporter of XML since 1996. She chairs Balisage: The Markup Conference conference. Ms. Usdin has developed DTDs, Schemas, and XML/SGML application frameworks for applications in government and industry. Projects include reference materials in medicine, science, engineering, and law; semiconductor documentation; historical and archival materials. Distribution formats have included print books, magazines, and journals, and both web- and media-based electronic publications. She is co-chair of the NISO Z39-96, JATS: Journal Article Tag Suite Working Group and a member of the BITS Working Group and the NISO STS Standing Committee. You can read more about her at http://www.mulberrytech.com/people/usdin/index.html and see some of her photos on: flickr.

Copyright ©2023 by the author.

Abstract

We in the markup community have built ourselves a beautiful and ever-improving place to work. We can move content into markup, we have a variety of tools to manipulate marked-up content, we can move at will from tool to tool, we create a variety of products from that marked up content, and we believe our marked up content will be long lived. We frequently lament that most of the world doesn’t live in our techno-garden, and we occasionally admit that most of the world doesn’t even know it exists. At Balisage this year we will learn about ways in which our technology is improving. We will hear about some of the projects we are doing with markup and some of the problems we are having. And we will hear (a little) about how we are opening the gate to our garden and interacting with the outside world.

Table of Contents

What’s Growing in the Garden
What’s New and Emerging in the Garden
Garden Infrastructure Activity
Ways to Open the Gates and Increase Traffic Both Into and Out of the Garden
Worrying about Threats to Our Garden
Enjoy the Garden!

Welcome to Balisage. Balisage is an annual celebration of all things markup. It’s a place where a small group of relatively like-minded people can commune with our fellows — we get together and talk to each other. This is a place where we can assume that people understand what markup is, that it is useful and important, and that the benefits of working in a consistent environment significantly outweigh the costs of our always-imperfect specifications and standards. This is a place where we can safely unleash our inner geek and speak in acronyms. We have to explain the details of our tasks, goals, tools, and techniques here, but we don’t need to justify our use of declarative markup. We may be asked about our tool and technology choices — about the particulars of our tools and approaches; we will not be challenged to justify using the generic.

Welcome to this small, beautiful garden — that’s not the way the rest of the world works. Well, it’s relatively small and isolated in any case. I think it’s beautiful; you may disagree. This is a place where if I say, Identify what it is, not what you want to do with it or what it should look like, and you can do many things with it and present it in many ways, you won’t look at me as if I were crazy. You will wonder what that leads to, because I haven’t said anything interesting yet.

Many of us are the only markup aficionados in our organizations. We spend our days interpreting error messages, customizing software, and justifying indirection and generic markup. For us, being surrounded by others of our ilk is a real joy.

So, what are we going to do in our garden this week?

  • We’re going to look at what is growing in the garden;

  • We’re going to learn what is new and emerging in the garden;

  • We’re going to talk about garden infrastructure activity;

  • We’re going to talk about ways to open the gates and increase traffic both into and out of the garden;

  • Some of us are going to worry about threats to our garden; and

  • We are going to enjoy the garden.

What’s Growing in the Garden

Let’s start with what’s growing in the garden. First, and probably the reason most of us are here, we’ll be enjoying an opportunity to talk about markup in great detail. We are going to enjoy the company of people who understand and value declarative markup and who understand what structured documents are and why we care about them. We are going to learn about tools, techniques, and projects.

We’re going to talk about work, projects, and activities in the real world that involve markup. We’re going to talk about successful projects, projects that are in process, and some work that people hope to do. We’ll hear about what went well and what could have gone better. We’ll learn about work that was harder than anticipated, and perhaps some projects that were just plain disappointing — we’re allowed to talk about that here, you know. We’ll talk about what is, or was, hard as well as what was easier than we promised. And we’ll share what we learned along the way.

The markup space is based on, dependent on, and rich with, standards and specifications. It’s through use of shared specifications that we gain much of the value in markup. At Balisage, we will talk about formal standards, community specifications, and shared understandings (and shared misunderstandings). We’ll talk about the history of some specifications and the future of others. Many of the acronyms we’ll throw around like confetti are the names of standards or specifications: XML, XSLT, iXML, XPath, DITA, JATS, JSON, XQuery, and dozens more. (I thought about making a Balisage Bingo game; I didn’t do it, but I probably should have.) We’re going to talk about how some of us are using specifications, where we find their constraints binding, and how we want them to change. Interestingly, we’re unlikely to spend much energy on those aspects of the specifications and their constraints that we find comfortable, except to talk about people outside our environment who live without those constraints, which makes them, and their documents, very difficult to deal with.

What’s New and Emerging in the Garden

We’re going to discuss what’s new and emerging in the garden. We’ll discuss the future of XML and the XML stack, the future of markup technologies, and the future of some of the projects and activities various members of our community are working on. We’re going to hear about some new (or extended, expanded, or improved) specifications and some new (or extended, expanded, or improved) tools. We’re going to hear about some old and some new problems, perhaps with some new approaches.

Garden Infrastructure Activity

We’re going to talk about garden infrastructure. We’re going to talk about tools. We’ll hear about the principles behind how some tools work and about uses of tools. (It’s unlikely, for example, that we’re going to be able to talk about projects without talking about some of the tools used in the projects.) We’re going to talk about tools some of us wish existed because we want to use them. We will probably also hear about tools in the Open Mic session you heard about a few minutes ago. We’ll talk about tools in regular conference sessions and in sponsor sessions.

Did you understand that not only are we thanking our sponsors from the platform, but also we’re going to hear from some of them? The sponsor sessions are the only place at Balisage where we want to hear details about cool tools and where giving a commercial for your cool tool is encouraged, not discouraged. I encourage you to go to the sponsor sessions because we need what those people are doing. We need to buy their tools and, certainly, to know what they are.

Ways to Open the Gates and Increase Traffic Both Into and Out of the Garden

I don’t think I’ve ever been to a markup-based meeting that didn’t include a discussion of how under-appreciated we are. About how the world does not sufficiently understand our genius. I understand this lamentation of neglect. I sympathize with the wails of frustration that our philosophy is not widely enough adopted, that our tools are not widely enough used, and that problems that could be easily and effectively solved if people listened to us are unsolved. I, too, sometimes feel this frustration that we are unjustly and ignorantly ignored!

But I hope at Balisage this year we can avoid whining about ingratitude, and instead talk about how we can reach over or through the walls of our comfortable garden to expand the reach of declarative markup. We will, for example, talk about iXML, which is at heart a way to bring our point of view, and our tool chain, to content that was not created, structured, or encoded with generic markup, XML, or the XML stack in mind. I think of iXML as stealth XML, silently bringing the benefits of our world view to environments that view XML with distrust.

We’re going to talk about Artificial Intelligence and Large Language Models in the context of markup. This will make some of us uncomfortable. I’m sympathetic, to an extent. Any time a fashionable topic dominates the conversation, it risks becoming boring, and we’re certainly running that risk at Balisage. But — and this is a big but — any time a group avoids talking about a big part of their world because they feel anxious about it, they risk becoming fossils.

Some of us feel threatened and get defensive. Some want to compromise and get along with everyone. Some don’t. We compromise on little things. Sometimes you identify part of a text as italic because that’s all you know about it. You see that it was printed slanty a few hundred years ago, and since neither the author nor the typesetter are available to ask, you don’t know why. So, you record what you see. You want to print on fixed-size paper, with attractive page breaks and without rivers or ladders; and you have a limited budget for auto-magic typography? You insert clues for the tool doing this specific thing. Some see this as practical and appropriate for people working in the real world. Others see it as compromising our principles, as eroding the foundation of declarative markup.

Worrying about Threats to Our Garden

There are so many threats we can worry about.

As we were soliciting content for Balisage, and after we announced the program, several nay-sayers contacted me. (You wouldn’t think a tiny niche conference like Balisage would attract hate mail, but it does. Very odd.) I was told that declarative markup was under threat, and we — no, I — needed to do something about it.

I’ve been playing the markup game for a long time, and I have seen a lot of such threats and fear-filled responses to such threats. We, as a community of people who find declarative, generic, semi-semantic markup useful, important, powerful, and generally under-appreciated, have faced several existential threats in the last 30 or so years:

  • GUI editors,

  • HTML,

  • XML,

  • SGML,

  • the XML hype, and now

  • AI and LLMs.

There are, no doubt, others. These are the ones I want to talk about.

GUI Editors. That’s the first one that I remember people really upset about.

Graphical-User-Interface-based editors were terrible. The markup missionaries were convinced that tools that hid the markup would destroy us all. It was essential that people be forced to see the markup, to understand the syntax of the markup that was embedded in their documents. Tools that made the authoring experience resemble a final presentation were misleading and destructive.

Because we could do many different things with marked-up documents (perform semantic as well as presentational magic), making the authoring interface look like just one presentation style would confuse people. It would lead to misuse of tags to make the document look as the author wanted in this tool, leading to badly tagged documents, and death and destruction. Well, I have yet to see a large markup environment in which there wasn’t some tag abuse, so in a sense, these doom-sayers were right. On the other hand, the people I know of who are creating content in explicit declarative markup (meaning they are identifying structures they know to encode information in their content) are using GUI editors.

I doubt that there is much, if any, significant content created in generic markup that is not created in a GUI editor of one sort or another. If you write in XML and you aren’t an XML geek, you write using a GUI editor. If you are an XML geek, you might also use a GUI editor when your subject matter isn’t XML.

And then there’s HTML. Well, we don’t use HTML to tell the difference between blue jeans and chocolate chip cookies on the web. And HTML isn’t the only tag set anyone uses for anything. But in some very interesting ways, it has become the default tag set — a basis from which we vary intentionally when there is reason to. (Take a look at Akomo Ntoso for a fascinating use of HTML.)

HTML was the next existential threat to rich, domain-specific, multi-use markup. And oh, did we panic about it. I remember an SGML conference, probably in 1992, when HTML and its 20 or so tags were all the buzz. HTML was the one tag set to do everything. HTML was going to be used to encode all documents for all uses. HTML was going to take the world by storm! HTML was wonderful. HTML was terrible. HTML was going to destroy the funding for all of our markup-related projects, and all of our documents were going to become worthless because they were going to be encoded in HTML!

I have to admit, I was among the non-believers. At the time, I was working on a vocabulary for a complex set of medical references that had very close to 1,000 tags. Not for any good reason, mind you, but because the client didn’t believe it was a good idea to rely on context for either display or retrieval because it would be unreliable, and this data was important. So, not only did we not have <title> inside <section> (and inside <section> inside <section>), we didn’t have <title> inside <canine-use-section> or <animal-use-section>. We had tags like <canine-use-section-title>. And this was for a believer in SGML.

Anyway, I didn’t think it was likely that this tiny set of tags they were calling HTML would destroy all markup projects (I was right about that). I also didn’t believe it would become particularly widely used or useful (I was wrong about that!). There were impromptu evening sessions to discuss HTML, to teach us what it was, and to start the conversation about how we would combat it. There were a lot of small group discussions in the bar about how to keep our projects safe from the threat and how to convince our funders not to drop their SGML projects because HTML would make them unnecessary. It was intense!

And then there was XML. XML was the next emergent threat. XML was going to destroy all market confidence in things-markup because it was a variation on SGML, and only if the SGML standard was inviolate, would it be possible to build a market. Without the anchor of a stable international standard, the entire markup endeavor was doomed! This was so important that efforts to revise the standard were quashed; no matter how good the possible improvement, it was not worth the risk!

Also, the claims for XML were that it would be easier than SGML, so SGML project budgets would all be slashed. Immediately. Catastrophically! Now, this was in the days of large bespoke vocabularies and tools that were technically vocabulary-neutral but in fact based on one vocabulary. A hint is that advertising for SGML editors boasted that they could not only handle whatever they could do, like for example, CALS, they could also work with arbitrary DTDs! (Note: You don’t call a customer’s data arbitrary if you want to keep that customer.) While there were a few big public (or public-ish) vocabularies, they were rare. One of the big costs of getting started with a markup project was the document analysis and modeling needed to create your particular tag set. There were people who read the claims for XML and believed that since XML was easier, figuring out what a group’s document structures and requirements were would be faster, cheaper, and easier for XML than SGML. We’ve changed syntax and syntactic capabilities, so the analysis and requirements phase will be easier. Uh. Wishful thinking. (And to look back a crisis of two, actually, one of my big hopes for XML was that it would lead to better, different editing tools, which it hasn’t. But that’s a rant for another day.)

The next threat. Remember SGML? A mere two years later, XML was being touted as the next Great Thing. XML was going take declarative markup mainstream. Everyone would be using XML, talking XML, and enjoying the benefits of generic markup. Everyone would know how wise the people in the generic markup community were, honoring us for our foresight; and we would finally be appropriately rewarded and recognized.

If, and only if, we hid our shameful past association with SGML and focused only on XML. The acronym SGML was banned from conferences. Not only were events renamed to remove this reminder of the past, presentation proposals that mentioned SGML were rejected, and job titles that included SGML were somehow left out of conference programs. Why? Because we needed to sell the would-be customers on the bright new future with XML.

I’ve said it before; I’ll say it again. SGML is unfashionable; it’s not illegal. And actually, I’m aware of several organizations, still in 2023, using SGML tools and applications that were developed before the total domination of XML in our tiny puddle and that still meet their needs. And I can say that at Balisage without causing thunder and lightning. Some of us still use SGML because it works.

And then there was the XML hype. The next threat to XML was the over-hyping of XML. In comparison to its predecessors, XML is a roaring success. In comparison to the early hype, XML is a total failure. Why, people cry, would anyone use XML to encode (fill in the blank with your favorite file type: configuration files, messages across networks, machine-to-machine communication, smokestack emissions data, whatever). And if XML is not absolutely perfect for my favorite file type, it must not be any good at all. We need to drop it and never let it rear its ugly head again. Feeling threatened?

Lions, and tigers, and bears! Why am I digging up all this history? Because we have a new existential threat. AI and Large Language Models. Natural Language Processing, machine learning, and …. It’s going to destroy the world as we know it. To quote some of yesterday’s conversations, LLM makes markup obsolete.

Wait. Haven’t we heard that song before? This time it’s different! This time it’s more frightening! This time the danger is real!

Or at least it’s our catastrophe-du-jour. Artificial Intelligence and Large Language Models: let me start by saying that I know these aren’t the same thing, and I know that they aren’t really all that new. I talk about them together because they seem related, and the reasons to talk about them at a markup conference are pretty much the same. And that’s what we need to do: talk about them.

I know we as a community are worried about them because, as we have put Balisage together, I have heard from quite a few people. I was advised not to take any AI or LLM talks at Balisage 2023. I was advised to cancel those talks and get them off the program because they would frighten people away. I’ve been told that, by talking about LLMs at Balisage, I’m giving aid and comfort to the enemy. (I’m not sure we have enemies.) I have been told by several people that they don’t intend to listen to those Balisage talks.

Now, I’m actually sympathetic to people who choose not to attend talks they find boring or disturbing. I’ve been caught reading newspaper comics during a conference talk I found particularly dull. This wouldn’t be all that bad, but I was moderating the session at the time. Oooops. And I have walked out of movies that were too gory for my tastes. If there is anything at Balisage that you find disturbing, or just plain dull, go play with the cat, make a fresh cup of tea, or see if there is an interesting conversation in one of the social spaces. That’s fine.

But don’t tell me that Artificial Intelligence will destroy civilization as we know it and that all of our jobs will be changed, destroyed, or eliminated. There are those who claim there will be no need for programming, writers will be replaced by apps, and there will be no way to know if what you read is true or the hallucinations of a bot. Talking about using AI and LLM in a markup context is tantamount to encouraging the destruction of civilization. Oh, come on! Let’s relax. Let’s talk about it.

Is this threat different from the ones I previously talked about? Yes, in several important ways. The most important, in my opinion, is that we’re far from alone in being affected this time. For the first time, we may be in the mainstream!

But I’m not panicking. I’m not banishing talk of these frightening topics from Balisage. Perhaps I’m too uninformed to properly appreciate the threat. Perhaps I’m too naïve. Or too old. Or too deeply indoctrinated. In any case, I think we are facing another new thing that will change the way we work. Meaning, something we should learn about and talk about at Balisage.

Enjoy the Garden!

We’re here to listen, to learn, and to interact. Balisage has fewer talks per hour, and fewer talks per day, than most conferences. That’s because our goal is not to maximize the number of speakers, but to maximize thoughtful interaction. I encourage you to listen to the talks, pose interesting questions the in Q&A, chat in the chat, instigate and participate in Birds of a Feather discussions, and use the social spaces.

Are you ready? Let’s go and enjoy the garden!