How to cite this paper

Usdin, B. Tommie. “The (unspoken) XML gotcha.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Usdin01.

Balisage: The Markup Conference 2021
August 2 - 6, 2021

Balisage Paper: The (unspoken) XML gotcha

B. Tommie Usdin

Mulberry Technologies

B. Tommie Usdin is President of Mulberry Technologies, Inc., a consultancy specializing in XML for textual documents. Ms. Usdin has been working with SGML since 1985 and has been a supporter of XML since 1996. She chairs the Balisage conference. Ms. Usdin has developed DTDs, Schemas, and XML/SGML application frameworks for applications in government and industry. Projects include reference materials in medicine, science, engineering, and law; semiconductor documentation; historical and archival materials. Distribution formats have included print books, magazines, and journals, and both web- and media-based electronic publications. She is co-chair of the NISO Z39-96, JATS: Journal Article Tag Suite Working Group and a member of the BITS Working Group and the NISO STS Standing Committee. You can read more about her at http://www.mulberrytech.com/people/usdin/index.html.

Copyright ©2021, Mulberry Technologies, Inc. Used with permission.

Abstract

XML is a platform-neutral way to exchange, share, and manipulate information. But what persuades many to use XML is the claim that XML provides a long-term way to store information, independent of tools (both hardware and software) with their short life spans. Projects spend significant resources on XML setup and then settle into doing the real work, using that XML infrastructure to compile, write, analyze, or whatever it is they do. Until, one day — something doesn’t work. Hardware is retired; software is upgraded; specifications go into new releases. Users get stuck. And when they complain, we respond, Of course that doesn’t work any more; you have been accumulating technical debt for years! It is time to reinvest. They thought they had committed to a one-time cost, and now we tell them that it is an ongoing expense. If the user had put documents into their favorite spreadsheet, they complain, they could still import them into the current version. How do we answer that complaint? We (the XMLers) think we described the values of XML plainly and fairly. We (the XML users) think that the claim that XML documents last a long time is relying on a specious technicality, and we have been trapped dishonestly. I live on both sides of this: as a user I want to invest in infrastructure once and have it last; as a developer I want to be able to improve my product without the limitations imposed by backwards compatibility. We as a community often complain that not enough people are using XML. If we really want XML use to grow, we need to address the gotcha that too many XML users are feeling.

Table of Contents

Appendix A. Excerpts of Introduction to XML for XYZ (February, 2001)

Note

This talk was presented at the beginning of Balisage: The Markup Conference 2021. The proceedings of the conference are available at: https://www.balisage.net/Proceedings/vol26/. The conference program is available at: https://www.balisage.net/2021/Program.html.

Like many of us in the markup community — possibly most of the people here — I want the users of declarative markup, which these days really means users of XML, to feel comfortable. I want them to think that their investments in XML are appropriate and that their XML applications at least meet their needs, if not exceed them. I love it when we hear success stories. Conference papers of the genre How We Did It Good At My Place are amongst my favorites although I know that there are at least some of you here who say, Another case study? Uh, time to check my email. Well, not me — I love them.

And I cringe when I hear about XML applications that fail. There are a lot of reasons for XML applications to fail; the most common one, in my opinion, is inappropriate expectations. The users don’t ever put it that way. What the users say is: The technology let them down — it doesn’t work. But what that usually means is they expected magic in some form, and they didn’t get it. And they blame XML. And in a way they’re right in that. Well, not XML per se, but the XML community. We, the people who know descriptive markup, usually are the ones who helped them select and/or create their XML application: Either in person or through our publications, we helped them design or select a vocabulary, we helped them build their cool applications, and we helped them convert their existing information into XML. We got them started doing business in a new way, which is very, very cool.

We — we as a community and I as an individual — promote XML. I have done this many, many times. In fact, let me share some slides from a presentation I gave in 2001.

The appendix to this paper has excerpts from a presentation I gave in February, 2001. I’m showing it to you, not because I think it is particularly interesting or particularly unusual, but quite to the contrary because I think it is absolutely typical of not only presentations I gave hundreds of times but also of presentations given by many, many other people.

This particular presentation came about because the techie people at the organization that I am calling XYZ said, We want to move our publication process to XML, and we need somebody to come in and tell our manager and money people what this stuff is that we want to spend money on and why it is a good idea for us. So, this presentation is essentially what is XML and Why Should You Care for managers. It starts with what XML is trying to achieve: one set of data for many publishing formats; communication of information; reuse of information; platform independence; vendor independence; one data format, many presentation formats; get away from the typesetting file trap; make a whole bunch of things from the same source.

It goes on to: What is XML? It’s a data format; it’s generic markup. XML looks at things as documents; it’s divided into elements and attributes. It’s a data format; you can make stuff that looks like this, and stuff that looks like that, and oh, stuff that looks like this. You can reuse your XML for print and voice synthesis and braille, and you can make electronic things, including HTML, out of your XML.

This is a really familiar song. You’ve all sung it, right? It can and should be generic markup, we say. We publish from XML; XML separates content from format and behavior. It uses an output specification to get there — we call these stylesheets.

I’m not telling anybody at Balisage anything you haven’t already heard a whole lot of times. You can use and reuse your XML; perhaps that’s the most important thing. You can reuse and re-purpose your content; you can make subsets and spin-offs. You can do all kinds of cool things with your XML. It is long-term, software-independent, archivable; your XML will last forever. You can use it for workflow. You can use it for large datasets, especially from disparate sources. You can maintain consistency. It’s easy to learn and use over the long-term.

XML is wonderful.

We tell them that it’s going to change you way you work and you’re going to have to learn some stuff and you’re going to need a little expertise. You need training. You need schemas. You need to make your XML part of your production process. You probably have to convert your backfiles. The bad news, at least as I ended my typical presentations, is there is no free lunch. Just because it’s XML doesn’t mean it’s good. You’re going to have to do more work. The good news is that you can do XML. There are long-term benefits; it will work for you.

There was more, of course. This is a four- or five-minute summary of a 90-minute presentation. As I said, nothing there that we haven’t all heard and probably said a dozen times at least. But I don’t think what I think I was saying is what my audience heard.

I had a similar experience dealing with a technology that I know very little about. I have in a suitcase under a bed a very expensive piece of junk. It’s a custom-made wetsuit that I had made about 20 years ago when I was learning how to dive. Why did I have a custom wetsuit made? Because there is a reason they don’t show Poppin’Fresh getting into that can.

Putting on a wetsuit that doesn’t fit, especially for someone with my general physique, is not pretty and not comfortable. So, I was measured and re-measured, and a skilled seamstress made me a wetsuit that I could put on and take off easily and that made me a lot more comfortable for long periods of time in the water. And they convinced me that I would go diving more often because it was comfortable. I was told that it would last forever because they had designed it to be adjustable. Should I change shape, they would be able to easily modify the wetsuit; they would be able to add to it or take away from it if my girth changed. This was just going to be just wonderful.

What didn’t they tell me? Well, first of all, apparently you have to apply stuff on a regular basis to a wetsuit to keep it pliable, and even if you do, neoprene has a limited lifespan. After somewhere between four and ten years, it becomes brittle and cracks. So, now there is an expensive piece of junk in a suitcase under a bed.

Did I know to ask how to maintain it? No. And even if I did, did I know to ask, how long, even if I maintained it, the thing would last? No. Does everybody who works in the wet environment know all of these things? Yes. Was I being foolish for not knowing to ask? Yes. But I was new to this.

When we who know about markup tell a story about a platform-neutral way to exchange, share, and manipulate information, users don’t hear a platform-neutral way to exchange the information content of your documents, but not the applications you built around them. They hear You can move your stuff around. Yeah.

We tell them XML provides a long-term way to store information independent of tools, both hardware and software with their short lifespans. And this is true. But it doesn’t occur to them that they’re spending a lot of time, energy, and money on tools we just told them have very short lifespans. They spend a lot of money and resources getting set up, then settle in to doing real work with XML with the expectation that they can now focus on their subject matter.

You know how it works: They bring in a team of outside experts, and we get them started. We do a little training, we write some documentation, we help them buy tools, and we help customize the tools. Once we get everything working, we do a little training. Life is good, and we go away to do the same thing for somebody else.

They keep working with their documents. And it’s working, and it’s working, and it’s working. And then one day, it’s not. They didn’t expect that. They don’t know what happened. And they are very, very unhappy about it.

Actually, I know exactly how they feel. I write a lot of slides — the ones in the appendix, for example — using the Mulberry slideshow XML tool chain. We write slides in fairly complex XML because from this XML source we can make slide decks, we can make our handouts, and we can make exercise books for classes. There is a bunch of stuff in the XML from which we make our slides.

A few months ago, I went to make some slides, and it didn’t work. Not only didn’t it work, but the error message I got pointed to the last line of my input file. (You know what it means when an error message points to the last line of your input file? It means some tool is saying I don’t know! The application got lost.) There is nothing helpful about an error message that points to the last line of your input file because it’s a really good bet that is not where the problem is.

What had happened? It’s a long story that will be familiar to you all. My favorite photo editor put out a new version with some features I wanted. But the photo editor required a newer version of the operating system than I was running. My ten-year-old machine wouldn’t run the new version of the operating system. So, I got a new machine which ran the new operating system which could run the new photo editor, but which meant I had to get new copies of everything else that I was using. So, I needed a new copy of my favorite XML editor, a new copy of my formatter, and new copies of a lot of tools.

Actually, the XML editor itself worked. I could write the new slides; I just couldn’t convert my XML into anything else because it turned out that the version of XProc that I was expecting to use wasn’t available in the framework I was using. And the version of XSLT in the framework I was using didn’t support the proprietary extensions that had been in the ten-year-old version because there were functional equivalents in newer versions of XSLT that weren’t available then; we didn’t need the proprietary extensions because we had better ways to do it.

Sigh. I know. This wasn’t XML breaking; this was normal technological change. It also brought me to an absolute full stop, and I was so frustrated trying to chase it down that I actually considered writing slides in PowerPoint. I despise PowerPoint.

I know as much about this stuff as any of our users — maybe more — and it was making me crazy trying to figure out how to deal with it. A lot of our users when they hit that wall the first time think we broke our promises. We said their XML stuff would work for the long-term, and it doesn’t. If they had done this in PowerPoint and they had had to buy a new version of Office, there would have been a smooth upgrade path. There would have been a button they could push that said Make your old stuff work in the new one, and it would have. That doesn’t happen in XML, and it drives them crazy.

So why am I talking about this? I’m talking about this because we, as an XML community, often complain that not enough people are using XML, and we’re indignant about it. They should be. XML is wonderful. Why aren’t they? They must be stupid.

Well, they are probably not stupid, and they’re not using XML for reasons. The people we want are people who are running successful projects and successful businesses, and who do know the stuff they know. The stuff they know just isn’t necessarily the stuff we know.

If we want the use of XML to grow — if we want XML users to be successful — we need to address the gotcha that they’re feeling. They’re feeling that we’re letting them down because we are. We need to make it clear that platform-neutral means they can re-invest in application development at any time they want, but they shouldn’t think the investments that they made in creating their environments will last forever. They think we’re telling them that XML applications are self-maintaining. They are not. And we need to be clear about that.

It would also be nice if we made it a little easier for them to detect what it is that is broken when something does go wrong. I hate error messages that point to the last line of an input file. That is the tool saying, Nah, nah, I’m not going to help or perhaps I can’t help. But let’s see if we can do a little bit better.

Why am I talking about this at Balisage? Why am I talking about this at the beginning of Balisage? Because I want to remind us all that what we’re talking about is important. It is important not just to us, the people who understand markup; it is or could be, perhaps should be, important to the world in general. We are capable of making it so.

I want to start a discussion of what we can do to nurture the understanding of the concepts that we at Balisage find clear, important, and in many cases, obvious; but that the rest of the world does not. They’re not ignoring us because they understand it and dismiss us; they’re ignoring us because they have no idea what it is we’re talking about. We fill our language with jargon, and we, all too often, skip over stuff that we think is obvious that they don’t know.

We, as a community, want XML to thrive and to grow. It is not, at least, not in the ways we want. Part of that is because nothing solves all problems, nothing appeals to everyone, and truthfully, there are situations in which there are more appropriate approaches. It would be good if we admitted that. And there are users who will always go with the newest, shiny thing; and we aren’t the newest, shiny thing anymore. Declarative markup, and XML in specific, is something that just works.

But we’re also letting our community down by not communicating as well as we could. We can get better at helping people set reasonable expectations and navigate the process of changing our tools. Also, I encourage tool suppliers in particular, but also consultants, to try harder to make this comprehensible to user people, to subject matter experts.

The next time one of us is inspired to jump on a horse and charge into some hapless project and move them from some inappropriate technology (Are they really storing all of their texts in Excel?) to a much more appropriate technology (This belongs in XML; this is long-term important.), you’ll stop and think: Are you moving them from something they know how to use and know how to maintain to something completely foreign to them? Are you going to get them there and leave them in the lurch when you leave? Don’t just get them started; make a long-term plan for sustainability not just of their XML documents but of the XML ecosystem that you are helping them set up. If they want declarative markup for long-term stability of their content, set them up for long-term success. You are a false hero if you set them up with a short-term, shiny toy that they can’t maintain.

I want us to stop talking about XML as a document format and start talking and thinking about it as part of an environment. Fortunately, we have a talks at Balisage that help us think about why this is important and how we might start making those changes.

Appendix A. Excerpts of Introduction to XML for XYZ (February, 2001)