How to cite this paper

Usdin, B. Tommie. “Explicit markup: a fool’s errand or the next big thing?” Presented at Balisage: The Markup Conference 2019, Washington, DC, July 30 - August 2, 2019. In Proceedings of Balisage: The Markup Conference 2019. Balisage Series on Markup Technologies, vol. 23 (2019). https://doi.org/10.4242/BalisageVol23.Usdin01.

Balisage: The Markup Conference 2019
July 30 - August 2, 2019

Balisage Paper: Explicit markup: a fool’s errand or the next big thing?

B. Tommie Usdin

Mulberry Technologies

ORCID ID: https://orcid.org/0000-0002-7620-1496

B. Tommie Usdin is President of Mulberry Technologies, Inc., a consultancy specializing in XML and SGML. Ms. Usdin has been working with SGML since 1985 and has been a supporter of XML since 1996. She chairs the Balisage conference. Ms. Usdin has developed DTDs, Schemas, and XML/SGML application frameworks for applications in government and industry. Projects include reference materials in medicine, science, engineering, and law; semiconductor documentation; historical and archival materials. Distribution formats have included print books, magazines, and journals, and both web- and media-based electronic publications. She is co-chair of the NISO Z39-96, JATS: Journal Article Tag Suite Working Group and a member of the NISO STS Standing Committee. You can read more about her at http://www.mulberrytech.com/people/usdin/index.html

Abstract

In 1998, at a Balisage predecessor conference, Brian Reid told us we couldn’t have the world we wanted. XML wouldn’t deliver. He used twenty-year-old slides, slides that he had originally presented at a conference in 1981 to make his point. I still want the world that Brian Reid told us we could not have; I still want Brian Reid to have been wrong. I still believe that separating meaning from format will enable our documents to be displayed in many forms and media, that a markup format that makes hierarchy explicit makes complex documents tractable, that when content creators author in systems that make declarative markup visible and use the author’s knowledge to add value to their content, we will be able to make documents sing! And I have the twenty-year-old slides to prove it.

Brian Reid
Teaching Reading
Back to Declarative Markup
Declarative Markup is A Good Thing
How Do We Get There?
Appendix A. Presentation Slides from Brian Reid’s Markup Technologies ’98 Keynote Address

In a recent conversation with a fellow markup enthusiast, I found myself saying Yes, I agree that making the information products they want from their content would be easier, and the products would be better, if they authored in a way that captured the structure and some of the meanings as they wrote. But it isn’t going to happen. Ever. Because the average person writing a document is not thinking about the process of writing the document, or the structure of the document, or how they might want to use or reuse the document, they are thinking about the subject matter of the documents. They want to use a process to create the document that is as transparent as possible. This means that not only will they use the popular-at-the-moment authoring tool, they will use it as thoughtlessly as possible.

I do not say this as a criticism of these people, it is simply as observation of the way they do, and probably should, write.

That conversation brought back a conference paper I heard many years ago, from Brian Reid.

Brian Reid

How many of you were at Markup Technologies ’98 in Chicago? At that conference we invited Brian Reid to keynote.

Aside: do you know who Brian Reid is? If you play in the markup space you probably should.

As I recall, it took some persuasion to get him to go to Chicago and talk to a markup conference. He said that he had spoken at the Conference on Research and Trends in Document Preparation Systems in Lausanne, Switzerland, in 1981 and said what he had to say to the SGML world. But … he paused … he had been wrong in 1981 and would be happy to tell us why in 1998. He found the plastic transparencies he had used as the visuals for his talk in 1981, scanned them, and used these scans as the basis for his 1998 talk. They are on his website to this day at: 20 years of abstract markup - Any progress? [Reid, 1998] (We’ll look at a few of them in a moment — and the whole set is the appendix to this paper.)

In 1981 he was singing the praises of what we now call declarative markup. He talked about the goals of Scribe:

Figure 1

Scribe project goals

Provide separation of form and content
Provide clerical support to writer
Provide reasonably good typography
Produce a running professional-quality system to get field experience
Produce good tutorial and reference documentation
Total device-independence for a wide range of printing devices

Figure 2

Language overview

Trivial syntax for
- naming (labeling) regions of text
- marking points in text
Writer identifies type of region, compiler formats it accordingly
Language design thus consists of
- selection of the right abstractions
- specifying semantics of their interaction
Language provides NO direct control over resulting format

Figure 3

Morals

High-level non-procedural systems can work
Representation & application of knowledge is important
Incremental change is wonderful
Lack of low-level control provides wonderful uniformity, but makes enemies
Device independence is possible
Writers need computer tools too

Figure 4

Text partitioned into Environments

Each identifies a text object
- body text
- quotation
- equation
- footnote
- etc.
Specific set of rules applied by compiler from database definition of that object
Environments nest like Algol blocks
Some environments are peculiar to specific classes of document type
- return address; signature
- title page; research funding credit

These slides would fit perfectly into the XML Basics for Text Processing class my group will be teaching in a few weeks. This is EXACTLY the song many of us sing for a living in 2019. This talk was in 1981. He was talking about a working tool that:

Used markup to identify parts of documents by what the information was
Separated content from format
Was platform independent

I believe that that conference in 1981 was the conference at which the SGML effort was announced. That is, work on SGML had just begun, and Brian Reid had a working tool and the philosophy underlying the tool, that met many of the goals of SGML and now XML. (Do you begin to see why I think we should all know about the work this man has done?)

Note, however, that these are Brian Reid’s 1981 slides.

He revisited the ideas in 1998:

Figure 5

Observations

Most people won’t use abstract markup even if you threaten them
It is hard to find agreement about structure and abstraction
Selection is easier than synthesis, but the universe is not finite

He continued with:

Figure 6

With procedural markup, skill and experimentation can make up for a lack of knowledge.
With descriptive markup, no skill will save you. You must have knowledge.
Any scheme based on knowledge and quality will remain at the fringe.

Declarative markup (SGML, XML, Scribe, or some other syntaxes) was a lost cause. People were not going to do it. People did not want to think about structure. People were not capable of thinking about structure.

He proved this by sharing some user interface research that some well-known organization (I don’t remember who and his slides don’t say) had done.

People were shown this image:

Then they were asked what would happen if a user clicked on the X and hit delete. He told us that most average users expected this:

Aaaargh! PROOF that people are NOT capable of understanding declarative markup. Proof that we are wasting our time talking about, and wishing for, success of generic-markup-based tools.

End of Story. Game Over. Go Home!

Really. End of Story. Game over. Go home! Time to give it up as a bad investment.

Teaching Reading

Perhaps the reason I remember his paper so well is that I don’t want to agree with him. I don’t think declarative markup is impossible, I haven’t given up, and as far as I’m concerned the game is NOT over.

It reminds me of that time, a very long time ago, when I was a student, and I volunteered with an adult literacy program. The organizers were mostly social workers, and most of the teaching was done by college students; the participants were mostly men in their 50s who were making their way in the world without the ability to read — and generally successfully hiding that fact from their families, friends, and employers. They were smart, hard-working, and generally affable people. It usually took some breakdown in their lives to get them to admit that they couldn’t read, to make the time to go to the literacy workshop, and probably most difficult, to accept help from college students. Some had been unable to complete the tests at the end of a court-ordered educational program, some couldn’t apply for drug rehabilitation or financial assistance, some were shamed by children or grandchildren who asked them to read me a story. They were highly unusual; most of their peers (adults in the United States who cannot read) lie, cheat, bluff, and hide to avoid admitting that they cannot read, and (even if detected) will find every possible excuse not to do anything as humiliating as going to a literacy program.

The program started with the alphabet, followed by what I think of as phonics-lite. That is, a rough guide for how to pronounce many letters and letter combinations, and how to look at a word and figure out what it probably sounds like and say it.

After the first day or two, we were down to about half the participants. Most of them had either decided that they were too stupid to learn to read, or that it was too late for them, or that they had more important things to do with their Tuesday evenings, or that they knew what the word on the Stop sign was, and that was reading so they could already read and didn’t need this program. We had one or two who insisted that it was all a trick and nobody could actually make sense of those little letters on the page, and it didn’t matter anyway. If I cook good enough that you pay $2 for a bowl of my chili, why do I need to read? one asked me.

They fought English phonics like it was a monster. It is not fair, we were told, that some letters were silent — sometimes — and that the same letter combination would make different sounds in different contexts. They wanted reading to be like a game: they wanted clear rules, and they wanted it to be fair. I sympathized. But it isn’t. English may be especially challenging in the respect, but other languages have their own challenges.

For many of our participants, heteronyms (words that are spelled the same, pronounced differently, and have different meanings) were the last straw. (He has tear in the eye when told to tear a page out of the book.) They couldn’t, they wouldn’t, NOBODY could, do this. It was impossible. End of story. Game over. Go home!

And yet … there were pressures. The grandchildren who wanted stories, the judge who had suspended a sentence on condition of completing the program, the need for independence, the heavy burden of pretending and lying and bluffing to get by. Some of them stuck it out.

The participants were paired with reading partners (the college students) and a book. We started, word by word, to sound out the words, then sentences, in the book. We would figure out each word in a paragraph, then go back and read the whole paragraph. The first few pages took days. It was excruciating — but by the third or fourth session the participants were beginning to get comfortable with the process. They were encouraged!

What were we reading?

Not the first readers their grandchildren were using in grade school! We were reading Mickey Spillane detective novels. (For those of you who have missed these literary classics, Mickey Spillane is the name under which a LOT of detective novels and short stories were published, starting in the 1950s. They are short, trite, usually violent, sexy but not explicit, and sexist even for the time in which they were written. They use short words, short sentences, have a fairly small vocabulary, and are definitely not appropriate for children.) The Big Kill starts:

It was one of those nights when the sky came down and wrapped itself around the world. The rain clawed at the windows of the bar like an angry cat and tried to sneak in every time some drunk lurched in the door. The place reeked of stale beer and soggy men with enough cheap perfume thrown in to make you sick.

Two drunks with a nickel between them were arguing over what to play on the juke box until a tomato in a dress that was too tight a year ago pushed the key that started off something noisy and hot. …

— Spillane, 1951

In retrospect, I think selecting hard boiled detective fiction as the reading material for these farmers, mechanics, painters, and laborers was an act of genius.

Each session ended with a pep talk: Look how far you’ve come, look how much you got through today, you are making real progress. And these wonderful patient men would nod, smile, and mutter I still can’t read and don’t think I ever will.

One evening, usually about 6 or 7 sessions into the guided reading part of the program, the team would get to a spot in the story where something exciting was just about to happen. The bad guy was going to shoot the good guy, the detective was about to expose the girl as a spy, or the girl was going to climb into bed with the detective. The bomb was about to go off if our hero didn’t disarm it quickly enough, or the car was heading for a cliff. The teacher would excuse him/her self for a moment — and vanish for a very long time.

When we got back we asked Did he buy her a drink or shoot her? And the no-longer-totally-illiterate participant knew the answer! The would-be victim’s mother had come to the door, the sheriff stepped out from a shadow, or … something. The participant had stopped thinking about reading and started thinking about the story, and suddenly was reading!

The big problem in teaching reading is that as long as you are thinking about reading you are not, you cannot be, reading. Try it. Try reading something while concentrating on the activity of reading. While you are thinking about reading, you are not reading; in order to read, you have to let go of the process and focus on the content you are reading.

Once the participants had made that leap once, it was relatively easy to get them to do it again and again. From there all it took was a few proactive sessions reinforcing the lesson, reading a newspaper, a few government forms, and a children’s book or two. (Books with nonsense words were particularly challenging for new readers.) They had been exposed to the written word their whole lives and had probably picked up a lot of reading basics without knowing it. All they needed was help over one (huge) logical hurdle.

Back to Declarative Markup

So, it wasn’t End of Story. Game Over. Go Home! It was time to learn a new way of thinking and practicing that enough to be able to do it without thinking about it. We KNOW people can do this; most (probably all) of the people in this room can read. You can read without thinking about it, and you take that ability for granted.

The people Brian Reid was talking about were the declarative markup equivalent of my non-readers. They were smart, they probably started as willing users of this new computer-aided writing tool, but even if they understood the premise behind it (and I suspect some of them did), they hadn’t internalized the concepts of generic markup.

I have spent a lot of time with XML users, or would-be XML users, who have a similar experience. We spend a lot of time with them, learning what the parts of their documents are, and selecting, customizing, or occasionally writing, a vocabulary appropriate to their documents.

It is not unusual for a group of subject matter experts and professional writers, when asked to identify the parts of one of their documents, to start talking about the format of the document. What is this? I ask. The answer is sometimes Times 24 Bold or Head 1 or Bell 24. No, I ask, not just the beginning of the thing I circled, the whole thing. Head 1 followed by several paragraphs is the usual answer. If I push it, I can often get Head 1, paragraph, paragraph, Head 2, paragraph, paragraph. And a room full of people who don’t understand why I am being so dense.

With just a little coaching they can, or they learn in the process of doing document analysis, to identify structures: sections, titles, lists, list items, and footnotes. They learn to name, define, and identify in documents subject matter that is important to them and their activity: drug name, ferrous alloy, terrestrial location, ammunition caliber, street address, conference start date. I think of this as the equivalent of learning the alphabet.

Like learning the alphabet, it is necessary to learn to see structures and subject matter content in your documents. Like learning the alphabet is no place close to sufficient to enable reading, learning how to name structures in documents is no place close to sufficient to enable writing in a structure-based tool.

Note: this is a sufficient level of knowledge to enable tagging existing documents. But if authors with this level of understanding are expected to produce declaratively marked-up documents, we are expecting them to write without the markup and then go back and add it. We are adding a time-consuming process to the act of writing. One that they don’t see as integral to the process of writing, and that in fact is not integral to the writing as they are doing it.

Worse than that, we are asking them to do a process that is rife with negative feedback. It is all about errors and warnings. In many structured document editing environments, the most positive feedback you get is silence. And sometimes it is difficult to tell the difference between Victory; you did it, Still processing, and Ooops, application crashed, possibly because of your bewilderingly bad data.

Once the leap is made to thinking about what you are writing as the structures it is, this becomes not just habit-forming but addictive. I have a colleague who can no longer write with a simple text editor without screaming at it. She wants, or perhaps needs, to identify sections, headings, lists, and such as they are created. She prefers to write in a model-driven XML editor, but can set up word processor styles to meet the need. She wants to identify a code block, not specify 10 point courier indented 3 m-spaces. She thinks of it as a code block, not as what it might look like. She thinks of list items as items in a list, or in a nested list, not as starting with a solid bullet or a hollow bullet; not about how far they are indented on the page. Amongst the people in this room, I don’t think that is unusual. Amongst the literate people on planet earth, it is very unusual indeed. Even amongst the people who write using computer-based tools (don’t forget that many people still write using ink on paper), this is a very unusual point of view.

I believe that Brian Reid was talking about people who have learned the alphabet of structured documents but who have not learned to read them. People who are sounding out one word after another, thinking about reading instead of thinking about the content of the document, and struggling with the process. Those people, with that level of comfort with declarative markup, will never adopt it. They cannot.

But I don’t believe that this is Game Over! Just as those farm hands, cooks, and mechanics could make the leap from the alphabet to sounding out words to reading, so can the authors of today make the leap from presentation to identifying structures in existing texts to composing thoughts in structural terms.

I did it the way most of you probably did. Through exposure, and repetition, and working with systems I was fighting tooth and nail but had to use anyway. It was actually a fairly discouraging process, but it happened so long ago I can barely remember it. I can completely understand someone trying to write a document refusing to use a tool that forces them to stop thinking about their subject matter and think about something unfamiliar in the process of trying to capture their thoughts on some topic.

Declarative Markup is A Good Thing

I believe that we, as a society, would be better off if many, perhaps most, of our documents were encoded with declarative markup. It would make them more discoverable, more accessible, and probably better organized and understandable.

I believe that there is information about many documents that the author knows that would add significant value to the document if it were captured when the author creates the document. Some of this information can be added later, usually at significant cost. Some of it is simply unavailable once the document is separated from the author. (As an example, it is possible for third parties to write descriptions of graphics, but they cannot be sure that they describe the aspect of the graphic that is the most important point the author wanted to make in using that image.)

How Do We Get There?

The success of that adult literacy program I talked about was based on three things:

Motivation
Instruction on fundamentals
Absorbing first materials

Let’s talk about these one at a time.

Motivation. There are a few, very special, circumstances, in which people have the motivation to learn to write in a declarative way. I have worked with helicopter pilots turned helicopter documentation specialists who learned to write in an SGML editor. There are professional editors who work comfortably in grammar-driven editors, and technical writers, and people who are working with very stable structured documents of many types.

But for most writers of most documents, there is no reason to care. Even if their documents will be published in several media, and even if they would be more discoverable if they were better structured, that is generally hidden from the author.

They will only make the investment in learning something new if we make the reasons clear. No, if we make the reward for overcoming the significant hurdle in the path more than worth the effort.

I suspect that that reward will have to vary from user community to user community. I worked with the support-analysis group of a very large organization. Decision-makers would ask this group to research topics of interest. Sometimes they wanted a one-page summary of options and advantages and disadvantages of each, sometimes they wanted in-depth studies that looked like research reports or even dissertations. Sometimes they specified that they needed the information within the next few months, but more often they wanted it as soon as possible. Using the word-processing based systems, there was usually about a 3-day lag between completion of the analysts’ work and presentation of the document to the requestor. Once we brought in the markup-based application, the analysts had a choice: they could continue to use their familiar word processor, and the publication team would convert the content and format it in the new tools — which would take about 2 days. Less than the old system; this was a win! But if they used the new authoring environment and created marked-up documents, their content would be approved by legal and formatted for delivery within 24 hours. Sometimes faster than that. Low and behold: most of them came to the classes on how to use the new editor, and most of them learned to use the new tool right away. By several months in there was only one hold-out, and when it became clear that the users preferred the other analysts because their documents were not only delivered more quickly but were also better organized — that one asked for private coaching in the @#(*&^ new system.

Instruction on fundamentals. Well, we have a fair amount of material for this. I don’t think most of it is appropriate to most users. I say this as someone who has written, and taught, classes on SGML and XML for years. We need better instructional material. I’ve seen some that was much worse than mine, but I haven’t seen any that really impressed me. It would be good if we figured out how to teach this stuff.

But I don’t think that is really the problem; the amount and quality of instruction on the alphabet and pronunciation in my literacy class was almost embarrassing. That didn’t matter.

Absorbing first materials. What we REALLY need is the functional equivalent of a Mickey Spillane novel for beginners with declarative markup.

Not reading material, but some instant gratification application that works quickly and gracefully if provided with well marked-up text. I don’t know what that application is. I charge you to start thinking about it. Actually, I hope to see several, dramatically different, applications that provide instant value for the effort of explicitly marking up the structure and content of documents. It is these applications that might, just might, help a significant number of document creators over that big barrier to graceful and comfortable use of declarative markup.

With luck, the presentations and conversation here at Balisage will help one or more of us create just that barrier-breaking tool.

Appendix A. Presentation Slides from Brian Reid’s Markup Technologies ’98 Keynote Address

References

[Reid, 1998] Reid, Brian. 20 years of abstract markup - Any progress? Keynote Address at Markup Technologies ’98, Chicago, Illinois, November 19 - 20, 1998. http://xml.coverpages.org/mt98-papers.html. Presentation slides available at http://www.reid.org/~brian/markup98.ppt.

[Spillane, 1951] Spillane, Mickey. The Big Kill. 1st ptg. edition. New York: New American Library (Signet Book #915), 1951. Available at https://www.amazon.com/Big-Kill-Mickey-Spillane/dp/0451093836.

Reid, Brian. 20 years of abstract markup - Any progress? Keynote Address at Markup Technologies ’98, Chicago, Illinois, November 19 - 20, 1998. http://xml.coverpages.org/mt98-papers.html. Presentation slides available at http://www.reid.org/~brian/markup98.ppt.

Spillane, Mickey. The Big Kill. 1st ptg. edition. New York: New American Library (Signet Book #915), 1951. Available at https://www.amazon.com/Big-Kill-Mickey-Spillane/dp/0451093836.

BalisageThe Markup Conference2019