<?xml version="1.0" encoding="UTF-8"?><article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0-subset Balisage-1.2"><title>But wait, there’s more!</title><info><confgroup><conftitle>Balisage: The Markup Conference 2008</conftitle><confdates>August 12 - 15, 2008</confdates></confgroup><abstract><para>XML has been widely adopted and forms part of the infrastructure of most modern
        information technology. We have a satisfyingly large collection of XML vocabularies and XML
        tools. Is it time to declare victory and go home yet? Or is there more to do?</para></abstract><author><personname><firstname>C. M.</firstname><surname>Sperberg-McQueen</surname></personname><personblurb><para>C. M. Sperberg-McQueen is a member of the technical staff of the World Wide Web
          Consortium. He has served as co-editor of the XML 1.0 specification, the Guidelines of the
          Text Encoding Initiative, and the XML Schema Definition Language (XSDL) 1.1 specification.
          He holds a doctorate in comparative literature. </para></personblurb><affiliation><jobtitle>Member of the technical staff</jobtitle><orgname>World Wide Web Consortium / MIT</orgname></affiliation></author><legalnotice><para>Copyright © 2008 by the author. Used with permission.</para></legalnotice></info><para>Some members of the audience will already be familiar with the phrase that is my title
    today: <quote>But wait. There’s more</quote>. But not everyone will be, so I had better
    start with a few words about Yuri Rubinsky. Yuri was a significant figure in the history of
    descriptive markup. He ran a software company, SoftQuad, that sold a popular SGML editor called
      <emphasis role="ital">Author/Editor</emphasis> and later on an HTML editor called <emphasis role="ital">HoTMetaL</emphasis>, based on <emphasis role="ital">Author/Editor</emphasis>, and
    later still (after Yuri’s time) SoftQuad created <emphasis role="ital">XMetal</emphasis>, which is still a popular application although it has changed hands a number
    of times since SoftQuad was acquired by Corel some years ago.<footnote><para>I am indebted to Tonya Gaylord of Mulberry Technologies for transcribing the tape of
        this talk. I have mostly left the wording alone, but I’ve supplied missing words and
        recast a few sentences whose structure got away from me when the talk was being given, or
        whose structure would not be as clear in print as it was in oral presentation.</para></footnote></para><para>Now, Yuri was a sweet — and sometimes sour — but a sweet, friendly,
    persuasive man: well-liked in the community and well deserving of being liked. He made a
    tremendous impact on a lot of us, including many of us in this room. And he died young. So if
    after this talk, you ask some of us old-timers about him, you may catch us lurching
    unpredictably back and forth between laughter and husky-voiced reminiscences.</para><para>As an engaging and persuasive sort of guy, Yuri was, of course, a pretty good salesman. And
    he was also an enthusiastic and committed cheerleader for the technology of descriptive markup
    and the freedom and responsibility that is entailed by the idea of giving ownership of the data
    to the creators of the data. Sometimes these two roles of salesman and cheerleader came into a
    state of, well, if not conflict, then at least a state of tension.</para><para>There’s a story that Yuri was once on a sales call with a colleague talking to some
    potential customers about the benefits of descriptive markup and the virtues of <emphasis role="ital">Author/Editor</emphasis>. He was eloquent, and SGML and <emphasis role="ital">Author/Editor</emphasis> were in fact a pretty good fit for this particular organization, so
    the potential customers were very soon persuaded. They began giving the usual signs of being
    ready to close the deal, but Yuri kept talking, piling advantage upon advantage to the case for
    descriptive markup and SGML, and eventually they were practically tugging at his arms, reaching
    into their pockets for their checkbooks, and his colleague was making <quote>let’s wrap
      it up</quote> noises, and Yuri turned around, fixed them with his eye, and said <quote>But
      wait. There’s more</quote>.</para><para>Now, the gist of that moment, the way the desire for clarity and the eagerness to show
    people all the ramifications and advantages of descriptive markup overpower the short-term
    desire to close the sale, the way that Yuri is so filled with enthusiasm that he can barely stop
    his exposition in order to allow people to hand him their money — these seem somehow so
    characteristic of Yuri and of his infectious enthusiasm that the story is now inextricably
    linked in our memory with Yuri.</para><para>It’s ten years now — ten years and a few months — since the
    Extensible Markup Language, version 1.0, became a recommendation of the World Wide Web
    Consortium. There’s something a little artificial about anniversaries of this kind, and
    I don’t like to observe them too scrupulously or make too much of them. But it is useful
    from time to time to stop and think about things in a broader frame of reference, to reflect on
    where we were some time ago, where we wanted to go from there, where we went in fact, and where
    we would like to go from here. And we can do a lot worse than to use anniversaries like this one
    as occasions for such periodic self-examination.</para><para>Yuri Rubinsky died the winter before the W3C formed the working group that eventually
    produced the XML specification. He had been one of the first and most persuasive members of the
    SGML community to argue that SGMLers should embrace the web as showing how useful and powerful
    an SGML application could be, instead of looking down on it for the shortcomings of its puny
    HTML tag set and the laughable inadequacies of most HTML processors. He wrote a book once, with
    Murray Maloney, called <emphasis role="ital">SGML on the Web</emphasis>, which is still worth
    reading for one of the most lucid descriptions you will find anywhere of the nature of
    descriptive markup [<xref linkend="yr_sotw"/>]. So it’s no surprise, I guess, that when
    I think about XML and its place in a broader context, I find myself thinking about Yuri Rubinsky
    and about the story I just told you and about the phrase <quote>But wait, there’s
    more</quote>.</para><para>Over time I have come to believe that the phrase <quote>But wait, there’s
    more</quote> — or the variant of the phrase which seems to fit Yuri’s willingness
    to accept the web as something we can learn from, <quote>But wait, there’s <emphasis role="ital">less</emphasis></quote>, — I’ve come to believe that these can
    illuminate our situation in a variety of ways. Now, before I talk about some of them, I have to
    explain that I sometimes think there are two kinds of projects in the world. Other times, I
    think only that there often seem to be two kinds of projects in the world. I don’t
    actually know how many there are or whether these are really distinct. There are what you might
    call <quote>barn raisings</quote> — you have something you want to do, you gather the
    materials and people and resources that are necessary to do it, you do the thing, you raise the
    barn, and then everybody goes home again. (I may need to explain to people who aren’t
    from here that all through the US and Canada as settlement progressed westward through the 18th
    and 19th centuries and new land passed under cultivation, the coming together of communities to
    build barns for new farms — to build barns for each other — was an important
    social binding ritual that is still practiced in more conservative social communities such as
    those of the Mennonites and the Amish. So, a barn raising is an important social event as well
    as a finite project with an end.)</para><para>And then there are other projects that you might call <quote>community farming</quote>.
    Again, you have something you want to do, you gather the people and resources you need, and you
    work together to accomplish your goal, but there is no final whistle, there is no point at which
    the roof beam has been raised, the roof is on, the barn is finished, and you can go home,
    because the task you’re talking about is an ongoing one. Once you have plowed the field,
    you have to start planting it. And once you have planted the field, you have to start weeding.
    And so eventually you have to harvest. And once you finish harvesting, you have to repair the
    plow.</para><para>Now, it’s possible to be mistaken about the kind of work something is. The most
    obvious example that comes to my mind immediately is that when standards development
    organizations are young — when they are first created — it is easy to see that
    many people involved think of the formation of, say, a working group as a barn raising. Ah, we
    have a problem, we form a working group, they write the spec, and then they’re done, and
    everybody goes home. When standards development organizations grow older and when individuals
    gain more experience in standards work, they tend less and less, at least in my experience, to
    think of working groups as barn raising projects and more as farming projects because once the
    spec is there, if it’s going to stay around and be used, it will need maintenance, it
    will need errata, it will need amendments, it will need new versions, it will need the
    development of a better test suite, it will need interpretation of difficult passages, and so on
    and so forth.</para><para>Now, there’s always a danger that a working group will just stay around out of
    inertia because its members are so lacking in imagination that they can’t get their
    heads around the idea that their work is done and they should go home, so every now and then an
    outside intervention is necessary to reorganize things. But there’s often a very good
    reason that working groups have a longer lifetime than some people might at first have expected.</para><para>If we thought that the quiet revolution that Eduardo Gutentag was talking about the other
    day was a barn raising, we were wrong [<xref linkend="eg"/>]. Revolutions are almost never barn
    raisings because, remember, if you succeed in your revolution, you suddenly find yourself
    responsible for day-to-day governance and then your work is never done. And we have in many ways
    succeeded in a quiet revolution. But that means there is a never-ending stream of new
    communities needing markup vocabularies. We need better algorithms for validation or for parsing
    or for processing or for styling or for any of the things that we do with marked up data. We
    need to standardize the XML form of office documents, as painful as that experience may be. We
    need to experiment with alternative ways of handling links and validation and styling and so
    forth and discontinuous structures that overlap. There is always more to be done in our quest to
    make descriptive markup ubiquitous and to help it fulfill the revolutionary potential that we
    see in it.</para><para>Every paper at the conference illustrates one aspect or another of this work, and while I
    would like to discuss them all one by one individually, that would probably take another three
    or four days, and that might make some of your worry about catching your flights, so
    I’ll try to suppress my urge to comment in detail on each paper individually.</para><para>Another sense of the phrase <quote>But wait, there’s more</quote>, is as a design
    reminder. When you’re designing version one, remember there will be more versions. This
    is not, unless of course you’ve managed to design a complete failure — this will
    not be the last version you want to do of this spec, so remember to provide some support for
    versioning your language. In this connection, it is worth suggesting to you that <quote>But
      wait, there’s less</quote>, is a good motto to adopt. Something is better than
    nothing. Correction: Almost anything is better than nothing when it comes to supporting
    versioning.</para><para>It’s very tempting when you’re designing version 1.0 of something that is
    kinda complicated and kinda hard to say, <quote>Oh, man, we can’t think about everything
      at once. We have to focus. We have to identify non-goals. We have to modularize things. We
      hardly know what is going to be in 1.0, let alone what we might want to put in 2.0. We
      can’t design a versioning system that will allow the addition of the features we will
      need in 2.0 because we don’t know what they are. That is way too complicated; we will
      not get it right, so let’s focus on just the immediate task</quote>. If you allow a
    working group to fall into that line of thinking, you have every likelihood that the working
    group will do nothing at all about versioning. Case in point: the XML Schema 1.0 working group.
    We knew it was important; we spent a lot of time talking about it. And our discussion of it made
    clear that we didn’t have the first idea how to do a really good versioning mechanism,
    how to support all the kinds of changes that we would need to make in future versions of XSD,
    without building a lot of useless mechanisms to support changes that we weren’t going to
    turn out to make.</para><para>The only perfect versioning mechanism — no cost without benefit — is a
    versioning mechanism that predicts exactly what changes are going to be necessary. No versioning
    mechanism designed without clairvoyance can be perfect. Important principle: It doesn’t
    have to be perfect to be useful. Those of you who were here Monday will have heard David Orchard
    mentioning HTML as a good example of a language that has survived versioning very well; he is
    not the only one [<xref linkend="do"/>]. It is a very common example, and in fact,
    they’re quite right: HTML did a great job of supporting versioning. Enthusiasts for HTML
    often will tell you, <quote>That is because they got it right. They did a perfect versioning
      mechanism. They said everything you need to know. The only rule you need is: Ignore what you
      don’t understand</quote>. Well, that is, I think, a slight over-simplification. HTML
    didn’t get it perfectly right. The versioning rule in HTML with regard to support for
    later versions of the HTML spec is quite simple: When you see a tag you don’t
    understand, ignore the tag. That is a good fallback. But as Sandro Hawke pointed out on Monday
      [<xref linkend="sh"/>], the best fallback for <quote>blink</quote> would be some other form of
    highlighting like <quote>red color</quote> or <quote>bold-italic</quote> or
    <quote>underscoring</quote> or <quote>very large</quote>. If you just ignore the tags and print
    the content, the one thing you have failed to do is indicate that that phrase is any different
    from its context, and that is almost certainly not the best possible fallback, although it is
    better than nothing.</para><para>It’s also the case that quite often when you’re extending a vocabulary in
    the ways that HTML has been extended, there are two things you might want to ignore. Sometimes
    you want to ignore the tags. That is the right thing to do for the <code>blink</code> tag and
    the <code>font</code> tag and all sorts of phrase-level tags. Other times what you really want
    to do is ignore the element, which is the right thing to do for, say, the <code>script</code>
    element. If you have read any of the textbooks on Javascript that were written within the first
    ten years of the introduction of Javascript, you’ll remember that there is a three-page
    section that says: <quote>Don’t ask why, but at the very beginning of every script
      element you have to repeat these magic formulae. Don’t try to understand it; just do
      it. Alright, if you insist on knowing, this is a common delimiter intended for this processor
      that prevents that from going on, that is a delimiter for this other processor that allows it
      to ignore the first delimiter. Then, there is a special case in that processor that allows it
      get by despite the fact that it doesn’t understand what is going on. Aren’t
      you glad you insisted on knowing why? Again, don’t try to understand it; just copy
      this into the beginning of every one of your scripts</quote>.<footnote><para>I seem to have exaggeraged slightly. In [<xref linkend="fjs"/>], the discussion takes
        two pages (pp. 353f), and in [<xref linkend="nsjs"/>], the core of explanation takes only
        one page (p. 13), not three.</para></footnote> Why? Because HTML didn’t get it perfectly right. There is no way in HTML to
    say this is an element that if you don’t understand, you should ignore the element
    instead of the tag.</para><para>Okay, <quote>Ignore what you don’t understand</quote> is not a perfect rule. HTML
    didn’t get it perfectly right. HTML got it maybe a little less than half right. And HTML
    is nevertheless a huge success story when it comes to a language allowing itself to be
    versioned. Why? Because it did something. That glass that is only one-third full is one-third
    full and not two-thirds empty. Almost anything is better than nothing when it comes to
    supporting versioning. There is more to versioning than you understand. As Peter Brown told us,
    when you start out working on versioning above all, don’t assume you understand what the
    word means [<xref linkend="pb"/>]. Quite true. There is more to versioning than your versioning
    mechanism is going to succeed in supporting. But that is okay. Wait, wait —
    there’s less.</para><para>Exercise humility! This is another way of spinning the phrase <quote>But wait,
      there’s more</quote>, which gives it the sense of <quote>But wait, there’s
      more. Yes, I know, but that is okay. I’m not going to try to be everything to
    everyone</quote>. That has more general applicability. There is more to the problem —
    whatever problem it is you’re working on — than your personal or corporate views
    and interests. There are different people on the working group with different points of view,
    different values, and, yes, hard though it is to remember, different virtues and different
    contributions to make to the collective work, irritating though they will be from time to time.
    We as individuals are not the be all and end all of our collective work. There is more to the
    working group and to the work than that.</para><para>Humility can be very hard to cultivate when so many of our working group colleagues not only
    remind us indirectly of our superiority but demonstrate it daily by their obstinancy in opposing
    what is clearly the right technical solution, that is, the one that we favor. But a little
    humility, even if it is not evenly distributed in the working group, can go a long way in
    helping working groups and other organizations to avoid the kinds of disasters that we heard
    about the other day and to minimize their effect when they happen anyway.</para><para>But perhaps the most important application of the phrase <quote>But wait, there’s
      more</quote>, is to the future of descriptive markup itself. Was XML supposed to be a barn
    raising or a farming project? And, independent of what it was supposed to be, what did it turn
    out to be? And how many barns were we intending to raise before we went out into the fields and
    started plowing?</para><para>Now, as I recall it, in 1996 we had a simple, clear plan. We wanted a web-friendly version
    — subset, cutout, profile, call it what you like — of SGML. We wanted a
    web-friendly version of DSSSL. We wanted a web-friendly version of HyTime. That is, we wanted
    web-friendly versions of the three major work items of ISO/IEC JTC1/SC18/WG8, which was the
    group that defined languages for document processing. Oddly enough, I don’t remember
    anyone ever saying we have to have a web-friendly version of the standard page description
    language. That one never seems to have caught on. I don’t understand why not.</para><para>Alright, well, if that was our goal, it’s interesting to note that it’s
    done. We have web-friendly versions of all of those things. SGML begat XML. DSSSL begat XSL and
    XSL-FO. HyTime begat (at some remove, but still there is a direct genealogical relation) XLink
    and XPointer. Now, XLink and XPointer do not have the uptake that we had hoped for. But they are
    there for those who need them.</para><para><!--* Sidebar: [Interjection about DOM.]  *--> DOM was not part of the original program as I
    understood it. More stuff started coming down the pike even before we were finished with that
    original program.</para><para>But if what were involved in was a barn raising, and we’ve raised all three of the
    barns that we had intended, what are we doing here? I suspect that our presence in this room
    today is an indication that there was more to do. And there <emphasis role="ital">is</emphasis>
    more to do.</para><para>I’ve spent a lot of time in the last few weeks talking to people about what should
    be next, what can be next. Ten years after SGML became an international standard, one way to
    tell the story is that a small group of people ended up trying to solve what seemed to them the
    single most pressing problem of SGML, which was the complexity of the specification, which
    prevented easy software development, which prevented wide-spread common tools, which prevented
    widespread adoption of markup languages.</para><para>So, if we imagine a similar situation now, what is the small group of people gathering in
    some city far away — because they won’t be us — what are they working
    on, or what should they be working on? Or what should we be working on if we are going to take
    the future into our own hands? Interesting question; there are a lot of different answers. I
    don’t have time to go through all of them. I don’t have time to go very well
    through many of them at all. So I’ll focus on the biggest one. The biggest problem
    — I hate to say this because in my retrospective mood of the last few weeks I’ve
    also been thinking about the first time I ever gave a closing talk at a markup industry
    conference. It was 1992 at SGML ’92, and I gave a talk in which I was asked to predict
    the future, and I identified the biggest problem that faced us and what we should do to make
    progress on it, and the biggest problem then is still the biggest problem today. And that is the
    problem of semantics [<xref linkend="s92"/>]. And it’s humbling to read a talk that you
    wrote 15-16 years ago and see the five or six things that you proposed as the right things to
    work on next and note that absolutely nothing has happened on any of those fronts, with one
    exception, which is an interesting exception. I suggested that one way to get a better grip on
    the semantics of markup would be to make it possible to identify that all X’s are
    Y’s, to identify class relations among markup constructs. Now, on that there has been
    progress. One of the major features of XML Schema 1.0 is precisely a system of type inheritance
    that is intended to and does in fact allow you to say precisely that kind of thing.</para><para>Lots of people talk about XSD 1.0 and 1.1 type inheritance. I notice that no one talks about
    it as semantics. This would be troubling to me except that we have an excellent analogy that
    helps us explain that situation. Some of you will have noticed that whenever the workers in the
    field of artificial intelligence finally solve a problem, that problem ceases to be part of the
    field of artificial intelligence. It’s now just engineering. Artificial intelligence is
    effectively the name of all the interesting things that we would have to do to replicate human
    intelligence in artificial form that we don’t currently know how to do. If we know how
    to do them, they’re no longer AI. They’re just engineering in the same way that
    the difference between a normal computer and a supercomputer is not ever measured in cycles per
    second or floating point operations per second or logical inferences per second; it is measured
    in dollars unadjusted for inflation and as the cost of computing power has fallen, the threshold
    of being a supercomputer in terms of computational power has risen. In the same way, what we
    call semantics is all those things that we don’t really know how to do very well. Allen
    Renear pointed out to me this morning that some linguists will distinguish — well,
    linguists have always distinguished, you know, phonology and morphology, syntax and then
    semantics. Some linguists now distinguish semantics from pragmatics. Why is that? Is that
    because they know how to do semantics? No, it’s because pragmatics has crystallized out
    as a field that people feel they have some kind of grip on (or at least they think they have a
    grip on how to study it), whereas semantics remains the black hole into which we throw all the
    stuff we don’t know how to do, but we’d like to do someday if we could only
    figure out how.</para><para>Semantics is a single noun, but it clearly doesn’t denote a single thing.
    It’s a cover term for our ignorance. So our goal really, if we want to have any feelings
    of success, shouldn’t be to solve the problem of semantics. It should be to isolate
    substructures within that complex or cultivate regions within that area and make them
    understood, knowing full well all the while that as we do so, they will cease to be regarded as
    covered by the term <quote>semantics</quote>. They may not be semantics, but they will still be
    useful things to be able to do.</para><para>One form in which the problem of semantics presents itself is the problem of design, the
    problem of modeling. What is the right way, a friend of mine asked me recently, to design a
    language? How do I teach the guys who work for me, the ones who actually design the markup
    languages, how to do it right? How do I tell them how to tell the difference between a good
    markup design and a bad one? When I ask the question that way, I am very pessimistic because I
    think the short story is that good design involves hard thinking. And that means it’s
    just hard.</para><para>Also, things seem to be getting worse. In 1992, at least according to the record of my talk,
    we thought we were nearing consensus on what counted as good design for markup languages. But
    the community has grown — the number of people involved in design has grown —
    and there is a lot of suboptimal XML out there. That was one of the main themes of the W3Quebec
    nocturne the other night here. Just how bad is the XML that you have seen in the wild? And the
    answer is: on the whole, pretty bad, some of it. There were some really outstanding examples of
    ugly vocabularies out there.</para><para>I became aware, while thinking about this, that although we may not be able to solve the
    core problem, we may be able to make progress on it if we give ourselves better tools. Thinking
    is hard, and we don’t have the capability to automate it, at least not now, not until
    our friends in the AI department have finished eliminating the field of AI and actually created
    artificial intelligence. Until they do that, we may not be able to directly support hard
    thinking. But any design involves both hard thinking and a lot of bookkeeping, and if we can
    make better tools for the bookkeeping and for visualizing the results of designs, we may be able
    to make it easier for bears of middling brain to do good design.</para><para>In that context, it seems to be a shame that although a number of people have mentioned the
    importance of prose documentation over the course of this conference, text and documentation
    tools and styles and procedures don’t seem to get much play. In a way, that is a shame.
    Another prospect that is frequently mentioned in this connection is compact syntaxes.
    I’m of two minds about compact syntaxes, partly because when I committed to SGML and
    XML, I committed hard, and syntax without angle brackets makes me nervous, you know. The road to
    hell is paved with compact syntax. But a lot of people like them a lot, and certainly the
    one-to-one mappability between the XML representation of RELAX NG schemas and the compact syntax
    does seem to have prevented the worst from happening there. And compact syntaxes do have the
    advantage that they allow you to get more information within the visual field of the person
    doing the hard thinking than is otherwise feasible with a verbose notation, and if there is one
    thing we have learned from reading Edward Tufte, it is that getting more information into the
    visual field in a tractable form is a good thing to do.</para><para>I think that our difficulties with semantics are related to the interoperability problems
    that were identified by Jerome McDonough [<xref linkend="jmcd"/>]. I think there may be two sets
    of forces at work in the kinds of problems he was talking about. First, there is a sort of
    social pendulum. As he noted, the rhetoric used to sell XML at the outset was all about freedom
    and independence and autonomy. And there was the promise of interoperability, which is the seed
    of the contradiction, but a lot of emphasis on freedom and autonomy. That is, I think, not an
    uncommon phenomenon. If you are asking me to adopt a new technology with which I’m not
    currently familiar, I have two concerns. I’d like to make sure I’m going to get
    some advantage from it (and stopping you harping at me may be enough of an advantage) but I also
    want to make sure that my costs are limited and that it does not impede the freedom I currently
    have to make up my own mind about certain things. So, I at least tend to be very wary of the
    kinds of ontological commitments a new technology may impose upon me.</para><para>In the Text Encoding Initiative, we had this problem in spades. We were quite up front about
    the fact that markup of documents is a hermeneutic activity. But hermeneutics is part of the
    core activity of everyone in our target usage audience. The last thing a professor of English
    literature wants to hear is: <quote>You should adopt this new technology, and it will force you
      into a particular style of interpretation</quote>. No, no — that way lies complete
    non-adoption. Now, it’s true that if you are very upfront about that, you will not have
    the kind of interoperability problems that Jerry McDonough talked about [<xref linkend="jmcd"/>], but not because you have interoperability. The problem simply won’t arise because
    no one will have adopted the technology in the first place.</para><para>So, I think the rhetoric of freedom is likely to be the emphasis in a lot of new
    technologies, and concern about interoperability will come a little later. Now, I think
    it’s rhetorically important, but I also don’t think it’s exactly
    deceptive to say at the outset that XML does help with interoperability. Because <emphasis role="ital">having markup that you understand and that you control is the first prerequisite
      to solving your interoperability problems</emphasis>. As long as my data is controlled by
    anonymous corporation X and your data is controlled by anonymous corporation Y, we have no hope
    of addressing the interoperability problem. Independence from the anonymous entities X and Y
    — or not anonymous, in some cases — is the first prerequisite of
    interoperability.</para><para>The adoption of XML, with all the autonomy that XML entails, did not create the
    interoperability problems. It exposed them; it allowed them to come to the surface. And what you
    will see in the TEI community is that the experience of people in the TEI noticing that they
    don’t have the level of interoperability that they wish for has led to a number of
    movements within the TEI community to make more concrete and fuller agreements. So there is a
    swing from the emphasis on the freedom to the emphasis on interoperability, and that may well
    produce a backlash later on.</para><para>The second set of forces at work here is the fact of incremental consensus. There is only so
    much agreement in the room at any given point, or as Donald Rumsfeld might have said, <quote>You
      ship the spec with the level of consensus you have, not with the level of consensus you might
      wish you had had</quote>. SQL-89 had one of the world’s most eccentric type systems.
    Why? Because they could all agree on integer, and they could all agree on a couple of other
    types, but there were a whole lot of types on which they could not agree so they just left them
    out, with the full expectation that people would extend SQL-89 in different ways. As they did.
    And that led to the well-known complications of SQL-89 interoperability. But the alternative
    would seem to be delaying the spec even further, until you have completely missed your market
    window.</para><para>Sometimes the reason you don’t have consensus on the details like, well, what should
    the date look like, is that you actually have disagreements. Sometimes it’s because some
    people in the room are not ready to reach an agreement on it because they don’t foresee
    that it will be a problem. In any working group, you’re going to have some people who
    know what is going on and predict pretty accurately what is going to happen, and they can say,
      <quote>Gee, you know, if we don’t specify which date format to use, that date field is
      going to be interoperable at this level, but not at the higher levels that we would
    like</quote>. And my experience is that quite often when those people try to explain the problem
    to the other people in the room, they get deer-in-headlights eyes, and at some point, you have
    to say, <quote>Well, they’ll learn eventually, and experience does help people
    learn</quote>. When my university examined client server software, we adopted Gopher. We
    didn’t even look at HTTP and the World Wide Web. If we had, we would have adopted Gopher
    because we could understand it; it was simple, and the additional complexity of HTML would have
    seemed utterly unmotivated. Six months experience with running and using distributed information
    systems taught us plenty, and after six months, we would have understood why HTML had the
    additional complexity it had, and why HTTP was more complicated than the Gopher Protocol, and at
    that point, we might well have adopted — and in fact did, though it was a couple of
    years later — the World Wide Web instead of Gopher. But without that experience, we were
    not in a position to understand, and that is going to be true of many people in the working
    group room as well.</para><para>Now, I feel terrible saying that since 1992 we have made no progress in semantics because
    I’m acutely aware that there are a lot of people in the room, and a lot of communities
    with people who are not in the room, who have spent a lot of time and effort over the last
    decade working on what they think of as solutions to semantics. Topic maps on the one hand, RDF
    on the other — how can I stand here and say that the RDF and Topic Map communities have
    made no progress? Well, I won’t say that. But then, what is wrong with them as a
    solution to semantics problems? The problem I have with them is that very little of the work
    that I understand in Topic Maps or RDF connects with the semantics problems that I have in mind
    when I say <quote>We the users of descriptive markup have a problem with semantics</quote>. What
    I said in 1992 was: <blockquote><para>But if data portability is good, application portability is better. If we are to make
        good on the promises we have made on behalf of SGML to our superiors, our users, and our
        colleagues, about how helpful SGML can be to them, we need application portability. And for
        application portability, alas, so far SGML and the world around it provide very little help.</para><para>Application portability is achieved if you can move an application from one platform to
        another and have it process the data in “the same way”. A crucial first step
        in this process is to define what that way is, so that the claim that Platform X and
        Platform Y do the same thing can be discussed and tested. But SGML provides no mechanism for
        defining processing semantics, so we have no vocabulary for doing so.</para></blockquote></para><para>Now, of course, there are lots of good reasons that SGML doesn’t provide a
    vocabulary of processing primitives, and it’s exactly right. But it simply means that in
    order to solve the problem of application portability, we need to choose and develop —
    choose, establish, develop, provide — some way of getting at those semantics. Do RDF and
    Topic Maps help here? How? Not, alas, in my experience. They offer a lot of functionality. They
    offer many semantically-rich bits. They work very well with data that is extremely regular, like
    triples or associations. They work a little less well with text. But that is where I came in;
    that is what I’m looking for help with.</para><para>Text is not a corner case. If we focus only on the tractable, regular cases because those
    are the ones that are tractable in our attempts to solve semantics, we’re a little bit
    like the drunk who is looking for his keys under the street light instead of where he lost them
    because the light is better there. Intelligence, as Tim Bray used to say, is a textual
    application. There is a reason that the budget of the United States is printed as a book with
    notes, footnotes, and preface and commentary; it’s because the simple array of numbers
    is not the whole story. You get the kind of regularity you get in relational tables, or for that
    matter in triples and associations, by banishing nuance and details to the footnotes. But the
    footnotes will need to be text.</para><para>It’s interesting, of course — we are acutely aware as we write prose
    definitions of the meaning of our markup that even when we write it really, really well, there
    will be readers who are ingenious enough to find ambiguities and uncertainties and vagueness and
    even, God forbid, contradictions in what we have written. Sometimes they’re illusions,
    but quite often, they’re there; we just didn’t see them. Murata Makoto found
    more ambiguities in the XML 1.0 spec than I would have ever imagined, by the simple procedure of
    translating it into Japanese and saying, <quote>Well, how do I translate this sentence? There
      are two different ways. Which does it mean</quote>? Ask me to tell you about the meaning of
      <quote>may not</quote> sometime.</para><para>So, it is natural that people who have been working with prose have always wanted to move to
    some sort of formalism. Allen Renear once <!--* published a paper *--> gave a talk at Extreme
    Markup Languages in which he talked about the ability to have statements in modal logic
    completely replace the prose description because they would be unambiguous, they would be clear,
    they would be precise, they would be compact [<xref linkend="ar"/>]. Before we go there,
    however, we should note that there are no notations that are so precisely defined and so
    widespread and so widely used as, say, programming languages.</para><para>Recently, the <emphasis role="ital">Communications of the ACM</emphasis> published an
    interview with Donald Knuth, who is widely and justifiably regarded as perhaps the greatest
    programmer in the world [<xref linkend="dek"/>]. Remember what Knuth did? One of Knuth’s
    major contributions to programming technology is the introduction of prose into the
    documentation of programs — the invention of literate programming, in which, as he says,
    you say everything twice.<footnote><para>Knuth says, <quote>As I’m writing <emphasis role="ital">The Art of Computer
            Programming</emphasis>, I realized the key to good exposition is to say everything
          twice: informally and formally. The reader gets to lodge it in his brain in two different
          ways, and they reinforce each other. In writing a computer program, it’s also
          natural to say everything in the program twice. You say it in English, what the goals of
          this part of the program are, but then you say it in your computer language. You alternate
          between the informal and the formal. Literate programming enforces this
      idea</quote>.</para></footnote> It’s as if programmers have to be Yossarian in <emphasis role="ital">Catch-22</emphasis>; you will remember that Yossarian was in the hospital because he saw
    everything twice [<xref linkend="jh"/>]. Seeing everything twice is a nice metaphor for the play
    of memory and expectation that is at the heart of the novel <emphasis role="ital">Catch-22</emphasis>, but I won’t go there.</para><para>You have to say everything twice. This is also the mechanism adopted in formal specification
    languages like Z. Every Z textbook I have ever read has some passage where they say <quote>You
      will at some point say, ‘Ah, I have got it’, and you will go home, and you
      will write ten pages of Z formalism, and you will go back to your colleagues, and you will
      say, ‘I have solved it. Here is the design’. You do not have the design. You
      do not have a Z document</quote>. If it is only the formalism, it’s not a
    specification. You must say everything twice. The two ways of saying it reinforce each other,
    clarify each other, disambiguate each other.<footnote><para>One good example is the discussion on page 8 of [<xref linkend="zgb"/>] of the mutual
        reinforcement of formal and informal expression in a specification, and the discussion on
        page 9 of the <quote>mathematical syndrome</quote>.</para></footnote></para><para>So, I sometimes think that what we really need for practical purposes is not just the pure
    formalism that I associate with RDF and, so a slightly lesser extent, with Topic Maps —
    modulo the determined resistance to going the final step to full formalism that Murray Altheim
    was exhibiting the other day [<xref linkend="ma"/>]. And what we really need is not prose by
    itself. What we really need is both. We need a way to embed the formal syntax into prose, as a
    sort of paraphrase. The work Sam Hunting reported on the other day illustrates this very
    precisely [<xref linkend="shunt"/>]. So does the RDFa specification published recently by the
    World Wide Web Consortium. Both of them turn the document author into a kind of Yossarian: they
    enable us to say everything twice, once in the formalism so it’s tractable for machines,
    and once in prose, where the nuances and the hesitations and the limitations can come across to
    the reader.</para><para>I think we also need to have explicit tools for talking about translation from our markup
    into other formalisms. So, the kinds of things that Dichev, Dicheva, and Ditcheva were talking
    about this morning are relevant here [<xref linkend="ddd"/>]. Now, there’s a lot to say
    about translation, translation mechanisms, and tools to support it. I won’t talk about
    them now because they are too important, there’s too much to say, I’ll get
    excited, and you’ll be here for another hour and a half. If we solve these problems or
    even if we just make some progress on them, we will, I think, fairly soon decide that
    they’re not really part of semantics because we know how to deal with them, but we will
    be in a much more comfortable world than we are now.</para><para>We’re almost done here, almost done with this talk, almost done with this
    conference.</para><para>But wait. There is more.</para><para>There is more work for each of us to do. There is more for each of us to learn. There is
    more for each of us to teach the others. But there is more to all of that than I can tell you
    about here now. So really, it’s up to you. Go home now, this barn raising is over. When
    you get home, you’ll have plenty to do, but come back next year and tell the rest of us
    all about it.</para><para>Thank you. Have a safe journey home.</para><bibliography><title>References</title><bibliomixed xml:id="ma" xreflabel="Altheim 2008"> Altheim, Murray. 2008. <quote>Informal
        onotology design: A wiki-based assertion framework</quote>. Proceedings of Balisage 2008,
      Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Altheim01/Balisage2008-Altheim01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Altheim01/Balisage2008-Altheim01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol1.Altheim01</biblioid>.
    </bibliomixed><bibliomixed xml:id="pb" xreflabel="Brown 2008"> Brown, Peter. 2008. <quote>This paper has no
        version: Versioning as a social construct</quote>. Proceedings of International Symposium on
      Versioning XML Vocabularies and Systems, Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Brown02/Balisage2008-Brown02.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Brown02/Balisage2008-Brown02.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol2.Brown02</biblioid>.
    </bibliomixed><bibliomixed xml:id="ddd" xreflabel="Dichev et al. 2008"> Dichev, Christo, Darina Dicheva,
      Boriana Ditcheva, and Mike Moran. 2008. <quote>Translation between RDF and Topic Maps: Divide
        and translate</quote>. Proceedings of Balisage 2008, Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Dichev01/Balisage2008-Dichev01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Dichev01/Balisage2008-Dichev01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol1.Dichev01</biblioid>.
    </bibliomixed><bibliomixed xml:id="fjs" xreflabel="Flanagan 2008"> Flanagan, David. 1998. <emphasis role="ital">JavaScript: The definitive guide</emphasis>. Third edition. Sebastopol, CA:
      O’Reilly. </bibliomixed><bibliomixed xml:id="eg" xreflabel="Gutentag 2008"> Gutentag, Eduardo. 2008. <quote>XML: It was
        not televised after all ...</quote>. Proceedings of Balisage 2008, Montréal. On the
      Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Gutentag01/Balisage2008-Gutentag01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Gutentag01/Balisage2008-Gutentag01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol1.Gutentag01</biblioid>.
    </bibliomixed><bibliomixed xml:id="sh" xreflabel="Hawke 2008"> Hawke, Sandro. 2008. <quote>Forward
        compatibility using XML Transform As Needed (XTAN)</quote>. Proceedings of International
      Symposium on Versioning XML Vocabularies and Systems, Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Hawke01/Balisage2008-Hawke01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Hawke01/Balisage2008-Hawke01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol2.Hawke01</biblioid>.
    </bibliomixed><bibliomixed xml:id="jh" xreflabel="Heller 1961"> Heller, Joseph. 1961. <emphasis role="ital">Catch-22</emphasis>. New York: Simon and Schuster. </bibliomixed><bibliomixed xml:id="shunt" xreflabel="Hunting 2008"> Hunting, Sam. 2008. <quote>Topic maps in
        near-real time</quote>. Proceedings of Balisage 2008, Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Hunting01/Balisage2008-Hunting01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Hunting01/Balisage2008-Hunting01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol1.Hunting01</biblioid>.
    </bibliomixed><bibliomixed xml:id="dek" xreflabel="Knuth, Feigenbaum, and Shustek 2008"> Knuth, Donald, Edward
      Feigenbaum, and Len Shustek. 2008. <quote>Interview. Donald Knuth: A life’s work
        interrupted</quote>. <emphasis role="ital">CACM</emphasis> 51.8: 31-35. Available on the Web
      at <link xlink:href="http://mags.acm.org/communications/200808/" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://mags.acm.org/communications/200808/</link>. doi: <biblioid class="doi">10.1145/1378704.1378715</biblioid>.
    </bibliomixed><bibliomixed xml:id="jmcd" xreflabel="McDonough 2008"> McDonough, Jerome. 2008.
        <quote>Structural metadata and the social limitation of interoperability: A sociotechnical
        view of XML and digital library standards development</quote>. Proceedings of Balisage 2008,
      Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/McDonough01/Balisage2008-McDonough01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/McDonough01/Balisage2008-McDonough01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol1.McDonough01</biblioid>.
    </bibliomixed><bibliomixed xml:id="zgb" xreflabel="McMorran and Powell 1993"> McMorran, Mike, and Steve
      Powell. 1993. <emphasis role="ital">Z guide for beginners</emphasis>. Oxford: Blackwell. </bibliomixed><bibliomixed xml:id="nsjs" xreflabel="Negrino and Smith 2008"> Negrino, Tom, and Dori Smith.
      1999. <emphasis role="ital">JavaScript for the World Wide Web</emphasis>. Third edition.
      Berkeley: Peachpit Press. </bibliomixed><bibliomixed xml:id="do" xreflabel="Orchard 2008"> Orchard, David. 2008. <quote>Versioning
        fundamentals</quote>. Proceedings of International Symposium on Versioning XML Vocabularies
      and Systems, Montréal. On the Web at <link xlink:href="http://www.balisage.net/Proceedings/html/2008/Orchard01/Balisage2008-Orchard01.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.balisage.net/Proceedings/html/2008/Orchard01/Balisage2008-Orchard01.html</link>. doi: <biblioid class="doi">10.4242/BalisageVol2.Orchard01</biblioid>.
    </bibliomixed><bibliomixed xml:id="ar" xreflabel="Renear 2003"> Renear, Allen. 2003. <quote>First thoughts on
        modal logic for document processing</quote>. Talk at Extreme Markup Languages 2003,
      Montréal. </bibliomixed><bibliomixed xml:id="yr_sotw" xreflabel="Rubinsky and Maloney 1997"> Rubinsky, Yuri, and Murray
      Maloney. 1997. <emphasis role="ital">SGML on the WEB: Small steps beyond H.T.M.L.</emphasis>.
      Upper Saddle River, NJ: Prentice Hall PTR, 1997. </bibliomixed><bibliomixed xml:id="s92" xreflabel="Sperberg-McQueen 1992"> Sperberg-McQueen, C. M. 1992.
        <quote>Back to the Frontiers and Edges</quote>. Closing Remarks at SGML ’92: the
      quiet revolution, sponsored by the Graphic Communications Association. Danvers, Massachusetts,
      29 October 1992. Available on the Web at <link xlink:href="http://www.w3.org/People/cmsmcq/1992/edw31.html" xlink:type="simple" xlink:show="new" xlink:actuate="onRequest">http://www.w3.org/People/cmsmcq/1992/edw31.html</link>
    </bibliomixed></bibliography></article>
