Creatures of a season
Copyright © 2016 by the author.
There are stories which take place in imaginary worlds where there are clans of witches who live in the far North and live for hundreds of years. They take men as lovers, but in a very short time, the man will grow old and die while the witch has not visibly aged at all. Falling in love with a man is a sort of bittersweet experience for witches in these worlds: sweet because it’s falling in love, and bitter because the witch knows it cannot last.
Do you ever wonder whether data which have a long lifetime of relevance ever fall in love with a particular piece of software used to display them? The fact of the matter is we have plenty of data which live much, much longer than any piece of software ever has. There are papyri from Egypt being digitalized today which were written 5000 years ago. Contemporary historians build databases today to describe events which happened 2000 years ago. There are important printed books from 200 years ago and all sorts of ages. No software has ever lived anything like that long.
And it’s not just academic projects that have data that outlive software. Railroads whose rights of way involve 99-year leases may have existing leases which are close to expiring — and thus close to a century old — and new leases which will run for just about another century. Their data management systems must handle time spans of at least a couple of centuries.
And even on a much shorter timescale, the technical publishers who were among the first promoters and adopters of descriptive markup had data that lived longer or should live longer than the initial application of the data. The problem lies not in the absolute durations involved but in the mismatch between the long-lived and the short-lived.
Now, to help the data stay alive longer we have often focused on locating its essence. Years ago a fellow named Dave Sklar identified at a predecessor conference to this one a distinction that has stuck with me and, I think, with some others probably in the room ever since: a distinction between a sort of gold standard archival form for the data utterly independent of any particular bias toward any particular piece of software and software-dependent forms which were tailored to particular packages.
The digital humanist, Peter Robinson, talking about how to keep data alive and useful
for a longer time, recently warned projects: Don’t fall in love with the user interfaces
you devote so much time and effort to, because they will not last, they will not stay
beautiful forever. He summed it up with the slogan
Your user interface is everyone else’s enemy.
In many ways this approach of looking for the essence has served us well. SGML is still useable after 30 years, as Betty Harvey showed us yesterday; it is sometimes a little bit of a challenge today to use non-XML SGML, but it’s a soluble challenge [Harvey 2016]. Those data are still there, they are still useable, they’re still being used. That’s not true of every dataset that was in use 30 years ago, or of every data format that was in use 30 years ago.
In consequence, much of our work consists in trying to identify the essence of our information and disentangling it from temporary accretions, from accidents in the philsophical sense of the word, that is, properties which are not part of the essense of the thing. Perhaps surprisingly, this work can take two apparently opposite and opposing forms.
First, we can attempt to unify apparently diverse phenomena under a single category. Todd Carpenter talked to us Tuesday about the importance of open standards in unifying communities around common vocabularies which identify commonalities of interests which underlie a variety of superficial appearances [Carpenter 2016]. Hans-Jürgan Rennau showed us Wednesday on a very, very different level how much we can achieve if we extend a clean core set of ideas more broadly in his paper on FOXpath [Rennau 2016].
Now, when we do reach agreement on common vocabularies, it’s useful to have tools to police the boundaries of our agreement. And the most common tool, of course, is an explicit definition of a vocabulary in a form that allows mechanical checking of validity, so we spend a lot of time and effort constructing schemas, either for individual projects or as documentation of community agreement. Techniques like the agile development practice described on Wednesday by Dianne Kennedy are important because they can reduce the cost of such schema development [Kennedy 2016].
Now, it’s possible to argue that declarative, mechanically checkable definitions of validity are not a good idea. Both Charles Goldfarb (the editor of ISO 8879, the SGML specification) and the editors of HTML5 have made, surprisingly enough, arguments of roughly that form. (That’s not an agreement I would have expected to see.) But procedural definitions of validity of the kind proposed for HTML5 make validity harder to check and harder to reason about. They increase the cost of working with the vocabulary in question so that they’re often feasible socially only for a very, very few formats which are thought to be so important that people are willing to put up with that extra cost. And they risk tying the vocabulary fatally to the behavior of one particular piece of software.
Non-procedural definitions of correctness, such as the semantic correctness which — if I understand Lynne Price correctly — Charles Goldfarb wanted to define as part of validity, increase costs even more. Although they make it easier to reason about the set of documents which are semantically correct, they make it much more expensive to identify those documents in practice: such identification cannot be done mechanically, in the general case, and therefore requires human eyeballs. It’s extremely important in the practical utility of descriptive markup that we distinguish, like logicians, between the mechanically checkable characteristic of validity and the semantically important property of correctness or truth. Validators for a given notation are thus crucial if we want to have a vendor-independent definition of correctness. These can require a lot of work as illustrated by Tony Graham’s discussion of the focheck validation framework for XSL-FO [Graham 2016].
Sometimes our search for the essence leads us, however, not to unify apparently diverse phenomena under a single category but, on the contrary, to distinguish things which on the surface appear to be the same, as in the argument by Jacob Jett, Tim Cole, David Dubin, and Allen Renear the other day that we will build better systems if we distinguish links not just by their anchors but by their targets, which may be different even when the anchors are the same [Jett et al. 2016].
Sometimes our search for the essence focuses on finding the right way to represent the information. Sometimes that will lead us to split, sometimes to lump. Syd Bauman’s discussion of soft hyphens in historical books showed us on Tuesday how much difference the representation we choose can make for our ability to process the data conveniently [Bauman 2016]. But unlike the witches of the stories I was thinking about, our data and the vocabularies in which we express them are unlikely to be timeless and unchanging no matter how hard we work to get things right when we design and make them.
Now, there are philosophers who argue that a complex cultural object like a symphony is a collection of notes in a particular temporal arrangement. And similarly a text is a particular collection of words — maybe a collection also of other things, but let’s stick with words for the moment — in a particular arrangement. Now, it follows (they say) that a different collection of words or notes — or the same collection in a different arrangement — cannot possibly be the same symphony or the same text. This is good because it gives them a clear, coherent notion of identity for complex objects like symphonies or texts. It does, however, have the unfortunate side effect of requiring that we deny, on any given night when one of the players on the second desk of of the second violins accidentally flubs an F-sharp (F#), that on that night the New York Philharmonic has performed Beethoven’s Fifth Symphony: one of the notes was different. It can’t be the Fifth Symphony because the Fifth Symphony is that arrangement of those notes, not that arrangement of those notes. From this approach it similarly follows that when we correct a typographic error or grammatical error in a document, we don’t have the old document in a new form; we have a new document representing a distinct text. A miss is as good as a mile in this world, and nothing except the eternally unchanging can ever retain its identity.
But where we live, below the crystal sphere of the moon, most of us don’t take such
an austere view of identity for objects like documents or texts or symphonies, so
we need ways of keeping track of things as they change. We need methods of naming
them, like the one that was suggested on Wednesday by Ari Nordström to keep track
of any identity which is asserted by us and which lies at the core of our document
management systems [Nordström 2016], even in the face of changes to the collection of words, and even in the face of
difficulty in describing in a way that will satisfy a philosopher just what it is
we actually mean by the phrase
the same document.. Just as our documents change, our vocabularies for documents will change, so we
need ways of managing change in our vocabularies and ways of defining customizations
of those vocabularies to support such change and ways of distinguishing changes that
we might regard as safe from changes that we expect to be unsafe for some definition
of safety, just along the lines of those outlined by Tommie Usdin, Jeff Beck, Laura
Randall, and Debbie Lapeyre, who showed us one possible way to design vocabularies
to support graceful change and customization [Usdin et al. 2016].
Change is a complicated matter, not just for us, but for a lot of fields. Sometimes
the direction of change is crucial, and sometimes it’s not. If some users of a public
vocabulary change it in one direction, and other users change it in another direction,
it will often be the case that both of those two modifications will co-exist in time
and sometimes have to co-exist in the same database with documents that use the unchanged
vocabulary. The direction of change is irrelevant here, so it’s interesting and important
that there is a certain duality between change as a phenomenon in time and variation
as a phenomenon that can be regarded as sort of
out of time. And as Robin La Fontaine showed us in the first talk on Tuesday, markup that was
originally devised for recording changes — directional changes — can in fact, if we
ignore the direction of change (factor it out of our consideration), be used to record
structural variation in documents, or multiple structures within the same document.
He provides a new and, to me at least, an unexpected technique for dealing with overlapping
hierarchies in XML [La Fontaine 2016].
A similar kind of stepping outside of time can make it possible to visualize complex
processes, as illustrated by Evan Lenz’ visualizations of XSLT transformations, which
themselves translate the temporal sequence of events in the execution of a transformation
into a spatial arrangement of inputs and outputs with indications of correspondence
which can make an otherwise very, very complicated set of changes easy to understand
visually [Lenz 2016]. This is particularly useful for XSLT because since it is a functional language,
the temporal sequence that we imagine for the transformation isn’t necessarily the one the transformation engine actually
performs. You will have had this experience yourself if you have ever tried the step-by-step
debugger for Saxon made available in the Oxygen XML editor; if you’re like me you
will have asked yourself
How did we get from the evaluation of that expression to the evaluation of that expression in that sequence? and the answer, of course, is that because XSLT is functional and declarative, it
offers no guarantees of temporal sequence. We always attribute a temporal sequence
to things; the spec uses — appeals to — temporal sequence not infrequently (although
less frequently perhaps than it used to because I keep asking that we eliminate it).
And some processors like Saxon invert the code in ways which are complicated for people
who haven’t studied the work of Michael Jackson (the other Michael Jackson, not the
singer) on program inversion, but which have wonderful effects for, among other things,
the temporal performance of the transformation.
That same duality between change and state can also be exploited in very practical ways, as illustrated by Damon Feldman’s paper on Monday discussing the construction of systems in which we save messages, not simply use them as triggers for actions, but save them as part of the record [Feldman 2016]. There is an important database theorist (whose name I have not been able to recover from the web) who has concluded that strictly speaking it’s the database logs which are the real database and which accurately and completely describe the data. The tables and indices that the DBMS code consults when it evaluates a query are merely an optimization; the true database is the transaction log. And if you actually persist the transaction log, you can, as Damon Feldman showed us, build some interesting things with it.
But if we focus only on the essence of our information and identify and cultivate the timeless, unchanging portions of it, we get only part of the story. If we focus only on things that never change, we risk behaving like a witch who never falls in love. We avoid some pain that way, but many people who have fallen in love have been known to say that it’s worth it even if it is occasionally attended by pain. When Dave Sklar distinguished the gold standard archival form from software dependent forms, he wasn’t saying we should never use software dependent forms; he worked for a software company. He was merely coming clean with the observation that being in SGML didn’t automatically guarantee that the format was a gold standard archival form, that there could be software biases even within different forms of descriptive markup.
When Peter Robinson says your user interface is everyone else’s enemy, he is not saying that we should not develop user interfaces. There have been digital humanities projects that take that route. The digitization of the Archimedes Palimpsest at the Walters Art Gallery in Baltimore is a great example. They have a very nice webpage describing the project, explaining what a palimpsest is, and describing the discovery of the palimpsest of otherwise lost geometrical works by Archimedes underneath a Byzantine prayer book. And they say, this is all about the data. This project is not for software development so there is no user interface; from those nice webpages explaining what a palimpsest is and so forth, there’s a link to a directory where there is a list of file names, and you can download zip files that contain high-resolution scanned images and other zip files that contain transcriptions, but there is no browsing interface because, well, once you’ve worked with a 12th century prayer book written on parchment that was used in the 8th or 9th century to record a text which was several hundred years old than that, a user interface that will look good and will work for ten or twenty years just doesn’t seem like a useful investment of time.
But that’s not what Peter Robinson was talking about. Peter Robinson was arguing that we need to provide access to the data through means other than our particular user interfaces so that other people can build new user interfaces for our data. He doesn’t want to save our data from the heartbreak of falling in love with a particular user interface; on the contrary, he wants our data to fall in love again and again and again. Isn’t that what reusability of data is all about?
The witches of the story practice what anthropologists call serial monogamy; they’re faithful to one partner at a time. But data reusability can enable not just serial monogamy, but promiscuous polyandry. Data can work with a second piece of software even while the first piece of software is still in use. In order to exploit that capability, however, we may find it useful to work to make it easier to do those things.
Why should it be as hard as it is sometimes to make new ways to look at our data, or work with our data, or edit our data? Surely it should be easy to make well-customized interfaces for specialized needs of the kind illustrated by the digital dictionary family names in Germany described on Wednesday by Franziska Horn, Jörg Hambuch, and Sandra Denzer [Horn, Denzer and Hambuch 2016].
We should work to find new ways to frame our problems and make such custom tools easier to conceive of as well as to develop — as argued by Wendell Piez [Piez 2016]. To do that we have to continue to improve our tactical infrastructure, our tools for document manipulation and transformation; in that context, of course, the work reported by Debbie Lockett and Michael Kay on SaxonJS is extremely promising [Lockett and Kay 2016]. XSLT 1.0 in the browser has made a great many things possible for those of us who have used it; XSLT 3.0, as implemented by Saxon JS, promises to be even more useful.
And we have to continue to improve our command of other tools, as argued on Monday by Greg Murray in his talk on XQuery and how to use it as a full programming language [Murray 2016], not just a query language along the lines of SQL. SQL is carefully designed not to be Turing complete, but XQuery has been Turing complete from day one. Anne Brüggemann-Klein shows some of the tools we should master in her discussion of how to apply XML technologies to the fullest in developing web applications [Brüggemann-Klein 2016], as does Jim Fuller in his catalog of Functional Programming idioms and how to use those idioms in XQuery [Fuller 2016].
If we do those things — if we cultivate our infrastructure — then we have great opportunities to do really interesting and really useful work. This claim has been illustrated this week by a large number of talks. If this week has any single dominant theme, perhaps it is just how varied are the specialized environments within which descriptive markup has proven useful and how varied are the ways in which we manage to make use of it. On Monday, Martin Kraetke and Gerrit Imsieke talked about a very impressive single source, multi-output web application that generates static websites for bio-medical reference work [Kraetke and Imsieke 2016]. That same day, Ashley Clark and Sarah Connell described using XML technologies to increase the size of the data available about women writers online in order to make it easier to study the reception of women’s writing [Clark and Connell 2016].
Mark Gross, Tammy Bilitzky, and Richard Thorne provided what I found an inspiring example of how much you can achieve with focused, intelligent work on a specialized problem, producing very interesting and helpful results even using relatively simple, straight-forward tools like Lex [Gross, Bilitzky and Thorne 2016]. Joshua Lubell proved once again the power of XForms to build sophisticated and intelligent interfaces for complex information [Lubell 2016], and generally the power of descriptive markup to help us manage complexity. His efforts show how useful it can be to use descriptive markup even when the source data are not distributed in that form; eating that conversion cost in order to have the data in XML form can be worthwhile.
Martin Kraetke showed us the other day how you can use descriptive markup to capture information that we generate with complicated, automatic processes like the analysis of software dependencies and so forth, and how you can make that information available to humans [Kraetke 2016]. And Andreas Tai gave us a wonderful account of how a well-chosen, well-designed vocabulary can help throughout a very complex workflow — or perhaps a complex array of complex workflows — as well as illuminating some of the challenges that are posed when you have variations in the usage of that vocabulary, not to mention the complications posed by competition from other formats and other approaches to information management [Tai 2016]. This morning, the paper by Autumn Cuellar and Jason Aiken on how to make DITA look pretty [Cuellar and Aiken 2016] made me think: God, it’s so beautiful! When I think of what SGML documents always used to look like, it’s really inspiring to be able to use good formatting tools and good page-styling tools. John Lumley talked, from a very different angle, about dealing with style information and styling information and his paper illustrates both a very sophisticated use of style information and an extremely sophisiticated use of XSLT for transforming styling information from one form to another equivalent form [Lumley 2016].
Perhaps the essence of data reuse is to allow the data to take many, many different forms in different contexts. If we do that seriously, how can we continue to behave as if we believed that only one of those forms were a correct or plausible representation of the data? So, just as we should make it easy and not hard to transform our data into different forms for processing by different software, or presentation in different formats for different audiences, so we should not make it harder than it needs to be for people to look at and interact with structured information in formats that are free of angle brackets. We may be skeptical, as some of us confessed to being the other day, about the capabilities of notations like Markdown, AsciiDoc, and the other minimalist markup notations described the other day by Norm Walsh [Walsh 2016], but there is no reason to make them harder to use than they already are by nature. If we make it easier to accept data from users in Markdown and similar forms and produce XML data (or perhaps I should say XDM data) that we can use, we will be better able to communicate with other people, and our applications will be able to play better with others. And similar hopes can be associated with the simplified XPath described yesterday by Uche Ogbuji [Ogbuji 2016]. Nothing that makes it easier for programmers in any language to navigate through complex data can possibly be a bad idea.
There may be a slight paradox here. If we want to play better with others, if we want to be able to share our data with others in the form of well-styled HTML, or PDF, or XML data in a whole variety of vocabularies, or JSON, or CSV, or whatever formats the fashion winds of IT blow in our direction next year or next decade, then we have all the more reason to want to make sure that the center of our processing systems is built on descriptive markup in the vocabulary that we have developed or adapted for it, because one of the core ideas of descriptive markup, and thus of XML, is to call things by their true names so that we can work with them effectively. And working with them effectively means it’s easier to transform them into whatever format is needed for a particular application.
This idea is nothing new for translations from XML into non-XML notations. SGML and XML and related technologies have often been sold as a hub format from which we produce other notations as needed. But the same ideas also apply to translations from one SGML or XML format into another: the translation does not need to be a lossy one. Wendell Piez suggested the other day that we’ve often focused on down translation and how to make down translation easier, and he suggested that maybe we should devote more effort to supporting up translation [Piez 2016]. I think he’s probably right. And I also think we might do well to spend more time and effort defining, as it were, non-lossy sideways translations, from one XML notation into other notations that are structurally or otherwise different, but which lose no information and which therefore can be converted back. Transformations like the one from DITA into page-layout software that Eliot [Kimber] was discussing this morning don’t have to be one-way, irreversible translations, but can be two-way translations. If we can make them two-way, then we can translate from our hub format into some other format, do the work there that’s usefully done there, and then bring the new information back without information loss.
This may require more consistent attention to extensibility in our schema design. It may require the development of tools to perform operations on schemas, and on documents that conform to those schemas, so that we can convert a schema that has no extensibility points into a schema that does have extensibility points in useful places so that we can translate documents conforming to the original schema into another format and back without loss of information.
Now, getting operations like that under intellectual control will be a challenge because it will require us to understand more clearly than we sometimes do, how to define not just a single vocabulary but a family of related vocabularies. We have plenty of experience with families of vocabularies, but I always have the impression that when we are defining families of related vocabularies, we’re operating at, or unfortunately often a little beyond, our intellectual control. It would be nice to be able to understand a family of vocabularies as clearly and as firmly — and have as much grasp and control over them — as we can for single, static vocabularies.
Years ago, a couple of years before Dave Sklar proposed the concept of the gold-standard
archival form, Lloyd Harding observed that if we want to take seriously the metaphor
that we sometimes use of
information refinement, and think of our processing as taking place in an
information refinery, then we need better ways to describe how the data that flows through a pipeline at
one point relates to the data that flows through a pipeline at another point in the
process. I’m not talking just about defining the transformation that takes one to the other,
but understanding statically, without directionality, how those two relate. We need a way to exploit the same kind of duality that Robin La Fontaine’s application
of change markup to overlap exploits [La Fontaine 2016].
Perhaps over the years we’ve gotten used to the idea that our documents are described by a single schema: by a document type defintion, or a RELAX-NG schema, or an XSD schema. The only schema language whose users seem really accustomed to the idea of having multiple schemas is Schematron. But the idea that our documents are not always just one thing — and not always the same thing — goes back a long way in our history. In — well, I guess it was twenty years ago — 1996 Dan Connelly, who was the staff contact of what was eventually known as the XML Working Group, wanted the spec to contain a full definition of what an XML document is: Is it a sequence of characters or what? And he beat on me over that because I was one of the editors, and I resisted consciously and intentionally because I did not want to identify an XML document with its serialized form. And if I had wanted to, I wouldn’t have been comfortable identifying it with just one particular one of its serial forms. Before entity expansion? After entity expansion? And since I didn’t know then how to define what seemed to me the essence of an SGML or XML document, I didn’t want to do what Dan asked.
Now, I still don’t know how to define the essence of an SGML or XML document. I do
think I was probably wrong then; I think possibly things would have been better in
the last twenty years if the XML spec had offered not what Dan wanted, a single definition of what constitutes an XML document, but in fact a set of related
definitions describing different forms of a document: the definition of what you
might call the
raw view of an SGML or XML document as a set of entities whose relations are defined
through entity declarations and entity references; and then the
cooked view of the document that results from expanding entity references in place; the
equivalence relations that hold between those two forms of the document; the abstract
structure of elements and attributes and other markup constructs along the lines of
the XDM data model; the equivalence relations between that form and the various serialization
forms; the difference between the XDM view that any single element has exactly one
parent and the alternative view (compatible in fact with the XML Infoset, and I think
intentionally so) that when an element is part of the replacement text of a general
entity, it may have as many different occurrences as there are references to that
entity, and each occurrence may have a different parent. One of the points of the entity structure of SGML and XML is surely to allow us
This address element will occur without change in all of the documents that provide
our institutional address so that we update it once, not as many times as there are occurrences. You don’t
have to agree with me on the details of this; I use it merely to illustrate the point
that we have not always regarded our documents as having a single, simple, unchanging
nature; their essential property is their ability to change shape in useful and interesting
Now, thinking about the various forms in which at any given time we would like to have our data leads me, of course, to think about fashion and about network effects. There’s been a lot of attention in recent years to network effects of technology, that is to say, to situations in which the value to a user of adopting a given technology is increased by an increase in the number of other users adopting the same technology. The classic example often used in explaining the idea is a telephone network; it’s really, really pointless to be the only person in town who has a telephone, but if everyone else in town has a telephone, it can be very useful. There is a sliding scale: the more people who have telephones in a town, the more useful it is for other people also to have a telephone. The same idea can apply to other technologies: the more other people use them, the more useful the technology can be to us.
Perhaps all technologies benefit one way or another from network effects; at the very
least it means it’s easier to hire people who are familiar with the technology. This
is one reason so many people say things like
Given the choice, you should certainly develop in C. Why? Because it’s easy to hire
C programmers. Or sometimes
No, it’s harder to hire C programmers than it is Java programmers; you should do Java
because more people use Java nowadays. The short version is simple: You should do this because everyone else is doing this. There’s a funny relation between this proposition and Kant’ categorical imperative
there, but I don’t propose to go into that here and now.
One of the problems with fashion is that it sometimes leads to irrational choices.
I don’t want to sneer entirely at fashion. There was a time when XML was fashionable;
it was a bandwagon technology. Lots of people wanted to get involved with XML because
they were convinced it was going to be the next big thing, and it was a lot of fun
to be involved with XML at that time because, well, let’s face it, not all of us had
had much experience of being among the
cool kids at school, and for a while we got to be among the cool kids, and that was fun.
But there appear to be some technologies whose primary selling point is network effects.
You should use this because everybody else is using it. Is there any other reason to use that technology? Well, sometimes not: the network
effects are sometimes pretty much the only selling point. But there are other technologies
which may benefit from network effects but which don’t depend on network effects:
technologies that will benefit the user whether anyone else gets that benefit or not,
which will provide technical benefit whether they become universally adopted or not,
which can live usefully in a niche.
Fashion trends are creatures of a season, and while I don’t think we should sneer
at them, I also don’t think we should take them too seriously. The witches in Philip
Pullman’s trilogy His Dark Materials, which is where my guiding metaphor is coming from, do take seriously their attachments.
A witch says in that book,
[M]en pass in front of our eyes like butterflies, creatures of a brief season. We
love them; they are brave, proud, beautiful, clever; and they die almost at once.
They die so soon that our hearts are continually racked with pain. [Pullman 2007] I can relate to that. I bet that almost all of the people in this room who were
using SGML twenty years ago will feel a twinge of pain when I say the name
Panorama. It was beautiful software. My data fell in love with it. I fell in love with it. And it has never been matched since. There have been plenty
of attempts at web annotation software; none of them have been anything like a match
But we will perhaps have a quieter life if we take a more dispassionate view. In Pygmalion, George Bernard Shaw described one character, Higgins’ mother, for those who are curious, in a way that I always felt was going to be a huge challenge to the actress but which has stuck in my mind ever since: She is a woman, he says, who no longer takes the trouble necessary to dress out of fashion [Shaw 1916, Act III]. She follows fashion, not because she thinks it’s important, but because she no longer thinks it’s important enough to dress against the fashion. And in the same way, I think we may find ourselves best able to achieve long-term goals if we’re able to smile at the latest fashions in IT development and follow them without losing our hearts to them.
Descriptive markup does not require
fashion to be useful; I have written document processing software without SGML and XML, and
I have written with it. To take one concrete example, some years ago I developed
a vocabulary for literate programming, intended as a sort of drop-in addition to some
base document vocabulary. And I wrote a processor for it; I wrote it in C over a
period of, I guess, six months. Not full-time — it was not an official project, so
this was moonlighting, as it were. But you know, six months or so. And a few months
later, I re-implemented it in XSLT over a period of, I think, six hours or so: slightly
better functionality, definitely better reliability, much easier deployment. No network
effects were needed: the benefit I derived from XSLT for this project would have
been almost every bit as large, if no one else in the world were using XSLT. I've
written document processing software without XML technology: I’ve been there, I’ve
done that, and I’m never going back. And any of you who have learned how to fully
exploit the technology of descriptive markup are likely to share that view. It doesn’t matter whether people outside this room want to use XML or think it’s fashionable or not.
We know what it does for us.
Now, it’s true: I would like the whole world to use descriptive markup. But not
because I think,
Oh, if the whole world uses it, I’ll get more consulting gigs and make more money,
or my software will be widely adopted and my company will make more money. I wish the whole world would use descriptive markup because descriptive markup is
built around the idea of calling things by their true names and of giving the creators
and owners of data the right and the responsibility for deciding what those names
are. The charter of descriptive markup is freedom and responsibility. Now, if a
lot of people don’t want to accept that freedom or don’t want to exercise that responsibility,
well, I can’t consistently make them do so. You can’t make people be free. But I would like them to have that freedom and use it because on
balance I think it is likely to make the world a better place. No, it doesn’t always
feel as if the advocates of freedom and responsibility were in the majority in our
world — even in the smaller parts of our world that we spend most of our time in.
Maybe that’s true; maybe we are in a minority. Maybe we are in a beleagured minority. But let’s use the advantages of our technical choices to get along as
well as we can with the majority that surrounds us.
And when we feel the need to commune with others in our minority group — others who share our interest in the long term, in application-independence of information, and in the ontological rightness of data representation — we can always come back to Balisage. Thank you, thank you all for the opportunity you have given to me and to each other this week to learn from you and your experiences. See you next year.
[Bauman 2016] Bauman, Syd.
The Hard Edges of Soft Hyphens. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Bauman01.
[Brüggemann-Klein 2016] Brüggemann-Klein, Anne.
The XML Expert’s Path to Web Applications: Lessons learned from document and from
software engineering. Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington,
DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). doi:https://doi.org/10.4242/BalisageVol18.Bruggemann-Klein01.
[Carpenter 2016] Carpenter, Todd.
Moving toward common vocabularies and interoperable data. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Carpenter01.
[Caton and Vieira 2016] Caton, Paul, and Miguel Vieira.
The Kiln XML Publishing Framework. Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington,
DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). doi:https://doi.org/10.4242/BalisageVol18.Caton01.
[Clark and Connell 2016] Clark, Ashley M., and Sarah Connell.
Meta(data)morphosis. Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington,
DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). doi:https://doi.org/10.4242/BalisageVol18.Clark01.
[Cuellar and Aiken 2016] Cuellar, Autumn, and Jason Aiken.
The Ugly Duckling No More: Using Page Layout Software to Format DITA Outputs. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Cuellar01.
[Feldman 2016] Feldman, Damon.
Message Format Persistence in Large Enterprise Systems. Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington,
DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). doi:https://doi.org/10.4242/BalisageVol18.Feldman01.
[Fuller 2016] Fuller, James.
A catalog of Functional Programming idioms in XQuery 3.1. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Fuller01.
[Galtman 2016] Galtman, Amanda.
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Galtman01.
[Graham 2016] Graham, Tony.
focheck XSL-FO Validation Framework. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Graham01.
[Gross, Bilitzky and Thorne 2016] Gross, Mark, Tammy Bilitzky and Richard Thorne.
Extracting Funder and Grant Metadata from Journal Articles: Using Language Analysis
to Automatically Identify and Extract Metadata. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Gross01.
[Harding 1993] Harding, Lloyd.
The Attachment of Processing Information to SGML Data in Large Systems. Presented at SGML ’93, Boston, Mass., 6 - 9 December 1993.
[Harvey 2016] Harvey, Betty.
SGML in the Age of XML. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Harvey01.
[Horn, Denzer and Hambuch 2016] Horn, Franziska, Sandra Denzer and Jörg Hambuch.
Hidden Markup — The Digital Work Environment of the "Digital Dictionary of Surnames
in Germany". Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Horn01.
[Jett et al. 2016] Jett, Jacob, Timothy W. Cole, David Dubin and Allen H. Renear.
Discerning the Intellectual Focus of Annotations. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Jett01.
[Kennedy 2016] Kennedy, Dianne.
Case Study: Applying an Agile Development Methodology to XML Schema Construction. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Kennedy01.
[Kraetke 2016] Kraetke, Martin.
From GitHub to GitHub with XProc: An approach to automate documentation for an open
source project with XProc and the GitHub Web API. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Kraetke01.
[Kraetke and Imsieke 2016] Kraetke, Martin, and Gerrit Imsieke.
XSLT as a Modern, Powerful Static Website Generator: Publishing Hogrefe’s Clinical
Handbook of Psychotropic Drugs as a Web App. Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington,
DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). doi:https://doi.org/10.4242/BalisageVol18.Kraetke02.
[La Fontaine 2016] La Fontaine, Robin.
Representing Overlapping Hierarchy as Change in XML. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.LaFontaine01.
[Lenz 2016] Lenz, Evan.
The Mystical Principles of XSLT: Enlightenment through Software Visualization. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Lenz01.
[Lockett and Kay 2016] Lockett, Debbie, and Michael Kay.
Saxon-JS: XSLT 3.0 in the Browser. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Lockett01.
[Lubell 2016] Lubell, Joshua.
Integrating Top-down and Bottom-up Cybersecurity Guidance using XML. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Lubell01.
[Lumley 2016] Lumley, John.
Approximate CSS Styling in XSLT. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Lumley01.
[Murray 2016] Murray, Gregory.
XQuery is not (just) a query language: Web application development with XQuery. Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington,
DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). doi:https://doi.org/10.4242/BalisageVol18.Murray01.
[Nordström 2016] Nordström, Ari.
Tracking Toys (and Documents). Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Nordstrom01.
[Ogbuji 2016] Ogbuji, Uche.
A MicroXPath for MicroXML (AKA A New, Simpler Way of Looking at XML Data Content). Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Ogbuji01.
[Piez 2016] Piez, Wendell.
Framing the Problem: Building customized editing environments and workflows. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Piez01.
[Popham 1993] Popham, Michael G.
Conference Report SGML ’93. Boston, MA, USA 6th-9th December 1993. Document SGML/R25 of The SGML Project at Exeter University. Preserved on the Web
at The Cover Pages, http://xml.coverpages.org/sgml93.html.
[Pullman 2007] Pullman, Philip. The Golden Compass. New York: Alfred A. Knopf, 2007. His Dark Materials.
[Rennear/Wickett 2009] Renear, Allen H., and Karen M. Wickett.
Documents Cannot Be Edited. Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 -
14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). DOI: https://doi.org/10.4242/BalisageVol3.Renear01.
[Rennear/Wickett 2010] Renear, Allen H., and Karen M. Wickett.
There are No Documents. Presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6,
2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). DOI: https://doi.org/10.4242/BalisageVol5.Renear01.
[Rennau 2016] Rennau, Hans-Jürgen.
FOXpath - an expression language for selecting files and folders. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Rennau01.
[Tai 2016] Tai, Andreas.
XML in the air - How TTML can change the workflows for broadcast subtitles. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Tai01.
[Usdin et al. 2016] Usdin, B. Tommie, Deborah A. Lapeyre, Laura Randall and Jeffrey Beck.
Graceful Tag Set Extension. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Usdin01.
[Walsh 2016] Walsh, Norman.
Marking up and marking down. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Walsh01.
The author thanks Tonya Gaylord for preparing the initial transcript of these remarks, and B. Tommie Usdin and the other conference organizers for their patience. The text has been lightly copy edited for clarity.
 The paper appears to have been given at SGML 96 in Boston under the unlikely title Graduating from File-based Document Assembly to Info-Based Document ‘Construction’.
 It would not be surprising if he had said this more than once, but the instance in the author’s mind was at the Digital Humanities 2013 conference in Lincoln, Nebraska.
 Perhaps the best known of the philosophers I have in mind is Nelson Goodman, whose Languages of Art acknowledges no near-misses in performance, perhaps because he pays no attention to any notions of similarity between complex things and tries to work solely with identity and non-identity. The application of the argument to documents will be well known to regular attendees at Balisage owing to Allen Renear and Karen Wickett's papers at Balisage [Rennear/Wickett 2009 and Rennear/Wickett 2010].
 Many people have through years of working with XSLT become so used to the XDM view that they have forgotten the other view ever existed. Some of them have occasionally told me with some heat that the XML specification itself requires the XDM view, but they have never cited any specific passage.