Balisage logo


The Allure of Gothic Markup

Prioritizing Local Adaptation

Simon St.Laurent

Senior Editor

O'Reilly Media, Inc.

Balisage: The Markup Conference 2013
August 6 - 9, 2013

Copyright © 2013 Simon St.Laurent. Permission for reuse readily granted.

How to cite this paper

St.Laurent, Simon. “The Allure of Gothic Markup: Prioritizing Local Adaptation.” Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). DOI: 10.4242/BalisageVol10.StLaurent01.


XML inherited and worsened SGML's legalistic tendencies, promoting a world of markup built to industrial standards. The sad reality of schemas is that they offer a defective transformation, one so crippled that it dehumanizes people working with them while giving us a broken impression of the world. We can still escape, but need to openly shift from falsely static globals to a world of local adaptation and transformation.

Table of Contents

Original Sin: Deep in the Standards
The Legalistic Applications of SGML
Ground Rules: XML
Pinning Down Butterflies with URIs
The First Hit is Free: Augmented Infosets and the PSVI
Old Magic: Learning About Conversation from Architecture
The Nature of Gothic
Patterns for Conversation
New Magic, from Clark and Crockford to the Present
Not Required
Leaving Gaps
Generic Parsers, not Parser Generators
Peace Through Massive Overbuilding
Peace Through Sprinkling
Peace Through Conflict
Accepting Failure
Valuing Errors
Mechanical Turks
JSON Shakes it Up
Relational Decline
Strictly Local Uses of Schemas
Transition Components - DSDL and MCE
Examplotron: A Bridge?
Toward a New Practice
Negotiation Style
Extending it ourselves
Changes Inside
Rehumanizing Electronic Communications


Markup specialists and their predecessors have wasted decades creating works that open possibilities in the short run but close them in the long run. The continuous headaches of versioning and differentiating vocabularies are a symptom of our failure, of the brittleness we have so enthusiastically embraced.

Seven years ago, speaking on an XML panel at a Web conference, I told attendees to go experiment with vocabularies, and try new paths. The browser universe was too constrained, I said, too bound up with ideas about validation, whether HTML or XML or something else. No one seemed enthusiastic about that advice, and I had startled myself to have recommended it so seriously.

There was no escape, though - it was the right advice, and continues to be.

Much of the markup world has actually turned to experimenting, building very different structures around their work. They mix social organization that distributes decision-making more widely with technical approaches, many of them applying old tools but reflecting enhancements to processing environments.

Original Sin: Deep in the Standards

Markup made mistakes early. The culture of agreements first, processing later, appeared in the earliest standards, indeed in the ISO approach to standardization. As Len Bullard described the attitude more recently:

Contracts have to care. If they don’t then the humans won’t.

— xml-dev, 4/9/13 10:36pm

That paranoia, that intrinsic doubt about what humans will do, has percolated deep into markup technologies.

The Legalistic Applications of SGML

SGML required documents to come with an declaration of their structure. Brittleness was built into the system. Documents were not meant to exist by themselves, but rather as part of an SGML application, defined as such:

4.279 SGML Application: Rules that apply SGML to a text processing application. An SGML application includes a formal specification of the markup constructs used in the application, expressed in SGML. It can also include a non-SGML definition of semantics, application conventions, and/or processing.


1. The formal specification of an SGML application normally includes document type definitions, data content notations, and entity sets, and possibly a concrete syntax or capacity set.... (SGML Handbook, 126)

The SGML Handbook notes that SGML applications needn't actually be "applications" as most computer users expect them:

Indeed, there are publishing situations where an SGML application can be useful with no processing specifications at all (not even application-specific ons [sic]), because each user will specify unique processing in a unique system environment. The historical explanation for this phenomenon is that in publishing (unlike, say, word processing before the laser printer), the variety of potential processing is unlimited and should not be constrained.

Goldfarb 1990, page 130

But while processing should not be constrained, document structure must be. SGML documents must contain a DOCTYPE declaration, which must in turn reference (or include) a DTD, and those DTDs rapidly became the battleground over which the users of SGML fought.

Ground Rules: XML

Those battles - or perhaps it is nicer to say negotiations - carried over into the XML world. Although XML allowed documents to go forth boldly naked without a DTD, the expected approach for its use still involved prior agreement. Citing Lewis Carroll's Humpty Dumpty, the W3C's Dan Connolly warned of the dangers of diverse semantics:

For any document to communicate successfully from author to readers, all parties concerned must agree that words all choose them to mean. [sic] Semantics can only be interpreted within the context of a community. For example, millions of HTML users worldwide agree that <B> means bold text, or that <H1> is a prominent top-level document heading...

When communities collide, ontological misunderstandings can develop for several reasons...

The best remedy is to codify private ontologies that serve to identify the active context of any document. This is the ideal role for a well-tempered DTD. Consider two newspapers with specific in-house styles for bylines, captions, company names, and so on. Where they share stories on a wire service, for example, they can identify it as their story, or convert it according to an industry-wide stylebook.

Connolly 1997, page 120.

Phrasing things more positively, Jon Bosak and Tim Bray described the work ahead and why people were eager to do it:

What XML does is less magical but quite effective nonetheless. It lays down ground rules that clear away a layer of programming details so that people with similar interests can concentrate on the hard part—agreeing on how they want to represent the information they commonly exchange. This is not an easy problem to solve, but it is not a new one, either.

Such agreements will be made, because the proliferation of incompatible computer systems has imposed delays, costs and confusion on nearly every area of human activity. People want to share ideas and do business without all having to use the same computers; activity-specific interchange languages go a long way toward making that possible. Indeed, a shower of new acronyms ending in "ML" testifies to the inventiveness unleashed by XML in the sciences, in business, and in the scholarly disciplines.

Bosak and Bray 1999, page 92.

Businesses and developers took up that challenge, and thousands of committees blossomed. XML "solved" the syntactic layer, and it was time to invest millions in defining structure.

Pinning Down Butterflies with URIs

The world of agreements wasn't enough for XML's keepers at the W3C. Tim Berners-Lee's Semantic Web visions required globally unique identifiers for vocabularies. In the period when Berners-Lee considered XML a key foundation for that work, that meant spackling URIs into markup to create globally unique markup identifiers and build them into vocabularies.

As the SGML community had ventured into hypertext, they too had found difficulties in sharing structures across vocabularies and recognizing those structures. They lacked pretensions of building a single global system, however, and had proposed a very different route: architectural forms.

Architectural forms permit DTD writers to use their own element type names for HyTime structures. Not only is the architectural form notion fundamental to HyTime, it is a new and useful SGML coding technique that can, if used wisely, ease the standardization of tagging practices by steering a route between the Scylla of excessing rigidity and the Charybdis of excessive freedom that threaten such standards when they must serve large, disparate groups of users.

...A standard intended for widespread use [among disparate vocabularies] thus cannot conveniently enforce particular tag names.

DeRose and Durand 1994, page 79.

While there was some work done to bring architectural forms to XML - most visibly David Megginson's XML Architectural Forms - architectural forms lost a bitter battle inside the W3C. While confidentiality makes it difficult to tell precisely what happened, public sputtering suggests that architectural forms' vision of adaptation to local schemas did not not appeal to the Director of the W3C's intention of building globally-understood vocabularies identified by URIs. Instead, the XML world was given a mandate to create vocabularies with globally unique identifiers.

Architectural forms were a limited set of transformations, still deeply intertwined with the contractual expectations of DTDs. The stomping they received at the W3C, however, was a sign of static expectations to come.

The First Hit is Free: Augmented Infosets and the PSVI

As more and more developers took Bosak and Bray's promises seriously, XML reached field after field, migrating far beyond the document-centered territory SGML had considered its home. While the ideas of agreement and validation were popular in many areas, DTDs did not feel like an appropriate answer to many developers. Many developers' expectations had been shaped by databases, strongly typed languages, and fields that wanted more intricate specifications of content.

The W3C responded with XML Schema, a pair of specifications defining a language for specifying deterministic document structures and content, supporting associated processing. Like its DTD predecessor, XML Schema's validation process modified the document, adding default values for attributes (the classic case). Going beyond what DTDs had done, it also annotated the infoset of the reported document with type information accessible to later processing.

While press releases are not usually a great place to learn about the details of specifications, they are an excellent place to learn about what those specifications are meant to do. In the case of XML Schema, there are even a few to choose from. At the outset, 1999's press release was excited about the potential for establishing standards for many kinds of transactions:

Many applications can benefit from the development of schemas:

  • Databases must, for example, communicate detailed information about the legal values of particular fields in the data being exchanged.

  • Publishing and syndication services must be able to describe the properties of headlines, news stories, thumbnail images, cross-references, etc.

  • For electronic commerce, schemas can be used to define business transactions within markets and between parties, and to provide rules for validating business documents.

When XML is used to exchange technical information in a multi-vendor environment, schemas will allow software to distinguish data governed by industry-standard and vendor-specific schemas....

W3C 1999

In 2001, when the XML Schema Recommendations arrived, the press release described schemas as finally fulfilling the promises XML had previously made:

"XML Schema makes good on the promises of extensibility and power at the heart of XML," said Tim Berners-Lee, W3C Director. "In conjunction with XML Namespaces, XML Schema is the language for building XML applications."

By bringing datatypes to XML, XML Schema increases XML's power and utility to the developers of electronic commerce systems, database authors and anyone interested in using and manipulating large volumes of data on the Web. By providing better integration with XML Namespaces, it makes it easier than it has ever been to define the elements and attributes in a namespace, and to validate documents which use multiple namespaces defined by different schemas.

W3C 2001

It wasn't just that XML Schemas would "make good on the promises of extensibility and power", but, as later specifications demonstrated, that they would provide a foundation for further work in processing XML as strongly typed data. The Post-Schema Validation Infoset, effectively a type-annotated version of documents that passed validation, became the foundation on which XSLT 2.0 and XQuery would build. XML itself didn't require that you use schemas of any kind, but the core toolset incorporated more and more assumptions based on schema capabilities, without any separation of concerns. "XML" practice clearly incorporated XML schema practice.


XML Schema had a key place in TimBL's own vision of the Semantic Web, as shown at [Semantic Web Architecture]. Perhaps namespaces were more important to him, given their foundation in URIs, but XML Schema also helped drive namespaces deeper into XML with qname content. Over time, however, Berners-Lee has largely lost interest in XML, and the Semantic Web has regularly looked elsewhere for syntax.

As schema proponents had hoped, the technology took off, becoming a central component in a rapidly growing "Web Services" ecosystem, largely built with tools that made it (relatively) easy to bind XML data to program structures using the type information provided by schemas. Schemas served not only as documentation and validation tools but as tools for structuring code. (They also served as configuration for XML editors of various kinds.)

However, as the limitations and intrusions of XML Schema became clearer, alternate approaches appeared. RELAX NG was in many ways a simpler and better thought-out approach to creating comprehensive schemas, while Schematron's rule-based approach offered a very different kind of testing. Examplotron built on Schematron's ideas to create a very different model of a schema, re-establishing the value of using sample documents for conversations about document interchange.

Old Magic: Learning About Conversation from Architecture

The model of prior agreement, of prior structure, isn't unique to markup. It emerged from bureaucratic models that had grown in both commerce and government during the Industrial Revolution, a period that mixed social tumult with insistence on standardization of products and process. "Top-down" approaches became the norm in a world where manufacturing and engineering reorganized themselves around design and calculation.

Markup emerged from the industrial mind-set common to even the most idealized computing models. Its creators had grown up in a world dominated by industrial models of production, and computers themselves matched that command-and-control drive toward efficiency. Despite the general triumph of the industrial model, it has never really bothered to answer its critics. It hasn't had to - material plenty and the race to keep up have distracted us - but those critics still have things to teach us, even about markup.

John Ruskin in the 19th century and Christopher Alexander in the 20th offer an alternative to industrial models, an opportunity to humanize practice. Unsurprisingly for work centered on human capabilities, conversation is a key tool. Ruskin extends the building conversation to include the lowliest workers, while Alexander pushes further to include current and future users of buildings and structures.

The Nature of Gothic

New fields like to flatter themselves by styling themselves after older ones. While computing is often engineering (or plumbing) minus the structured training, it more typically compares itself to architecture. Like architecture, it hopes to achieve some form of grace in both its visible and invisible aspects, creating appealing structures that will remain standing. While we may have calmed down a bit from the heroic architect model of Howard Roark in The Fountainhead, markup language creators still expect to be able to lay things out as plans and have them faithfully executed by others who will live up to our specifications.

Writing a century before computing's emergence, John Ruskin's The Nature of Gothic (a chapter of The Stones of Venice) offered a very different view of the proper relationship between craftsman and architect. This is long, but even Ruskin's asides have parallels to computing practice, so please pause to study this quote more than its Victorian prose might otherwise tempt you to do:

SAVAGENESS. I am not sure when the word "Gothic" was first generically applied to the architecture of the North but I presume that, whatever the date of its original usage, it was intended to imply reproach, and express the barbaric character of the nations among whom that architecture arose... As far as the epithet was used scornfully, it was used falsely; but there is no reproach in the word, rightly understood; on the contrary, there is a profound truth, which the instinct of mankind almost unconsciously recognizes. It is true, greatly and deeply true, that the architecture of the North is rude and wild; -but it is not true, that, for this reason, We are to condemn it, or despise. Far otherwise: I believe it is in this very character that it deserves our profoundest reverence.

...go forth again to gaze upon the old cathedral front, where you have smiled so often at the fantastic ignorance of the old sculptors: examine once more those ugly goblins, and formless monsters, and stern statues, anatomiless and rigid; but do not mock at them, for they are signs of the life and liberty of every workman who struck the stone; a freedom of thought, and rank in scale of being, such as no laws, no charters, no charities can secure; but which it must be the first aim of all Europe at this day to regain for her children.

Let me not be thought to speak wildly or extravagantly. It is verily this degradation of the operative into a machine, which, more than any other evil of the times, is leading the mass of the nations everywhere into vain, incoherent, destructive struggling for a freedom of which they cannot explain the nature to themselves....

We have much studied and much perfected, of late, the great civilized invention of the division of labor; only we give it a false name. It is not, truly speaking, the labor that is divided; but the men:—Divided into mere segments of men—broken into small fragments and crumbs of life; so that all the little piece of intelligence that is left in a man is not enough to make a pin, or a nail, but exhausts itself in making the point of a pin, or the head of a nail. Now it is a good and desirable thing, truly, to make many pins in a day; but if we could only see with what crystal sand their points were polished,—sand of human soul, much to be magnified before it can be discerned for what it is,—we should think there might be some loss in it also.

...Enough, I trust, has been said to show the reader that the rudeness or imperfection which at first rendered the term “Gothic” one of reproach is indeed, when rightly understood, one of the most noble characters of Christian architecture, and not only a noble but an essential one. It seems a fantastic paradox, but it is nevertheless a most important truth, that no architecture can be truly noble which is not imperfect. And this is easily demonstrable. For since the architect, whom we will suppose capable of doing all in perfection, cannot execute the whole with his own hands, he must either make slaves of his workmen in the old Greek, and present English fashion, and level his work to a slave’s capacities, which is to degrade it; or else he must take his workmen as he finds them, and let them show their weaknesses together with their strength, which will involve the Gothic imperfection, but render the whole work as noble as the intellect of the age can make it.

The second mental element above named was CHANGEFULNESS, or Variety.

I have already enforced the allowing independent operation to the inferior workman, simply as a duty to him, and as ennobling the architecture by rendering it more Christian. We have now to consider what reward we obtain for the performance of this duty, namely, the perpetual variety of every feature of the building.

Wherever the workman is utterly enslaved, the parts of the building must of course be absolutely like each other; for the perfection of his execution can only be reached by exercising him in doing one thing, and giving him nothing else to do. The degree in which the workman is degraded may be thus known at a glance, by observing whether the several parts of the building are similar or not; and if, as in Greek work, all the capitals are alike, and all the mouldings unvaried, then the degradation is complete; if, as in Egyptian or Ninevite work, though the manner of executing certain figures is always the same, the order of design is perpetually varied, the degradation less total; if, as in Gothic work, there is perpetual change both in design and execution, the workman must have been altogether set free....

Experience, I fear, teaches us that accurate and methodical habits in daily life are seldom characteristic of those who either quickly perceive or richly possess, the creative powers of art; there is, however, nothing inconsistent between the two instincts, and nothing to hinder us from retaining our business habits, and yet fully allowing and enjoying the noblest gifts of Invention. We already do so, in every other branch of art except architecture, and we only do not so there because we have been taught that it would be wrong.

Our architects gravely inform us that, as there are four rules of arithmetic, there are five orders of architecture; we, in our simplicity , think that this sounds consistent, and believe them. They inform us also that there is one proper form for Corinthian capitals, another for Doric, and another for Ionic. We, considering that there is also a proper form for the letters A, B, and C, think that this also sounds consistent, and accept the proposition. Understanding, therefore, that one form of the capitals is proper and no other, and having a conscientious horror of a impropriety we allow the architect to provide us with the said capitals, of the proper form, in such and such a quantity, and in all other points to take care that the legal forms are observed; which having done, we rest in forced confidence that we are well housed.

But our higher instincts are not deceived. We take no pleasure in the building provided for us, resembling that which we take in a new book or a new picture. We may be proud of its size, complacent in its correctness, and happy in its convenience. We may take the same pleasure in its symmetry and workmanship as in a well-ordered room, or a skillful piece of manufacture. And this we suppose to be all the pleasure that architecture was ever intended to give us.

Ruskin 1853, paragraphs xxvi-xxviii.

What has architecture to do with markup?

In the virtual world, markup creates the spaces in which we interact. It creates bazaars, agoras, government buildings, and even churches. Markup builds the government office, the sales floor, the loading dock. Markup offers us decorations and distractions. Markup is our architecture.

The concerns which apply to architecture also apply to markup. Markup process, often deliberately, parallels that of the contemporary design approach. Define a problem. Develop a shared vision for solving it. Hire experts who create plans specifying that vision in detail, and send them to the "workmen" to be built.

Ruskin's friend William Morris, in his Preface to The Nature of Gothic, warns of the costs of that style of work.

To some of us when we first read it, now many years ago, it seemed to point out a new road on which the world should travel. And in spite of all the disappointments of forty years, and although some of us, John Ruskin amongst others, have since learned what the equipment for that journey must be, and how many things must be changed before we are equipped, yet we can still see no other way out of the folly and degradation of Civilization.

For the lesson which Ruskin here teaches us is that art is the expression of man's pleasure in labour; that it is possible for man to rejoice in his work, for, strange as it may seem to us to-day, there have been times when he did rejoice in it; and lastly, that unless man's work once again becomes a pleasure to him, the token of which change will be that beauty is once again a natural and necessary accompaniment of productive labour, all but the worthless must toil in pain, and therefore live in pain. So that the result of the thousands of years of man's effort on the earth must be general unhappiness and universal degradation; unhappiness and degradation, the conscious burden of which will grow in proportion to the growth of man's intelligence, knowledge, and power over material nature.

Morris 1892, paragraphs xxvi-xxviii.

Architecture has largely ignored these concerns, and computing, alas, has taken its lessons from that ignorance. Let us instead take Ruskin's worker free of alienation - valuing savageness and supporting changefulness - as a goal worth achieving.

Patterns for Conversation

Christopher Alexander has the strange distinction of being an architect more revered and imitated in computing than in his own field. When, during an XML conference in Philadelphia, I stopped at the American Institute of Architects to buy A Pattern Language and The Timeless Way of Building, the cashier informed me that I must be a programmer, because "only programmers buy those books. Architects don't."

Architects and Programmers

Programmers have indeed bought A Pattern Language, but for mostly the wrong reasons. The classic text, Design Patterns, cites Alexander as its inspiration and brought him to the wide attention of the computing community:

1.1 What Is a Design Pattern?

Christopher Alexander says, "Each pattern describes a problem which occurs over and over in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice." [AIS+ 77, page x] Even though Alexander was talking about patterns in buildings and town, what he says is true about object-oriented design patterns. Our solutions are expressed in terms of objects and interfaces instead of walls and doors, but at the core of both kinds of patterns is a solution to a problem in context.

Gamma 1995, pages 2-3.

Even at this stage, however, they have already over-simplified Alexander's approach to patterns, seeing a top-down approach that isn't there. "A solution to a problem in context" is the goal of most non-fiction writing, but the writers of Design Patterns have forgotten the critical question of who solves those problems and how. They seem to have assumed that since Alexander is an architect, these patterns are meant to be applied by architects.

However, Alexander argues that the conversation must be broader. At the top of that same page x, one finds:

It is shown [in The Timeless Way of Building] that towns and buildings will not be able to become alive, unless they are made by all the people in society, and unless these people share a common pattern language, within which to make these buildings, and unless this pattern language is alive itself.

Alexander 1977, page x.

The conclusion of Design Patterns, unfortunately, repeats and amplifies its error, in a way that perhaps only computer programmers could:

4. Alexander claims his patterns will generate complete buildings. We do not claim that our patterns will generate complete programs.

When Alexander claims you can design a house simply by applying his patterns one after another, he has goals similar to those of object-oriented design methodologists who give step-by-step rules for design. Alexander doesn't deny the need for creativity; some of his patterns require understanding the living habits of people who will use the building, and his belief in the "poetry" of design implies a level of expertise beyond the pattern language itself. But his description of how patterns generate designs implies that a pattern language can make the design process deterministic and repeatable.

Gamma 1995, page 356.

It is hard to imagine a more bizarre misreading of Alexander: a projection of top-down design assumptions applied to a text whose primary purpose is to overturn them. Gamma, Helm, et al. provide an unfortunately perfect demonstration of how developers borrow badly from architecture. (He recognizes the misreading in Alexander 1996.)

What Alexander actually offers is not a "design process deterministic and repeatable," but tools for conversation. The "level of expertise" is partially aesthetic, but in many ways social. He takes seriously "all the people in society" from the prior quote, and the job of the architect is less to design and more to facilitate. A Pattern Language is not a set of fixed rules that has emerged from practice, but a constantly evolving and shifting foundation that must combine with other local patterns to be of use.

Establishing Continuous Conversation

Unlike most models for including users in design, Alexander's process keeps the conversation going throughout the creation of works, and includes the workers and the users in that conversation. He has learned (perhaps from Ruskin) that treating workers as automata imposes deep costs, and recognized the quality of construction that people have achieved over centuries, even in financially poor environments, without the aid of architects. His building process allows for the layering of detail to respond to particular circumstances rather than laying out rules which must be applied in all cases.

How do patterns work? Alexander tells the story of implementation in The Production of Houses:

"In order to get a reasonable house which works well and which nevertheless expresses the uniqueness of each family, the families all used an instrument we call the pattern language... The particular pattern language contained twenty-one patterns...

this language has the amazing capacity to unify the generic needs which are felt by every family, and which make a house functional and sensible, with the unique idiosyncrasies that make every family different, and thus to produce a house which is unique and personal, but also one which satisfies the basic needs of a good house.

this pattern language... allowed us to produce a variety of houses, each one essentially a variant of a fundamental house "type" (defined by the twenty-one patterns together), and yet each one personal and unique according to the special character of the family who used it.

Alexander 1985, pages 175-6.

It didn't all go smoothly, however, as one additional angle, the creation of new patterns, didn't materialize in the earlier work to shape the overall cluster of houses:

the families became enthusiastic about the project as they began to see the richness inherent in the patterns. However, our efforts to get them to modify the language, to contribute other patterns of their own, were disappointing.

Under normal circumstances, the architect-builder of a particular area would also modify and refine these patterns, according to local custom. In this particular project, we were so occupied by the demands of construction that we had little time to undertake work of this sort.

Alexander 1985, page 133.

However, in other projects, he had better luck developing pattern languages based on input from the community:

Once we have learned to take a reading of people's true desires and feelings, we can then describe the patterns that are needed to generate a profound campus environment. The system, or "language" of these patterns can give the community the beautiful world they need and want, the physical organization that will make their world practical, beautiful, life-enhancing, and truly useful.

Alexander 2012, page 131.

For the Eishin Campus, Alexander's team and a large group of administrators, teachers, and students developed 110 patterns specific to that project, informed by the broader list in A Pattern Language but moving beyond it.

The conversation doesn't end when construction starts, either. Patterns apply at all levels of development, from regional planning to finished details. Construction is an opportunity to make changes along the way, as the reality of earlier decisions becomes clears. This approach not only involves the users of the building, but transforms the role of the architect.

it is axiomatic, for us that the people who build the houses must be active, mentally and spiritually, while they are building, so that of course they must have the power to make design decisions while they are building, and must have an active relation to the conception of the building, not a passive one. This makes it doubly clear that the builders must be architects.

Alexander 1985, pages 74-5.

Alexander's approach obliterates the traditional separation between designers and builders, refusing to cooperate with a model he believes creates a "breakdown of the physical environment... It is hardly possible to experience a profound relationship with these places. So the landscape of our era has become, and continues to become, a wasteland." [Alexander 2012, page 80.]

Schematics, Standardization, and Alienation

What has created that wasteland? While Alexander's early writings focus mostly on the positive he hopes to encourage, his later works, works from the field, cannot avoid dealing with a world structured to make his style of work difficult, if not impossible. As the scale of construction has grown, specialization and centralization have led to ever-more detached and standardized approaches that cannot help but produce bad work.

"The great complexity needed by a human settlement cannot be transmitted via paper; and the separation of functions between architect and builder is therefore out of the question. The complexity can only be preserved if the architect and contractor are one. All this makes clear that the architect must be the builder.

And the opposite is true also. In modern times, the contractor and his crew are deeply and sadly alienated from the buildings they produce. Since the buildings are seen as "products," no more than that, and since they are specified to the last nail by the architect, the process of building itself becomes alienated, desiccated, a machine assembly process, with no love in it, no feeling, no warmth, and no humanity.

Alexander 1985, pages 74-5.

While reliance on drawings is one aspect of the problem, the industrial model of component construction adds an entirely new level of potential trouble. Standards limit choices, reinforcing the mistakes created by separation of concerns:

Today's systems of housing production almost all rely, in one form or another, on standardized building components. These components may be very small (electrical boxes, for instance), or intermediate (2x4 studs), or very large (precast concrete rooms); but regardless of their size, buildings are understood to be assembled out of these components. In this sense then, the actual construction phase of the housing production process has become an assembly phase: an occasion where prefabricated components are assembled, on site, to produce the complete houses.

It has been little understood how vast the effect of this has been on housing: how enormous the degree of control achieved, unintentionally, by these components and the demands of their assembly. Yet, as anyone who has intimate knowledge of building knows, these components are merciless in their demands. They control the arrangement of details. They prohibit variation. They are inflexible with respect to ornament, or whimsy, or humor, or any little human touch a person might like to make."

Alexander 1985, page 220.

Changing this is not easy, as Alexander learns repeatedly. His books have become more biting over time, reflecting the hostility of established practice to his different approaches. A world in which inspectors demand to approve signed drawings before allowing construction is not especially compatible with an approach that defines itself in terms of conversation and techniques. His latest book, The Battle for the Life and Beauty of the Earth, makes that conflict explicit, describing it as a "necessary confrontation" between two incompatible approaches to building: System-A and System-B.

System-A is a system of production in which local adaptation is primary. Its processes are governed by methods that make each building, and each part of each building, unique and uniquely crafted to its context.

System-B is, on the contrary, dedicated to an overwhelmingly machinelike philosophy. The components and products are without individual identity and most often alienating in their psychological effect.

The pressure to use such a system comes mainly from the desire to make a profit, and from the desire to do it at the highest possible speed.

Alexander 2012, page 43.

In Alexander's telling, System-B grows from mass production and the ideologies of classicism and industrialism that Ruskin and Morris blasted generations before. System-B has spread from the Victorian factories to every aspect of building construction (and computing as well). System-A, Alexander's preferred system, is older but has largely been discarded in the race to industrialize everything. His more detailed telling reveals another dimension to the problem of System-B: it is not only profit-seeking, but its adherents have been so surrounded by it that they have a difficult time imagining that anything else could work.

System-B is also the worldview that dominates computing. There are occasional counterculture moments and corners that resist System-B. However, even in a software world that at least seems like it should be more flexible than its explicitly industrial hardware side, the march toward the mass production of identical and mildly configurable products (and the standards that facilitate them) continues inexorably.

New Magic, from Clark and Crockford to the Present

While much of the markup world is infused with the System-B concepts that Alexander encourages us to reject, there are corners, influences, and opportunities that can help us cast aside markup's traditional model of design-first-and-then-execute. None of these pieces by itself is exactly a revolution, and some may even seem contradictory. Some of them are indeed side effects of the schema-based approach. While they may seem familiar, combining them offers a path to a different approach and new conversations. All of them point to opportunities for a shift in markup culture.

Not Required

Ever since XML 1.0 permitted documents without a DOCTYPE declaration, it has at least been possible to work with XML in the absence of a schema. While in most of my travels I have only found people using this freedom for experimental purposes or for very small-scale work, conversation on xml-dev did turn up some people who simply avoid using schemas. They do, however, seem to use standard formats, but test and document them by other means. These practices are often criticized, especially when the content leaves those often closed systems [Beck 2011].

Even in the best of these cases, though, throwing off schemas is not enough if the expectations of fixed vocabularies remain behind, as Walter Perry warned over a decade ago in [Perry 2002]

Leaving Gaps

Even the most obsessively controlling schema vocabularies allow developers to leave some space for growth. ANY in DTDs and xs:any in XML Schema are the classics, and variations allow a mixture of precision and openness.

Support for these gaps, however, varies wildly in practice. Both tools and culture push back against the open models. Tools that came from the strictly defined expectations of object definitions have a difficult time dealing with "underspecified" markup. The culture of interoperability testing often encourages maximum agreement. Do open spaces make it easier to create new options, or do they just create new headaches when it's time to move from one version of the a defined vocabulary to another? Gaps create tension with many supposed best practices.

Generic Parsers, not Parser Generators

Although it is certainly possible to write parser generators that lock tightly onto a particular vocabulary and parse nothing else, it happens less often than might seem likely. There are tools for creating parser generators, like [XML Booster], and there are certainly cases where it is more efficient or more secure than processing the results of a generic parser. However, judging from job listings and general conversation, parser generation has had a much smaller role in XML than it has had, for example, in ASN.1. (I've had a difficult time in ASN.1 conversations even convincing developers that generic parsers were possible and useful.)

Data binding tools, of course, can produce tight bonds even when run on top of a generic parser, but XML's explicit support for generic parsing has at least created the opportunity for looser coupling.

Peace Through Massive Overbuilding

Some vocabularies have taken an "everything but the kitchen sink" approach. Two of the most popular document vocabularies, DocBook and the Text Encoding Initiative (TEI), both include gigantic sets of components. While both provide support through tools for those components, many organizations use subsets like the DocBook subset used for this paper. While subsets vary, having a common foundation generally makes transformations easy and fallback to generic tools an option. The TEI pizza chef [TEI Pizza Chef], which served up custom DTDs, typically a TEI subset, stands out as a past highlight of this approach.

Building a vocabulary so large that most people work with it only through subsets may seem excessive, but it opens the way to conversation among users of different subsets. In many ways, this is similar to (though operating in the reverse direction of) Rick Jelliffe's suggestion that:

"In particular, rather than everyone having to adopt the same schema for the same content type, all that is necessary is for people to revise (or create) each schema so that they are dialects (in the sense above) of the same language. That "language" is close to being the superset information model."

Jellife 2012.

So long as the superset model is broad enough, peace can be maintained.

Peace Through Sprinkling

Rather than building supersets, some groups have focused on building the smallest thing that could possibly work for their use cases. Dublin Core [DCMI] is probably the most famous of these, though a variety of annotations from [WAI-ARIA] to [HTML5 Data Attributes]. These approaches offer a range of techniques for adding a portion of information used by a certain kind of processor to other vocabularies. They allow multiple processors to see different pieces of a document, though frequently there is still a unified vision of the document managed by those who sprinkle in these components.

Peace Through Conflict

While I cited Connolly 1997 above, I halted the quote at a convenient point. There is more to that argument - still schema (DTD) focused, but acknowledging conversation beyond the original creation point:

As competing DTDs are shared among the community, semantics are clarified by acclamation [15]. Furthermore, as DTDs themselves are woven into the Web, they can be discovered dynamically, further accelerating the evolution of community ontologies.

Connolly 1997, page 121.

While there has been more visible competition among schema languages than among vocabularies specified with schemas, there are constant overlaps among vocabulary projects as well as some direct collisions. "Acclamation" may be too strong a word, as steady erosion seems a more typical process, but there is certainly motion.

Accepting Failure

Resisting System-B is easiest, perhaps perversely, in a corner of the software universe that has long hoped to make System-B's "design by specification" possible: functional and declarative programming. These styles of software development remove features that add instability to imperative approaches, often in the pursuit of mathematical provability, reliability and massive scale. These design constraints, though intended to maximize industrial-scale processing of information, also make possible a wide range of more flexible approaches to handling information.

The paradigmatic application of these tools in the markup world lies in the technologies we call stylesheets, or style sheets, depending on who is editing at any given moment. While Cascading Style Sheets (CSS) and Extensible Stylesheet Language (XSL) were frequently seen as competitors when XML first arrived, both offer similar capabilities in this regard. They both are (or at least can be) excellent at tolerating failure, with little harm done.

The key to that tolerance is pattern matching - selectors for CSS, XPath for XSLT. If patterns don't match, they don't match, and the process goes on. XSLT offers many more features for modifying results, and is more malleable, but neither of them worry much if a document matches their expectations. At worst they produce empty results. XSLT is capable of operating more generically, and of working with content it didn't match explicitly. The XSLT toolset can support reporting and transformation that goes beyond the wildest dreams of schema enthusiasts - and can do much more useful work than validation and annotation along the way.

Pattern matching is also central to a number of explicitly functional languages. While they were built for things like mathematical provability, "nine nines" reliability, and structured management of state, those constraints actually give them the power needed to go beyond XSLT's ability to process individual documents. Erlang's "let it crash" philosophy, for example, makes it (relatively) easy to build robust programs that can handle flamingly unexpected situations without grinding to a halt. Failures can be picked up and given different processing, discarded or put in a queue for different handling.

A calm response to the unexpected opens many new possibilities.

Valuing Errors

Years ago, Walter Perry said in a talk that often the most interesting communications were the ones that broke the rules. They might be mistakes, but they might also be signs of changing conditions, efforts to game the system, or an indication that the system itself was flawed.

Errors and flaws have become more popular since. While much of the effort poured into test-driven development is about making sure they don't happen, a key side effect of that work is new approaches to providing meaningful error messages when they do happen. "Test failed" is useful but incomplete.

In distributed systems, errors aren't necessarily just bugs, an instant path to the discard bin. While the binary pass/fail of many testing approaches encourage developers to stomp out communications that aren't quite right, turning instead to the meaningful error messages (and error handling) side of that conversation can be much more fruitful.

Mechanical Turks

After decades of trying to isolate computing processes from human intervention, some developers are now including humans in the processing chain. After all, it's not difficult to treat such conversations as just another asynchronous call, especially in an age of mobile devices. Not everything has to be processed instantly.

Amazon developed the [Amazon Mechanical Turk] service, named after an 18th chess-playing "machine" that turned out to have a person inside of it. It looked like brilliant technology, and was, if humans count. Amazon adds digital management to the approach, distributing "Human Intelligence Tasks" across many anonymous workers. Facebook uses similar if more centralized approaches to censor photos. [Facebook Censorship] The Mechanical Turk model has led to some dire work situations [Cushing 2012] in which humans are treated as cheap cogs in a computing machine, as a System B industrial approach seeks cheap labor to maximize profit.

Horrible as some of these approaches are, they make it very clear that even large-scale digital systems can pause to include humans in the decision-making process. It isn't actually that difficult. Connecting these services to markup processing, however, requires interfaces for letting people specify what should be done with unexpected markup. "Keep it, it's okay" with a note, or an option to escalate to something stronger (perhaps even human-to-human conversation) may be an acceptable start.

JSON Shakes it Up

While XML seemed to be conquering the communications universe, even finally reaching the Web as the final X in AJAX, many developers dreamed of an escape from its strange world of schemas, transformations, and seemingly endless debates about data representation. Douglas Crockford found an answer uniquely well-suited to the Web, extracted in fact from the JavaScript programming language itself. [JSON] (JavaScript Object Notation) rapidly became popular with JavaScript developers. JSON had an innate advantage in that it could bypass same-origin requirements, but its use has spread far beyond those situations.

JSON uses a different syntax, but much more importantly, the nature of the conversation also shifted. Perhaps because it comes from a free-wheeling JavaScript background, expectations of structure have always been loose. Coordination can happen, but reuse and modification is a more common pattern than formal structuring. Many JSON formats are created by single information hubs, rather than across groups of providers, and conversion to internal formats is just a normal fact of life for JSON data consumption.

JSON, while somewhat less readable to humans than markup, was both easy to work with in a JavaScript context and compatible with (actually a subset of) the [YAML] data serialization supported by a few other languages. JSON was just a data format, a means for developers to pass information from one program to another. Although JSON schemas and JSON transformation tools exist, they are relatively minor corners of JSON culture.

Despite those glaring absences, JSON use continues to expand rapidly. It replaced XML as a default format in Ruby on Rails, and dominates current Ajax usage. Perhaps more striking, it is becoming more common in public use, exactly the territory where prior agreement was deemed most important [Bye XML]. It hasn't replaced XML in that space yet, but is claiming a larger and larger share. Documentation and samples, it seems, is enough.

So why stick with markup and not just leap to JSON's more open approach? Mostly because of the tools described in the previous section. Markup understands transformation and decoration better than JSON. Despite its largely schema-free world, JSON is still primarily about tight binding to program structures. The schemas are invisible, often unspecified but they still exist when a document is loaded.

However, JSON programmers do plenty of transformation internally. They base their expectations of schemas more on sources than on documents, and some have gone so far as to establish simple source-testing regimens that warn them of change. Source-based versioning is also common in the world of JSON APIs. Rather than the URI of a namespace changing or the details of a schema, new versions of APIs are often simply hosted at new URLs, with the changed content coming from a new location to give developers time to adapt.

JSON's curly braces may earn sneers from XML developers who prefer their angle brackets, but JSON is doing more with less.


XML was supposed to reach the browser. Mostly, it didn't, but the current state of the browser has much to teach XML. Some of that is ugly, of course. Even without a hard focus on schemas, the power of the browser vendors and the continuing insistence on standardization have limited possibilities. Here we see that it is not only the formalization of schemas but the cultural values surrounding them that create brittleness and stifle experiments.

However, that very brokenness, failed versioning, and the lingering hangovers of old software led to the development of a new software pattern, the polyfill [Sharp 2010]. Polyfills are JavaScript code (sometimes combined with HTML and CSS) that quietly extend a browser to support JavaScript libraries that are missing or markup that it doesn't understand. Building cross-browser polyfills is tricky, but [Osmani 2011] not that much more difficult than creating cross-browser frameworks. Even the limited world of HTML5-specific polyfills is vast [Polyfills List].

While the HTML+CSS+JavaScript architecture is extremely flexible, there are some barriers, largely created by browser makers' efforts to optimize bandwidth and processing. Efforts to create a picture element - combining concerns of responsive design and human interaction - have faced challenges around timing, pre-loading in particular. Media processing is (as usual) one of the hardest challenges in working with the browser environment. Establishing communications and respect between vendors and those using their products in perhaps even more difficult.

The W3C and browser vendors are working to address those challenges as well as create new frameworks that make polyfills easier to build and more efficient. The Shadow DOM [Shadow DOM] and Web Components [Web Components] work both aim directly at making polyfills a more standard part of web development environments. Google's recent work on [Polymer] is an example of a browser vendor pushing hard in this space.

Separately, the Extensible Web Manifesto [Manifesto 2013] encourages this work as a way to shift much work done in JavaScript to work done in markup:

We want web developers to write more declarative code, not less. This calls for eliminating the standards bottleneck to introducing new declarative forms, and giving library and framework authors the tools to create them.

While XML processing models are typically different, offering no tools to extend processors at the document creator's discretion, this approach could prove useful in situations where processors have chosen to place more trust in users. It also clearly offers an option to developers tired of waiting for the W3C, the WHATWG, and the browser vendors to add markup functionality to HTML5.

Relational Decline

Structured data has had a difficult few decades in general. While XML's schemas defined structure, relational database purists (most notably Fabian Pascal) heaped scorn on XML's daring to step outside the sharply defined boundaries of RDBMS tables and joins. Much of the pressure for XML Schema's insistence on deterministic structures and strongly typed data came from communities who considered the constraints in 1990s RDBMS practice to be a good thing - but XML's very success was a key factor in making clear that the relational model was not the only possible story for data.

The challenges of scaling within the constraints of Atomicity, Consistency, Isolation, and Durability (ACID), led to several rapid generations of change in the database community. While there are probably more relational databases deployed today than there were when XML appeared, the NoSQL movement has ended the era when developers only chose among relational databases unless their project was extremely unusual.

This shift has little direct effect on markup processing, but it does reduce the cultural pressures to only create data structures conforming to a well-known schema.


REST is a communications style based on HTTP's limited number of methods, treating those constraints as a virtue teaching us to build with few verbs and many nouns. There is nothing in REST-based work specific to schemas - schemas (or the lack thereof) are a detail of the work that happens inside the local processors.

However, in contrast to their more RPC-based predecessors, which emerged from the CORBA and object-oriented worlds, this lack of specification is still a significant opening. A minimal set of verbs makes it much easier to process a much larger set of nouns, with fewer expectations set up front.

Strictly Local Uses of Schemas

Some developers and organizations see schemas as a limited-use tool, applied primarily in a local context to reflect local quality assurance and document creation needs.

Since the late 1990s, I've suggested that my consulting customers think of a schema not as an integral part of XML document/data exchange, but as a special auxiliary file that can fill up to two supporting roles:

1. A special stylesheet that renders a boolean value (valid/not valid) for QA.

2. A template for rare and high-specialized structured-authoring applications.

If you don't need at least one of those, then you probably don't need a schema.

— David Megginson [Correspondence 2013]

So long as authoring applications expect schemas as input, schemas will be necessary. There are other ways to do quality assurance, of course, but schemas are common. Will developers resist the temptation to apply schemas more broadly than these cases, when tools and practice point them that direction?

Transition Components - DSDL and MCE

Much of the best thinking in markup schemas has worked under the banner of DSDL. RELAX NG, Schematron, and some more obscure pieces demonstrate more flexible alternatives to the W3C's XML Schema. Namespace Validation Dispatching Language (NVDL) finally offers tools for mixing validation approaches based on the namespace qualifiers applied to content. Document Schema Renaming Language (DSRL) offers a simple transformation approach for comparing documents to local schemas. These parts are still tightly bound to schema validation approaches, but they at least add flexibility and add more options.

Markup Compatibility and Extensibility (MCE), coming out of the Open Office XML work, finally asks hard questions about different degrees of "understanding" a document:

Attributes in the Markup Compatibility namespace shall be either Ignorable, ProcessContent, ExtensionElements, or MustUnderstand. Elements of the Markup Compatibility namespace shall be either AlternateContent, Choice, or Fallback.

As Rick Jelliffe describes it:

This is a kind of having your cake and eating it too, you might think; the smart thing that gives it a hope of working is that MCE also provides some attributes PreserveElements and PreserveAttributes which let you (the standards writer or the extended document developer) list the elements that do not need to be stripped when modifying some markup.

I think standards developers who are facing the cat-herding issue of multiple implementations and the need for all sorts of extensions should seriously consider the MCE approach.

Jellife 2009

Examplotron: A Bridge?

Examplotron Examplotron is in fact a schema language, but it is also a possible bridge between expectations from the Age of Schemas and other possibilities. It uniquely combines communications through sample documents with the possibility of validation processing, and seems like a base for describing further transformations. To the extent that a schema technology could work successfully within System A, Examplotron is clearly the strongest candidate. (Schematron isn't too far behind, but lacks the document-sharing orientation.)

Toward a New Practice

While these many pieces have been opening doors for a long time, they tend to be used in isolation, or in contexts where schema and agreement still rule, if quietly. While no one (to my knowledge) has yet combined them to create a model of networked markup communications operating in Alexander's System-A, there are now more than enough pieces to sketch out a path there.

The first step, however, is more difficult than any of the technical components. System-A requires a shift in priorities, from industrial efficiency to local adaptation. Despite the propaganda for "embracing change" over the last few decades, actually valuing changefulness at the conversation level is more difficult. Schemas constrain it, as does the software we typically build around schemas. The stories around markup have valued the static far more than the changeful.

Change this. Value changefulness, and yes, even savageness, and let the static go.


Many organizations will consider all of these suggestions inappropriate, unless perhaps they can be shown to provide massive cost savings within a framework they think they can manage with their current approach. As the cost savings are likely to be modest, and this style of processing a difficult fit with their organizational style, that is unlikely. I make no claims that this approach can make inroads with organizations that have regimentation, hierarchy, and control near the top of their management value system. Obsessed with security? This is probably not for you.

Adding System-A to a System-B context is especially difficult.

So what does valuing changefulness look like in practice? It changes your toolset from schemas to transformations. It means recognizing that schemas are in fact a weak transformation, converting documents to a binary valid/not result with an optional annotation system. It demands shifting to a model in which you expect to perform explicit transformations on every document you handle. It demands taking Alexander's model of local adaptation seriously.

If it's any comfort, XML developers are not the only ones facing this change. Dave Thomas suggests:

"It's clear (to me, at least) that the idea of programming is not longer the art of maintaining state. Instead, we're moving towards coding as a means of transforming state. Object Orientation has served us well with state-based programming. But now we need to move on." [Thomas 2013]

So what does transformation look like? It operates on several levels, but perhaps it is easiest to start with negotiation - the piece of the puzzle that schemas were supposed to let us consider at least temporarily solved.

Negotiation Style

Negotiation in the schema sense is typically a gathering of opinions, working together to hammer out a schema for future communications. To avoid the cost of holding that conversation frequently, it makes sense to gather as many possible voices as possible, though that "as many" is often slashed by requiring that the voices be 'expert'. After all, too many people in a conversation also adds delay, often for little benefit. The result is a formal structure that can be used to share data, a form of contract among the participants and anyone else who wishes to join that conversation.

Most forms of negotiation, however, are much less formal than a diplomatic council or standards organization. People bargain constantly, and not just over prices. The forms of communications within business change constantly, often fitting the chaotic model of shared spreadsheets much more readily than the formal structure of relational databases. While spreadsheets carry all the headaches of individual files that are easily misplaced, there is nothing inherently superior about a tightly-structured centralized database. The advantages of databases emerge almost exclusively when large quantities of data need to be shared according to a formal process. Telling a story with a database typically requires sifting and selecting its information to be presented in a different form.

The negotiation style of spreadsheets typically works according to a simple process: "send me what works for you, and I'll see if I can make it work." There may be massive gaps of spreadsheet prowess, business understanding, tools choice, or sheer power between the participants, but having a concrete basis for the conversation generally eases those, or leads to a request for simplification and a broader conversation.

Exchanging XML documents has a similar concrete effect. Marked up documents are not that difficult to explore so long as you have a rough familiarity with the language used for the markup or willingness to consult a dictionary. Just as there might be questions about how a spreadsheet is structured, there can be conversation about markup structure choices. Just as the contents of a spreadsheet can be questioned, there is room to discuss whether the model the spreadsheet uses for a particular problem is the appropriate one. Just as with spreadsheets, there is rarely need to ask for a deeply formalized representation of the underlying model.

XML has one major advantage over a spreadsheet, however - it is far, far easier to extract information from XML documents than from spreadsheets. Writing code that extracts content from person A's spreadsheet and reliably places it in person B's different spreadsheet is possible, but difficult. Changes make it even harder. Spreadsheets weren't built with that in mind. Markup was.

Extending it ourselves

In one context, this is easy. Because the HTML world allows developers to send code along with their markup, the situation in browsers is actually simpler than the world where markup is the sole content of messages. Markup in the browser is surrounded by opportunities to explain what it is and even to do something with it. (Markup plus logic is at least as capable as spreadsheet cells and logic, after all!)

The picture polyfill debacle seems to tell the story of callous browser vendors who have their own bad ideas blowing off the people who use their projects most intensively. In the long run, however, it is more of a pointer to the way things can be. CSS and JavaScript have developed to the point where they can tell the story of markup, with HTML necessary - strictly speaking - for a very few tasks like form fields. ARIA can provide metadata supporting accessibility, and if things are really tough, XSLT (and JavaScript implementations of it) are now available to transform documents in whatever markup vocabulary to an HTML equivalent, perhaps enhanced with SVG or Canvas.

Browser vendors get lots of press, good and bad, for implementing new features and new APIs. The quiet reality, though, is that there is no longer any good reason for them to control our markup vocabularies. We can do that, and we can do it today.

Changes Inside

Supporting a more flexible negotiation style in cases where documents travel without supporting logic requires a major break with prior models of development.

Most software for processing documents created by markup - not all, but by far the majority - follow a consistent process. A parser reads a document, checking its structure against syntactic and possibly structural expectations. During or after that reading, if the document passes, the data is bound to internal structures, possibly using additional information added to the document by the testing process.

Perhaps most important, however, isn't the nature of the parser (generic or not), the schema type used, or the nature of the data binding. The most important aspect of this is that each receiving process only supports a single or a few vocabularies. (The primary exception is for completely generic processing, which doesn't care and generally doesn't know what kind of markup it's dealing with. This is common in editors and some kinds of filters.) Extending that support, changing the vocabulary expectations, requires programmer intervention.

Avoiding programmer intervention has been the foundation of data processing for decades. Much of the joy around schemas celebrated that schemas themselves could be used to create code for handling markup vocabularies. Yes, of course you still needed to add business logic to that code, but generated code would reduce the time needed to write, and adoption of common schemas would even allow the sharing of at least a portion of the business logic code over time.

The design choices in schema and schema culture come from the dream of a machine that would run by itself, with only periodic human intervention when updates were necessary, plus the usual maintenance now considered the cost center of IT.

So what might a different model look like?

The machine doesn't run by itself. Human intervention is routinely acceptable, and facilitated by a toolset that assumes that humans are available to do the mapping previously performed by schemas, data-binding, and similar approaches. All participants in a conversation have their own processing structures, but those structures are explicitly adapted to internal needs, not external conversation. (Yes, it is possible that one of those internal needs will be to facilitate external conversation, but that need should not set the terms in most cases.)

Human intervention isn't required for every message, however. Over time, such a processing system will develop a library of known transformations to and from external vocabularies and in particular to and from known senders. Intervention arises when new or ambiguous information arrives. A new party has joined the conversation, or a previously known sender has added or deleted sections of a document. In short, intervention is necessary when prior mapping failed.

Intriguingly, this approach can fulfill one of the failed promises of XML Schema:

When XML is used to exchange technical information in a multi-vendor environment, schemas will ... help applications know when it is safe to ignore information they do not understand, and when they must not do so. This means schemas may help make software more robust and systems more able to change and adapt to evolving situations.

W3C 1999

Schemas can flag information as outside of the schema and still pass it on, but it has no way to determine the 'safety' of such information or to encourage change, unless someone is monitoring the schema processing closely. By contrast, such cases are not anomalies in the mapping system - they are just day to day business.

But wait, isn't there still a schema? Isn't the transformation from external representation to internal structure a hidden schema? HTML5 style? Perhaps, especially if there some kind of document describing that transformation that can be shared. However, unlike the schemas that currently dominate the markup world, these transformations don't describe a single document structure. They describe a set of possible mappings between a document structure and a specific internal structure. That substantially reduces the prospects for (abuse) reuse.

Also, while it is possible that a transformation would return a binary succeeded/failed response like the valid/not-valid result of most schema processing, there is a much broader range of possibilities. Partial mappings may be completely acceptable. Failures may simply be prompts to create new mappings.

The cost of programmers, though. What about the cost of programmers? Their time isn't cheap, and we have other things for them to do! Even Bray and Bosak cautiously warned us that programmers were necessary, and schemas would only reduce the amount of time they needed!

These transformations aren't usually that difficult. Simple transformations are easily designed through basic interfaces. Complex transformations may require template logic on the order of XSLT. Especially difficult transformations, particularly of compact markup styles might require an understanding of regular expressions.

Except perhaps for especially twisted regular expressions, however, none of these skills requires what we pretend today is a "rock star" programmer. There are many contexts, but the home context is relatively simple, and transformations themselves are more intricate than deep. While establishing this model might require regular contributions from experts, and having experts available is useful, over time the need for expertise should decline. Transformation should not be as difficult as, say, natural language translation.

Rehumanizing Electronic Communications

Changing tracks is difficult, especially when an entire technology culture has been built on values of industrial and bureaucratic efficiency aimed at minimizing human involvement in small but constant processing decisions. Pioneers created vocabularies for the purpose of controlling a market and leaping ahead of competition, hoping that the resulting brittle processing model would make it too difficult to refactor their advantage away.

However, rather than striving for maximum automation, developers have the opportunity to aim for systems that are both more flexible and more human than current models allow. Developers can build on a model of automating what humans hate to do, rather than automating everything. The three great virtues of the Desperate Perl Hacker displaced by XML's arrival - laziness, impatience, and hubris - can take their rightful and continual place in a community of processing rather than disappearing in a cloud of efficiency. Automate what is convenient for humans communicating, not entire processes.

That changes the role of markup experts as well. Instead of consulting on vocabularies and leaving others to implement them, we have to take a more active role on a smaller number of projects:

Let us imagine building five hundred houses. In today's normal method of building large-scale housing projects, one architect and one contractor often control a rather large volume of houses or apartments...

The architect-builder... has greater powers but more limited domain. Any one architect builder may control no more than twenty houses at a time, but he will take full responsibility for their design and construction, and he will work far more intensely with the individual families, and with the individual details of their houses. Thus, in this model of construction, both design and construction are decentralized....

Of course, this means that the architect-builders play a different role in society. There are more of them - taking the place of the alienated construction workers and architectural draftsmen who now provide the manpower to make the centralized system work...

We envisage a new kind of professional who is able to see the buildings which he builds as works of love, works of craft, individual; and who creates a process in which the families are allowed, and even encouraged, to play their natural role in helping to lay out their houses and helping to create their own community."

Alexander 1985, pages 76-8.

Some of us work this way already, of course.

Working this way also allows to us focus on one of Alexander's most general patterns in his question for local adaptation: site repair.

Buildings must always be built on those parts of the land which are in the worst condition, not the best....

If we always build on that part of the land which is most healthy, we can be virtually certain that a great deal of the land will always be less than healthy. If we want the land to be healthy all over - all of it - then we must do the opposite. We must treat every new act of building as an opportunity to mend some rent in the existing cloth; each ac of building gives us the chance to make one of the ugliest and least healthy parts of the environment more healthy - as for those parts which are already healthy and beautiful - they of course need no attention. And in fact, we much discipline ourselves most strictly to leave them alone, so that our energy actually goes to the places which need it. This is the principle of site repair.

Alexander 1977, pages 509-10.

This means continuous mending, continuous repair, by people close enough to the ground to see what needs work. Continuous refactoring has entered the programming lexicon, and needs to become a dominant part of the markup lexicon.

In a world saturated with the message of industrial efficiency and in a computing culture deeply soaked in expectations of standardizing away as many interactions as possible, this Alexanderine model will be at least as difficult to achieve as it is in architecture. Possible on the margins, yes, perhaps in experiments, but not to be taken seriously. That is a normal starting point for this conversation, and this paper just an opening.

Markup workers of the world, unite! You have nothing to lose but your chains.


In addition to the Balisage reviewers, I would like to thank Mark Bernstein, Mike Amundsen, Matt Edgar, Walter Perry, and the xml-dev list for helpful reviews, inspiration, and sparring.


[Alexander 1977] Alexander, Christopher, et al. A Pattern Language: Towns, Buildings, Construction. New York: Oxford University Press, 1977.

[Alexander 1979] Alexander, Christopher. The Timeless Way of Building. New York: Oxford University Press, 1979.

[Alexander 1985] Alexander, Christopher, et al. The Production of Houses. New York: Oxford University Press, 1985.

[Alexander 1996] Alexander, Christoper. "The Origins of Pattern Theory, the Future of the Theory, and the Generation of a Living World."

[Alexander 2012] Alexander, Christopher, et al. The Battle for the Life and Beauty of the Earth. New York: Oxford University Press, 2012.

[Amazon Mechanical Turk] Amazon Mechanical Turk.

[Beck 2011] Beck, Jeff. "The False Security of Closed XML Systems." Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). doi:10.4242/BalisageVol7.Beck01.

[Semantic Web Architecture] Berners-Lee, Tim. "Semantic Web - XML 2000- slide 'Architecture'"

[Bosak and Bray 1999] Bosak, Jon, and Bray, Tim. "XML and the Second-Generation Web". Scientific American, May 1999. Pages 89-93. doi:10.1038/scientificamerican0599-89.

[TEI Pizza Chef] Burnard, Lou, and Sperberg-McQueen, C. Michael. "TEI Pizza Chef".

[Butterick 2013] Butterick, Matthew. "The Bomb in the Garden"

[Cargill 2011] Cargill, Carl. "Why Standardization Efforts Fail".;view=fulltext.

[Connolly 1997] Connolly, Dan, et al. "The Evolution of Web Documents: The Ascent of XML", in XML: Principles, Tools, and Techniques. Sebastopol, CA: O'Reilly Media, 1997.

[Web Components] Cooney, Dominic, and Glazkov, Dmitri. Introduction to Web Components.

[JSON] Crockford, Douglas. "Introducing JSON".

[Cushing 2012] Cushing, Ellen. "Dawn of the Digital Sweatshop".

[DeRose and Durand 1994] DeRose, Steven and Durand, David. Making Hypermedia Work: A User's Guide to HyTime. Boston: Kluwer Academic Publishers, 1994. doi:10.1007/978-1-4615-2754-1.

[DCMI] Dublin Core Metadata Initiative. "DCMI Specifications".

[Bye XML] DuVander, Adam. "".

[YAML] Evans, Clark. "YAML: YAML Ain't Markup Language".

[Prollyfill] Extensible Web Community Group.

[Manifesto 2013]

[Gamma 1995] Gamma, Erich, et al. Design Patterns: Elements of Reusable Software. Boston: Addison-Wesley, 1995.

[Goldfarb 1990] Goldfarb, Charles. The SGML Handbook. Oxford: Oxford University Press, 1990.

[Polyfills List] "HTML5 Cross Browser Polyfills".

[DSDL] ISO/IEC 19757 - DSDL. "Document Schema Definition Languages".

[Jehl 2012] Jehl, Scott. "Polyfilling picture without the overhead".

[Jellife 2009] Jelliffe, Rick. "Safe Plurality: Can it be done using OOXML's Markup Compatibility and Extensions mechanism?".

[Jellife 2012] Jelliffe, Rick. "XML's Dialect Problem: Diversity is not the problem; it is the requirement".

[Katz 2013] Katz, Yehuda. "Extend the Web Forward".

[Marquis 2012] Marquis, Mat. "Responsive Images and Web Standards at the Turning Point".

[XML Architectural Forms] Megginson, David. "XML Architectural Forms".

[Correspondence 2013] Megginson, David. Facebook correspondence.

[Morris 1892] Morris, William. "Preface to the Nature of Gothic", in The Nature of Gothic, London: Kelmscott Press, 1892.

[Osmani 2011] Osmani, Addy. "The Developer’s Guide To Writing Cross-Browser JavaScript Polyfills".

[Perry 2002] Perry, Walter. "Standard Data Vocabularies Unquestionably Harmful".

[Polymer] Polymer Project.

[Ruskin 1853] Ruskin, John. "The Nature of Gothic", in The Stones of Venice, Volume II: The Sea-Stories Mineola: Dover Phoenix Editions, 2005.

[MCE] "Semantics of MCE".

[Shadow DOM] Shadow DOM.

[Sharp 2010] Sharp, Remy. "What is a Polyfill?"

[Smith 2013] Smith, Michael [tm]. "Getting agreements is hard (some thoughts on Matthew Butterick’s “The Bomb in the Garden” talk at TYPO San Francisco)".

[Spuybroek 2011] Spuybroek, Lars. The Sympathy of Things: Ruskin and the Ecology of Design. Rotterdam: V2_Publishing, 2011.

[Thomas 2013] Thomas, Dave. 4/10/13 8:39pm, elixir-lang

[Usdin 2009] Usdin, B. Tommie. "Standards considered harmful." Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:10.4242/BalisageVol3.Usdin01.

[Examplotron] van der Vlist, Eric. "Examplotron".

[Facebook Censorship] Webster, Stephen C. "Low-wage Facebook contractor leaks secret censorship list".

[W3C 1999] World Wide Web Consortium. "World Wide Web Consortium Releases First Working Drafts of XML Schema Specification"

[W3C 2001] World Wide Web Consortium. "World Wide Web Consortium Issues XML Schema as a W3C Recommendation".

[WAI-ARIA] World Wide Web Consortium. "WAI-ARIA Overview"

[HTML5 Data Attributes] World Wide Web Consortium. "HTML5 -- A vocabulary and associated APIs for HTML and XHTML", Embedding custom non-visible data with the data-* attributes.*-attributes.

[XML Booster] XML Booster.

Author's keywords for this paper: Markup; Schemas; Architecture; Transformation; Polyfill

Simon St.Laurent

Senior Editor

O'Reilly Media, Inc.

A troublemaker, Simon St.Laurent has been working with XML since the early drafts of the specification. His first book on XML, XML: A Primer, went through three editions, each time teaching a new group of developers a variety of bad ideas. Apparently the example using XML to manage lighting inspired several protocols for excessively complicated control systems. His book Cookies may be partially responsible for the erosion of privacy. His other books have done less damage because they haven't sold as well, but he fears that Introducing Erlang and Introducing Elixir may prove to be contributing factors in the development of Skynet.

His more positive contributions include a partially-completed book on hand tool woodworking, various writings on Quakerism, and two adorable children. He has lately become obsessed with hospitality and craft, leading to binges of repentance for past (and current) work.