Markup specialists and their predecessors have wasted decades creating works that
open possibilities in the short run but close them in the long run. The continuous
headaches of versioning and differentiating vocabularies are a symptom of our failure,
of the brittleness we have so enthusiastically embraced.
Seven years ago, speaking on an XML panel at a Web conference, I told attendees to
go experiment with vocabularies, and try new paths. The browser universe was too
constrained, I said, too bound up with ideas about validation, whether HTML or XML
or something else. No one seemed enthusiastic about that advice, and I had startled
myself to have recommended it so seriously.
There was no escape, though - it was the right advice, and continues to be.
Much of the markup world has actually turned to experimenting, building very different
structures around their work. They mix social organization that distributes decision-making
more widely with technical approaches, many of them applying old tools but reflecting
enhancements to processing environments.
Original Sin: Deep in the Standards
Markup made mistakes early. The culture of agreements first, processing later, appeared
in the earliest standards, indeed in the ISO approach to standardization. As Len
Bullard described the attitude more recently:
Contracts have to care. If they don’t then the humans won’t.
— xml-dev, 4/9/13 10:36pm
That paranoia, that intrinsic doubt about what humans will do, has percolated deep
into markup technologies.
The Legalistic Applications of SGML
SGML required documents to come with an declaration of their structure. Brittleness
was built into the system. Documents were not meant to exist by themselves, but rather
as part of an SGML application, defined as such:
4.279 SGML Application: Rules that apply SGML to a text processing application. An
SGML application includes a formal specification of the markup constructs used in
the application, expressed in SGML. It can also include a non-SGML definition of
semantics, application conventions, and/or processing.
1. The formal specification of an SGML application normally includes document type
definitions, data content notations, and entity sets, and possibly a concrete syntax
or capacity set.... (SGML Handbook, 126)
The SGML Handbook notes that SGML applications needn't actually be "applications" as most computer
users expect them:
Indeed, there are publishing situations where an SGML application can be useful with
no processing specifications at all (not even application-specific ons [sic]), because
each user will specify unique processing in a unique system environment. The historical
explanation for this phenomenon is that in publishing (unlike, say, word processing
before the laser printer), the variety of potential processing is unlimited and should
not be constrained.
— Goldfarb 1990, page 130
But while processing should not be constrained, document structure must be. SGML
documents must contain a DOCTYPE declaration, which must in turn reference (or include)
a DTD, and those DTDs rapidly became the battleground over which the users of SGML
Ground Rules: XML
Those battles - or perhaps it is nicer to say negotiations - carried over into the
XML world. Although XML allowed documents to go forth boldly naked without a DTD,
the expected approach for its use still involved prior agreement. Citing Lewis Carroll's
Humpty Dumpty, the W3C's Dan Connolly warned of the dangers of diverse semantics:
For any document to communicate successfully from author to readers, all parties concerned
must agree that words all choose them to mean. [sic] Semantics can only be interpreted
within the context of a community. For example, millions of HTML users worldwide agree
that <B> means bold text, or that <H1> is a prominent top-level document heading...
When communities collide, ontological misunderstandings can develop for several reasons...
The best remedy is to codify private ontologies that serve to identify the active
context of any document. This is the ideal role for a well-tempered DTD. Consider
two newspapers with specific in-house styles for bylines, captions, company names,
and so on. Where they share stories on a wire service, for example, they can identify
it as their story, or convert it according to an industry-wide stylebook.
— Connolly 1997, page 120.
Phrasing things more positively, Jon Bosak and Tim Bray described the work ahead and
why people were eager to do it:
What XML does is less magical but quite effective nonetheless. It lays down ground
rules that clear away a layer of programming details so that people with similar interests
can concentrate on the hard part—agreeing on how they want to represent the information
they commonly exchange. This is not an easy problem to solve, but it is not a new
Such agreements will be made, because the proliferation of incompatible computer
systems has imposed delays, costs and confusion on nearly every area of human activity.
People want to share ideas and do business without all having to use the same computers;
activity-specific interchange languages go a long way toward making that possible.
Indeed, a shower of new acronyms ending in "ML" testifies to the inventiveness unleashed
by XML in the sciences, in business, and in the scholarly disciplines.
— Bosak and Bray 1999, page 92.
Businesses and developers took up that challenge, and thousands of committees blossomed.
XML "solved" the syntactic layer, and it was time to invest millions in defining structure.
Pinning Down Butterflies with URIs
The world of agreements wasn't enough for XML's keepers at the W3C. Tim Berners-Lee's
Semantic Web visions required globally unique identifiers for vocabularies. In the
period when Berners-Lee considered XML a key foundation for that work, that meant
spackling URIs into markup to create globally unique markup identifiers and build
them into vocabularies.
As the SGML community had ventured into hypertext, they too had found difficulties
in sharing structures across vocabularies and recognizing those structures. They
lacked pretensions of building a single global system, however, and had proposed a
very different route: architectural forms.
Architectural forms permit DTD writers to use their own element type names for HyTime
structures. Not only is the architectural form notion fundamental to HyTime, it
is a new and useful SGML coding technique that can, if used wisely, ease the standardization
of tagging practices by steering a route between the Scylla of excessing rigidity
and the Charybdis of excessive freedom that threaten such standards when they must
serve large, disparate groups of users.
...A standard intended for widespread use [among disparate vocabularies] thus cannot
conveniently enforce particular tag names.
— DeRose and Durand 1994, page 79.
While there was some work done to bring architectural forms to XML - most visibly
David Megginson's XML Architectural Forms - architectural forms lost a bitter battle inside the W3C. While confidentiality
makes it difficult to tell precisely what happened, public sputtering suggests that
architectural forms' vision of adaptation to local schemas did not not appeal to the
Director of the W3C's intention of building globally-understood vocabularies identified
by URIs. Instead, the XML world was given a mandate to create vocabularies with globally
Architectural forms were a limited set of transformations, still deeply intertwined
with the contractual expectations of DTDs. The stomping they received at the W3C,
however, was a sign of static expectations to come.
The First Hit is Free: Augmented Infosets and the PSVI
As more and more developers took Bosak and Bray's promises seriously, XML reached
field after field, migrating far beyond the document-centered territory SGML had considered
its home. While the ideas of agreement and validation were popular in many areas,
DTDs did not feel like an appropriate answer to many developers. Many developers'
expectations had been shaped by databases, strongly typed languages, and fields that
wanted more intricate specifications of content.
The W3C responded with XML Schema, a pair of specifications defining a language for
specifying deterministic document structures and content, supporting associated processing.
Like its DTD predecessor, XML Schema's validation process modified the document, adding
default values for attributes (the classic case). Going beyond what DTDs had done,
it also annotated the infoset of the reported document with type information accessible
to later processing.
While press releases are not usually a great place to learn about the details of specifications,
they are an excellent place to learn about what those specifications are meant to
do. In the case of XML Schema, there are even a few to choose from. At the outset,
1999's press release was excited about the potential for establishing standards for
many kinds of transactions:
Many applications can benefit from the development of schemas:
Databases must, for example, communicate detailed information about the legal values
of particular fields in the data being exchanged.
Publishing and syndication services must be able to describe the properties of headlines,
news stories, thumbnail images, cross-references, etc.
For electronic commerce, schemas can be used to define business transactions within
markets and between parties, and to provide rules for validating business documents.
When XML is used to exchange technical information in a multi-vendor environment,
schemas will allow software to distinguish data governed by industry-standard and
— W3C 1999
In 2001, when the XML Schema Recommendations arrived, the press release described
schemas as finally fulfilling the promises XML had previously made:
"XML Schema makes good on the promises of extensibility and power at the heart of
XML," said Tim Berners-Lee, W3C Director. "In conjunction with XML Namespaces, XML
Schema is the language for building XML applications."
By bringing datatypes to XML, XML Schema increases XML's power and utility to the
developers of electronic commerce systems, database authors and anyone interested
in using and manipulating large volumes of data on the Web. By providing better integration
with XML Namespaces, it makes it easier than it has ever been to define the elements
and attributes in a namespace, and to validate documents which use multiple namespaces
defined by different schemas.
— W3C 2001
It wasn't just that XML Schemas would "make good on the promises of extensibility
and power", but, as later specifications demonstrated, that they would provide a foundation
for further work in processing XML as strongly typed data. The Post-Schema Validation
Infoset, effectively a type-annotated version of documents that passed validation,
became the foundation on which XSLT 2.0 and XQuery would build. XML itself didn't
require that you use schemas of any kind, but the core toolset incorporated more and
more assumptions based on schema capabilities, without any separation of concerns.
"XML" practice clearly incorporated XML schema practice.
XML Schema had a key place in TimBL's own vision of the Semantic Web, as shown at
[Semantic Web Architecture]. Perhaps namespaces were more important to him, given their foundation in URIs,
but XML Schema also helped drive namespaces deeper into XML with qname content. Over
time, however, Berners-Lee has largely lost interest in XML, and the Semantic Web
has regularly looked elsewhere for syntax.
As schema proponents had hoped, the technology took off, becoming a central component
in a rapidly growing "Web Services" ecosystem, largely built with tools that made
it (relatively) easy to bind XML data to program structures using the type information
provided by schemas. Schemas served not only as documentation and validation tools
but as tools for structuring code. (They also served as configuration for XML editors
of various kinds.)
However, as the limitations and intrusions of XML Schema became clearer, alternate
approaches appeared. RELAX NG was in many ways a simpler and better thought-out approach
to creating comprehensive schemas, while Schematron's rule-based approach offered
a very different kind of testing. Examplotron built on Schematron's ideas to create
a very different model of a schema, re-establishing the value of using sample documents
for conversations about document interchange.
Old Magic: Learning About Conversation from Architecture
The model of prior agreement, of prior structure, isn't unique to markup. It emerged
from bureaucratic models that had grown in both commerce and government during the
Industrial Revolution, a period that mixed social tumult with insistence on standardization
of products and process. "Top-down" approaches became the norm in a world where manufacturing
and engineering reorganized themselves around design and calculation.
Markup emerged from the industrial mind-set common to even the most idealized computing
models. Its creators had grown up in a world dominated by industrial models of production,
and computers themselves matched that command-and-control drive toward efficiency.
Despite the general triumph of the industrial model, it has never really bothered
to answer its critics. It hasn't had to - material plenty and the race to keep up
have distracted us - but those critics still have things to teach us, even about markup.
John Ruskin in the 19th century and Christopher Alexander in the 20th offer an alternative
to industrial models, an opportunity to humanize practice. Unsurprisingly for work
centered on human capabilities, conversation is a key tool. Ruskin extends the building
conversation to include the lowliest workers, while Alexander pushes further to include
current and future users of buildings and structures.
The Nature of Gothic
New fields like to flatter themselves by styling themselves after older ones. While
computing is often engineering (or plumbing) minus the structured training, it more
typically compares itself to architecture. Like architecture, it hopes to achieve
some form of grace in both its visible and invisible aspects, creating appealing structures
that will remain standing. While we may have calmed down a bit from the heroic architect
model of Howard Roark in The Fountainhead, markup language creators still expect to be able to lay things out as plans and
have them faithfully executed by others who will live up to our specifications.
Writing a century before computing's emergence, John Ruskin's The Nature of Gothic (a chapter of The Stones of Venice) offered a very different view of the proper relationship between craftsman and architect.
This is long, but even Ruskin's asides have parallels to computing practice, so please
pause to study this quote more than its Victorian prose might otherwise tempt you
SAVAGENESS. I am not sure when the word "Gothic" was first generically applied to
the architecture of the North but I presume that, whatever the date of its original
usage, it was intended to imply reproach, and express the barbaric character of the
nations among whom that architecture arose... As far as the epithet was used scornfully,
it was used falsely; but there is no reproach in the word, rightly understood; on
the contrary, there is a profound truth, which the instinct of mankind almost unconsciously
recognizes. It is true, greatly and deeply true, that the architecture of the North
is rude and wild; -but it is not true, that, for this reason, We are to condemn it,
or despise. Far otherwise: I believe it is in this very character that it deserves
our profoundest reverence.
...go forth again to gaze upon the old cathedral front, where you have smiled so often
at the fantastic ignorance of the old sculptors: examine once more those ugly goblins,
and formless monsters, and stern statues, anatomiless and rigid; but do not mock at
them, for they are signs of the life and liberty of every workman who struck the stone;
a freedom of thought, and rank in scale of being, such as no laws, no charters, no
charities can secure; but which it must be the first aim of all Europe at this day
to regain for her children.
Let me not be thought to speak wildly or extravagantly. It is verily this degradation
of the operative into a machine, which, more than any other evil of the times, is
leading the mass of the nations everywhere into vain, incoherent, destructive struggling
for a freedom of which they cannot explain the nature to themselves....
We have much studied and much perfected, of late, the great civilized invention of
the division of labor; only we give it a false name. It is not, truly speaking, the
labor that is divided; but the men:—Divided into mere segments of men—broken into
small fragments and crumbs of life; so that all the little piece of intelligence that
is left in a man is not enough to make a pin, or a nail, but exhausts itself in making
the point of a pin, or the head of a nail. Now it is a good and desirable thing, truly,
to make many pins in a day; but if we could only see with what crystal sand their
points were polished,—sand of human soul, much to be magnified before it can be discerned
for what it is,—we should think there might be some loss in it also.
...Enough, I trust, has been said to show the reader that the rudeness or imperfection
which at first rendered the term “Gothic” one of reproach is indeed, when rightly
understood, one of the most noble characters of Christian architecture, and not only
a noble but an essential one. It seems a fantastic paradox, but it is nevertheless
a most important truth, that no architecture can be truly noble which is not imperfect.
And this is easily demonstrable. For since the architect, whom we will suppose capable
of doing all in perfection, cannot execute the whole with his own hands, he must either
make slaves of his workmen in the old Greek, and present English fashion, and level
his work to a slave’s capacities, which is to degrade it; or else he must take his
workmen as he finds them, and let them show their weaknesses together with their strength,
which will involve the Gothic imperfection, but render the whole work as noble as
the intellect of the age can make it.
The second mental element above named was CHANGEFULNESS, or Variety.
I have already enforced the allowing independent operation to the inferior workman,
simply as a duty to him, and as ennobling the architecture by rendering it more Christian.
We have now to consider what reward we obtain for the performance of this duty, namely,
the perpetual variety of every feature of the building.
Wherever the workman is utterly enslaved, the parts of the building must of course
be absolutely like each other; for the perfection of his execution can only be reached
by exercising him in doing one thing, and giving him nothing else to do. The degree
in which the workman is degraded may be thus known at a glance, by observing whether
the several parts of the building are similar or not; and if, as in Greek work, all
the capitals are alike, and all the mouldings unvaried, then the degradation is complete;
if, as in Egyptian or Ninevite work, though the manner of executing certain figures
is always the same, the order of design is perpetually varied, the degradation less
total; if, as in Gothic work, there is perpetual change both in design and execution,
the workman must have been altogether set free....
Experience, I fear, teaches us that accurate and methodical habits in daily life are
seldom characteristic of those who either quickly perceive or richly possess, the
creative powers of art; there is, however, nothing inconsistent between the two instincts,
and nothing to hinder us from retaining our business habits, and yet fully allowing
and enjoying the noblest gifts of Invention. We already do so, in every other branch
of art except architecture, and we only do not so there because we have been taught
that it would be wrong.
Our architects gravely inform us that, as there are four rules of arithmetic, there
are five orders of architecture; we, in our simplicity , think that this sounds consistent,
and believe them. They inform us also that there is one proper form for Corinthian
capitals, another for Doric, and another for Ionic. We, considering that there is
also a proper form for the letters A, B, and C, think that this also sounds consistent,
and accept the proposition. Understanding, therefore, that one form of the capitals
is proper and no other, and having a conscientious horror of a impropriety we allow
the architect to provide us with the said capitals, of the proper form, in such and
such a quantity, and in all other points to take care that the legal forms are observed;
which having done, we rest in forced confidence that we are well housed.
But our higher instincts are not deceived. We take no pleasure in the building provided
for us, resembling that which we take in a new book or a new picture. We may be proud
of its size, complacent in its correctness, and happy in its convenience. We may take
the same pleasure in its symmetry and workmanship as in a well-ordered room, or a
skillful piece of manufacture. And this we suppose to be all the pleasure that architecture
was ever intended to give us.
— Ruskin 1853, paragraphs xxvi-xxviii.
What has architecture to do with markup?
In the virtual world, markup creates the spaces in which we interact. It creates
bazaars, agoras, government buildings, and even churches. Markup builds the government
office, the sales floor, the loading dock. Markup offers us decorations and distractions.
Markup is our architecture.
The concerns which apply to architecture also apply to markup. Markup process, often
deliberately, parallels that of the contemporary design approach. Define a problem.
Develop a shared vision for solving it. Hire experts who create plans specifying
that vision in detail, and send them to the "workmen" to be built.
Ruskin's friend William Morris, in his Preface to The Nature of Gothic, warns of the costs of that style of work.
To some of us when we first read it, now many years ago, it seemed to point out a
new road on which the world should travel. And in spite of all the disappointments
of forty years, and although some of us, John Ruskin amongst others, have since learned
what the equipment for that journey must be, and how many things must be changed before
we are equipped, yet we can still see no other way out of the folly and degradation
For the lesson which Ruskin here teaches us is that art is the expression of man's
pleasure in labour; that it is possible for man to rejoice in his work, for, strange
as it may seem to us to-day, there have been times when he did rejoice in it; and
lastly, that unless man's work once again becomes a pleasure to him, the token of
which change will be that beauty is once again a natural and necessary accompaniment
of productive labour, all but the worthless must toil in pain, and therefore live
in pain. So that the result of the thousands of years of man's effort on the earth
must be general unhappiness and universal degradation; unhappiness and degradation,
the conscious burden of which will grow in proportion to the growth of man's intelligence,
knowledge, and power over material nature.
— Morris 1892, paragraphs xxvi-xxviii.
Architecture has largely ignored these concerns, and computing, alas, has taken its
lessons from that ignorance. Let us instead take Ruskin's worker free of alienation
- valuing savageness and supporting changefulness - as a goal worth achieving.
Patterns for Conversation
Christopher Alexander has the strange distinction of being an architect more revered
and imitated in computing than in his own field. When, during an XML conference in
Philadelphia, I stopped at the American Institute of Architects to buy A Pattern Language and The Timeless Way of Building, the cashier informed me that I must be a programmer, because "only programmers buy
those books. Architects don't."
Architects and Programmers
Programmers have indeed bought A Pattern Language, but for mostly the wrong reasons. The classic text, Design Patterns, cites Alexander as its inspiration and brought him to the wide attention of the
1.1 What Is a Design Pattern?
Christopher Alexander says, "Each pattern describes a problem which occurs over and
over in our environment, and then describes the core of the solution to that problem,
in such a way that you can use this solution a million times over, without ever doing
it the same way twice." [AIS+ 77, page x] Even though Alexander was talking about
patterns in buildings and town, what he says is true about object-oriented design
patterns. Our solutions are expressed in terms of objects and interfaces instead of
walls and doors, but at the core of both kinds of patterns is a solution to a problem
— Gamma 1995, pages 2-3.
Even at this stage, however, they have already over-simplified Alexander's approach
to patterns, seeing a top-down approach that isn't there. "A solution to a problem
in context" is the goal of most non-fiction writing, but the writers of Design Patterns have forgotten the critical question of who solves those problems and how. They
seem to have assumed that since Alexander is an architect, these patterns are meant
to be applied by architects.
However, Alexander argues that the conversation must be broader. At the top of that
same page x, one finds:
It is shown [in The Timeless Way of Building] that towns and buildings will not be able to become alive, unless they are made
by all the people in society, and unless these people share a common pattern language,
within which to make these buildings, and unless this pattern language is alive itself.
— Alexander 1977, page x.
The conclusion of Design Patterns, unfortunately, repeats and amplifies its error, in a way that perhaps only computer
4. Alexander claims his patterns will generate complete buildings. We do not claim
that our patterns will generate complete programs.
When Alexander claims you can design a house simply by applying his patterns one after
another, he has goals similar to those of object-oriented design methodologists who
give step-by-step rules for design. Alexander doesn't deny the need for creativity;
some of his patterns require understanding the living habits of people who will use
the building, and his belief in the "poetry" of design implies a level of expertise
beyond the pattern language itself. But his description of how patterns generate
designs implies that a pattern language can make the design process deterministic
— Gamma 1995, page 356.
It is hard to imagine a more bizarre misreading of Alexander: a projection of top-down
design assumptions applied to a text whose primary purpose is to overturn them. Gamma,
Helm, et al. provide an unfortunately perfect demonstration of how developers borrow
badly from architecture. (He recognizes the misreading in Alexander 1996.)
What Alexander actually offers is not a "design process deterministic and repeatable,"
but tools for conversation. The "level of expertise" is partially aesthetic, but
in many ways social. He takes seriously "all the people in society" from the prior
quote, and the job of the architect is less to design and more to facilitate. A Pattern Language is not a set of fixed rules that has emerged from practice, but a constantly evolving
and shifting foundation that must combine with other local patterns to be of use.
Establishing Continuous Conversation
Unlike most models for including users in design, Alexander's process keeps the conversation
going throughout the creation of works, and includes the workers and the users in
that conversation. He has learned (perhaps from Ruskin) that treating workers as
automata imposes deep costs, and recognized the quality of construction that people
have achieved over centuries, even in financially poor environments, without the aid
of architects. His building process allows for the layering of detail to respond
to particular circumstances rather than laying out rules which must be applied in
How do patterns work? Alexander tells the story of implementation in The Production of Houses:
"In order to get a reasonable house which works well and which nevertheless expresses
the uniqueness of each family, the families all used an instrument we call the pattern
language... The particular pattern language contained twenty-one patterns...
this language has the amazing capacity to unify the generic needs which are felt by
every family, and which make a house functional and sensible, with the unique idiosyncrasies
that make every family different, and thus to produce a house which is unique and
personal, but also one which satisfies the basic needs of a good house.
this pattern language... allowed us to produce a variety of houses, each one essentially
a variant of a fundamental house "type" (defined by the twenty-one patterns together),
and yet each one personal and unique according to the special character of the family
who used it.
— Alexander 1985, pages 175-6.
It didn't all go smoothly, however, as one additional angle, the creation of new patterns,
didn't materialize in the earlier work to shape the overall cluster of houses:
the families became enthusiastic about the project as they began to see the richness
inherent in the patterns. However, our efforts to get them to modify the language,
to contribute other patterns of their own, were disappointing.
Under normal circumstances, the architect-builder of a particular area would also
modify and refine these patterns, according to local custom. In this particular project,
we were so occupied by the demands of construction that we had little time to undertake
work of this sort.
— Alexander 1985, page 133.
However, in other projects, he had better luck developing pattern languages based
on input from the community:
Once we have learned to take a reading of people's true desires and feelings, we can
then describe the patterns that are needed to generate a profound campus environment.
The system, or "language" of these patterns can give the community the beautiful world
they need and want, the physical organization that will make their world practical,
beautiful, life-enhancing, and truly useful.
— Alexander 2012, page 131.
For the Eishin Campus, Alexander's team and a large group of administrators, teachers,
and students developed 110 patterns specific to that project, informed by the broader
list in A Pattern Language but moving beyond it.
The conversation doesn't end when construction starts, either. Patterns apply at
all levels of development, from regional planning to finished details. Construction
is an opportunity to make changes along the way, as the reality of earlier decisions
becomes clears. This approach not only involves the users of the building, but transforms
the role of the architect.
it is axiomatic, for us that the people who build the houses must be active, mentally
and spiritually, while they are building, so that of course they must have the power
to make design decisions while they are building, and must have an active relation
to the conception of the building, not a passive one. This makes it doubly clear
that the builders must be architects.
— Alexander 1985, pages 74-5.
Alexander's approach obliterates the traditional separation between designers and
builders, refusing to cooperate with a model he believes creates a "breakdown of the
physical environment... It is hardly possible to experience a profound relationship
with these places. So the landscape of our era has become, and continues to become,
a wasteland." [Alexander 2012, page 80.]
Schematics, Standardization, and Alienation
What has created that wasteland? While Alexander's early writings focus mostly on
the positive he hopes to encourage, his later works, works from the field, cannot
avoid dealing with a world structured to make his style of work difficult, if not
impossible. As the scale of construction has grown, specialization and centralization
have led to ever-more detached and standardized approaches that cannot help but produce
"The great complexity needed by a human settlement cannot be transmitted via paper;
and the separation of functions between architect and builder is therefore out of
the question. The complexity can only be preserved if the architect and contractor
are one. All this makes clear that the architect must be the builder.
And the opposite is true also. In modern times, the contractor and his crew are deeply
and sadly alienated from the buildings they produce. Since the buildings are seen
as "products," no more than that, and since they are specified to the last nail by
the architect, the process of building itself becomes alienated, desiccated, a machine
assembly process, with no love in it, no feeling, no warmth, and no humanity.
— Alexander 1985, pages 74-5.
While reliance on drawings is one aspect of the problem, the industrial model of component
construction adds an entirely new level of potential trouble. Standards limit choices,
reinforcing the mistakes created by separation of concerns:
Today's systems of housing production almost all rely, in one form or another, on
standardized building components. These components may be very small (electrical
boxes, for instance), or intermediate (2x4 studs), or very large (precast concrete
rooms); but regardless of their size, buildings are understood to be assembled out
of these components. In this sense then, the actual construction phase of the housing
production process has become an assembly phase: an occasion where prefabricated components
are assembled, on site, to produce the complete houses.
It has been little understood how vast the effect of this has been on housing: how
enormous the degree of control achieved, unintentionally, by these components and
the demands of their assembly. Yet, as anyone who has intimate knowledge of building
knows, these components are merciless in their demands. They control the arrangement
of details. They prohibit variation. They are inflexible with respect to ornament,
or whimsy, or humor, or any little human touch a person might like to make."
— Alexander 1985, page 220.
Changing this is not easy, as Alexander learns repeatedly. His books have become
more biting over time, reflecting the hostility of established practice to his different
approaches. A world in which inspectors demand to approve signed drawings before
allowing construction is not especially compatible with an approach that defines itself
in terms of conversation and techniques. His latest book, The Battle for the Life and Beauty of the Earth, makes that conflict explicit, describing it as a "necessary confrontation" between
two incompatible approaches to building: System-A and System-B.
System-A is a system of production in which local adaptation is primary. Its processes
are governed by methods that make each building, and each part of each building, unique
and uniquely crafted to its context.
System-B is, on the contrary, dedicated to an overwhelmingly machinelike philosophy.
The components and products are without individual identity and most often alienating
in their psychological effect.
The pressure to use such a system comes mainly from the desire to make a profit, and
from the desire to do it at the highest possible speed.
— Alexander 2012, page 43.
In Alexander's telling, System-B grows from mass production and the ideologies of
classicism and industrialism that Ruskin and Morris blasted generations before. System-B
has spread from the Victorian factories to every aspect of building construction (and
computing as well). System-A, Alexander's preferred system, is older but has largely
been discarded in the race to industrialize everything. His more detailed telling
reveals another dimension to the problem of System-B: it is not only profit-seeking,
but its adherents have been so surrounded by it that they have a difficult time imagining
that anything else could work.
System-B is also the worldview that dominates computing. There are occasional counterculture
moments and corners that resist System-B. However, even in a software world that at
least seems like it should be more flexible than its explicitly industrial hardware side, the
march toward the mass production of identical and mildly configurable products (and
the standards that facilitate them) continues inexorably.
New Magic, from Clark and Crockford to the Present
While much of the markup world is infused with the System-B concepts that Alexander
encourages us to reject, there are corners, influences, and opportunities that can
help us cast aside markup's traditional model of design-first-and-then-execute. None
of these pieces by itself is exactly a revolution, and some may even seem contradictory.
Some of them are indeed side effects of the schema-based approach. While they may
seem familiar, combining them offers a path to a different approach and new conversations.
All of them point to opportunities for a shift in markup culture.
Ever since XML 1.0 permitted documents without a DOCTYPE declaration, it has at least
been possible to work with XML in the absence of a schema. While in most of my travels
I have only found people using this freedom for experimental purposes or for very
small-scale work, conversation on xml-dev did turn up some people who simply avoid
using schemas. They do, however, seem to use standard formats, but test and document
them by other means. These practices are often criticized, especially when the content
leaves those often closed systems [Beck 2011].
Even in the best of these cases, though, throwing off schemas is not enough if the
expectations of fixed vocabularies remain behind, as Walter Perry warned over a decade
ago in [Perry 2002]
Even the most obsessively controlling schema vocabularies allow developers to leave
some space for growth. ANY in DTDs and xs:any in XML Schema are the classics, and
variations allow a mixture of precision and openness.
Support for these gaps, however, varies wildly in practice. Both tools and culture
push back against the open models. Tools that came from the strictly defined expectations
of object definitions have a difficult time dealing with "underspecified" markup.
The culture of interoperability testing often encourages maximum agreement. Do open
spaces make it easier to create new options, or do they just create new headaches
when it's time to move from one version of the a defined vocabulary to another? Gaps
create tension with many supposed best practices.
Generic Parsers, not Parser Generators
Although it is certainly possible to write parser generators that lock tightly onto
a particular vocabulary and parse nothing else, it happens less often than might seem
likely. There are tools for creating parser generators, like [XML Booster], and there are certainly cases where it is more efficient or more secure than processing
the results of a generic parser. However, judging from job listings and general conversation,
parser generation has had a much smaller role in XML than it has had, for example,
in ASN.1. (I've had a difficult time in ASN.1 conversations even convincing developers
that generic parsers were possible and useful.)
Data binding tools, of course, can produce tight bonds even when run on top of a generic
parser, but XML's explicit support for generic parsing has at least created the opportunity
for looser coupling.
Peace Through Massive Overbuilding
Some vocabularies have taken an "everything but the kitchen sink" approach. Two of
the most popular document vocabularies, DocBook and the Text Encoding Initiative (TEI),
both include gigantic sets of components. While both provide support through tools
for those components, many organizations use subsets like the DocBook subset used
for this paper. While subsets vary, having a common foundation generally makes transformations
easy and fallback to generic tools an option. The TEI pizza chef [TEI Pizza Chef], which served up custom DTDs, typically a TEI subset, stands out as a past highlight
of this approach.
Building a vocabulary so large that most people work with it only through subsets
may seem excessive, but it opens the way to conversation among users of different
subsets. In many ways, this is similar to (though operating in the reverse direction
of) Rick Jelliffe's suggestion that:
"In particular, rather than everyone having to adopt the same schema for the same
content type, all that is necessary is for people to revise (or create) each schema
so that they are dialects (in the sense above) of the same language. That "language"
is close to being the superset information model."
— Jellife 2012.
So long as the superset model is broad enough, peace can be maintained.
Peace Through Sprinkling
Rather than building supersets, some groups have focused on building the smallest
thing that could possibly work for their use cases. Dublin Core [DCMI] is probably the most famous of these, though a variety of annotations from [WAI-ARIA] to [HTML5 Data Attributes]. These approaches offer a range of techniques for adding a portion of information
used by a certain kind of processor to other vocabularies. They allow multiple processors
to see different pieces of a document, though frequently there is still a unified
vision of the document managed by those who sprinkle in these components.
Peace Through Conflict
While I cited Connolly 1997 above, I halted the quote at a convenient point. There is more to that argument -
still schema (DTD) focused, but acknowledging conversation beyond the original creation
As competing DTDs are shared among the community, semantics are clarified by acclamation
. Furthermore, as DTDs themselves are woven into the Web, they can be discovered
dynamically, further accelerating the evolution of community ontologies.
— Connolly 1997, page 121.
While there has been more visible competition among schema languages than among vocabularies
specified with schemas, there are constant overlaps among vocabulary projects as well
as some direct collisions. "Acclamation" may be too strong a word, as steady erosion
seems a more typical process, but there is certainly motion.
Resisting System-B is easiest, perhaps perversely, in a corner of the software universe
that has long hoped to make System-B's "design by specification" possible: functional
and declarative programming. These styles of software development remove features
that add instability to imperative approaches, often in the pursuit of mathematical
provability, reliability and massive scale. These design constraints, though intended
to maximize industrial-scale processing of information, also make possible a wide
range of more flexible approaches to handling information.
The paradigmatic application of these tools in the markup world lies in the technologies
we call stylesheets, or style sheets, depending on who is editing at any given moment.
While Cascading Style Sheets (CSS) and Extensible Stylesheet Language (XSL) were frequently
seen as competitors when XML first arrived, both offer similar capabilities in this
regard. They both are (or at least can be) excellent at tolerating failure, with
little harm done.
The key to that tolerance is pattern matching - selectors for CSS, XPath for XSLT.
If patterns don't match, they don't match, and the process goes on. XSLT offers many
more features for modifying results, and is more malleable, but neither of them worry
much if a document matches their expectations. At worst they produce empty results.
XSLT is capable of operating more generically, and of working with content it didn't
match explicitly. The XSLT toolset can support reporting and transformation that
goes beyond the wildest dreams of schema enthusiasts - and can do much more useful
work than validation and annotation along the way.
Pattern matching is also central to a number of explicitly functional languages. While
they were built for things like mathematical provability, "nine nines" reliability,
and structured management of state, those constraints actually give them the power
needed to go beyond XSLT's ability to process individual documents. Erlang's "let
it crash" philosophy, for example, makes it (relatively) easy to build robust programs
that can handle flamingly unexpected situations without grinding to a halt. Failures
can be picked up and given different processing, discarded or put in a queue for different
A calm response to the unexpected opens many new possibilities.
Years ago, Walter Perry said in a talk that often the most interesting communications
were the ones that broke the rules. They might be mistakes, but they might also be
signs of changing conditions, efforts to game the system, or an indication that the
system itself was flawed.
Errors and flaws have become more popular since. While much of the effort poured
into test-driven development is about making sure they don't happen, a key side effect
of that work is new approaches to providing meaningful error messages when they do
happen. "Test failed" is useful but incomplete.
In distributed systems, errors aren't necessarily just bugs, an instant path to the
discard bin. While the binary pass/fail of many testing approaches encourage developers
to stomp out communications that aren't quite right, turning instead to the meaningful
error messages (and error handling) side of that conversation can be much more fruitful.
After decades of trying to isolate computing processes from human intervention, some
developers are now including humans in the processing chain. After all, it's not difficult
to treat such conversations as just another asynchronous call, especially in an age
of mobile devices. Not everything has to be processed instantly.
Amazon developed the [Amazon Mechanical Turk] service, named after an 18th chess-playing "machine" that turned out to have a person
inside of it. It looked like brilliant technology, and was, if humans count. Amazon
adds digital management to the approach, distributing "Human Intelligence Tasks" across
many anonymous workers. Facebook uses similar if more centralized approaches to censor
photos. [Facebook Censorship] The Mechanical Turk model has led to some dire work situations [Cushing 2012] in which humans are treated as cheap cogs in a computing machine, as a System B
industrial approach seeks cheap labor to maximize profit.
Horrible as some of these approaches are, they make it very clear that even large-scale
digital systems can pause to include humans in the decision-making process. It isn't
actually that difficult. Connecting these services to markup processing, however,
requires interfaces for letting people specify what should be done with unexpected
markup. "Keep it, it's okay" with a note, or an option to escalate to something stronger
(perhaps even human-to-human conversation) may be an acceptable start.
JSON Shakes it Up
While XML seemed to be conquering the communications universe, even finally reaching
the Web as the final X in AJAX, many developers dreamed of an escape from its strange
world of schemas, transformations, and seemingly endless debates about data representation.
Douglas Crockford found an answer uniquely well-suited to the Web, extracted in fact
JSON had an innate advantage in that it could bypass same-origin requirements, but
its use has spread far beyond those situations.
JSON uses a different syntax, but much more importantly, the nature of the conversation
expectations of structure have always been loose. Coordination can happen, but reuse
and modification is a more common pattern than formal structuring. Many JSON formats
are created by single information hubs, rather than across groups of providers, and
conversion to internal formats is just a normal fact of life for JSON data consumption.
JSON, while somewhat less readable to humans than markup, was both easy to work with
a means for developers to pass information from one program to another. Although
JSON schemas and JSON transformation tools exist, they are relatively minor corners
of JSON culture.
Despite those glaring absences, JSON use continues to expand rapidly. It replaced
XML as a default format in Ruby on Rails, and dominates current Ajax usage. Perhaps
more striking, it is becoming more common in public use, exactly the territory where
prior agreement was deemed most important [Bye XML]. It hasn't replaced XML in that space yet, but is claiming a larger and larger share.
Documentation and samples, it seems, is enough.
So why stick with markup and not just leap to JSON's more open approach? Mostly because
of the tools described in the previous section. Markup understands transformation
and decoration better than JSON. Despite its largely schema-free world, JSON is still
primarily about tight binding to program structures. The schemas are invisible, often
unspecified but they still exist when a document is loaded.
However, JSON programmers do plenty of transformation internally. They base their
expectations of schemas more on sources than on documents, and some have gone so far
as to establish simple source-testing regimens that warn them of change. Source-based
versioning is also common in the world of JSON APIs. Rather than the URI of a namespace
changing or the details of a schema, new versions of APIs are often simply hosted
at new URLs, with the changed content coming from a new location to give developers
time to adapt.
JSON's curly braces may earn sneers from XML developers who prefer their angle brackets,
but JSON is doing more with less.
XML was supposed to reach the browser. Mostly, it didn't, but the current state of
the browser has much to teach XML. Some of that is ugly, of course. Even without
a hard focus on schemas, the power of the browser vendors and the continuing insistence
on standardization have limited possibilities. Here we see that it is not only the
formalization of schemas but the cultural values surrounding them that create brittleness
and stifle experiments.
However, that very brokenness, failed versioning, and the lingering hangovers of old
doesn't understand. Building cross-browser polyfills is tricky, but [Osmani 2011] not that much more difficult than creating cross-browser frameworks. Even the limited
world of HTML5-specific polyfills is vast [Polyfills List].
largely created by browser makers' efforts to optimize bandwidth and processing.
Efforts to create a picture element - combining concerns of responsive design and
human interaction - have faced challenges around timing, pre-loading in particular.
Media processing is (as usual) one of the hardest challenges in working with the browser
environment. Establishing communications and respect between vendors and those using
their products in perhaps even more difficult.
The W3C and browser vendors are working to address those challenges as well as create
new frameworks that make polyfills easier to build and more efficient. The Shadow
DOM [Shadow DOM] and Web Components [Web Components] work both aim directly at making polyfills a more standard part of web development
environments. Google's recent work on [Polymer] is an example of a browser vendor pushing hard in this space.
We want web developers to write more declarative code, not less. This calls for eliminating
the standards bottleneck to introducing new declarative forms, and giving library
and framework authors the tools to create them.
While XML processing models are typically different, offering no tools to extend processors
at the document creator's discretion, this approach could prove useful in situations
where processors have chosen to place more trust in users. It also clearly offers
an option to developers tired of waiting for the W3C, the WHATWG, and the browser
vendors to add markup functionality to HTML5.
Structured data has had a difficult few decades in general. While XML's schemas defined
structure, relational database purists (most notably Fabian Pascal) heaped scorn on
XML's daring to step outside the sharply defined boundaries of RDBMS tables and joins.
Much of the pressure for XML Schema's insistence on deterministic structures and strongly
typed data came from communities who considered the constraints in 1990s RDBMS practice
to be a good thing - but XML's very success was a key factor in making clear that
the relational model was not the only possible story for data.
The challenges of scaling within the constraints of Atomicity, Consistency, Isolation,
and Durability (ACID), led to several rapid generations of change in the database
community. While there are probably more relational databases deployed today than
there were when XML appeared, the NoSQL movement has ended the era when developers
only chose among relational databases unless their project was extremely unusual.
This shift has little direct effect on markup processing, but it does reduce the cultural
pressures to only create data structures conforming to a well-known schema.
REST is a communications style based on HTTP's limited number of methods, treating
those constraints as a virtue teaching us to build with few verbs and many nouns.
There is nothing in REST-based work specific to schemas - schemas (or the lack thereof)
are a detail of the work that happens inside the local processors.
However, in contrast to their more RPC-based predecessors, which emerged from the
CORBA and object-oriented worlds, this lack of specification is still a significant
opening. A minimal set of verbs makes it much easier to process a much larger set
of nouns, with fewer expectations set up front.
Strictly Local Uses of Schemas
Some developers and organizations see schemas as a limited-use tool, applied primarily
in a local context to reflect local quality assurance and document creation needs.
Since the late 1990s, I've suggested that my consulting customers think of a schema
not as an integral part of XML document/data exchange, but as a special auxiliary
file that can fill up to two supporting roles:
1. A special stylesheet that renders a boolean value (valid/not valid) for QA.
2. A template for rare and high-specialized structured-authoring applications.
If you don't need at least one of those, then you probably don't need a schema.
— David Megginson [Correspondence 2013]
So long as authoring applications expect schemas as input, schemas will be necessary.
There are other ways to do quality assurance, of course, but schemas are common.
Will developers resist the temptation to apply schemas more broadly than these cases,
when tools and practice point them that direction?
Transition Components - DSDL and MCE
Much of the best thinking in markup schemas has worked under the banner of DSDL. RELAX NG, Schematron, and some more obscure pieces demonstrate more flexible alternatives
to the W3C's XML Schema. Namespace Validation Dispatching Language (NVDL) finally
offers tools for mixing validation approaches based on the namespace qualifiers applied
to content. Document Schema Renaming Language (DSRL) offers a simple transformation
approach for comparing documents to local schemas. These parts are still tightly
bound to schema validation approaches, but they at least add flexibility and add more
Markup Compatibility and Extensibility (MCE), coming out of the Open Office XML work, finally asks hard questions about different
degrees of "understanding" a document:
Attributes in the Markup Compatibility namespace shall be either Ignorable, ProcessContent,
ExtensionElements, or MustUnderstand. Elements of the Markup Compatibility namespace
shall be either AlternateContent, Choice, or Fallback.
As Rick Jelliffe describes it:
This is a kind of having your cake and eating it too, you might think; the smart thing
that gives it a hope of working is that MCE also provides some attributes PreserveElements
and PreserveAttributes which let you (the standards writer or the extended document
developer) list the elements that do not need to be stripped when modifying some markup.
I think standards developers who are facing the cat-herding issue of multiple implementations
and the need for all sorts of extensions should seriously consider the MCE approach.
— Jellife 2009
Examplotron: A Bridge?
Examplotron Examplotron is in fact a schema language, but it is also a possible bridge between expectations
from the Age of Schemas and other possibilities. It uniquely combines communications
through sample documents with the possibility of validation processing, and seems
like a base for describing further transformations. To the extent that a schema technology
could work successfully within System A, Examplotron is clearly the strongest candidate.
(Schematron isn't too far behind, but lacks the document-sharing orientation.)
Toward a New Practice
While these many pieces have been opening doors for a long time, they tend to be used
in isolation, or in contexts where schema and agreement still rule, if quietly. While
no one (to my knowledge) has yet combined them to create a model of networked markup
communications operating in Alexander's System-A, there are now more than enough pieces
to sketch out a path there.
The first step, however, is more difficult than any of the technical components.
System-A requires a shift in priorities, from industrial efficiency to local adaptation.
Despite the propaganda for "embracing change" over the last few decades, actually
valuing changefulness at the conversation level is more difficult. Schemas constrain
it, as does the software we typically build around schemas. The stories around markup
have valued the static far more than the changeful.
Change this. Value changefulness, and yes, even savageness, and let the static go.
Many organizations will consider all of these suggestions inappropriate, unless perhaps
they can be shown to provide massive cost savings within a framework they think they
can manage with their current approach. As the cost savings are likely to be modest,
and this style of processing a difficult fit with their organizational style, that
is unlikely. I make no claims that this approach can make inroads with organizations
that have regimentation, hierarchy, and control near the top of their management value
system. Obsessed with security? This is probably not for you.
Adding System-A to a System-B context is especially difficult.
So what does valuing changefulness look like in practice? It changes your toolset
from schemas to transformations. It means recognizing that schemas are in fact a
weak transformation, converting documents to a binary valid/not result with an optional
annotation system. It demands shifting to a model in which you expect to perform
explicit transformations on every document you handle. It demands taking Alexander's
model of local adaptation seriously.
If it's any comfort, XML developers are not the only ones facing this change. Dave
"It's clear (to me, at least) that the idea of programming is not longer the art of
maintaining state. Instead, we're moving towards coding as a means of transforming
state. Object Orientation has served us well with state-based programming. But now
we need to move on." [Thomas 2013]
So what does transformation look like? It operates on several levels, but perhaps
it is easiest to start with negotiation - the piece of the puzzle that schemas were
supposed to let us consider at least temporarily solved.
Negotiation in the schema sense is typically a gathering of opinions, working together
to hammer out a schema for future communications. To avoid the cost of holding that
conversation frequently, it makes sense to gather as many possible voices as possible,
though that "as many" is often slashed by requiring that the voices be 'expert'.
After all, too many people in a conversation also adds delay, often for little benefit.
The result is a formal structure that can be used to share data, a form of contract
among the participants and anyone else who wishes to join that conversation.
Most forms of negotiation, however, are much less formal than a diplomatic council
or standards organization. People bargain constantly, and not just over prices.
The forms of communications within business change constantly, often fitting the chaotic
model of shared spreadsheets much more readily than the formal structure of relational
databases. While spreadsheets carry all the headaches of individual files that are
easily misplaced, there is nothing inherently superior about a tightly-structured
centralized database. The advantages of databases emerge almost exclusively when
large quantities of data need to be shared according to a formal process. Telling
a story with a database typically requires sifting and selecting its information to
be presented in a different form.
The negotiation style of spreadsheets typically works according to a simple process:
"send me what works for you, and I'll see if I can make it work." There may be massive
gaps of spreadsheet prowess, business understanding, tools choice, or sheer power
between the participants, but having a concrete basis for the conversation generally
eases those, or leads to a request for simplification and a broader conversation.
Exchanging XML documents has a similar concrete effect. Marked up documents are not
that difficult to explore so long as you have a rough familiarity with the language
used for the markup or willingness to consult a dictionary. Just as there might be
questions about how a spreadsheet is structured, there can be conversation about markup
structure choices. Just as the contents of a spreadsheet can be questioned, there
is room to discuss whether the model the spreadsheet uses for a particular problem
is the appropriate one. Just as with spreadsheets, there is rarely need to ask for
a deeply formalized representation of the underlying model.
XML has one major advantage over a spreadsheet, however - it is far, far easier to
extract information from XML documents than from spreadsheets. Writing code that
extracts content from person A's spreadsheet and reliably places it in person B's
different spreadsheet is possible, but difficult. Changes make it even harder. Spreadsheets
weren't built with that in mind. Markup was.
Extending it ourselves
In one context, this is easy. Because the HTML world allows developers to send code
along with their markup, the situation in browsers is actually simpler than the world
where markup is the sole content of messages. Markup in the browser is surrounded
by opportunities to explain what it is and even to do something with it. (Markup
plus logic is at least as capable as spreadsheet cells and logic, after all!)
The picture polyfill debacle seems to tell the story of callous browser vendors who
have their own bad ideas blowing off the people who use their projects most intensively.
In the long run, however, it is more of a pointer to the way things can be. CSS and
HTML necessary - strictly speaking - for a very few tasks like form fields. ARIA
can provide metadata supporting accessibility, and if things are really tough, XSLT
whatever markup vocabulary to an HTML equivalent, perhaps enhanced with SVG or Canvas.
Browser vendors get lots of press, good and bad, for implementing new features and
new APIs. The quiet reality, though, is that there is no longer any good reason for
them to control our markup vocabularies. We can do that, and we can do it today.
Supporting a more flexible negotiation style in cases where documents travel without
supporting logic requires a major break with prior models of development.
Most software for processing documents created by markup - not all, but by far the
majority - follow a consistent process. A parser reads a document, checking its structure
against syntactic and possibly structural expectations. During or after that reading,
if the document passes, the data is bound to internal structures, possibly using additional
information added to the document by the testing process.
Perhaps most important, however, isn't the nature of the parser (generic or not),
the schema type used, or the nature of the data binding. The most important aspect
of this is that each receiving process only supports a single or a few vocabularies.
(The primary exception is for completely generic processing, which doesn't care and
generally doesn't know what kind of markup it's dealing with. This is common in editors
and some kinds of filters.) Extending that support, changing the vocabulary expectations,
requires programmer intervention.
Avoiding programmer intervention has been the foundation of data processing for decades.
Much of the joy around schemas celebrated that schemas themselves could be used to
create code for handling markup vocabularies. Yes, of course you still needed to
add business logic to that code, but generated code would reduce the time needed to
write, and adoption of common schemas would even allow the sharing of at least a portion
of the business logic code over time.
The design choices in schema and schema culture come from the dream of a machine that
would run by itself, with only periodic human intervention when updates were necessary,
plus the usual maintenance now considered the cost center of IT.
So what might a different model look like?
The machine doesn't run by itself. Human intervention is routinely acceptable, and
facilitated by a toolset that assumes that humans are available to do the mapping
previously performed by schemas, data-binding, and similar approaches. All participants
in a conversation have their own processing structures, but those structures are explicitly
adapted to internal needs, not external conversation. (Yes, it is possible that one
of those internal needs will be to facilitate external conversation, but that need
should not set the terms in most cases.)
Human intervention isn't required for every message, however. Over time, such a processing
system will develop a library of known transformations to and from external vocabularies
and in particular to and from known senders. Intervention arises when new or ambiguous
information arrives. A new party has joined the conversation, or a previously known
sender has added or deleted sections of a document. In short, intervention is necessary
when prior mapping failed.
Intriguingly, this approach can fulfill one of the failed promises of XML Schema:
When XML is used to exchange technical information in a multi-vendor environment,
schemas will ... help applications know when it is safe to ignore information they
do not understand, and when they must not do so. This means schemas may help make
software more robust and systems more able to change and adapt to evolving situations.
— W3C 1999
Schemas can flag information as outside of the schema and still pass it on, but it
has no way to determine the 'safety' of such information or to encourage change, unless
someone is monitoring the schema processing closely. By contrast, such cases are
not anomalies in the mapping system - they are just day to day business.
But wait, isn't there still a schema? Isn't the transformation from external representation
to internal structure a hidden schema? HTML5 style? Perhaps, especially if there
some kind of document describing that transformation that can be shared. However,
unlike the schemas that currently dominate the markup world, these transformations
don't describe a single document structure. They describe a set of possible mappings
between a document structure and a specific internal structure. That substantially
reduces the prospects for (abuse) reuse.
Also, while it is possible that a transformation would return a binary succeeded/failed
response like the valid/not-valid result of most schema processing, there is a much
broader range of possibilities. Partial mappings may be completely acceptable. Failures
may simply be prompts to create new mappings.
The cost of programmers, though. What about the cost of programmers? Their time isn't
cheap, and we have other things for them to do! Even Bray and Bosak cautiously warned
us that programmers were necessary, and schemas would only reduce the amount of time
These transformations aren't usually that difficult. Simple transformations are easily
designed through basic interfaces. Complex transformations may require template logic
on the order of XSLT. Especially difficult transformations, particularly of compact
markup styles might require an understanding of regular expressions.
Except perhaps for especially twisted regular expressions, however, none of these
skills requires what we pretend today is a "rock star" programmer. There are many
contexts, but the home context is relatively simple, and transformations themselves
are more intricate than deep. While establishing this model might require regular
contributions from experts, and having experts available is useful, over time the
need for expertise should decline. Transformation should not be as difficult as,
say, natural language translation.
Rehumanizing Electronic Communications
Changing tracks is difficult, especially when an entire technology culture has been
built on values of industrial and bureaucratic efficiency aimed at minimizing human
involvement in small but constant processing decisions. Pioneers created vocabularies
for the purpose of controlling a market and leaping ahead of competition, hoping that
the resulting brittle processing model would make it too difficult to refactor their
However, rather than striving for maximum automation, developers have the opportunity
to aim for systems that are both more flexible and more human than current models
allow. Developers can build on a model of automating what humans hate to do, rather
than automating everything. The three great virtues of the Desperate Perl Hacker
displaced by XML's arrival - laziness, impatience, and hubris - can take their rightful
and continual place in a community of processing rather than disappearing in a cloud
of efficiency. Automate what is convenient for humans communicating, not entire processes.
That changes the role of markup experts as well. Instead of consulting on vocabularies
and leaving others to implement them, we have to take a more active role on a smaller
number of projects:
Let us imagine building five hundred houses. In today's normal method of building
large-scale housing projects, one architect and one contractor often control a rather
large volume of houses or apartments...
The architect-builder... has greater powers but more limited domain. Any one architect
builder may control no more than twenty houses at a time, but he will take full responsibility
for their design and construction, and he will work far more intensely with the individual
families, and with the individual details of their houses. Thus, in this model of
construction, both design and construction are decentralized....
Of course, this means that the architect-builders play a different role in society.
There are more of them - taking the place of the alienated construction workers and
architectural draftsmen who now provide the manpower to make the centralized system
We envisage a new kind of professional who is able to see the buildings which he builds
as works of love, works of craft, individual; and who creates a process in which the
families are allowed, and even encouraged, to play their natural role in helping to
lay out their houses and helping to create their own community."
— Alexander 1985, pages 76-8.
Some of us work this way already, of course.
Working this way also allows to us focus on one of Alexander's most general patterns
in his question for local adaptation: site repair.
Buildings must always be built on those parts of the land which are in the worst condition,
not the best....
If we always build on that part of the land which is most healthy, we can be virtually
certain that a great deal of the land will always be less than healthy. If we want
the land to be healthy all over - all of it - then we must do the opposite. We must
treat every new act of building as an opportunity to mend some rent in the existing
cloth; each ac of building gives us the chance to make one of the ugliest and least
healthy parts of the environment more healthy - as for those parts which are already
healthy and beautiful - they of course need no attention. And in fact, we much discipline
ourselves most strictly to leave them alone, so that our energy actually goes to the
places which need it. This is the principle of site repair.
— Alexander 1977, pages 509-10.
This means continuous mending, continuous repair, by people close enough to the ground
to see what needs work. Continuous refactoring has entered the programming lexicon,
and needs to become a dominant part of the markup lexicon.
In a world saturated with the message of industrial efficiency and in a computing
culture deeply soaked in expectations of standardizing away as many interactions as
possible, this Alexanderine model will be at least as difficult to achieve as it is
in architecture. Possible on the margins, yes, perhaps in experiments, but not to
be taken seriously. That is a normal starting point for this conversation, and this
paper just an opening.
Markup workers of the world, unite! You have nothing to lose but your chains.