Automatic XML Namespaces

Liam R. E. Quin

Abstract

XML, originally called Web SGML, was designed as a subset of SGML suitable for use on the World Wide Web. Shortly after the publication of XML, the Names in XML specification, better known as Namespaces, was published. One of its primary purposes was to facilitate the exchange of RDF metadata in XML. XML Namespaces are somewhat clumsy and unwieldy, under-specified, and incomplete. They have gained considerable adoption within XML vocabularies, often with reluctance, but have met with resistance from the Web community.

This paper explores some of the difficulties (whether real or perceived) with Namespaces, suggests ways to address some of those difficulties, and also suggests ways to attempt some reconciliation with the Web developer community.

Introduction

The XML community has lived with XML namespaces for a decade. They are useful to the point of seeming indispensable, they are ubiquitous, and yet they are at the same time unwieldy and flawed. Namespace declarations can be inconvenient to remember, and errors in them are frequently the source of subtle and hard-to-diagnose errors. From a programming perspective, namespaces provide scope and disambiguation; from a document authoring perspective, namespaces provide headaches. For an HTML author, working in a world in which the browsers tend to suppress or auto-correct errors, and in which MathML, XHTML, SVG, XForms, Dublin Core and more each have their own namespace URI, the need to pre-declare large sets of namespaces quickly becomes onerous.

In this paper the author proposes a simple system to simplify namespace declaration, and to enhance namespace functionality considerably by introducing a single new feature, without losing the existing benefits. The paper first describes in more detail some of the issues, then summarises the issues with requirements for change, then discusses other proposals, and finally makes a concrete proposal.

XHTML and Namespaces

Consider a typical XHTML document that also uses XForms, SVG, MathML, and has some metadata using the Dublin Core and FOAF. Already we are up to six namespace declarations (including the one for XHTML) and we have hardly begun. SVG uses XLink adding another, and so it continues. Documents with twenty or more declarations are sometimes seen.

Of course, many of these documents are generated automatically rather than hand-authored. Even there, the burden of maintaining the declarations should not be underestimated. To XML people familiar with the mechanisms and overhead of the namespace syntax there may seem (at least at first sight) no problem, but to HTML authors the difference is startling. Is it necessary?

Recall that namespaces are serving two primary functions: they are associating names with the specifications that define them, and they are disambiguating in the case that two specifications define the same name. In practice, conflicts where the same name is defined by multiple specifications are rare (although still important enough to need addressing). For XHTML, the DOCTYPE declaration already is sufficient to bestow HTML-ness, and, within the context of HTML, an svg element only has one plausible interpretation. A significant motivation driving the use of XHTML is that XML tools can be used with the document, and for these tools, SVG-ness is not associated with any particular element name. One could use #FIXED attributes in a document type definition, but we will see later why this is not a satisfactory approach. The HTTP MIME Content Type can also be used to indicate combinations (application/xml+svg+mathml) but since every combination must be registered with IANA, this does not scale; it also doesn't work on a local file system.

A major goal of the work described in this paper, then, must be to eliminate as much syntax as possible without losing the benefit of being able to combine namespaces at will and have both XML and Web tools operate on the result.

The XML community will not be motivated to support a new specification merely to satisfy the needs of some other community. We would need to do more. Happily, there is more to be done. Extensive use of namespaces has demonstrated a need for an easy way for users to define their own namespaces that are a mixture of existing namespaces and their own elements, and to check that

HTML 5, XML and Namespaces

In the last couple of years, a number of individuals have gathered support for renewed work on non-XML versions of HTML. These are also not based on SGML, but instead are an SGML-inspired format. Avowed dislike of XML appears to have stemmed at least in part from misunderstandings and in part from the stricter and more verbose syntax. For these people, robustness, accuracy, error detection and correctness are relatively unimportant: all that matters is that the Web browser render an acceptable result. At the time of writing, the HTML Working Group is considering hard-wiring MathML and SVG namespaces into the HTML specification, so that an svg element would automatically be placed into the SVG namespace. This would make it harder to process the documents with other tools, for example it's tricky to match SVG elements with XPath or with XSLT match expressions if you don't know in advance whether there will be a namespace declaration, and, if there is, whether it will be correct. In fairness it should be noted that, since HTML 5 processors are expected to auto-correct certain classes of syntax errors which XML processors are required not to attempt to correct, one cannot in general expect to process HTML 5 documents with XML tools. None the less it is reasonable to be able to expect to generate HTML documents from XML, and also to use JavaScript, XPath and other tools on the HTML DOM.

Other XML Environments where Namespaces may be Suboptimal

Anywhere that users have to declare a large number of mostly orthogonal (non-overlapping) namespaces is a candidate for improvement: it is particularly unfortunate that users cannot themselves combine namespaces to make new amalgamated ones, such as XSLT plus SVG plus HTML.

Some difficulties when using multiple namespaces today include:

The need to remember long URIs: people often use copy and paste, which can result in extra declarations being pasted in and later causing problems; or, they re-type the URI by hand and make errors, with the result that software later doesn't recognise the namespace correctly.
The need for humans to remember which namespace defines which element or attribute, even where there's no clear functional gain. For example, remembering that href comes from XLink in SVG, and from XHTML in some other vocabulary.
Matching mixed-namespace documents with XPath, whether for XSLT or for XQuery or (hardest of all) stand-alone XPath, is distressingly exciting. The most commonly asked questions on XML support channels are about processing namespaces.

Requirements for a Solution

Freely Available	If you need to pay for the spec, the Web developer community is not interested.
Freely Implementable	There should be no patent encumbrances; since this is in practice not determinable, at the very least the people developing the specification, and the organisation publishing it, mast make effort to ensure that people using or implementing the specification won't suddenly be asked to pay royalties.
Makes life simpler	Although part of the goal of Automatic Namespaces is to enable HTML 5 documents to be namespace-well-formed in memory, it's important to remember that this is a motivation for XML people but not HTML people. Therefore, to gain adoption, Automatic Namespaces must not require the user to understand new concepts. For example, the user should not have to declare or use namespace bindings, since those are the things that are objected to the most.
Easy to Implement	The solution must be easy to implement in JavaScript (for example, an experienced JavaScript programmer should be able to write a complete implementation in under an hour, including the time taken to read the specification). This is partly because otherwise no-one will do it, and partly because a JavaScript implementation that is more than a few hundred bytes will not be appealing to people trying to make Web sites that load quickly. The specification must also be easy to implement in XSLT and other environments.
Compatible with Today's Web	The solution must work in Web browsers that are in use today, at least in the vast majority of cases. People won't upgrade their Web browsers to view Web pages using namespaces.
Gives clear benefits to XML people	I'm not out just to make the HTML 5 people feel vaguely more karmic. They can do that all by themselves. I aim to write a spec that will mean XML users can benefit too.

Existing and Proposed Technologies

Others have identified a need in this area. A recent discussion on the xml-dev list quin2009aelicited an incomplete proposal that will be discussed below, together with two other methods, #FIXED attributes and ISO DSRL. Private communication has also provided a JavaScript-based partial solution that will also be described below.

Default (#FIXED) attributes in a DTD

The idea here is to have a document type definition that supplies xmlns:svg and so forth as #FIXED attribute values. This would be interesting if Web browsers fetched DTDs. One could consider JavaScript to fetch the DTD from the W3C servers, but W3C can't afford the bandwidth, and there are file access restrictions on JavaScript that may make a persistent cache or XML Catalogue approach hard to implement; in addition, both the SYSTEM and the PUBLIC identifiers are fixed, so one cannot serve HTML documents with server-specific values. This DTD-based approach might work fine outside the HTML world, but it turns out that today's Web browsers reject documents containing qnames if the prefix has not been explicitly bound to a URI. This means that a document would not be considered as well-formed without the DTD: progressive rendering as the document loads would have to wait for the DTD, and an unavailable DTD would prevent the document from loading. Worse, since the browsers don't fetch the DTD themselves, the approach of defining default prefixes in a DTD can only lead to documents that the browsers can't load. Since our goal is to reach out and build bridges between HTML and XML worlds and we must (regretfully) dismiss this approach.

Information Technology — Document Schema Definition Languages (DSDL) — Part 8: Document Semantics Renaming Language (DSRL)

ISO Joint Technical Committee ISO/IEC JTC 1, Information Technology, Subcommittee SC 34, Document Description and Processing Languages has recently produced a draft of their Document Semantics Renaming Language (DSRL) . This document does not appear to make clear how a DSRL mapping is located, given an instance document, although separate evidence private communication suggests a plan to use a processing instruction. Possibly the ISO committee would be amenable to an alternative suggestion that is more likely to work in HTML-based Web browsers, since processing instructions interfere with PHP processing, and also cannot portably appear before the end of the document's head element, which may be too late in a world of progressive rendering.

The DSRL specification describes a powerful mechanism to map elements, attributes, processing instructions and entities (!) in the XML document to alternates. You can map any element to a new (namespace, element) combination, where the replacement is part of a validating schema. This specification is almost certainly too complex to implement in the few hundred bytes of JavaScript we can allow ourselves, although the possibility of defining HTML entities using an XML syntax is very alluring. DSRL uses XPath to specify the context in which remapping is to occur. One could thus map every third svg element in the document, if desired.

Overall, DSRL seems very promising. It appears to do what is needed. But, like a US Congress bill, it comes with a lot of additional baggage, some of which is problematic for us, and also is some missing functionality we need:

DSRL files themselves have at least three namespace declarations in them. We want something that doesn't need to have any additional declarations, if possible.
DSRL appears to lack an inclusion facility. One could use XInclude, perhaps, but at the cost of added syntactic complexity that we are trying to avoid. An XML-savvy user could create DSRL files with XSLT or XQuery, but again, that's a level beyond our expectations. We want to be able to combine namespaces to make new ones, and DSRL isn't designed to do that.
DSRL requires an explicit reference from the document to the DSRL file, but using a processing instruction. Processing instructions can cause problems in Web browser environments: they generally work in application/xml documents but in HTML and XHTML documents they can be confused with PHP on the server, and can also be (incorrectly) displayed by a Web browser: any unrecognised markup terminates the HTML HEAD element, so a processing instruction can also break stylesheet and script links in older browsers, and may even be rendered as text content [ISO; XHTML-C1, Appendix C1].

JavaScript-based solutions

The trouble with a purely JavaScript approach is that it works in a Web browser but not in a pure XML environment. It is a bridge that reaches only one shore of the river. But JavaScript can indeed be part of a solution, as we shall see.

First, we should note that, as things stand (April, 2009), HTML 5 says that certain elements, such as svg and math, are to be placed in the namespaces one might expect automatically. Unfortunately, existing Web browsers do not behave this way. Once HTML 5 becomes a W3C Recommendation one might reasonably expect to see implementations, but a great many people will still be using older browsers. This also presents an incompatibility with XPath used from JavaScript, because older browsers put everything in the default (non) namespace.

One can use JavaScript to place elements in the right namespace to make older browsers work the same way as newer ones, and this leads us towards a possible solution.

We should note, however, that any JavaScript that changes the DOM tree representation of the document might break other scripts, so our route to a short-term solution is not without problems.

Another compatibility issue is that XPath name test expressions with no namespace match only elements in the default namespace; a request has been made to bless the idea that they also can match elements in the HTML namespace. There is ongoing discussion in this area at the time of writing.

Disruptive Approaches

In the interest of completeness, it is worth mentioning an approach to changing namespaces that are incompatible with current practice. Ian Hickson has suggested [Hixie2006] adding micro-syntax within the xmlns pseudo-attribute, after the namespace URI, to allow a search path of namespaces. This would cause interoperability problems with existing XML software, but does show that others have considered this problem space in the past. Private communication tells the author that others have also given thought to this problem.

Proposal: Automatic Namespaces

The goals of the Automatic Namespace mechanism are to allow document authors to define their own namespace mix-ins in terms of other namespaces and to refer to them, and also to minimise the amount of syntax needed for declarations—in the case of HTML, ideally, to zero.

The first goal, defining mix-ins, is so that one can have a combination such as HTML, MathML and GeoML, using a single URI to refer to the definition, and share that definition between many documents. A Web browser would act as if a default mix-in had been read; for older Web browsers this could be simulated with JavaScript. This means that if you want the default HTML 5 behaviour, in almost all cases you don't need to do anything at all; but if you want some other combination, or you need to extend the HTML 5 set of namespaces, you can do so easily.

The second goal, minimising syntax, is necessary in order to have any hope of adoption by HTML and XHTML authors.

People reading a draft of this paper commented that a greater barrier to XML adoption in the HTML world was the draconian error-handling, which they believed meant that a Web browser must reject any document that claims to be XML but is not well-formed. This is an unfortunate mis-perception: in fact, the restriction is that the browser must not claim such a resource to be a well-formed XML document, but, once it is not XML it is outside the scope of the XML specification, and error recovery is perfectly acceptable, as long as no claim is made that the original document is itself XML. So it seems to this author that the barrier is not draconian error handling, but browser writers. So, rather than address a problem that appears not to exist, the approach here is to address a real difficulty that might be pointed out as a barrier if the draconian error-handling straw-man were to be removed. There is no possibility of making the unfamiliar familiar without acquaintance, but first impressions count for a lot.

The Automatic Namespace Definition

An automatic namespace definition file is a simple XML document. It does not itself use a namespace, and does not need a DTD (although there is one) or schema (although you can have those too if you like). Let's start with a simple example:

<ns>
  <element>
    <name>svg</name>
    <uri>http://www.w3.org/2000/svg</uri>
  </element>
</ns>

The example says that whenever an element called svg is encountered, it introduces a new default namespace, with the given URI, which will apply both to it and to all its children, unless of course they are themselves listed in a namespace file, or unless they have explicit namespace bindings in the document.

This much one could do with DSRL, except that we have not used a namespace declaration for our namespace definition file itself. We also need a way to connect the namespace definition with the document; if we are in an HTML document, we could use a link:

<link rel="ns" href="ns.xml" />

This markup would go in the HTML head, although it is only needed if your namespace differs from the HTML 5 default, or if you are using XHTML. For older (i.e. all existing) browsers we also need to add a link to some JavaScript to make it work:

<script type="JavaScript src="ns.js" />

This is more complexity than declaring the SVG namespace, so it's hard to imagine people doing it. But what if we define several namespaces in one ns file?

<ns>
  <element>
    <name>svg</name>
    <uri>http://www.w3.org/2000/svg</uri>
  </element>
  <element>
    <name>math</name>
    <uri>http://www.w3.org/1998/Math/MathML</uri>
  </element>
</ns>

Now we are starting to make some headway, because although we added a lot of garbage, we don't need to change the declarations in the document. We could still do this with DSRL, or by putting the namespaces right into that JavaScript code. Now suppose we want to add the Enterprise Access Control Markup Language (invented for the purpose of example), EACML, and that we have a separate eacml.ns file that defines its element and namespace. We'll go back to our SVG-only example to keep the example listing shorter.

<ns>
  <element>
    <name>svg</name>
    <uri>http://www.w3.org/2000/svg</uri>
  </element>
  <element src="eacml.ns />
</ns>

Perhaps this markup is a little odd, but the idea is to have an analogy with HTML script elements. In addition, a non-empty element can supply a namespace URI, so that if the software recognises the URI, it does not need to fetch the ns file.

For XML documents in general, we could use an attribute on the top-level element (or any beneath it):

<mydocument xml:ns="ns.x" />

Such a declaration would need the blessing of the W3C XML Core Working Group, and at the time of writing has neither been proposed to them nor discussed by them.

Attributes

It turns out that some markup languages need to have some of their attributes in a different namespace from their elements. This, of course, is because the languages predate the invention of automatic namespaces!

<ns>
  <element>
    <name>svg</name>
    <uri>http://www.w3.org/2000/svg</uri>
    <attribute>
      <name>href</name>
      <uri>http://www.w3.org/1999/xlink</uri>
    <attribute>
  </element>
</ns>

Now any href attribute anywhere inside an SVG element (or, more precisely, affixed to an element in the SVG namespace) will be put in the XLink namespace.

Namespace Prefixes

What if you need to disambiguate an element? Or if you need to put a prefixed QName into an attribute?

The first answer is that you really don't want to put prefixed names into attribute values. You might think that you do, but you are deluded. If you should happen to persist, we will honour your delusion. But we will not make it too easy. The answer is that you can bind prefixes in just the way you always did, even in the presence of namespace files.

The original design of automatic namespaces let you name a prefix in the ns file, and use it in the instance, but it turns out you can't do that: your document would not be considered well-formed by Web browsers, which defeats the purpose. The second attempt was to use a prefix character other than the colon, but at that point it seems just as easy to declare the namespace. This is an area of experimentation at the time of writing.

Note that a DSRL-based approach to disambiguation might be to define a rewrite, so that the names of the elements are changed. Automatic Namespace Files do not support renaming elements or attributes beyond associating namespaces with them, partly because of the goal of having documents that work in Web browsers as much as possible, and partly because that seems a lot more than just defining namespace mix-ins.

Changing DSRL

Since DSRL already exists, it seems reasonable to ask how it could be changed to support automatic namespaces.

Support an implicit link to a DSRL definition, supplied by an application (such as a Web browser), rather than requiring a processing instruction. We want to allow XHTML documents to be legal with a minimum of extra work for their authors.
Allow a DSRL processor to recognise a default namespace, so that DSRL documents do not themselves need to use namespace bindings in the most common cases.
Add an inclusion facility, so that one DSRL document can reference another, preferably using a terminology that suggests namespace bindings rather than the renaming of elements.
Ensure that there are royalty-free patent commitments from all authors of the specification.
Ensure that the text of the specification will be freely available, and can be freely reproduced in books, tutorials and elsewhere.

Although DSRL is not exactly aligned with automatic namespaces, it seems worth exploring further.

Conclusions and Ongoing and Future Work

Automatic Namespaces can considerably reduce the amount of syntax at the start of XHTML documents. They can also legitimize HTML 5 parsing, by having a default namespace file that specifies the behaviour, a sort of in-browser thought experiment. Automatic Namespaces can also help with other XML applications, because although currently you'd need to use (say) XSLT to process the namespace file and the instance, this is a straight forward thing to do in may pipeline-based work flows.

Links to other resources, such as schemas and style sheets, human-readable documentation and more could arguably live in the same namespace file in the future. The mechanism proposed is easily extensible by adding new elements.

A simple mechanism that can be implemented in JavaScript for XHTML documents can also be useful for more general XML, and could help to give a way to consider HTML 5 documents to have mixed namespaces without, in most cases, any need to declare them.

References

[Hixie2006] Hickson, Ian, How to make namespaces in XML easier, blog entry from 2006-06-29, online at http://ln.hixie.ch/?start=1151612438.

[ISO] ISO, ISO/IEC 19757-8:2008 Information technology -- Document Schema Definition Languages (DSDL) -- Part 8: Document Semantics Renaming Language (DSRL), online at www.iso.org; €98 fee applies for downloading the PDF.

[XHTML-C1] XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition) A Reformulation of HTML 4 in XML 1.0, W3C Recommendation 26 January 2000, online at www.w3.org/TR/xhtml1/.

Author's keywords for this paper:

XML; Markup; Namespaces

Liam R. E. Quin

XML Activity Lead

W3C

`<liam@w3.org>`

Liam Quin has been involved with declarative, descriptive markup since the early 1980s. He wrote his open-source text retrieval system and first distributed it in the late 1980s

He has worked at the World Wide Web Consortium since 2001, where he is XML Activity Lead, or, informally, Mrs XML.

BalisageThe Markup Conference

Balisage Paper: Automatic XML Namespaces

Liam R. E. Quin

`<liam@w3.org>`

Table of Contents

Introduction

XHTML and Namespaces

HTML 5, XML and Namespaces

Other XML Environments where Namespaces may be Suboptimal

Requirements for a Solution

Existing and Proposed Technologies

Default (#FIXED) attributes in a DTD

Information Technology — Document Schema Definition Languages (DSDL) — Part 8: Document Semantics Renaming Language (DSRL)

JavaScript-based solutions

Disruptive Approaches

Proposal: Automatic Namespaces

The Automatic Namespace Definition

Attributes

Namespace Prefixes

Changing DSRL

Conclusions and Ongoing and Future Work

References

Author's keywords for this paper:

`<liam@w3.org>`

Balisage Series on Markup Technologies