The XML community has lived with XML namespaces for a decade. They are useful to the point of seeming indispensable, they are ubiquitous, and yet they are at the same time unwieldy and flawed. Namespace declarations can be inconvenient to remember, and errors in them are frequently the source of subtle and hard-to-diagnose errors. From a programming perspective, namespaces provide scope and disambiguation; from a document authoring perspective, namespaces provide headaches. For an HTML author, working in a world in which the browsers tend to suppress or auto-correct errors, and in which MathML, XHTML, SVG, XForms, Dublin Core and more each have their own namespace URI, the need to pre-declare large sets of namespaces quickly becomes onerous.
In this paper the author proposes a simple system to simplify namespace declaration, and to enhance namespace functionality considerably by introducing a single new feature, without losing the existing benefits. The paper first describes in more detail some of the issues, then summarises the issues with requirements for change, then discusses other proposals, and finally makes a concrete proposal.
XHTML and Namespaces
Consider a typical XHTML document that also uses XForms, SVG, MathML, and has some metadata using the Dublin Core and FOAF. Already we are up to six namespace declarations (including the one for XHTML) and we have hardly begun. SVG uses XLink adding another, and so it continues. Documents with twenty or more declarations are sometimes seen.
Of course, many of these documents are generated automatically rather than hand-authored. Even there, the burden of maintaining the declarations should not be underestimated. To XML people familiar with the mechanisms and overhead of the namespace syntax there may seem (at least at first sight) no problem, but to HTML authors the difference is startling. Is it necessary?
Recall that namespaces are serving two primary functions: they are
associating names with the specifications that define them, and they are
disambiguating in the case that two specifications define the same name.
In practice, conflicts where the same name is defined by multiple
specifications are rare (although still important enough to need
addressing). For XHTML, the DOCTYPE declaration already is sufficient to
bestow HTML-ness, and, within the context of HTML, an
element only has one plausible interpretation. A significant motivation
driving the use of XHTML is that XML tools can be used with the document,
and for these tools, SVG-ness is not associated with any particular
element name. One could use
#FIXED attributes in a document
type definition, but we will see later why this is not a satisfactory
approach. The HTTP MIME Content Type can also be used to indicate
combinations (application/xml+svg+mathml) but since every combination must
be registered with IANA, this does not scale; it also doesn't work on a
local file system.
A major goal of the work described in this paper, then, must be to eliminate as much syntax as possible without losing the benefit of being able to combine namespaces at will and have both XML and Web tools operate on the result.
The XML community will not be motivated to support a new specification merely to satisfy the needs of some other community. We would need to do more. Happily, there is more to be done. Extensive use of namespaces has demonstrated a need for an easy way for users to define their own namespaces that are a mixture of existing namespaces and their own elements, and to check that
HTML 5, XML and Namespaces
In the last couple of years, a number of individuals have gathered
support for renewed work on non-XML versions of HTML. These are also not
based on SGML, but instead are an SGML-inspired format. Avowed dislike of
XML appears to have stemmed at least in part from misunderstandings and in
part from the stricter and more verbose syntax. For these people,
robustness, accuracy, error detection and correctness are relatively
unimportant: all that matters is that the Web browser render an acceptable
result. At the time of writing, the HTML Working Group is considering
hard-wiring MathML and SVG namespaces into the HTML specification, so that
svg element would automatically be placed into the SVG
namespace. This would make it harder to process the documents with other
tools, for example it's tricky to match SVG elements with XPath or with
XSLT match expressions if you don't know in advance whether there will be
a namespace declaration, and, if there is, whether it will be correct. In
fairness it should be noted that, since HTML 5 processors are expected to
auto-correct certain classes of syntax errors which XML processors are
required not to attempt to correct, one cannot in general expect to
process HTML 5 documents with XML tools. None the less it is reasonable to
be able to expect to generate HTML documents from XML, and also to use
Other XML Environments where Namespaces may be Suboptimal
Anywhere that users have to declare a large number of mostly orthogonal (non-overlapping) namespaces is a candidate for improvement: it is particularly unfortunate that users cannot themselves combine namespaces to make new amalgamated ones, such as XSLT plus SVG plus HTML.
Some difficulties when using multiple namespaces today include:
The need to remember long URIs: people often use copy and paste, which can result in extra declarations being pasted in and later causing problems; or, they re-type the URI by hand and make errors, with the result that software later doesn't recognise the namespace correctly.
The need for humans to remember which namespace defines which element or attribute, even where there's no clear functional gain. For example, remembering that
hrefcomes from XLink in SVG, and from XHTML in some other vocabulary.
Matching mixed-namespace documents with XPath, whether for XSLT or for XQuery or (hardest of all) stand-alone XPath, is distressingly exciting. The most commonly asked questions on XML support channels are about processing namespaces.
Requirements for a Solution
If you need to pay for the spec, the Web developer community is not interested.
There should be no patent encumbrances; since this is in practice not determinable, at the very least the people developing the specification, and the organisation publishing it, mast make effort to ensure that people using or implementing the specification won't suddenly be asked to pay royalties.
Makes life simpler
Although part of the goal of Automatic Namespaces is to enable HTML 5 documents to be namespace-well-formed in memory, it's important to remember that this is a motivation for XML people but not HTML people. Therefore, to gain adoption, Automatic Namespaces must not require the user to understand new concepts. For example, the user should not have to declare or use namespace bindings, since those are the things that are objected to the most.
Easy to Implement
Compatible with Today's Web
The solution must work in Web browsers that are in use today, at least in the vast majority of cases. People won't upgrade their Web browsers to view Web pages using namespaces.
Gives clear benefits to XML people
I'm not out just to make the HTML 5 people feel vaguely more karmic. They can do that all by themselves. I aim to write a spec that will mean XML users can benefit too.
Existing and Proposed Technologies
Others have identified a need in this area. A recent discussion on
the xml-dev list quin2009aelicited an incomplete
proposal that will be discussed below, together with two other methods,
#FIXED attributes and ISO DSRL. Private communication has
Default (#FIXED) attributes in a DTD
The idea here is to have a document type definition that supplies
xmlns:svg and so forth as
#FIXED attribute values. This
would be interesting if Web browsers fetched DTDs. One could consider
may make a persistent cache or XML Catalogue approach hard to implement;
in addition, both the SYSTEM and the PUBLIC identifiers are fixed, so
one cannot serve HTML documents with server-specific values. This
DTD-based approach might work fine outside the HTML world, but it turns
out that today's Web browsers reject documents containing qnames if the
prefix has not been explicitly bound to a URI. This means that a
document would not be considered as well-formed without the DTD:
progressive rendering as the document loads would have to wait for the
DTD, and an unavailable DTD would prevent the document from loading.
Worse, since the browsers don't fetch the DTD themselves, the approach
of defining default prefixes in a DTD can only lead to documents that
the browsers can't load. Since our goal is to reach out and build
bridges between HTML and XML worlds and we must (regretfully) dismiss
Information Technology — Document Schema Definition Languages (DSDL) — Part 8: Document Semantics Renaming Language (DSRL)
ISO Joint Technical Committee ISO/IEC JTC 1, Information Technology, Subcommittee SC 34, Document Description and Processing Languages has recently produced a draft of their Document Semantics Renaming Language (DSRL) . This document does not appear to make clear how a DSRL mapping is located, given an instance document, although separate evidence private communication suggests a plan to use a processing instruction. Possibly the ISO committee would be amenable to an alternative suggestion that is more likely to work in HTML-based Web browsers, since processing instructions interfere with PHP processing, and also cannot portably appear before the end of the document's head element, which may be too late in a world of progressive rendering.
The DSRL specification describes a powerful mechanism to map
elements, attributes, processing instructions and entities (!) in the
XML document to alternates. You can map any element to a new (namespace,
element) combination, where the replacement is part of a validating
schema. This specification is almost certainly too complex to implement
the possibility of defining HTML entities using an XML syntax is very
alluring. DSRL uses XPath to specify the context in which remapping is
to occur. One could thus map every third
svg element in the
document, if desired.
Overall, DSRL seems very promising. It appears to do what is needed. But, like a US Congress bill, it comes with a lot of additional baggage, some of which is problematic for us, and also is some missing functionality we need:
DSRL files themselves have at least three namespace declarations in them. We want something that doesn't need to have any additional declarations, if possible.
DSRL appears to lack an inclusion facility. One could use XInclude, perhaps, but at the cost of added syntactic complexity that we are trying to avoid. An XML-savvy user could create DSRL files with XSLT or XQuery, but again, that's a level beyond our expectations. We want to be able to combine namespaces to make new ones, and DSRL isn't designed to do that.
DSRL requires an explicit reference from the document to the DSRL file, but using a processing instruction. Processing instructions can cause problems in Web browser environments: they generally work in application/xml documents but in HTML and XHTML documents they can be confused with PHP on the server, and can also be (incorrectly) displayed by a Web browser: any unrecognised markup terminates the HTML HEAD element, so a processing instruction can also break stylesheet and script links in older browsers, and may even be rendered as text content [ISO; XHTML-C1, Appendix C1].
First, we should note that, as things stand (April, 2009), HTML 5
says that certain elements, such as
math, are to be placed in the namespaces one might expect
automatically. Unfortunately, existing Web browsers do not behave this
way. Once HTML 5 becomes a W3C Recommendation one might reasonably
expect to see implementations, but a great many people will still be
using older browsers. This also presents an incompatibility with XPath
default (non) namespace.
Another compatibility issue is that XPath name test expressions with no namespace match only elements in the default namespace; a request has been made to bless the idea that they also can match elements in the HTML namespace. There is ongoing discussion in this area at the time of writing.
In the interest of completeness, it is worth mentioning an approach to changing namespaces that are incompatible with current practice. Ian Hickson has suggested [Hixie2006] adding micro-syntax within the xmlns pseudo-attribute, after the namespace URI, to allow a search path of namespaces. This would cause interoperability problems with existing XML software, but does show that others have considered this problem space in the past. Private communication tells the author that others have also given thought to this problem.
Proposal: Automatic Namespaces
The goals of the Automatic Namespace mechanism are to allow document authors to define their own namespace mix-ins in terms of other namespaces and to refer to them, and also to minimise the amount of syntax needed for declarations—in the case of HTML, ideally, to zero.
The second goal, minimising syntax, is necessary in order to have any hope of adoption by HTML and XHTML authors.
People reading a draft of this paper commented that a greater barrier to XML adoption in the HTML world was the draconian error-handling, which they believed meant that a Web browser must reject any document that claims to be XML but is not well-formed. This is an unfortunate mis-perception: in fact, the restriction is that the browser must not claim such a resource to be a well-formed XML document, but, once it is not XML it is outside the scope of the XML specification, and error recovery is perfectly acceptable, as long as no claim is made that the original document is itself XML. So it seems to this author that the barrier is not draconian error handling, but browser writers. So, rather than address a problem that appears not to exist, the approach here is to address a real difficulty that might be pointed out as a barrier if the draconian error-handling straw-man were to be removed. There is no possibility of making the unfamiliar familiar without acquaintance, but first impressions count for a lot.
The Automatic Namespace Definition
An automatic namespace definition file is a simple XML document. It does not itself use a namespace, and does not need a DTD (although there is one) or schema (although you can have those too if you like). Let's start with a simple example:
<ns> <element> <name>svg</name> <uri>http://www.w3.org/2000/svg</uri> </element> </ns>
The example says that whenever an element called
is encountered, it introduces a new default namespace, with the given
URI, which will apply both to it and to all its children, unless of
course they are themselves listed in a namespace file, or unless they
have explicit namespace bindings in the document.
This much one could do with DSRL, except that we have not used a namespace declaration for our namespace definition file itself. We also need a way to connect the namespace definition with the document; if we are in an HTML document, we could use a link:
<link rel="ns" href="ns.xml" />
This is more complexity than declaring the SVG namespace, so it's hard to imagine people doing it. But what if we define several namespaces in one ns file?
<ns> <element> <name>svg</name> <uri>http://www.w3.org/2000/svg</uri> </element> <element> <name>math</name> <uri>http://www.w3.org/1998/Math/MathML</uri> </element> </ns>
<ns> <element> <name>svg</name> <uri>http://www.w3.org/2000/svg</uri> </element> <element src="eacml.ns /> </ns>
Perhaps this markup is a little odd, but the idea is to have an analogy with HTML script elements. In addition, a non-empty element can supply a namespace URI, so that if the software recognises the URI, it does not need to fetch the ns file.
For XML documents in general, we could use an attribute on the top-level element (or any beneath it):
<mydocument xml:ns="ns.x" />
Such a declaration would need the blessing of the W3C XML Core Working Group, and at the time of writing has neither been proposed to them nor discussed by them.
It turns out that some markup languages need to have some of their attributes in a different namespace from their elements. This, of course, is because the languages predate the invention of automatic namespaces!
<ns> <element> <name>svg</name> <uri>http://www.w3.org/2000/svg</uri> <attribute> <name>href</name> <uri>http://www.w3.org/1999/xlink</uri> <attribute> </element> </ns>
Now any href attribute anywhere inside an SVG element (or, more precisely, affixed to an element in the SVG namespace) will be put in the XLink namespace.
What if you need to disambiguate an element? Or if you need to put a prefixed QName into an attribute?
The first answer is that you really don't want to put prefixed names into attribute values. You might think that you do, but you are deluded. If you should happen to persist, we will honour your delusion. But we will not make it too easy. The answer is that you can bind prefixes in just the way you always did, even in the presence of namespace files.
The original design of automatic namespaces let you name a prefix in the ns file, and use it in the instance, but it turns out you can't do that: your document would not be considered well-formed by Web browsers, which defeats the purpose. The second attempt was to use a prefix character other than the colon, but at that point it seems just as easy to declare the namespace. This is an area of experimentation at the time of writing.
Note that a DSRL-based approach to disambiguation might be to define a rewrite, so that the names of the elements are changed. Automatic Namespace Files do not support renaming elements or attributes beyond associating namespaces with them, partly because of the goal of having documents that work in Web browsers as much as possible, and partly because that seems a lot more than just defining namespace mix-ins.
Since DSRL already exists, it seems reasonable to ask how it could be changed to support automatic namespaces.
Support an implicit link to a DSRL definition, supplied by an application (such as a Web browser), rather than requiring a processing instruction. We want to allow XHTML documents to be legal with a minimum of extra work for their authors.
Allow a DSRL processor to recognise a default namespace, so that DSRL documents do not themselves need to use namespace bindings in the most common cases.
Add an inclusion facility, so that one DSRL document can reference another, preferably using a terminology that suggests namespace bindings rather than the renaming of elements.
Ensure that there are royalty-free patent commitments from all authors of the specification.
Ensure that the text of the specification will be freely available, and can be freely reproduced in books, tutorials and elsewhere.
Although DSRL is not exactly aligned with automatic namespaces, it seems worth exploring further.
Conclusions and Ongoing and Future Work
Automatic Namespaces can considerably reduce the amount of syntax at the start of XHTML documents. They can also legitimize HTML 5 parsing, by having a default namespace file that specifies the behaviour, a sort of in-browser thought experiment. Automatic Namespaces can also help with other XML applications, because although currently you'd need to use (say) XSLT to process the namespace file and the instance, this is a straight forward thing to do in may pipeline-based work flows.
Links to other resources, such as schemas and style sheets, human-readable documentation and more could arguably live in the same namespace file in the future. The mechanism proposed is easily extensible by adding new elements.
[ISO] ISO, ISO/IEC 19757-8:2008 Information technology -- Document Schema Definition Languages (DSDL) -- Part 8: Document Semantics Renaming Language (DSRL), online at www.iso.org; €98 fee applies for downloading the PDF.