XML in the Browser: the Next Decade
Copyright © 2009 R. Alexander Milowski. Used by permission.
Table of Contents
Copyright © 2009 R. Alexander Milowski. Used by permission.
Table of Contents
It was Thursday, March 11th and the last day of XTech 1999 in San Jose, California, just before lunch. We'd just heard a presentation from Microsoft about their vision for client and server XML and what we should expect in IE (Internet Explorer) 5. I and few of my colleagues were standing in the back, arms crossed, ready for the session to be over. The next presentation [apparao1999-1] was from Netscape about their new Gecko rendering engine and what came next was going to make our day.
The first six slides went through more technical information than most wanted about how it was all going to work together and on the seventh slide was a demo. The demo consisted of a simple XML document listing six books, their titles, authors, and ISBN numbers that had been rendered via CSS natively for the first time in a widely used, open-source, commercial web browser [apparao1999-2]. For some of us, that was delivery on the promise of rendering XML on the web and surprise to many in the room. It deserved and received a standing ovation.
They could have stopped there with some success but there was more to be seen. A few slides later was a final demo that demonstrated client-side harvesting of information [apparao1999-3]. An IRS document in XML was presented that contained a small box with a button labeled "Contents" on the right. When this button was pressed, TOC items were harvested from the document and a collapsible table of contents was displayed on the left side of the document. When a TOC item was clicked, the document navigated to the item's location in the document. Unbeknownst to the users at the conference, this was accomplished via Simple XLinks [xlink] embedded in the TOC.
Elated and hungry we all went to lunch with "success" on our minds. We had just stood witness to the start of an avalanche, or so we thought, of delivery of XML content to users. We were no longer bound to the perceived limitations of HTML.
Given the demos from 1999, the simple question is where are we today after a decade of "progress". Testing with IE 6, IE7, IE 8, Firefox, Safari, and Andriod's WebKit-based mobile browser, we get these results:
|Browser||Books Demo||TOC Demo|
|IE 6||No - Blank Page||No - Errors|
|IE 7||No - Blank Page||No - Errors|
|IE 8||No - Blank Page||No - Errors|
script element. The CSS is provided by three
separate stylesheets. In the case of all the "recent" versions of IE, the
browser fails to render the document and provides no indication of what
failed. All the other browsers give a consistent rendering and user
Based on browser usage statistics [usage] and grouping all WebKit based browser together, we get a penetration of 32.74% of browsers that can render XML (excluding XLink handling) as of July 9th, 2009. Given that IE fails for both demos and consists of around 65.5% on that same date, that leaves roughly 1.76% in an unknown state of whether they can render and manipulate XML documents. That's not a very good result for a decade of browser development--mainly due to IE's dominance and failures.
The question remains as to where the decade has gone. One large factor has been the stagnation of browser development due to the demise of Netscape and the resulting reluctance of Microsoft to really implement the W3C's recommendations. Only recently has the public--either general or developers--understood the need for conformance to these W3C recommendations and how failing to do so affects both the bottom line and the user's experience.
Nevertheless, the open source community has emerged strong with two viable contenders for core browser technology--Firefox [mozilla] and WebKit [webkit]. While readers are probably more familiar with Mozilla Firefox, the WebKit project is the core technology inside Safari, Chrome, the iPhone's web browser, and Andriod's web browser. Also, the WebKit project is both open source and supported by large companies such as Google and Apple.
This success has been driven by the fact that HTML, not XML, in conjunction with CSS and ECMAScript has been spiraling towards a consistent target platform--dragging Microsoft kicking and screaming along the way. The Application Provider is then responsible for bridging the gap between any Content Providers and the target application that will properly render and present their content intertwined with an application. Many creative and resourceful developers have found ways around browser quirks and lack-of-conformance issues to provide consistent toolkits for use by the application provider.
The result is the Web User receives the application and content intertwined as unrecognizable HTML from whatever source received from the Content Provider. The unfortunate consequence is that they cannot necessarily re-purpose the information they receive. For many this is not an issue but, depending on user's needs, such lack of information repurpose means they may not be able to even read or use the application due to accessibility or other human constraints. Further, the user may be unable to use augmentation tools--such as browser extensions--to extract additional information or enhance their user experience from the same lack of the original content.
Even with these restrictions, this model has been wildly successful and has delivered, on both the business and user sides, a web with some aspect of ubiquity. All of this is without much XML involved in the client-side delivery of content to the browser. XML has largely been hidden on the server-side of the application.
Any markup that a web browser can natively process with some well-defined non-trivial semantic without the aid of additional constructs (e.g. stylesheets) we'll call an Intrinsic Vocabulary. By that definition, HTML is an intrinsic vocabulary. Notably, XML is not an intrinsic vocabulary as some semantics--at least via something like CSS--are needed to give the browser some instructions as what to do with a specific XML document.
An application provider can rely upon an intrinsic vocabulary to have some baseline semantic. They can still enhance the semantics by using additional augmentations such as a stylesheet or ECMAScript. In some cases, like SVG or MathML, while a stylesheet may enhance the rendering, the vocabulary itself is self-contained and the mere act of delivering the vocabulary invokes the intended result.
Given a sufficient set of intrinsic vocabularies for linking, diagramming, and specialized communications like Mathematics, an application developer can deliver content to the browser with some expected result and semantics for the user. In the case of domains like Mathematics, by having MathML as an intrinsic vocabulary, augmentation by tools or accessibility can be achieved by the simple fact that the markup is there instead of a representation (e.g like an image of the mathematics).
Unfortunately, the set of currently available intrinsic vocabularies is across the different browsers is limited to a subset of HTML 4. MathML [mathml], SVG [svg], and other possible intrinsic vocabularies are limited to specific browsers and their implementations are incomplete.
There are many choices for core intrinsic vocabularies but it is clear that the likely near-term outcomes are the following:
HTML5 - provides needed enhancements to HTML while providing a standard way of including other vocabularies like MathML or SVG and, at the same time, provides an option for an XML syntax.
SVG - provides interactive diagrams that can be affected by stylesheets and/or ECMAScript much like HTML.
MathML - provides essential content models for mathematical, scientific, or education content.
While HTML5 is currently under development, the promise of the ability to mix MathML and SVG into an HTML document is very powerful. Add to that the ability to deliver an HTML document in XML syntax without it being thought of as a separate vocabulary means we can utilize all the work that has gone into making XML internationalized.
Also, SVG has shown up recently in several browsers. The support for this essential vocabulary will certainly grow over time in the open-source community. Whether commercial browser vendors like Microsoft will support SVG is unknown.
Finally, MathML support is currently only native to Firefox. While MathML was the first XML vocabulary produced by the W3C in April 1998, only the Mozilla developers have chosen to integrate it into their browser--which is, unfortunately, an incomplete implementation. While Mathematics is a universal human language with a long history, intertwined into so many subjects, and involved in so many communications, MathML support has been largely ignored by browser vendors.
Nevertheless, what separates an intrinsic vocabulary from a non-intrinsic vocabulary is the ability to map from one to the other. A non-intrinsic vocabulary can be composed out of intrinsic vocabulary components via some kind of mapping. In contrast, an intrinsic vocabulary is difficult to implement correctly and efficiently. We need our browser vendors to build-in support for intrinsic vocabularies as the average developer cannot do so.
Unlike many other desktop browsers, Firefox provides the ability to write "extensions" [extensions] in addition to "plugins". A plugin typically provides:
the ability to handle a specific media type,
the ability to render that media type via an HTML
In contrast, Firefox has a very successful extensions model that provides augmentations to the browser. Extensions can provide what a plugin provides as well as add UI elements (menus, sidebars, etc.) and other internal components. These augmentations can be used in concert to provide a completely new experience for specific tasks or services.
An extension is installed by the user and always present, unlike plugins which are invoked as necessary by the browser to handle a specific media type. Accordingly, the user can add extensions that they rely upon for their "every day" experience when using the browser.
The user can find new extensions by visiting a registry provided by Mozilla. Within Firefox, a user can search and access an application registry (addons.mozilla.org) where developers have uploaded extensions. These extensions have been put through a basic approval process by which a user has a minimum level of confidence that the extension isn't malicious. Afterwards, the same services are used to allow the developer to upload and distribute updates to their extensions.
Somewhat unique to Firefox is the ability to register new internal components via an extension that can be used by other extensions or web pages. These components become part of the browser's ecosystem. As such, an extension developer can truly "extend" the basic core of the browser and add the ability to process new XML vocabularies.
Firefox's extension architecture enables a new application model for developing and deploying markup semantics. Previously, had we wanted to deliver XML content directly the browser, either it was one of the browser's intrinsic vocabularies or it was delegated to a plugin and accessible only as a standalone or via a HTML 'object' element. Within this new model, we can develop an extension to the browser that understands the XML media type and delegates to our own components using the browser's ecosystem and intrinsic capabilities to render the document.
With this architecture we can extend Firefox such that it can handle any XML vocabulary we choose to send to it as long as it can be uniquely identified either by namespace or media type (preferably by media type). The basic process by which the extension does this is by registering a media type handler component with Firefox's internal registry. This component is responsible for handling, parsing, and otherwise processing the XML data stream coming across any transport Firefox supports (e.g. files, http/https, ftp, etc.).
Since we have a non-intrinsic vocabulary, the extension can provide whatever internal semantics to translate, transform, other otherwise orchestrate the use of intrinsic vocabularies like HTML, MathML, SVG, etc. to render the document and provide user interface components to the browser user. From the perspective of the browser user, ultimately, the XML document received is just another tab in their browser window. From the perspective of the developer, the user interface provided can be much more rich in UI widgets, semantics, and privileges than what a typical HTML document provides. The end result is a merged view of the application and the document's rendering within the Firefox user interface.
Mobile applications as architected by Google for their Andriod OS and Apple for their iPhone OS are both remarkably similar to each other as well as similar, in a limited way, to Firefox extensions. A mobile application is essentially a program that runs on the mobile OS platform with access to certain system services. On both the Andriod and iPhone platforms, one of these system services is the ability to construct a web browser environment based on WebKit.
Much like Firefox's addon registry, the developer uploads the application to the "marketplace" where users can download it and add it to their mobile phone's environment. Unlike a Firefox browser extension, it isn't really merged into the browser and does not augment the general web browser's capability. Instead, it provides a separate launching icon where the user must go to initiate the application.
Even given the limitations in augmenting the general web browser on these platforms, the mobile application can do remarkably similar things. Within the environment a developer can instantiate a browser instance, load content, and manipulate the browser's environment. To some extent, the mobile developer can mimic some of the Firefox browser extension environment by building their own application.
What a developer cannot do is change the browser's handling of media types. If a document is requested that uses some specialized XML vocabulary, it will get rendered using the same rules as if the user were using the platforms browser application. As such, the application developer needs to understand and control what is being given to the browser much more so than within Firefox.
In addition, once the application has rendered an XML document into some kind of HTML/Intrinsic vocabulary application being displayed by the WebKit instance, there are platform-specific limitations as to what kinds of interactions between the application and document can occur. This can be broken down further into these useful application categories:
Affect Global Environment: Can the application provide global objects accessibly by any document loaded by the browser instance?
Execute Inside: Can the application execute ECMAScript within the browser's document?
Execute Outside: Can the document execute scripts or access objects within the application's environment?
|OS/Browser||Affect Global Environment||Execute Inside||Execute Outside|
The result of this analysis is that Andriod applications cannot
affect their documents once loaded but their documents can initiate a
request causing such a change. As such, an Andriod application can work
around this limitation by a few clever bootstrapping tricks where there
is always an internal document which proxies subsequently loaded
documents in an
Conversely, an iPhone OS application can affect their documents by executing scripts within their documents but the document cannot interact with the application and the application cannot affect the global environment in which the document exists. This severely limits a browser based application because the document cannot tell the application about an event unless the application regularly inquires about its status. Similarly, there is no ability to pass continuous data streams (e.g. Accelerometer events) to an application without constant execution of scripts.
Nevertheless, in both these mobile application platforms you can build an application that loads, intercepts, and understands XML vocabularies while utilizing the intrinsic abilities of the mobile browser to handle the rendering and UI semantics. The application has to do a lot more of the "heavy lifting" than in the case of a Firefox extension and it also cannot integrate quite seamlessly into the browser's internals.
Common between Firefox extensions and applications on the iPhone or Andriod platforms is:
an "application registry" or "store" where users can readily get new functionality,
the use of the browser as a core application user interface component,
the reliance on HTML and associated intrinsic capability of the browser for application functionality.
Unfortunately, in the case of both the mobile platforms, the browser's integration into the application is limited. While we can possibly write an application that interacts with our XML content, we can only do so within the confines of our application. The regular browser on the mobile platform remains ignorant of what to do with such XML content.
What we want is for the browser itself to be augmented to handle our media type so that the user experience inside and outside of any mobile or desktop application is the same. We don't want to duplicate the browser's architecture for handling transports, media types, and linking that it already does well. Instead, we want to augment the existing known media type handlers and insert a portion (if not all) of our application.
A simplified scenario for how this works internally can be described as this sequence of events:
A XML media type is recognized at the transport layer.
The media type is associated with our embedded application's media type handler for that XML vocabulary.
The XML data stream and metadata is transferred to our application component registered for that media type.
From the XML content received, our embedded application component constructs user interface elements and/or web content documents in the browser's intrinsic vocabularies.
The unified experience of our application facade and the web content documents are presented to the user.
The end result is the user's experience is much like that of any other HTML application they might use a browser to access. The difference is that over the transport they received the XML content rather than some single-purpose rendition of that content. As such, they can choose the embedded application appropriate to the experience that they desire.
The DAISY/NISO standard, ANSI/NISO Z39.86 [daisy3], commonly known as DAISY 3, is an e-book specification developed with accessibility for the visually disabled in mind. While the specification itself is not limited to only such special purpose software environments, the focus of development has been around the such special needs users. In the end, the e-book specification is a collection of XML vocabularies that work together to form a single e-book.
The anatomy of a DAISY 3 book starts with a manifest document called a "OEB Package File". This XML document type was developed by the Open E-Book Forum/International Digital Publishing Forum [idpf] and provides a manifest of all the parts of the DAISY e-book. From such a manifest you can access:
The DAISY DTBook XML instance which contains the e-book content,
The DAISY NCX XML instance which contains navigation information about the e-book (e.g. table of contents),
SMIL XML documents used to provide playback scripts for the e-book content,
Any ancillary media objects used by the playback or book.
For a browser to open and display such an e-book, assuming we start with the OEB Packaging, the browser must first collect all the related parts and then decide what to render. The starting point of the packaging file gives the typical XML rendering very little to display. As such, just associating a CSS stylesheet or an XSLT transformation for rendering is insufficient.
Solving this requires a browser extension that understands the OEB
Packaging file's media type,
and invokes a DAISY browser extension. This component is the responsible
for locating the different documents linked by the manifest in the OEB
Packaging document. The collection of document located is then used to
assemble an appropriate UI within the browser.
The DAISY NCX document is used to provide a navigation aid, such as a table of contents, to the user. This document has links into the DAISY DTBook instance, which is the e-book content. These documents are used to present the user with a browser tab with e-book content via some XSLT transformation.
The book itself can be "played" to the user via the linked SMIL documents. These XML documents describe how the content from the original DAISY DTBook instance should be sequenced. As such, care must be taken in the transformations to preserve the identity of content elements so the SMIL references will work. In the end, the user is presented with playback options that sequence the book's content.
The end result is the user "opens" a DAISY book just like they do any other web document. They just follow a link or type in a URL to a DAISY book's packaging document and read the content. They don't need to know that there is some more complicated processing going on behind the interface presented to them.
The crucial point here is that for accessibility, since DAISY was started as an e-book format for blind and otherwise visually disabled people and since the DTBook content is translated into an intrinsic vocabulary (HTML) that the browser already understands, the tools used by these people to read web documents still work. The vendors of such tools like screen readers do not need to add specialized support for the DAISY book reader because, to them, the user is just reading a regular HTML web document. The combination of standardized intrinsic vocabularies and widespread software supported by these vendors means that specialized software like the DAISY browser extension can "hide" in the background and allow the user the same experience they are used to when they browse the web.
This DAISY book extension has been implemented as a Firefox extension and is now open-source. It is available for download from launchpad.net [daisyextension].
Making predictions is certainly risky business. Many of us at that 1999 XTech presentation thought we were at the start of the ability to deliver high quality XML content to users over the web and into their browsers. What we didn't understand was the complexity of the interactivity model being developed within HTML, the explosion of sufficiency from "regular HTML" based web applications, and the relative high complexity of delivering a true XML application to a client-side browser.
In 2009, we've found ourselves at another crossroad where high quality browser technology is now simultaneously scalable to the mobile platform and open-source as WebKit or Firefox. The promise of WebKit provides the unique ability to contribute to open-source efforts and bridge the gap between the ultimate flexibility of the Firefox Mozilla platform and the streamlined and compliant nature of WebKit. That is, we can make WebKit what we need simply by actively contributing or otherwise supporting its development.
In the past, we waited for the browser vendors to do "the right thing". Now we can make what we want to happen by embracing our open-source browser technologies and have them do "the right thing" because we implemented the code to do so. That's our choice: to contribute or let our ideas fail.
In the spirit of this, I present these challenges for the reader:
We need intrinsic vocabularies and semantics we can rely upon. We must have HTML5, SVG, and MathML.
We won't wait for "someone else" to develop our browser enhancements.
We will embrace the idea of intrinsic vocabularies, like HTML, because such things take an inordinate amount of time to develop.
We will replicate the browser extension model championed by Firefox because it enables direct delivery of XML vocabularies without obscene acts.
We will support open-source and make it easy to use because it is our "big stick" we use to get what we want.
Commercial vendors of browser technologies need to catch up or perish. The drag that has been created by certain browsers not implementing the most basic of recommendations from the W3C has caused enormous delay as well as economic consequences. While it is the user who suffers, they also often have a choice and can choose one that works.
The ability to deliver XML content paired with applications directly to users has existed for quite awhile--but only in Firefox. That ability has been buried inside Firefox and delegated to the brave souls who want to dig through the source code. We need to bring that ability to the surface and make it easy to use.
Having only one browser that does "cool things" is not enough. We need to propagate the ability to extend a web browser by extending it at its core. We need the ability to do serious work along side other components inside the browser in addition to augmenting the user interface to add in our "gadgets". It is really our choice to propagate a new model based on this knowledge and experience for the next decade.
[apparao1999-1] V. Apparao .XML and Related Standards in Gecko - slides from XTech 1999 - http://www.mozilla.org/newlayout/xml/slides/slide1.xml.
[apparao1999-2] V. Apparao .Book Demo - from XTech 1999 - http://www.mozilla.org/newlayout/xml/books/books.xml.
[apparao1999-3] V. Apparao .TOC Demo - from XTech 1999 - http://www.mozilla.org/newlayout/xml/tocdemo/rights.xml.
[usage] Wikipedia Usage share of web browsers - http://en.wikipedia.org/wiki/Usage_share_of_web_browsers.
[daisy3] DAISY/NISO Standard, officially, the ANSI/NISO Z39.86, Specifications for the Digital Talking Book - http://www.niso.org/standards/resources/Z39-86-2005.html.