Moving sands: Adventures in XML e-book-land

From XML to XML. Is that enough?

Michel Biezunski



Copyright © 2012 Michel Biezunski

expand Abstract

expand Michel Biezunski

Balisage logo


expand How to cite this paper

Moving sands: Adventures in XML e-book-land

From XML to XML. Is that enough?

Balisage: The Markup Conference 2012
August 7 - 10, 2012


Note. In this article, the word “e-reader” is used to designate a reading device, for example a tablet, while the word “reader” (except in the phrase “e-ink reader”) is used to designate a human being performing the action of reading.

Introduction — E-books go mainstream

Contemporary publishers of books, journals, and magazines are moving away from a publishing model based on printed pages. Printed books are industrial products that, once written, edited and formatted, undergo several steps: printing, binding, storing, distributing, displaying, selling. The process of selling e-books is much simpler: once composed, an e-book can be put on a server in an online store, where customers can buy it instantaneously. e-books will never be “out-of-print”, nor will e-books need to be destroyed because they don’t sell enough. The term “e-book” encompasses content beyond text and graphics. Audio and video can be included, as well as software components using scripts.

There are many devices that let us read e-books, including e-ink readers, tablets, smart phones, and computers. E-ink readers are best fit for reading plain books, tablets for magazines and audio/video content. Smart phones, even though they don’t offer the same reading “real estate”, can be used to read the same material. And computers can do all, but they don’t necessarily offer the comfortable experience that users want for leisurely reading. e-books can be downloaded in their entirety and therefore can be read off-line.

The e-book business has become big business. In addition to traditional booksellers such as Barnes & Noble, most book publishers, and, increasingly, news-“papers”, key players now include Apple, Google, Microsoft, Adobe, Samsung, and Amazon. Actors in the sector are both information technology companies and content owners.

E-book formats are basically a set of compressed XML documents, where the main content is stored in XHTML. The EPUB delivery standard has become the prevalent means of document delivery. Amazon is an exception and has adopted its own formats — MOBI/AZW and now KF8. Use of XML in e-books is more tightly controlled than it is in web browsers. In particular, web browsers tolerate XML-like HTML in which validity to a DTD or a schema is almost irrelevant. In contrast, e-books must be entirely conformant to XML schemas, including XHTML for the main content. One would think that this rigor would guarantee that e-books will properly display on most e-readers and eliminate tweaking of content according to the e-reader the way web pages must be tweaked according to the browser. Unfortunately, this is not the case. The primary goal if this article is to show that XML is insufficient to guarantee smooth interoperability for rendering. A secondary goal is to remind us not to underestimate the distinction between a logical structure and its presentation, a distinction that is at the core of structured markup.

XML-based standards


Why is this paragraph a comment?

The EPUB standards are a set of XML vocabularies including the document content (XHTML), the “manifest”, declaring the document entities present in one e-book, the “table of contents” for navigating, the front cover page, and a mimetype declaration. Fine tuning the formatting is achieved through the use of cascading style sheets (CSS). The XML documents are compressed together with the style sheets, the images, any videos, and when applicable, Javascript programs. A compressed package is called an “EPUB” document.

EPUB is created by an industry consortium named IDPF, the International Digital Publishing Forum. The two versions in use of EPUB are 2.0.1 and 3.0. EPUB 2.0.1 was approved in May, 2010, and EPUB 3.0 in October, 2011. In practice, they are augmented with proprietary features that are only available on one particular device; the most famous is the Apple extension to CSS for iPads. The conformance level to a given version of a standard also varies.

EPUB 2.0.1 comprises three specifications: the Open Publication Structure (OPS), the Open Packaging Format (OPF), and the Open Container Format (OCF). The OPS is based on XHTML 1.1 and DTBook (DAISY Digital Talking Book). The style language is a required subset of CSS 2. It allows the inclusion of so-called “XML Islands” which are any complete XML document with a structure different from DTBook or XHTML. This feature was not widely implemented and “out-of-line XML islands” have been dropped in EPUB 3.0.

EPUB 3.0 contains the following specifications: Publications 3.0, Content Documents 3.0, Open Container Format 3.0, Media Overlays 3.0, and EPUB Canonical Fragment Identifiers. The main difference between EPUB 2.0.1 and EPUB 3.0 is that the latter contains more provisions for incorporating diverse media, such as audio and video, and parallels the evolution of HTML towards XHTML 5. EPUB3 was first implemented in Apple’s iBooks, Readium, and Azari. Recent progress has been reported for other devices. EPUB 3 is gaining a lot of momentum. Its adoption is accelerating noticeably in Japan due to its enhanced capabilities for global language support, particularly handling writing directions including vertical writing. Furthermore, EPUB 3 books can be read on an EPUB 2 reader, providing an EPUB 2.0.1 table of contents is provided.[1]

Support for other EPUB 3 features varies. There are only a few signs of implementation of video and audio support and pragmatists in the industry prefer to “wait and see” before guaranteeing that it actually delivers on its promises. Although Javascript is enabled, its support is being largely discarded because of potential security breaches. It may be possible to move forward by restricting Javascript to a set of industry-supported scripts that could be implemented in most e-readers. However, SVG and MathML are now incorporated, and there seems to be more traction on that front. Specific extensions to CSS3 for e-books are also defined in the specification.

In addition to the general framework the EPUB 3 specification provides, satellite specifications are now being defined to offer support for multimedia and indexes. An important one will support fixed layout. Fixed layout contrasts to the flowable content addressed by the base EPUB standard but is needed for documents such as comic books, illustrated books, children books, knitting patterns, textbooks, and magazines. A single publication should include both flowable content and fixed layout, but this requirement seems to be unavailable at this point. Fixed layout must support multiple columns, and a fixed height to width ratio in different page sizes. Rendering fixed layout on a variety of e-readers is a challenging task.


Amazon e-book formats differ from the EPUB standards and are optimized for rendering on Kindle devices. They also contain provisions for support of audio and video directly inspired by HTML 5. The main Amazon formats used today are MOBI/AZW and KF8. MOBI is an open standard that was bought by Amazon. AZW was developed by Amazon specifically for the Kinder e=ink reader and is DRM (Digital Rights Management)- restricted and is locked to a specific device[2] KF8 uses a subset of HTML5 and CSS3, and is designed to be backward-compatible with MOBI.

Rendering remains a challenge

There are two kinds of e-books: flowable content and fixed layout e-books. Flowable content can be reformatted on the fly, depending on the device and the user configuration. Furthermore, some devices automatically redisplay the content depending on the orientation in which they are held. Under these conditions, the traditional notion of a page no longer holds. This obvious difference from the world of published documents consisting of the well-defined pages of paper or PDF, also distinguishes e-books from web pages. Even if browser windows can be reformatted using user-defined local settings, the notion of a web page is pervasive. And fragment identifiers help locate the exact location of any information on a page, regardless of rendering. This location addressing facility is not currently available on an e=book. Technically, flowable content in an e-book is displayed by pages, which depend on the local conditions, and is divided into units which correspond more or less to the chapters of a traditional printed book. Each of these units may correspond to a different physical XHTML file, although the page-break property of a CSS may separate one file into multiple pages. However, an actual page of flowable content on an e-reader is rendered purely dynamically and depends on the conditions set for it. Some e-readers assign page numbers, but these are modified each time there is a change in the display style. Thus, e-books have no notion of an absolute page number nor of a specific URL for a chapter. (While the file structure within the e-book could be used to locate a particular page, the file structure is only visible when the e-book is decompressed, information that e-book developers but not readers are expected to see.)

The second category of e-books are those for which the layout is partially if not strictly fixed. The contents of each page is well defined, and can not be reflowed without constraints. For example, magazines, comic books, children books, art books, and text books often need to link tightly illustrations, audio, or video content to surrounding text. Many pages must be presented in a way that does not dramatically alter the aesthetics of the publication and the meaning of the content. Some e-books also contain software which provide a more interactive experience to the reader, with supplementary constraints about what needs to be displayed under various conditions.

A mechanism called media queries allows an e-book to contain conditional statements, but is insufficient to guarantee that the resulting EPUB will be usable on a variety of e-readers. Thus, the diversity of e-readers, as well as the existence of multiple standards, producing several output versions of many e-books is practically impossible to avoid.

Professionals also advice of not trusting the result displayed when the rendering of one e-reader is emulated on another one. For example, a Kindle application on an iPad, Android tablet, or on a smart phone doesn't reflect in all details the rendering obtained on an actual Kindle device.

Implementation of software for rendering EPUB has specific issues on each e-reader. For example, there is an e-reader-specific limit on the size of acceptable files. As a result, a Kindle 3 may freeze while trying to display a document which an iPad can absorb without difficulty.

The physical characteristics of e-readers vary greatly and have an effect on the way they are handling documents. For example, e-ink Kindles and Nooks truncate tables that don’t fit their screens. The rightmost part of a wide table is simply not displayed, typically after the fourth column. In landscape mode the device may display more than in portrait mode, but there still may not be sufficient space to render the full table even when the font is set to is minimal size. Readers may well be surprised by this loss of information, especially since it contrasts to their experience with the web pages and PDF documents they are accustomed to magnifying or reducing as desired.

Should rumors that Apple is preparing a 7-inch iPad prove true, one would expect that it will render the same information that can be displayed on the current versions. But so far, that remains wishful thinking, because the actual size is often used as a guide for an optimized rendition.

As far as flowable content is concerned, the font size and margins can be customized by every user. If the layout of the page is not fluid enough, some unpredictable things may occur, making the text meaningless. This is for example the case with presentation in which horizontal layout is critical. If a line wraps where it was not designed to do, then the content becomes unreadable. Some e-readers dynamically calculate the page numbers, and those numbers will change dynamically, according to the font size or to whether the device is hold vertically or horizontally. Others only assign “location values”. Some users want to be able to read the documents on their smart phones, which offer a much smaller space than the tablets or e-ink readers.

Another problem is the nature of the documents accepted in an EPUB. Text and images must be treated differently. The rendition of graphics may vary according to the e-reader. For example, some e-readers, such as the iPad, enable full-screen display of graphics, and allow the reader to enlarge even a full-screen image by pinching. Other e-readers, for example the Kindle 3, do not offer this possibility. Some graphics may therefore be unreadable on some e-readers. The new iPad, with the Retina display, has a much higher resolution than the other models, and therefore demands images with greater resolution. These images will not fit on many other e-readers, and may slow them to a halt.

So far, there is no universal way of guaranteeing accessibility and each e-reader handles accessibility requirements (such as Section 508 compliance) differently. Some e-readers include software that enables text to be read aloud. Others do not. EPUB3 provides guidelines for accessible documents, but it has not been implemented yet, and it is too early to say which features are going to be actually used.

By definition, footnotes are displayed in books at the bottom of a page. Since e-books have no notion of a fixed page in flowable content, they have no analogous location. Each e-book or e-book series could specify a location for displaying footnotes (such as the end of a chapter or of the entire book). Such conventions blur any logical distinction between footnotes and endnotes. Another possibility is for footnotes to pop up when the reader moves the cursor over a reference to a footnote.

Bugs in some e-reader software causes improper rendering of some HTML code. For example, a notorious iPad bug, known as the “span bug”, ignores any font change on the HTML <span> element. If several fonts are needed within one iPad document the HTML <span> element should not be used to specify fonts. Apple may well solve this bug in a future release of iBooks software and when that occurs this restriction will not be needed. On other platforms, this bug does not occur, and the <span> element can reliably used for font changes.

Another category of problems results from initial settings that vary among devices. Relying on the default values can therefore produce undesirable differences that would be eliminated if all formatting requirements were explicitly stated. For example, the Kobo e-reader centers some headings that other e-readers left align. Amazon uses different default values in MOBI than in KF8. Using Amazon’s conversion tool to upgrade can therefore break the alignment, indentation, and other formatting features. These issues can be solved if all formatting requirements are explicitly stated.

The tables of contents are encoded according to a given structure, which differs between EPUB2 and EPUB3. The EPUB3 format uses more XHTML tags. Multi-level tables of contents can be created, but it is not always possible to render them as folded. For example, on the iPad, the tables of contents are displayed, with all levels, each sublevel being indented appropriately. On e-readers software such as Sigil, by Google, the tables of contents are “folded”. At first, one sees only the first level elements, and they can be unfolded to reveal the next level headers, etc.

In general, the information on these details is scarce, and much trial and error is required to obtain satisfactory results. Fortunately, the most prominent consultants in the field share information about their successes and failures through web pages and blogs; such resources also provide a forum for exchanging pertinent questions and answers. One such blog, maintained by Liz Castro, is called “Pigs, Gourds and Wikis” ( and includes tweets posted at #eprdctn.


The US Internal Revenue Service publishes documents that help taxpayers compute their taxes. These include Taxpayer Information Publications (TIP) as well as instructions for forms and the associated schedules. These documents are updated each year; they are available in print and on the Web (in PDF, HTML, and also as a topic map called “Tax Map”).

For decades, the IRS has served as a laboratory of innovation for testing new technology in information delivery. It is now launching its first e-books. The IRS is as agnostic as possible in hardware choice and gives preference to formal and de facto standards over proprietary formats. As a result, the IRS plans to make its e-books available on most e-readers currently on the market. However, it has encountered the same issues as those faced by other publishes of e-books for multiple platforms. As a starting point, the work has therefore focused on a single platform, the iPad. Extensions to other platforms are planned to follow quickly.

Seen from the perspective of XML application development, the e-book project seems straightforward. XML source files conforming to a limited number of schemas must be transformed into another XML structure (namely, XHTML and the other EPUB structures). XSLT is a natural method for transforming this material. The XSLT process includes slicing the sources into chapter-like divisions that are small enough to prevent e-readers from crashing, converting static text to cross-references and links to the Web, tagging footnotes and graphics, building the table of contents, and (in a later stage of the project) creating an index.

The road has been bumpier than expected. Limiting initial output to a single e-reader reduced the number of e-reader-dependent quirks that had to be addressed for a proof-of-concept project, but there was no way to plan for undocumented idiosyncrasies. An easy workaround to the <span> bug mentioned above is to use HTML elements such as <cite>, or the rarely-used <samp> element instead of <span>.

The problem with this kind of bug is that it may just be temporary. Perhaps Apple will issue a software upgrade that will solve it. Or not. Or perhaps the upgrade will introduce a new side effect that didn’t exist before. And there is no way to know, the documentation is almost non-existent, and the only source of information is the community of practitioners that exchange this kind of information on specialized forums. And of course, the problem highlighted here is just one of them, and it’s only for the iPad. Other e-readers have their own issues, and these issues are only known at the time of testing.

To provide products that work reliably on a variety of environments, a project must establish the least common denominator among the formatting requirements, and minimize use of platform-specific features. This recommendation is easier to follow for flowable content than for fixed layout, where interoperability between e-readers is even more challenging. Apple, for example, has released software that allows an author to create an e-book that may contain text, images, video, etc., and may be beautifully laid out. However, the result is only usable on Apple platforms. Apple’s commercial motivation for this product is obvious. Until vendors have greater motivation to support platform-independent standards, the technical features that would ensure good quality rendering on various devices are pretty daunting, and we are clearly not there yet.

Moving Forward

Publishing high quality e-books on multiple e-readers requires quality assurance on every platform. The scope of this tedious process is open-ended, since new e-readers appear regularly, although others are discarded. The work area where testing is performed can resemble a toy store with shelves containing numerous devices. Media queries and similar mechanisms for anticipating how e-books will appear on various devices can bring interesting issues to light, but do not substitute for real tests on real devices. Testing is a learning process for accumulating the experience for tracking, categorizing, and fixing in the most possible generic way any detected problems.

Long term solutions require conceptual, abstract thinking about the e-book environment and nature of the issues involve.

There are several kinds of traditional books, not all consisting of flowable content or fixed layout. Flowable content includes novels, essays, academic books, textbooks, reference books, technical manuals, etc. Most books have tables of contents, some have indexes, cross-references and references to parts of other books. Some are illustrated, others are not. In addition to the graphics and photographs, there can be structures with complex layout, such as tables. Tables are sometimes automatically composed from a source in XML (for example, the content may be provided in DocBook). In addition, e-books can be interactive and can contain audio and video clips. Almost every item listed here poses a challenge to e-book publishing.

A novel is typically easy to publish as an e-book. Most e-reader devices support tables of contents. In principle, such mechanisms could work well as well for essays and academic books, but there is a supplementary problem here. How should academic citations to part of an e-book be made? Literate references to books typically identify the cited book and the relevant pages. Since e-books have no fixed page numbers, new conventions for academic citations must be developed. Publishers advise authors of academic material to use numbered sections, and create references by section number. If sections are automatically numbered, however, section numbers can easily become obsolete in a later edition of a work. Perhaps the URI reference system that is pervasive on the Web can be used for citations to content within an e-book. Such conventions will only be useful if widely adopted.

Few current e-books include an index because little e-book publishing software makes it easy to build an index. While the XML community is experienced in building indexes from “invisible tags” stored within an XML source document, knowledge of XSLT in the publishing industry is quite limited. Publishers are much more likely to tweak a CSS stylesheet to create desired formatting than to consider the design of XML source material.

Some presentational effects, such as multi-column layout, are dropped in flowable content within the e-book edition of a book. Such degradation of original formatting may be an aesthetic loss mourned by purists, but is usually not a tremendous loss of information content.

It is a completely different story with tables. As previously noted, tables as represented in printed documents or web pages do not display well in the restricted real estate of an e-reader screen. Even when manufacturers offer the ability to zoom in and out, it is not convenient to view tabular information in limited space. Worse, as previously discussed, e-readers merely crop the tables to fit the maximum width of the device. The amount of information available is proportional to the font size, and depends on whether the device is held in portrait or landscape mode. This problem is much more acute with e-books than traditional publishing because paper-based publishing relied on an implicit notion of using paper of a reasonable width for the content. As more and more material has moved to computer screens, readers change font size or scroll to view tabular material successfully in the vast majority of cases. Now that we are reading on smaller devices, including smart phones, the situation has changed, and the notion of adjustment in the width becomes crucial.

These issues take us back to the early days of logical markup, and the difference between semantic and presentational structures. Practitioners in the XML community, and earlier, in the SGML community, have been aware of these issues from the beginning of generic markup. Tables survive because they are a compact and visually compelling presentation for some kind of information. They will continue to be used as long as there are methods for fitting them on a display of “reasonable width”.

Alternatively, some tables could be presented as lists. For example, a customer list in which the first column contains the first name, the second column the last name, and the third column an email address, could very well be represented as an indented list that can be more easily rendered in an e-book. The same information could still be rendered as a table for print, PDF, or web pages.

We also know that tables—and this is especially visible when managed within spreadsheet software—are views into a database. This model considers each row to be a record and each cell a field. A header row expresses the database schema. A cell therefore is the response to a query about a specific property. In database terminology, the word “table” is actually used interchangeably for “relation”. This database model suggests the possibility of software that offers a form-like user interface for searching for information within a table.

List and database solutions for rendering tabular material require document sources that use semantic markup rather than visual markup to store this type of information.

Other navigational devices need to be adjusted for e-book publishing: footnotes, running headers, indexes, can be rendered a number of ways, and there are provisions in the EPUB 3.0 specification (or in one of its satellites currently under development) to handle these features in interoperable ways. Such features will remain a factor of differentiation among e-reader device manufacturers, because supporting these features are not required.

As we still are in the infancy of the e-book industry, we are in the period of the “Whoa!” effect. It’s great to be able to carry thousands of books on a device for which the battery life can span over a month or to exchange bookmarks with “friends” on a social network. We still accept that e-books do not yet have all the features we have in printed books. As readers become more familiar with e-books, they will demand more of the sophistication and quality they deserve. Serious readers will require the nitty-gritty details that have evolved over centuries of book publishing to become standard in print. How quickly this demand will develop is still difficult to predict. The XML community, especially publishers within the XML community, clearly will have a big part in the future of e-books.


Author's keywords for this paper: e-book