How to cite this paper
Biezunski, Michel. “Moving sands: Adventures in XML e-book-land.” Presented at Balisage: The Markup Conference 2012, Montréal, Canada, August 7 - 10, 2012. In Proceedings of Balisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). https://doi.org/10.4242/BalisageVol8.Biezunski01.
Balisage: The Markup Conference 2012
August 7 - 10, 2012
Balisage Paper: Moving sands: Adventures in XML e-book-land
From XML to XML. Is that enough?
Consultant in Information Technology, Specialized in Publishing Applications,
Creator of the Topic Maps standard.
Copyright © 2012 Michel Biezunski
Converting XML to e-books supported by multiple devices is a significant task. Deficiencies
in the EPUB standard, undocumented bugs in some devices, and issues such as fixed
layout vs. flowable content add complexity. Fortunately, the first experts to
experience weird effects are sharing their findings. This paper continues this
of information by describing experience in a US IRS project. The first step is
scheduled for completion in July 2012. When fully deployed, this project will
multiple devices, some announced but not yet available, which differ in size,
features, the formats they can read, and how much of the EPUB standard they
implement, if any. This paper analyzes issues that have been encountered while
working on automatic production of e-books using XML technologies and also suggests
a new vision to ensure a sustainable, minimally intrusive, long term solution
producing e-books on various devices. Relying on standards is necessary but not
sufficient and some issues that are at the core of structured markup are coming
back.: the difference between structure and presentation has never been so acute.
This project reminds us that there is no short circuit to a sound approach of
analysis not only of how information looks, but also of what it means.
Table of Contents
- Introduction — E-books go mainstream
- XML-based standards
- Rendering remains a challenge
- Moving Forward
Note. In this article, the word “e-reader” is used to
designate a reading device, for example a tablet, while the word “reader”
(except in the phrase “e-ink reader”) is used to designate a human being
performing the action of reading.
Introduction — E-books go mainstream
Contemporary publishers of books, journals, and magazines are moving away from a publishing
model based on printed pages. Printed books are industrial products that, once
edited and formatted, undergo several steps: printing, binding, storing, distributing,
displaying, selling. The process of selling e-books is much simpler: once composed,
e-book can be put on a server in an online store, where customers can buy it
instantaneously. e-books will never be “out-of-print”, nor will e-books need to
destroyed because they don’t sell enough. The term “e-book” encompasses content
text and graphics. Audio and video can be included, as well as software components
There are many devices that let us read e-books, including e-ink readers, tablets,
phones, and computers. E-ink readers are best fit for reading plain books, tablets
magazines and audio/video content. Smart phones, even though they don’t offer the
reading “real estate”, can be used to read the same material. And computers can
but they don’t necessarily offer the comfortable experience that users want for
leisurely reading. e-books can be downloaded in their entirety and therefore can
The e-book business has become big business. In addition to traditional booksellers
Barnes & Noble, most book publishers, and, increasingly, news-“papers”, key players
now include Apple, Google, Microsoft, Adobe, Samsung, and Amazon. Actors in the
are both information technology companies and content owners.
E-book formats are basically a set of compressed XML documents, where the main content
is stored in XHTML. The EPUB delivery standard has become the prevalent means of
document delivery. Amazon is an exception and has adopted its own formats —
MOBI/AZW and now KF8. Use of XML in e-books is more tightly controlled than it
is in web
browsers. In particular, web browsers tolerate XML-like HTML in which validity
to a DTD
or a schema is almost irrelevant. In contrast, e-books must be entirely conformant
XML schemas, including XHTML for the main content. One would think that this rigor
guarantee that e-books will properly display on most e-readers and eliminate tweaking
content according to the e-reader the way web pages must be tweaked according to
browser. Unfortunately, this is not the case. The primary goal if this article
show that XML is insufficient to guarantee smooth interoperability for rendering.
secondary goal is to remind us not to underestimate the distinction between a logical
structure and its presentation, a distinction that is at the core of structured
Why is this paragraph a comment?
The EPUB standards are a set of XML vocabularies including the document content (XHTML),
the “manifest”, declaring the document entities present in one e-book, the “table
contents” for navigating, the front cover page, and a mimetype declaration. Fine
tuning the formatting is achieved through the use of cascading style sheets (CSS).
The XML documents are compressed together with the style sheets, the images, any
EPUB is created by an industry consortium named IDPF, the International Digital Publishing
Forum. The two versions in use of EPUB are 2.0.1 and 3.0. EPUB 2.0.1 was approved
in May, 2010, and EPUB 3.0 in October, 2011. In practice, they are augmented with
proprietary features that are only available on one particular device; the most famous
is the Apple extension to CSS for iPads. The conformance level to a given version
of a standard also varies.
EPUB 2.0.1 comprises three specifications: the Open Publication Structure (OPS), the
Open Packaging Format (OPF), and the Open Container Format (OCF). The OPS is based
on XHTML 1.1 and DTBook (DAISY Digital Talking Book). The style language is a required
subset of CSS 2. It allows the inclusion of so-called “XML Islands” which are any
complete XML document with a structure different from DTBook or XHTML. This feature
was not widely implemented and “out-of-line XML islands” have been dropped in EPUB
EPUB 3.0 contains the following specifications: Publications 3.0, Content Documents
Open Container Format 3.0, Media Overlays 3.0, and EPUB Canonical Fragment
Identifiers. The main difference between EPUB 2.0.1 and EPUB 3.0 is that the latter
contains more provisions for incorporating diverse media, such as audio and video,
and parallels the evolution of HTML towards XHTML 5. EPUB3 was first implemented
Apple’s iBooks, Readium, and Azari. Recent progress has been reported for other
devices. EPUB 3 is gaining a lot of momentum. Its adoption is accelerating
noticeably in Japan due to its enhanced capabilities for global language support,
particularly handling writing directions including vertical writing. Furthermore,
EPUB 3 books can be read on an EPUB 2 reader, providing an EPUB 2.0.1 table of
contents is provided.
Support for other EPUB 3 features varies. There are only a few signs of implementation
video and audio support and pragmatists in the industry prefer to “wait and see”
is enabled, its support is being largely discarded because of potential security
industry-supported scripts that could be implemented in most e-readers. However,
and MathML are now incorporated, and there seems to be more traction on that front.
Specific extensions to CSS3 for e-books are also defined in the
In addition to the general framework the EPUB 3 specification provides, satellite
specifications are now being defined to offer support for multimedia and indexes.
important one will support fixed layout. Fixed layout contrasts to the flowable
content addressed by the base EPUB standard but is needed for documents such as
comic books, illustrated books, children books, knitting patterns, textbooks,
magazines. A single publication should include both flowable content and fixed
layout, but this requirement seems to be unavailable at this point. Fixed layout
must support multiple columns, and a fixed height to width ratio in different
sizes. Rendering fixed layout on a variety of e-readers is a challenging
Amazon e-book formats differ from the EPUB standards and are optimized for rendering
Kindle devices. They also contain provisions for support of audio and video directly
inspired by HTML 5. The main Amazon formats used today are MOBI/AZW and KF8. MOBI
an open standard that was bought by Amazon. AZW was developed by Amazon specifically
for the Kinder e=ink reader and is DRM (Digital Rights Management)- restricted
is locked to a specific device
KF8 uses a subset of HTML5 and CSS3, and is designed to be backward-compatible
Rendering remains a challenge
There are two kinds of e-books: flowable content and fixed layout e-books. Flowable
can be reformatted on the fly, depending on the device and the user configuration.
Furthermore, some devices automatically redisplay the content depending on the
orientation in which they are held. Under these conditions, the traditional notion
page no longer holds. This obvious difference from the world of published documents
consisting of the well-defined pages of paper or PDF, also distinguishes e-books
web pages. Even if browser windows can be reformatted using user-defined local
the notion of a web page is pervasive. And fragment identifiers help locate the
location of any information on a page, regardless of rendering. This location addressing
facility is not currently available on an e=book. Technically, flowable content
e-book is displayed by pages, which depend on the local conditions, and is divided
units which correspond more or less to the chapters of a traditional printed book.
of these units may correspond to a different physical XHTML file, although the
page-break property of a CSS may separate one file into multiple pages. However,
actual page of flowable content on an e-reader is rendered purely dynamically and
depends on the conditions set for it. Some e-readers assign page numbers, but these
modified each time there is a change in the display style. Thus, e-books have no
of an absolute page number nor of a specific URL for a chapter. (While the file
structure within the e-book could be used to locate a particular page, the file
structure is only visible when the e-book is decompressed, information that e-book
developers but not readers are expected to see.)
The second category of e-books are those for which the layout is partially if not
fixed. The contents of each page is well defined, and can not be reflowed without
constraints. For example, magazines, comic books, children books, art books, and
books often need to link tightly illustrations, audio, or video content to surrounding
text. Many pages must be presented in a way that does not dramatically alter the
aesthetics of the publication and the meaning of the content. Some e-books also
software which provide a more interactive experience to the reader, with supplementary
constraints about what needs to be displayed under various conditions.
A mechanism called media queries allows an e-book to contain conditional statements,
insufficient to guarantee that the resulting EPUB will be usable on a variety of
e-readers. Thus, the diversity of e-readers, as well as the existence of multiple
standards, producing several output versions of many e-books is practically impossible
Professionals also advice of not trusting the result displayed when the rendering
one e-reader is emulated on another one. For example, a Kindle application on an
Android tablet, or on a smart phone doesn't reflect in all details the rendering
obtained on an actual Kindle device.
Implementation of software for rendering EPUB has specific issues on each e-reader.
For example, there is an e-reader-specific limit on the size of acceptable files.
As a result, a Kindle 3 may freeze while trying to display a document which an iPad
can absorb without difficulty.
The physical characteristics of e-readers vary greatly and have an effect on the way
are handling documents. For example, e-ink Kindles and Nooks truncate tables that
fit their screens. The rightmost part of a wide table is simply not displayed,
after the fourth column. In landscape mode the device may display more than in
mode, but there still may not be sufficient space to render the full table even
font is set to is minimal size. Readers may well be surprised by this loss of
information, especially since it contrasts to their experience with the web pages
PDF documents they are accustomed to magnifying or reducing as desired.
Should rumors that Apple is preparing a 7-inch iPad prove true, one would expect that
render the same information that can be displayed on the current versions. But
that remains wishful thinking, because the actual size is often used as a guide
As far as flowable content is concerned, the font size and margins can be customized
by every user. If the layout of the page is not fluid enough, some unpredictable
may occur, making the text meaningless. This is for example the case with presentation
in which horizontal layout is critical. If a line wraps where it was not designed
then the content becomes unreadable. Some e-readers dynamically calculate the page
numbers, and those numbers will change dynamically, according to the font size
whether the device is hold vertically or horizontally. Others only assign “location
values”. Some users want to be able to read the documents on their smart phones,
offer a much smaller space than the tablets or e-ink readers.
Another problem is the nature of the documents accepted in an EPUB. Text and images
must be treated differently. The rendition of graphics may vary according to the e-reader.
For example, some e-readers, such as the iPad, enable full-screen display of graphics,
and allow the reader to enlarge even a full-screen image by pinching. Other e-readers,
for example the Kindle 3, do not offer this possibility. Some graphics may therefore
be unreadable on some e-readers. The new iPad, with the Retina display, has a much
higher resolution than the other models, and therefore demands images with greater
resolution. These images will not fit on many other e-readers, and may slow them to
So far, there is no universal way of guaranteeing accessibility and each e-reader
accessibility requirements (such as Section 508 compliance) differently. Some e-readers
include software that enables text to be read aloud. Others do not. EPUB3 provides
guidelines for accessible documents, but it has not been implemented yet, and it
early to say which features are going to be actually used.
By definition, footnotes are displayed in books at the bottom of a page. Since e-books
no notion of a fixed page in flowable content, they have no analogous location.
e-book or e-book series could specify a location for displaying footnotes (such
end of a chapter or of the entire book). Such conventions blur any logical distinction
between footnotes and endnotes. Another possibility is for footnotes to pop up
reader moves the cursor over a reference to a footnote.
Bugs in some e-reader software causes improper rendering of some HTML code. For example,
a notorious iPad bug, known as the “span bug”, ignores any font change on the HTML
<span> element. If several fonts are needed within one iPad document the HTML <span>
element should not be used to specify fonts. Apple may well solve this bug in a future
release of iBooks software and when that occurs this restriction will not be needed.
On other platforms, this bug does not occur, and the <span> element can reliably used
for font changes.
Another category of problems results from initial settings that vary among devices.
on the default values can therefore produce undesirable differences that would
eliminated if all formatting requirements were explicitly stated. For example,
e-reader centers some headings that other e-readers left align. Amazon uses different
default values in MOBI than in KF8. Using Amazon’s conversion tool to upgrade can
therefore break the alignment, indentation, and other formatting features. These
can be solved if all formatting requirements are explicitly stated.
The tables of contents are encoded according to a given structure, which differs between
EPUB2 and EPUB3. The EPUB3 format uses more XHTML tags. Multi-level tables of contents
can be created, but it is not always possible to render them as folded. For example,
on the iPad, the tables of contents are displayed, with all levels, each sublevel
being indented appropriately. On e-readers software such as Sigil, by Google, the
tables of contents are “folded”. At first, one sees only the first level elements,
and they can be unfolded to reveal the next level headers, etc.
In general, the information on these details is scarce, and much trial and error is
required to obtain satisfactory results. Fortunately, the most prominent consultants
in the field share information about their successes and failures through web pages
and blogs; such resources also provide a forum for exchanging pertinent questions
and answers. One such blog, maintained by Liz Castro, is called “Pigs, Gourds and
Wikis” (http://www.pigsgourdsandwikis.com/) and includes tweets posted at #eprdctn.
The US Internal Revenue Service publishes documents that help taxpayers compute their
These include Taxpayer Information Publications (TIP) as well as instructions for
and the associated schedules. These documents are updated each year; they are available
in print and on the Web (in PDF, HTML, and also as a topic map called “Tax Map”).
For decades, the IRS has served as a laboratory of innovation for testing new
technology in information delivery. It is now launching its first e-books. The
IRS is as
agnostic as possible in hardware choice and gives preference to formal and de facto
standards over proprietary formats. As a result, the IRS plans to make its e-books
available on most e-readers currently on the market. However, it has encountered
same issues as those faced by other publishes of e-books for multiple platforms.
starting point, the work has therefore focused on a single platform, the iPad.
Extensions to other platforms are planned to follow quickly.
Seen from the perspective of XML application development, the e-book project seems
straightforward. XML source files conforming to a limited number of schemas must
transformed into another XML structure (namely, XHTML and the other EPUB structures).
XSLT is a natural method for transforming this material. The XSLT process includes
slicing the sources into chapter-like divisions that are small enough to prevent
e-readers from crashing, converting static text to cross-references and links to
Web, tagging footnotes and graphics, building the table of contents, and (in a
stage of the project) creating an index.
The road has been bumpier than expected. Limiting initial output to a single e-reader
reduced the number of e-reader-dependent quirks that had to be addressed for a proof-of-concept
project, but there was no way to plan for undocumented idiosyncrasies. An easy workaround
to the <span> bug mentioned above is to use HTML elements such as <cite>, or the rarely-used
<samp> element instead of <span>.
The problem with this kind of bug is that it may just be temporary. Perhaps Apple
will issue a software upgrade that will solve it. Or not. Or perhaps the upgrade will
introduce a new side effect that didn’t exist before. And there is no way to know,
the documentation is almost non-existent, and the only source of information is the
community of practitioners that exchange this kind of information on specialized forums.
And of course, the problem highlighted here is just one of them, and it’s only for
the iPad. Other e-readers have their own issues, and these issues are only known at
the time of testing.
To provide products that work reliably on a variety of environments, a project must
establish the least common denominator among the formatting requirements, and minimize
use of platform-specific features. This recommendation is easier to follow for
content than for fixed layout, where interoperability between e-readers is even
challenging. Apple, for example, has released software that allows an author to
an e-book that may contain text, images, video, etc., and may be beautifully laid
However, the result is only usable on Apple platforms. Apple’s commercial motivation
this product is obvious. Until vendors have greater motivation to support
platform-independent standards, the technical features that would ensure good quality
rendering on various devices are pretty daunting, and we are clearly not there
Publishing high quality e-books on multiple e-readers requires quality assurance on
platform. The scope of this tedious process is open-ended, since new e-readers
regularly, although others are discarded. The work area where testing is performed
resemble a toy store with shelves containing numerous devices. Media queries and
mechanisms for anticipating how e-books will appear on various devices can bring
interesting issues to light, but do not substitute for real tests on real devices.
Testing is a learning process for accumulating the experience for tracking,
categorizing, and fixing in the most possible generic way any detected problems.
Long term solutions require conceptual, abstract thinking about the e-book environment
nature of the issues involve.
There are several kinds of traditional books, not all consisting of flowable content
fixed layout. Flowable content includes novels, essays, academic books, textbooks,
reference books, technical manuals, etc. Most books have tables of contents, some
indexes, cross-references and references to parts of other books. Some are illustrated,
others are not. In addition to the graphics and photographs, there can be structures
with complex layout, such as tables. Tables are sometimes automatically composed
source in XML (for example, the content may be provided in DocBook). In addition,
e-books can be interactive and can contain audio and video clips. Almost every
listed here poses a challenge to e-book publishing.
A novel is typically easy to publish as an e-book. Most e-reader devices support tables
contents. In principle, such mechanisms could work well as well for essays and
books, but there is a supplementary problem here. How should academic citations
of an e-book be made? Literate references to books typically identify the cited
the relevant pages. Since e-books have no fixed page numbers, new conventions for
academic citations must be developed. Publishers advise authors of academic material
use numbered sections, and create references by section number. If sections are
automatically numbered, however, section numbers can easily become obsolete in
edition of a work. Perhaps the URI reference system that is pervasive on the Web
used for citations to content within an e-book. Such conventions will only be useful
Few current e-books include an index because little e-book publishing software makes
to build an index. While the XML community is experienced in building indexes from
“invisible tags” stored within an XML source document, knowledge of XSLT in the
publishing industry is quite limited. Publishers are much more likely to tweak
stylesheet to create desired formatting than to consider the design of XML source
Some presentational effects, such as multi-column layout, are dropped in flowable
within the e-book edition of a book. Such degradation of original formatting may
aesthetic loss mourned by purists, but is usually not a tremendous loss of information
It is a completely different story with tables. As previously noted, tables as represented
in printed documents or web pages do not display well in the restricted real estate
an e-reader screen. Even when manufacturers offer the ability to zoom in and out,
not convenient to view tabular information in limited space. Worse, as previously
discussed, e-readers merely crop the tables to fit the maximum width of the device.
amount of information available is proportional to the font size, and depends on
the device is held in portrait or landscape mode. This problem is much more acute
e-books than traditional publishing because paper-based publishing relied on an
notion of using paper of a reasonable width for the content. As more and more material
has moved to computer screens, readers change font size or scroll to view tabular
material successfully in the vast majority of cases. Now that we are reading on
devices, including smart phones, the situation has changed, and the notion of adjustment
in the width becomes crucial.
These issues take us back to the early days of logical markup, and the difference
between semantic and presentational structures. Practitioners in the XML community,
and earlier, in the SGML community, have been aware of these issues from the beginning
of generic markup. Tables survive because they are a compact and visually compelling
presentation for some kind of information. They will continue to be used as long as
there are methods for fitting them on a display of “reasonable width”.
Alternatively, some tables could be presented as lists. For example, a customer list
which the first column contains the first name, the second column the last name,
third column an email address, could very well be represented as an indented list
can be more easily rendered in an e-book. The same information could still be rendered
as a table for print, PDF, or web pages.
We also know that tables—and this is especially visible when managed within spreadsheet
software—are views into a database. This model considers each row to be a record and
each cell a field. A header row expresses the database schema. A cell therefore is
the response to a query about a specific property. In database terminology, the word
“table” is actually used interchangeably for “relation”. This database model suggests
the possibility of software that offers a form-like user interface for searching for
information within a table.
List and database solutions for rendering tabular material require document sources
that use semantic markup rather than visual markup to store this type of information.
Other navigational devices need to be adjusted for e-book publishing: footnotes, running
headers, indexes, can be rendered a number of ways, and there are provisions in
3.0 specification (or in one of its satellites currently under development) to
these features in interoperable ways. Such features will remain a factor of
differentiation among e-reader device manufacturers, because supporting these features
are not required.
As we still are in the infancy of the e-book industry, we are in the period of the
effect. It’s great to be able to carry thousands of books on a device for which
battery life can span over a month or to exchange bookmarks with “friends” on a
network. We still accept that e-books do not yet have all the features we have
printed books. As readers become more familiar with e-books, they will demand more
the sophistication and quality they deserve. Serious readers will require the
nitty-gritty details that have evolved over centuries of book publishing to become
standard in print. How quickly this demand will develop is still difficult to predict.
The XML community, especially publishers within the XML community, clearly will
big part in the future of e-books.