Multimedia & Media Overlays
While audio and video had been incorporated into EPUB files previously,
this version of the spec codified their use by incorporating the HTML5 audio
and video markup. This markup provides a standard way in which to include
multimedia within the EPUB files. In addition, a mechanism for adding media
overlays within books enables new functionality, including text highlighting
for read along books. This is done using the SMIL language.
The adoption of the HTML5 audio and video markup is probably the most
familiar addition in this area. While there was little discussion around the
audio formats to be used (MP3 required and MP4 AAC LC suggested), there was
a great deal of discussion surrounding the video formats to be supported.
The main contenders were MPEG4 H.264 and VP8. The main point of dissention
was the level of support for each. Originally all reading systems were going
to be required to support both. However, various reading system developers
expressed concerns about being required to support 2 formats. There were
also concerns about the copyright on the H.264 format and how licensing
might affect reading systems. After a great deal of discussion and threats
of non-support if both formats where mandated, it was decided that there
would not be a requirement that reading systems support either format.
Instead, an informational paragraph was included recommending the use of one
of the formats, but not precluding the use of another format. In essence,
this will require publishers to create videos in both formats in order to
support the widest range of reading systems -- and hopefully no reading
system decides to use an alternative format.
The epub:trigger element was created in order to support markup that
defines user interfaces for controlling multimedia objects when they are
encountered, without requiring scripting. Using this element, it is now
possible to activate media files based on the content.
The ability to hear the content is a vital accessibility concern and is
desirable for many other users as well. Media overlays provide a mechanism
that allows the synchronization of text and audio content within a
publication. There are many possible use cases for this functionality beyond
that of accessibility, including learning to read. Within the spec there are
options for computer-based text-to-speech reading as well as synchronized
audio files to the text.
MathML became a first class citizen within EPUB3. The spec requires
support for embedded Presentation MathML and allows for processing of
Content MathML based on MathML 3.0. In the prior versions of the spec,
mathematical formulas were usually converted to images. The inclusion of
MathML allows formulas to be marked semantically and displayed. This also
allows the formulas to reflow along with the rest of the text rather than
remain as a static image. The markup also can be used as input to
mathematical engines like Wolfram Alpha, allowing formulas to become
While MathML is supported in the spec, it would still be prudent for
publishers to provide the image renditions of symbols and formulas as
fallbacks for older reading systems. Publishers should also provide
alternative text content for accessibility purposes including
Scripting & Interactivity
similar to what is available in modern browsers. There was a great deal of
discussion about the level of support to be included, due mostly to security
concerns. Most reading system that are attached to online vendors have
access to personal financial information which must be protected. The final
decision was that reading systems must provide appropriate levels of
security in order to safeguard sensitive data.
create interactivity within EPUBs. This support is optional and operates
under several restrictions and limitations. Due to this, it is not clear to
what level reading systems will support this new functionality. Reading
systems that are based on browser engines, such as Webkit, will likely
provide support, but other non-browser based systems may not be able to
provide an equal level of support. Also, due to security concerns, many
readings systems as well. Publishers that opt to incorporate scripting
enabled or not.
provides a means for querying a reading system to determine its capabilites.
This will allow scripts to be developed which can provide their own levels
of fallback capabilities.
Speech & Accessibility
EPUB has always been about supporting adaptive layouts and accessibility.
Most of the enhancements made in EPUB3 had to pass an accessibility litmus
test to ensure that new enhancements would still be accessible. Anything
it. The same applies for MathML and SVG content.
With the removal of the DTBook schema, text-to-speech became more
important within EPUB3 to support accessibility. Portions of the W3C PLS and
SSML specifications are included in EPUB3 to allow publishers to provide
information to TTS engines, including pronunciation guides for terms which
might not be in a standard dictionary. In addition, features found within
the CSS3 Speech Module allow publishers to control speech synthesis options,
such as voice pitch and rate.
Metadata & Semantic Inflection
The metadata capabilities have been greatly expanded to include more
information about a publication as well as attaching complete bibliographic
records. Also, a unique identifier attribute was created which allows a way
in which to identify a specific manifestation of a publication. Finally a
mechanism was created allowing for annotating document markup with more
semantically meaningful information.
Within EPUB3, metadata can be expressed using any combination of
EPUB-specific metadata, DCMES, DCTERMS, as well as other profiles including
PRISM and FOAF. Multiple identifiers can be defined, but none are mandated.
This allows publishers to select how they want to manage identifiers.
However, the unique identifier selected by the publisher to represent the
package is expected to be persistent, in order to support linking and other
applications. Packages must also include a decterms:modified property as a
timestamp. The package identifier is then made up of a combination of the
unique identifier and the modification date.
Within each package, a publication can have multiple titles. The
alternate titles may include short titles, subtitles, series information,
display sequences, sortable titles and non-Latin versions of the
Semantic inflection is used to attach additional meaning about the
specific purpose or nature of an element within the content. The spec
defines the epub:type attribute to express domain-specific semantics. This
metadata is not intended for human use, but rather, to assist reading
systems in enhancing the reading experience for users. The spec defines a
Structural Semantics Vocabulary which is the default vocabulary for all EPUB
documents. Within the vocabulary, there are structures including document
partitions (e.g. cover, back matter, etc.), document divisions (e.g. volume,
part, etc.), document sections and components (e.g. epigraph, conclusion,
preamble, etc.), document reference sections (e.g. index, colophon,
glossaries, appendices, bibliographies, etc.), preliminary sections and
components (e.g. errata, copyright page, etc.), complementary content (e.g.
sidebar, marginalia, etc.), notes (e.g. notes, footnotes, etc.), headings
(e.g. bridgehead), titles (e.g. subtitle, covertitle, etc.), document text
(e.g. keyword, topic sentence, etc.), references, pagination, tables, and
lists. In the cases of tables and lists, the semantic inflection is often
used to indicate to media overlays whether that content is escapable or
Support for a standard dictionary markup scheme was discussed and it was
decided that a companion specification would be developed specifically for
this type of content.
EPUB3 also provides a mechanism that can be used to identify and embed
semantic information within the content of a publication. There was a great
deal of discussion about the use of RDFa within EPUB3 and it was decided
that the complexity of implementing a full RDFa engine within a reading
system was too burdensome. However, there is a method through which RDF or
OWL can be inserted into an EPUB files. This can be done using the
epub:switch element as shown below:
</rdfs:subClassOf> <!-- other restrictions could be added here -->
<epub:default> giraffe </epub:default>
The epub:switch element allows XML fragments to be conditionally inserted
into the content of an EPUB document. Reading systems must process each
epub:switch element to determine whether they can render any of the
epub:case elements. The fallback is the epub:default element. In theory,
and RDF/OWL capable reading system could use the information defined above
to build a taxonomy within the publication, that then could be used to aid
in searching for information within the publication based on the taxonomy.
This method can also be used to insert other markup schemes such as ChemML.
At this point in time, it is not known whether any reading system plans to
support the use of RDF in this manner.
Scalable Vector Graphics (SVG) also became a first class citizen with the
adoption of a subset of SVG1.1. While it had previously been listed as a
supported graphic format, it is now recognized as a suitable method for
inclusion of content, fonts and images within the spine and table of
contents in addition to the main content. SVG fonts provide the ability to
create more complex typography that can be scaled when the readers reflow
the content. SVG content can be inserted by reference or by
Publications in EPUB3 can now have orders other than sequential from
beginning to end. In addition there is new functionality to enhance
accessibility and navigation, including allowing i18n and embedded grammars
(MathML, SVG) within the navigation documents. In addition, CSS can be used
to tailor the display of navigation information.
NCX documents are deprecated in favor of the EPUB Navigation Document
which uses the HTML5 nav element to define navigation information. The NCX
document can still be included to allow EPUB2 reading systems to attempt to
process an EPUB3 document.
EPUB3 defines a new EPUB Canonical Fragment Identifier (CFI) specification
that defines a standardized method for linking into a publication. This
specification enables EPUB reading systems to have an interoperable linking
mechanism, which can, for example, facilitate the sharing of bookmarks and
reading locations across devices.
The CFI is a combination of IRI and URI, HTML ids and named anchors, and
shorthand Xpointer. At this time, linking via the CFI is only supported
within an EPUB publication. Another companion specification is planned that
will address linking between EPUB documents.
Styling & Layout
EPUB3 sets CSS 2.1 as its baseline, but incorporates some CSS3 modules
(speech, fonts, text, writing mode, media queries, multi-column, ruby
positioning) to provide advanced layout and styling beyond what was
previously available. The spec also introduces some EPUB-specific CSS
EPUB3 also supports the ability to include multiple style sheets within a
publication. This functionality can be used to change between day and night
reading modes or change the rendering direction. Initially, there were
plans to incorporate page-level layouts (similar to Apple's fixed layout
format) and the ability to target multiple display sizes, all within a
single publication. However, this functionality got pushed out to a
separate accompanying specification to be defined at a later time.
EPUB3 now requires that reading systems support OpenType and WOFF font
formats for embedded fonts. In addition, there are now normative sections
dealing with font obfuscation.
Global Language Support
A specific working group was formed to address the issues surrounding
character sets, writing direction, etc. The work done by the group is viewed
as one of the major improvements to the EPUB specification, allowing it to
be adopted in a much wider range of markets.
EPUB3 uses Pronunciation Lexicon Specification (PLS) documents and Speech
Synthesis Markup Language (SSML) attributes to increase pronunciation
control for rendering natural language in text-to-speech enabled reading
systems. It is also possible to combine CSS Speech and inline SSML phonemes
to provide fine control over ruby.
EPUB3's CSS support enables horizontal and vertical writing as well as
left-to-right and right-to-left writing. In addition, there is better
control over line breaking so that breaks can occur at the character level
for languages that do not use spaces to delimit words. However, reading
systems are not required to support all these capabilities.
Removal of DTBook and XML Islands
While most publishers delivered content in accordance with the EPUB
schema, far fewer used the DTBook syntax, which is targeted to systems
supporting accessibility to print-impaired users. It was decided fairly
early in the process that the 2 schemas would be unified in order to
increase accessibility to digital content. This was enabled by HTML5's
intrinsic semantic markup capabilities that were similar to those found in
DTBook. In essence, all EPUB3 content is accessible by nature. That being
said, there are still steps that should be taken to make content even more
accessible, including alternate text on images and formulas.
XML islands were also a feature that was little used and caused
interchange issues between reading systems. A survey of publishers and
conversion service providers revealed very little use of this