EPUB3 Overview

  • Multimedia & Media Overlays

    While audio and video had been incorporated into EPUB files previously, this version of the spec codified their use by incorporating the HTML5 audio and video markup. This markup provides a standard way in which to include multimedia within the EPUB files. In addition, a mechanism for adding media overlays within books enables new functionality, including text highlighting for read along books. This is done using the SMIL language.

    The adoption of the HTML5 audio and video markup is probably the most familiar addition in this area. While there was little discussion around the audio formats to be used (MP3 required and MP4 AAC LC suggested), there was a great deal of discussion surrounding the video formats to be supported. The main contenders were MPEG4 H.264 and VP8. The main point of dissention was the level of support for each. Originally all reading systems were going to be required to support both. However, various reading system developers expressed concerns about being required to support 2 formats. There were also concerns about the copyright on the H.264 format and how licensing might affect reading systems. After a great deal of discussion and threats of non-support if both formats where mandated, it was decided that there would not be a requirement that reading systems support either format. Instead, an informational paragraph was included recommending the use of one of the formats, but not precluding the use of another format. In essence, this will require publishers to create videos in both formats in order to support the widest range of reading systems -- and hopefully no reading system decides to use an alternative format.

    The epub:trigger element was created in order to support markup that defines user interfaces for controlling multimedia objects when they are encountered, without requiring scripting. Using this element, it is now possible to activate media files based on the content.

    The ability to hear the content is a vital accessibility concern and is desirable for many other users as well. Media overlays provide a mechanism that allows the synchronization of text and audio content within a publication. There are many possible use cases for this functionality beyond that of accessibility, including learning to read. Within the spec there are options for computer-based text-to-speech reading as well as synchronized audio files to the text.

  • MathML

    MathML became a first class citizen within EPUB3. The spec requires support for embedded Presentation MathML and allows for processing of Content MathML based on MathML 3.0. In the prior versions of the spec, mathematical formulas were usually converted to images. The inclusion of MathML allows formulas to be marked semantically and displayed. This also allows the formulas to reflow along with the rest of the text rather than remain as a static image. The markup also can be used as input to mathematical engines like Wolfram Alpha, allowing formulas to become interactive.

    While MathML is supported in the spec, it would still be prudent for publishers to provide the image renditions of symbols and formulas as fallbacks for older reading systems. Publishers should also provide alternative text content for accessibility purposes including text-to-speech.

  • Scripting & Interactivity

    Javascript was also added to EPUB3 to allow some levels of interactivity similar to what is available in modern browsers. There was a great deal of discussion about the level of support to be included, due mostly to security concerns. Most reading system that are attached to online vendors have access to personal financial information which must be protected. The final decision was that reading systems must provide appropriate levels of security in order to safeguard sensitive data.

    The addition of JavaScript to the specification provides the ability to create interactivity within EPUBs. This support is optional and operates under several restrictions and limitations. Due to this, it is not clear to what level reading systems will support this new functionality. Reading systems that are based on browser engines, such as Webkit, will likely provide support, but other non-browser based systems may not be able to provide an equal level of support. Also, due to security concerns, many people disable Javascript within their browsers and likely will within readings systems as well. Publishers that opt to incorporate scripting should ensure that the user experience is acceptable whether Javascript is enabled or not.

    Also new to the spec is the epubReadingSystem Javascript object which provides a means for querying a reading system to determine its capabilites. This will allow scripts to be developed which can provide their own levels of fallback capabilities.

  • Speech & Accessibility

    EPUB has always been about supporting adaptive layouts and accessibility. Most of the enhancements made in EPUB3 had to pass an accessibility litmus test to ensure that new enhancements would still be accessible. Anything done through JavaScript must have an accessibility component associated with it. The same applies for MathML and SVG content.

    With the removal of the DTBook schema, text-to-speech became more important within EPUB3 to support accessibility. Portions of the W3C PLS and SSML specifications are included in EPUB3 to allow publishers to provide information to TTS engines, including pronunciation guides for terms which might not be in a standard dictionary. In addition, features found within the CSS3 Speech Module allow publishers to control speech synthesis options, such as voice pitch and rate.

  • Metadata & Semantic Inflection

    The metadata capabilities have been greatly expanded to include more information about a publication as well as attaching complete bibliographic records. Also, a unique identifier attribute was created which allows a way in which to identify a specific manifestation of a publication. Finally a mechanism was created allowing for annotating document markup with more semantically meaningful information.

    Within EPUB3, metadata can be expressed using any combination of EPUB-specific metadata, DCMES, DCTERMS, as well as other profiles including PRISM and FOAF. Multiple identifiers can be defined, but none are mandated. This allows publishers to select how they want to manage identifiers. However, the unique identifier selected by the publisher to represent the package is expected to be persistent, in order to support linking and other applications. Packages must also include a decterms:modified property as a timestamp. The package identifier is then made up of a combination of the unique identifier and the modification date.

    Within each package, a publication can have multiple titles. The alternate titles may include short titles, subtitles, series information, display sequences, sortable titles and non-Latin versions of the title.

    Semantic inflection is used to attach additional meaning about the specific purpose or nature of an element within the content. The spec defines the epub:type attribute to express domain-specific semantics. This metadata is not intended for human use, but rather, to assist reading systems in enhancing the reading experience for users. The spec defines a Structural Semantics Vocabulary which is the default vocabulary for all EPUB documents. Within the vocabulary, there are structures including document partitions (e.g. cover, back matter, etc.), document divisions (e.g. volume, part, etc.), document sections and components (e.g. epigraph, conclusion, preamble, etc.), document reference sections (e.g. index, colophon, glossaries, appendices, bibliographies, etc.), preliminary sections and components (e.g. errata, copyright page, etc.), complementary content (e.g. sidebar, marginalia, etc.), notes (e.g. notes, footnotes, etc.), headings (e.g. bridgehead), titles (e.g. subtitle, covertitle, etc.), document text (e.g. keyword, topic sentence, etc.), references, pagination, tables, and lists. In the cases of tables and lists, the semantic inflection is often used to indicate to media overlays whether that content is escapable or skippable.

    Support for a standard dictionary markup scheme was discussed and it was decided that a companion specification would be developed specifically for this type of content.

    EPUB3 also provides a mechanism that can be used to identify and embed semantic information within the content of a publication. There was a great deal of discussion about the use of RDFa within EPUB3 and it was decided that the complexity of implementing a full RDFa engine within a reading system was too burdensome. However, there is a method through which RDF or OWL can be inserted into an EPUB files. This can be done using the epub:switch element as shown below:

    <epub:switch id=”giraffeOwlSwitch"> 
    <epub:case required-namespace="http://www.w3.org/2002/07/owl#"> 
    <Class rdf:about”#giraffe”> 
    <rdfs:label>giraffe</rdfs:label>
    <rdfs:subClassOf> 
    <Class rdf:about=“#animal”/>
    </rdfs:subClassOf> <!-- other restrictions could be added here -->
    </Class> 
    </epub:case> 
    <epub:default> giraffe </epub:default>
    </epub:switch>

    The epub:switch element allows XML fragments to be conditionally inserted into the content of an EPUB document. Reading systems must process each epub:switch element to determine whether they can render any of the epub:case elements. The fallback is the epub:default element. In theory, and RDF/OWL capable reading system could use the information defined above to build a taxonomy within the publication, that then could be used to aid in searching for information within the publication based on the taxonomy. This method can also be used to insert other markup schemes such as ChemML. At this point in time, it is not known whether any reading system plans to support the use of RDF in this manner.

  • SVG

    Scalable Vector Graphics (SVG) also became a first class citizen with the adoption of a subset of SVG1.1. While it had previously been listed as a supported graphic format, it is now recognized as a suitable method for inclusion of content, fonts and images within the spine and table of contents in addition to the main content. SVG fonts provide the ability to create more complex typography that can be scaled when the readers reflow the content. SVG content can be inserted by reference or by inclusion.

  • Navigation

    Publications in EPUB3 can now have orders other than sequential from beginning to end. In addition there is new functionality to enhance accessibility and navigation, including allowing i18n and embedded grammars (MathML, SVG) within the navigation documents. In addition, CSS can be used to tailor the display of navigation information.

    NCX documents are deprecated in favor of the EPUB Navigation Document which uses the HTML5 nav element to define navigation information. The NCX document can still be included to allow EPUB2 reading systems to attempt to process an EPUB3 document.

  • Linking

    EPUB3 defines a new EPUB Canonical Fragment Identifier (CFI) specification that defines a standardized method for linking into a publication. This specification enables EPUB reading systems to have an interoperable linking mechanism, which can, for example, facilitate the sharing of bookmarks and reading locations across devices.

    The CFI is a combination of IRI and URI, HTML ids and named anchors, and shorthand Xpointer. At this time, linking via the CFI is only supported within an EPUB publication. Another companion specification is planned that will address linking between EPUB documents.

  • Styling & Layout

    EPUB3 sets CSS 2.1 as its baseline, but incorporates some CSS3 modules (speech, fonts, text, writing mode, media queries, multi-column, ruby positioning) to provide advanced layout and styling beyond what was previously available. The spec also introduces some EPUB-specific CSS constructs.

    EPUB3 also supports the ability to include multiple style sheets within a publication. This functionality can be used to change between day and night reading modes or change the rendering direction. Initially, there were plans to incorporate page-level layouts (similar to Apple's fixed layout format) and the ability to target multiple display sizes, all within a single publication. However, this functionality got pushed out to a separate accompanying specification to be defined at a later time.

    EPUB3 now requires that reading systems support OpenType and WOFF font formats for embedded fonts. In addition, there are now normative sections dealing with font obfuscation.

  • Global Language Support

    A specific working group was formed to address the issues surrounding character sets, writing direction, etc. The work done by the group is viewed as one of the major improvements to the EPUB specification, allowing it to be adopted in a much wider range of markets.

    EPUB3 uses Pronunciation Lexicon Specification (PLS) documents and Speech Synthesis Markup Language (SSML) attributes to increase pronunciation control for rendering natural language in text-to-speech enabled reading systems. It is also possible to combine CSS Speech and inline SSML phonemes to provide fine control over ruby.

    EPUB3's CSS support enables horizontal and vertical writing as well as left-to-right and right-to-left writing. In addition, there is better control over line breaking so that breaks can occur at the character level for languages that do not use spaces to delimit words. However, reading systems are not required to support all these capabilities.

  • Removal of DTBook and XML Islands

    While most publishers delivered content in accordance with the EPUB schema, far fewer used the DTBook syntax, which is targeted to systems supporting accessibility to print-impaired users. It was decided fairly early in the process that the 2 schemas would be unified in order to increase accessibility to digital content. This was enabled by HTML5's intrinsic semantic markup capabilities that were similar to those found in DTBook. In essence, all EPUB3 content is accessible by nature. That being said, there are still steps that should be taken to make content even more accessible, including alternate text on images and formulas.

    XML islands were also a feature that was little used and caused interchange issues between reading systems. A survey of publishers and conversion service providers revealed very little use of this feature.

What is still missing?

Inter-document linking

Although the CFI spec provides mechanisms for addressing locations within an EPUB file, inter-document linking functionality will not be included at this time. Inter-document linking was proposed to allow EPUB files to reference other EPUB files. The difficulty was in designing a linking mechanism that supported all the following use cases:

  • linking to a specific place within a specific EPUB file (e.g. a reference to a particular passage within a particular version of Tom Sawyer) from another EPUB file

  • linking to a specific place within a non-specific EPUB file (e.g. a reference to a particular passage within whatever version of Tom Sawyer is available to the device) from another EPUB file

  • appropriate actions in cases where the target location is not available on the device (e.g. dead link, access a store, etc.)

The difficulty in developing a complete and thorough specification was deemed too much to accomplish within the timeframe for developing the EPUB3 specs. It is anticipated that work will begin on a new companion spec after the EPUB3 specs have been finally approved.

Annotations

Another class of functionality that is dependent on the CFI is the definition of an interchangeable method for defining annotations on an EPUB. There is a requirement to be able to move annotations between reading devices in much the same manner as the publications themselves can be interchanged. As with the inter-document linking work, this work was postponed until after the final approval of the EPUB3 specs in order to meet deadlines and to allow the CFI to become final before designing something that was dependent on it.

Dictionary Interchange

During the discussions about glossaries and indexes, it was determined that there is a need to include dictionary interchange support. Most reading systems provide dictionary lookup capabilities, but there is not a standardized or extensible method for doing so. The goal of this companion specification will be to define an interchangeable dictionary scheme. In addition, it will provide a method for an EPUB to insert publication specific terms into the dictionary mechanism in order to create a consistent user experience for looking up terms within a publication, no matter what reading system is being used.

Enhanced article support and support for ads

The stated goals of the IDPF board for EPUB3 included the definition of a mechanism for including advertising and enhanced article support. When it was time to create working groups for the different work items, there was lukewarm support for this within the working group. In conjunction with these was a suggestion to define functionality to support hotspots on graphics. Within IDEAlliance, a working group consisting mostly of magazine and periodical publishers, known as nextPUB has begun work on creating a layer based on EPUB3 that will work well within the magazine industry. The aforementioned goals have essentially been passed to the nextPUB working group for resolution.

Adoption and Market Impact

Whenever a new version of a specification is released, there is inevitable upheaval within the marketplace. This specification will be no different. Publishers will need to revise their production process to support the new format. In addition, they will need to determine how best to take advantage of the new functionality provided within the spec. This must be tempered with knowledge of what is supported across reading systems. Many of the new features within the specification are optional, meaning that device capabilities might be even broader than they are now.

One of the things the working group did well was the mandate that all EPUB3 reading systems are required to support pre-EPUB3 files. This guarantees that current customer libraries will be able to be read on devices supporting EPUB3. However, there is no mandate for forward compatibility from current EPUB reading systems, meaning that some dedicated devices may begin a path to obsolescence. While it is possible that many can be updated via software upgrades, there are still device limitations that must be considered. E-Ink based displays cannot refresh quickly enough to support video, meaning that publishers should provide fallbacks any time they incorporate video within an Ebook. The video format compromise will also force publishers to create fallbacks to support different formats on different reading systems.

At this point in time it is difficult to predict the level of support for EPUB3 from reading system developers. Several high profile reading system vendors were part of the working group, but none have made any formal announcement concerning schedules for releasing EPUB3 capable reading systems and apps. It seems logical that non-dedicated reading system vendors (such as the iPad or android tablets) might be able to be more agile in releasing the updates, especially the functionality around embedded audio and video and JavaScript. Dedicated reading systems, such as the Kindle, Nook or Kobo, do not have the hardware capability to support video, meaning new models of the devices will be required. Barnes & Noble has been proactive in releasing the nookColor, which is capable of supporting the new multimedia capabilities.

Publishers, on the other hand, will be caught between a rock and hard place, yet again. Conversion vendors are already receiving requests for EPUB3 support, even before reading systems are available. Validation software is also not available yet, making it very difficult to test EPUB3 files. The timing of the final spec release (July 2011) will also compound problems as publisher begin to gear up for the upcoming holiday season, without a clear understanding of what capabilities will be available in the different reading systems. It would be prudent for publishers to continue to support the current EPUB standard in order to support the current customer base. However, there will also be pressure on them to begin creating EPUB3 files to take advantage of the new functionality and capabilities. This will result in additional cost burdens for publishers.

At this point, it is unknown how the public will react to the release of EPUB3-based books. For some time, there will likely be additional confusion about which formats are supported on which devices. There is also some interest in how the various online sources will market the new files. It is anticipated that multiple choices will be available for the foreseeable future. How long the stores identify the different versions remains to be seen.

References

EPUB Canonical Fragment Identifier (epubcfi) Specification - 23 May 2011 - http://www.idpf.org/epub/linking/cfi/epub-cfi.html

EPUB 3 Changes from EPUB 2.0.1 - 23 May 2011 - http://www.idpf.org/epub/30/spec/epub30-changes.html

EPUB 3 Content Documents 3.0 - 23 May 2011 - http://www.idpf.org/epub/30/spec/epub30-contentdocs.html

EPUB 3 Media Overlays 3.0 - 23 May 2011 - http://www.idpf.org/epub/30/spec/epub30-mediaoverlays.html

EPUB 3 Overview - 23 May 2011 - http://www.idpf.org/epub/30/spec/epub30-overview.html

EPUB 3 Publications 3.0 - 23 May 2011 - http://www.idpf.org/epub/30/spec/epub30-publications.html

EPUB 3 Structural Semantics Vocabulary - http://www.idpf.org/epub/vocab/structure/#

Author's keywords for this paper:
EPUB; EPUB3; ebook; specification

Eric Freese

Director - Solutions Architect

Aptara

Eric Freese is a veteran of publishing production optimization. From the early days of SGML, he has worked in roles as varied as consultant, software developer, content architect and semantic web technologist in industries including defense, technical publishing, commercial software and legal publishing. In his role as Director/Solutions Architect, Eric helps Aptara customers efficiently transition to cost-saving digital publishing models that simultaneously support eBook and other electronic delivery platforms, as well as traditional print production.