Balisage logo

Proceedings

Vivliostyle - Open source, web browser based CSS typesetting engine

Shinyu Murakami

Vivliostyle Inc.

Johannes Wilm

Vivliostyle Inc.

Balisage: The Markup Conference 2015
August 11 - 14, 2015

This work is licensed under a Creative Commons Attribution 4.0 International License, see http://creativecommons.org/licenses/by/4.0/.

How to cite this paper

Murakami, Shinyu, and Johannes Wilm. “Vivliostyle - Open source, web browser based CSS typesetting engine.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). DOI: 10.4242/BalisageVol15.Wilm01.

Abstract

We are working on a new typesetting engine using CSS for styling implemented in JavaScript. In this article we argue why such a project is needed, and why we think this is the most fitting for the digital publishing era as it can unify web, ebook and print publishing.

Table of Contents

Facilitating multi-media publishing workflows
The need for a web-based content solution
Existing HTML-centric print formatters
Using web browsers with JavaScript to create print output
Things needed: common styling specifications for print
Things needed: JavaScript-based general print layout implementations
Conclusion

Facilitating multi-media publishing workflows

Publishing textual content to several media, such as printed books, ebooks and web faces a number of challenges. Most workflows seem to incorporate parts or all of one of the common publishing solutions:

  • Small-scale, non-academic text publishing generally relies on text production in word processing applications (namely Microsoft Word), which is exported as HTML. For a web version it is then cleaned and converted to use the tags, attributes and classes used within the site that the text is embedded into. If an ebook version is created at all, it is often created as an EPUB by converting the HTML file further, using a tool such as Calibre. To obtain a print version, the text is imported into and set up within a desktop publishing application, such as Adobe InDesign. If resources for this kind of conversion are lacking, a PDF may be created directly from Microsoft Word, which leads to suboptimal output quality.

  • Small-scale, academic text publishing will alternatively at times be done using tools such as LaTeX which convert human-readable source text into good-looking PDFs which are well-suited for print and which are much better at additional features such as consistent bibliography management or mathematical formulas than word processors. Runtimes such as pdfTeX convert the LaTeX source files into printable PDF-files. For ebook and web output, a stage of transformation to HTML has to occur first, and although conversion tools such as HEVEA, latex2html and TeX4ht exist, conversions seldom go smoothly, and cleanup by hand is mostly required. Similarly problematic is the conversion of the input: unless the author directly offers the text in LaTeX format, it needs to be converted from a word processor, which seldom can be done automatically

  • Text publishing by larger companies and organizations is oftentimes done via a step of XML in which the original text is first converted from a word processor to an XML format, it is then cleaned up manually. It is then converted to PDF, HTML and EPUB using one of a number of different chain of conversion tools. For example, PDFs can be obtained by applying an XSLT stylesheet to an XML file using an XSLT processor the output of which is then parsed through an XSL-FO formatter. An HTML file can be obtained from the XML file by applying another XSLT stylesheet using the XSLT processor. An EPUB can be obtained by converting the HTML file. In theory these processes could be entirely automated, but in practice, oftentimes, a lot of manual and by hand editing is required at some stage, because the contents contain elements very specific to the type of publication in question, that had not been anticipated by the creators of the conversion software.

  • A slightly different workflow is also XML-centered, but instead of converting the XML directly, the XML is imported into InDesign where it is then styled and adjusted for print. The problem is that if the XML file has been changed and the output file needs to be updated, changes made in InDesign will have to be reapplied.

All these conversion systems have in common that they are rather labor intensive and that separate and different workflow steps are needed for the different output formats. While the most professional solution, involving XML, at least in theory can work with just one source file which can be updated along the way, XML is not easily editable, and it seems as if XHTML is being replaced by the non-XML-conforming HTML5 in the context of much web publishing.

Additionally, the most common way of styling XML-files, using XSL-FO, is running into trouble: While the number of print products created with XSL-FO is still increasing and it continues to have some features that are more advanced than CSS that are used in print products, further standardization of XSL-FO seems to have halted indefinitely due to lack of interest, with the W3C believing that CSS will replace it Graham2014Kelly2014.

The need for a web-based content solution

Going beyond the currently existing publishing solutions, it was clear to us that none of them function perfectly, nor automatically. We also noted that the central place that XML currently has in many publishing workflows is likely mainly a historical artifact from the period when HTML was to be replaced by XML in the form of XHTML around the turn of the century Simpson2000. Because XML seldom is the final output format, and just about absent from the web Berjon2014, and there are much fewer editors to edit XML in a rich text WYSIWYG fashion than is the case for HTML, it creates largely unneeded conversion steps.

If, on the other hand, one used HTML as the main content file format, some steps of conversion could be made much smaller or eliminated entirely:

  • In the case of EPUB files, the most common ebook file type, the textual content comes in the form of files containing a restricted version of HTML. And the styling of these pages is defined through restricted CSS, the same language used to define the styling of web pages. Conversion from a HTML source file to an EPUB could therefore largely be done automatically. If one is able to restrict the tags, attributes and CSS rules used in the source files, the conversion should in most cases be entirely automatic.

  • When publishing for the web, the source file will in itself already be presentable. If further changes are required, these can be added with simple converters. Such converters can even be written in JavaScript and be executed in the browser of the end user. The contents of an article or chapter can in this way be made to fit the style of the website that it is presented on. Should the same source text be used on different sites or presented on different media with different settings (desktop computer vs. tablet with different, user adjusted zooming), the styling can be adjusted to fit. Because JavaScript is a relatively easily learned language, it makes it possible for a much wider community of developers to convert the source text to the final output format.

While solutions for web- and EPUB-publishing would not have to be changed a lot, the situation in print is quite different. As we have seen above, none of the standard print typesetting workflows are centered around HTML and print does not require for the text to ever be converted into a web-centric format. Source files will be a mix of Microsoft Word, Adobe InDesign and in the case of large publishers, XML files.

We believe that a lot of publishers could have benefits from switching their workflow to HTML, while some publishers will still have benefits from using XML. Independently thereof, they will find benefits from switching from styling defined through XSL-FO to CSS for print, because it allows them to use the same or similar style definitions for all types of outputs.

Existing HTML-centric print formatters

This is not the first project that will provide print processing functionality using HTML/CSS. Two such formatters are the Antenna House Formatter and PrinceXML.

Both of these are stand-alone executables that allow for CSS and HTML input and will output printable PDFs, and at least two major publishers have switched to HTML and CSS for book publishing: O'Reilly Media McKesson2012McKesson2013Kleinfeld2013 and the Hachette Book Group Cramer2012. Even though the formatters accept fairly common HTML elements, the implementation of each HTML formatter differs slightly. Those creating web-based content and editors to create web-based content not only try to comply to existing web specifications, but also to the most common web browsers actual implementations of those standards, which means they test their content's rendering in Google Chrome, Apple Safari, Mozilla Firefox and Internet Explorer, but not on formatters solely meant for print. Web content that renders without problems in all major browsers will need extra attention before it can be converted by the above-mentioned tools, both due to the slight differences in how features are implemented in the formatters in comparison to the browsers, and because the formatters are relatively slow to support new CSS features since they implement the core engines on their own with their much smaller development teams than what the browsers have. This is one of the difficulties in current CSS typesetting that the print-publishing industry is facing. Other difficulties are that standard CSS does not include rules for everything needed for book styling, and that those extensions that are concerned with adding styling features that are important for book printing are at a very early development stage Bos2013.

Using web browsers with JavaScript to create print output

Pagination.js [1] and simplePagination.js, developed 2012–14, were first trials in creating a HTML-based print layout system that runs in standard browsers. They are written in JavaScript and add styling and content features that are specific to printed books, such as table-of-contents, running headers, footnotes, word indexes, margin notes and page numbers, and they permit the user to style the main content using CSS. Pagination.js has been used in the production of books for the past two years, but is limited in that it only works in Apple Safari using the browser's print-to-pdf feature.

The two JavaScript packages were made exclusively for the printing of books that have general layouts that repeat according to specific rules across all pages, and not for magazines or books that need page-specific layouts. They are configured through JavaScript function arguments rather than CSS, which means that the end user will have to configure layout options in two different ways, depending on whether it is part of the layout controlled by the browser or the JavaScript package. Also, both packages have been optimized to work with current browsers, using bugs in the rendering engine to obtain better results.

As proof of concept and for very specific print layout types, the existing JavaScript packages currently work fine. But when browsers change or new browsers come along, or should the user wish to do something slightly different than print a book according to the offered rules, they are rather useless.

Things needed: common styling specifications for print

Because we believe all styles should be configurable through CSS, part of our focus lies in ensuring that the extra elements that are only important for print and other page based media are sufficiently defined in web specifications to ensure interoperability with other projects.

One of the more important specifications is the CSSPagedMedia module. There are already several typesetting engines supporting CSS Paged Media, the Antenna House Formatter and PrinceXML being among them.

Browsers have implemented ways for users to create PDFs of web pages. Unfortunately, support for the CSS Paged Media specification has not been implemented in the main browsers. The same is true for most ebook display solutions.

Additionally, the typesetting engines supporting CSS Paged Media contain proprietary and incompatible vendor extensions which means that source files cannot easily be moved between engines.

With the Vivliostyle project we prioritize advancing the development of web standards so that Vivliostyle.js will be interoperable with other and future web-based print-solutions.

We have started to work with the World Wide Web Consortium (W3C) to enhance and promote specifications such as CSS Paged Media and other related specifications such as the CSSPageFloats or the CSSGeneratedContentForPagedMedia specifications.

Things needed: JavaScript-based general print layout implementations

The two JavaScript book printing solutions have inspired us, yet we believe that something more general that uses CSS rules to define styles is needed. We believe that a JavaScript-based solution is needed for general print layout. Different from the existing solutions, these should work in all major browsers and they should read the associated CSS to define page layout, so that layout can be defined entirely in CSS. Such a solution should be usable both to prepare web content for print, and to use the browser as an ebook-reader.

Vivliostyle.js has been coded for half a year[2] and continues to be developed. It parses page-related CSS properties that are ignored by the regular browser. So far it is able to do basic page styling including footnotes, page numbering, floats and page headers.

Figure 1: Vivliostyle.js

png image ../../../vol15/graphics/Wilm01/Wilm01-001.png

Vivliostyle.js already provides some of the most common page-layout features including footnotes, page numbers, floats and page headers.

Implementing new features in JavaScript that are in the processes of being standardized in the form of a W3C specification fits well with the Extensible Web Manifesto, a document signed by a number of leading web visionaries who are trying to speed up the development of new features on the web. By trying out whether and how things work in JavaScript, we can help the relevant specifications move further and, if successful, will help to result in the finalization of print-related CSS specifications and the implementations of some print-related features in browsers.

Conclusion

The world of text publishing is fairly fragmented. Several approaches have been used in the past to try to unify these, and XML was the most promising for a long time. In a publishing world in which more publishers seek alternatives to labor-intensive DTP-based workflows, print solutions that involve automatic content conversion with as few steps as possible from input to final output formats need to be developed. As we have shown, there is a strong case to be made that a combination of CSS and HTML, despite their inherent shortcomings, represent the most promising path to a unified publishing workflow. However, for HTML and CSS to become a viable alternative for more publishers, time will need to be invested in development of CSS standards and early implementations in JavaScript.

References

[Berjon2014] Berjon, Robin. Mending Fences and Saving Babies. Presented at Symposium on HTML5 and XML, Washington, DC, August 4, 2014. In Proceedings of the Symposium on HTML5 and XML. Balisage Series on Markup Technologies, vol. 14 (2014). doi:10.4242/BalisageVol14.Berjon01.

[Bos2013] Bos, Bert. Can you typeset a book with CSS? Presented at 2nd W3C Workshop on Electronic Books and the Open Web Platform, Tokyo, Japan, June 4, 2013. http://www.w3.org/Talks/2013/0604-CSS-Tokyo/.

[Cramer2012] Cramer, Dave. Production as if Digital Mattered: Making books with HTML and CSS. New York, 2012. http://infogridpacific.typepad.com/files/idpf-2012-cramer-smaller.pdf.

[Graham2014] Graham, Tony. Formatting from XML. XML Prague, Prague, Czech Republic, February 14–16, 2014. In XML Prague 2014 Conference Proceedings. http://archive.xmlprague.cz/2014/files/xmlprague-2014-proceedings.pdf.

[Kelly2014] Kelly, Mike. XSL-FO Is Dead, CSS Paged Media Is Prime Suspect. [online]. Rockweb. June 4, 2014. [cited 13 Apr 2015]. http://www.rockweb.co.uk/blog/2014/06/xsl-fo-is-dead,-css-paged-media-is-prime-suspect/.

[Kleinfeld2013] Kleinfeld, Sanders. The Case for Authoring and Producing Books in (X)HTML5. Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6–9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:10.4242/BalisageVol10.Kleinfeld01.

[McKesson2012] McKesson, Nellie. Building Books with CSS3. [online]. Rockweb. June 12, 2012. [cited 13 Apr 2015]. http://alistapart.com/article/building-books-with-css3.

[McKesson2013] McKesson, Nellie. Publisher Case Study: O'Reilly Media. IDPF Seminar: EPUB and the Open Web Platform for Publishers, Noida, India, November 30, 2013. http://idpf.org/sites/default/files/file_attach/oreilly-case-study.pdf.

[Simpson2000] Simpson, John E. Will XML replace HTML?. [online]. Xml.com. December 13, 2000. [cited 13 Apr 2015]. http://www.xml.com/pub/a/2000/12/13/xmlhtml.html.

[CSSPagedMedia] http://dev.w3.org/csswg/css-page-3/

[CSSPageFloats] http://dev.w3.org/csswg/css-page-floats/

[CSSGeneratedContentForPagedMedia] http://dev.w3.org/csswg/css-gcpm/, http://dev.w3.org/csswg/css-gcpm-4/



[1] Pagination.js was initially known as BookJS.

[2] Vivliostyle.js is being developed by Toru Kawakubo who built it on top of Peter Sorotokin's EPUB Adaptive Layout JavaScript-based implementation.

Author's keywords for this paper: HTML; JavaScript; CSS

Shinyu Murakami

Vivliostyle Inc.

Shinyu Murakami is the founder of Vivliostyle Inc. Previously, he was the lead developer of the Antenna House Formatter. He started the Vivliostyle project with the economic support of Antenna House.

Johannes Wilm

Vivliostyle Inc.

Johannes Wilm has developed Pagination.js and simplePagination.js and is now working with Vivliostyle. He has been working on a range of LaTeX and HTML-based text layout and editing solutions for academic texts in the social sciences and humanities since the early 2000s. Wilm holds a PhD in anthropology from Goldsmiths College, University of London.