How to cite this paper

Graham, Tony. “Call me Pastichemael: Recreating the Moby-Dick first edition.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Graham01.

Balisage: The Markup Conference 2021
August 2 - 6, 2021

Balisage Paper: Call me Pastichemael

Recreating the Moby-Dick first edition

Tony Graham

Antenna House, Inc.

Tony Graham is a Senior Architect with Antenna House, where he works on their XSL-FO and CSS formatter, cloud-based authoring solution, and related products. He also provides XSL-FO and XSLT consulting and training services on behalf of Antenna House.

Tony has been working with markup since 1991, with XML since 1996, and with XSLT/XSL-FO since 1998. He is Chair of the Print and Page Layout Community Group at the W3C and previously an invited expert on the W3C XML Print and Page Layout Working Group (XPPL) defining the XSL-FO specification, as well as an acknowledged expert in XSLT. Tony is the developer of the ‘stf’ Schematron testing framework and also Antenna House’s ‘focheck’ XSL-FO validation tool, a committer to both the XSpec and Juxy XSLT testing frameworks, the author of “Unicode: A Primer”, and a qualified trainer.

Tony’s career in XML and SGML spans Japan, USA, UK, and Ireland. Before joining Antenna House, he had previously been an independent consultant, a Staff Engineer with Sun Microsystems, a Senior Consultant with Mulberry Technologies, and a Document Analyst with Uniscope. He has worked with data in English, Chinese, Japanese, and Korean, and with academic, automotive, publishing, software, and telecommunications applications. He has also spoken about XML, XSLT, XSL-FO, EPUB, and related technologies to clients and conferences in North America, Europe, Japan, and Australia.

©2021 Antenna House, Inc.

Abstract

Moby-Dick by Herman Melville is frequently used as the example document for EPUB and CSS applications. At around 670 pages, it is also a good choice for demonstrating the automated analysis features of AH Formatter. This presentation describes features of working with – and sometimes augmenting, sometimes correcting – the TEI source for the American first edition of Moby Dick to create a PDF version in the style of the 1851 original.

Table of Contents

Introduction
Successive Approximations
Styling from Page Images
Front Matter
Title page
Book title
Contents
‘Etymology’ and ‘Extracts’
Body
Chapter separator
Footnotes
Duplicate footnotes
Footnote size
Block
Widows and orphans
Hyphen at end of page
Text
Italics and small-caps
‘Curly’ quotes
Consecutive em dashes
Baseline grid
Headers and Footers
Conclusion
Acknowledgments

Introduction

This paper describes aspects of the stylesheets that were developed to format the first American edition of Moby-Dick by Herman Melville. The stylesheets illustrate one way to approach developing a stylesheet for XSL-FO, and they also illustrate how to use some AH Formatter extensions. The stylesheets were developed for a project to demonstrate how to use the Automated Analysis feature 6 of AH Formatter V7.1 7.

AH Formatter V7.1 is able to automatically detect a range of typographic problems in a formatted document. Solving these problems usually requires editorial or stylistic changes, and sometimes both. Automated analysis of formatting problems is most useful with longer documents. With shorter documents, the user might decide they can find all of the problems just by looking at the few pages.

The first American edition of Moby-Dick was chosen because:

  • Moby-Dick is frequently used as a sample document for EPUB and CSS examples.

  • At around 670 pages when formatted, it is obvious that automated analysis will be both quicker and more consistent than visually inspecting each page.

  • The book is out of copyright.

  • The text is freely available in XML.

  • Scans of the original pages are available on the web. 1 2 3

The source for Moby-Dick 3 is TEI-encoded XML 4 from the Wright American Fiction project 5. Moby-Dick is also available as a Project Gutenberg eBook 11, but the tagging in that version lacks sufficient detail.

Because this was the testbed for the automated analysis feature, the initial emphasis was on getting the text block of the body pages correct. The styles for everything outside the text block – headers and footers, the front-matter, and the advertisements at the back of the first edition – were initially developed as a rough approximation of the formatting used in the first edition. Over time, the styles have been refined to more accurately mimic the printed first edition.

Successive Approximations

To develop a stylesheet for formatting with either XSL-FO or CSS is usually a process of developing successive approximations of the final result. This is true whether the look of the document is being developed on the fly, developed according to a design brief, or developed to match an existing document, as with Moby-Dick.

The first draft of a stylesheet will likely produce only a rough approximation of the final result. If you are developing on the fly, then you haven’t made up your mind about the final look at that point anyway. If you are developing according to a design brief, then the first version that you format is likely to have the correct page size and the correct fonts and font sizes for major titles and paragraphs but may omit more context-specific styles such as for the table of contents, index, tables, nested lists, and so on. It is similar for developing styles to match an existing document.

That is usually followed by a sequence of making and reviewing changes to bring the styles closer to the final result. This is true, of course, when you are developing on the fly, because the final result isn’t known until you say that you have the result that you want. It is also true for both developing according to a design brief and developing to match an existing document, because there are additional contexts that you know you have not handled yet and, quite likely, more contexts that neither you nor the designer had anticipated. These might include nesting lists of different types or handling figures or table immediately after a title or, for Moby-Dick, handling Queequeg’s mark or stage directions and songs.

Successive changes should, of course, bring you closer to the final result. In reality, some changes will have to be redone, and some changes will throw up new problems, but the overall movement is to close in on the final result.

Styling from Page Images

The initial styles for the pages – particularly for the front-matter – were refined by setting a photograph of a page from the first edition as the background image for the corresponding page and adjusting the XSL-FO to match. The following image shows the formatted title page with the photograph of the first edition’s title page as the page background opened in the AH Formatter GUI.

The sequence of steps to use adjust the styles to match a page scan used as a background image is:

  1. Modify a copy of the XSL-FO to add axf:bleed and axf:crop-offset properties to each fo:simple-page-master that will have a background image. For example:

    <fo:simple-page-master master-name="First-PageMaster"
                           page-height="7.375in"
                           page-width="4.78in"
                           axf:bleed="0.5in"
                           axf:crop-offset="0.5in">
  2. If necessary, rotate the page image so that the text is as horizontal as possible.

    The first edition is now 170 years old, and the available page images are photographs of pages in the bound book, rather than scans of individual pages. The result is that the text in the scans is not always perfectly parallel, either because of the condition of the page or because of the curve of the paper when the page was photographed. The following image shows that variation can happen: the red lines are parallel, the text is not.

  3. Specify the page image, scaled and positioned to match the formatted page, as the background image of either the fo:simple-page-master:

    <fo:simple-page-master
        master-name="First-PageMaster"
        page-height="7.375in"
        page-width="4.875in"
        background-image="page-images/MD_Amer_0038.jpg"
        axf:background-size="5.21in"
        background-position="-0.12in -0.15in"
        axf:bleed="0.5in"
        axf:crop-offset="0.5in">

    or on the fo:page-sequence that generates the page:

    <fo:page-sequence
        master-reference="CoverFrontMaster"
        background-image="page-images/MD_Amer_0019.jpg"
        axf:background-size="5.7in"
        background-position="-0.7in -0.3in">

    Because the page images for the first edition are photographs, there was considerable variation in the size and position of the page within each image. Getting the correct size and position was an iterative process of modifying the XSL-FO and viewing the result in the AH Formatter GUI, then repeating the process until the result is satisfactory. Enabling ‘Show Borders’ in the AH Formatter GUI makes it easier to judge how to adjust the background image.

  4. Iteratively modify the XSL-FO then view it in the AH Formatter GUI until the formatted document satisfactorily matches the page from the first edition.

  5. Modify the stylesheets for generating the XSL-FO to recreate the FOs and properties that were arrived at manually.

The result can be quite a close approximation of the original:

The different parts of the front matter of the first edition show considerable variation in fonts, font sizes, and letter- and word-spacing. That, combined with the necessarily imprecise size and position of the background images, has resulted in a range of values for the same properties applied at different places on different pages. When time permits, it should be possible to rationalize these and use fewer, more consistent values and still reproduce the first edition pages with sufficient accuracy. After all, the first edition was printed with a fixed set of founts and with fixed increments of the space that could be added between letters. Font sizes, etc., were unlikely to have been specified in points in America in 1851, but the sizes would have been internally consistent.

Front Matter

The front matter of Moby-Dick comprises:

  • Title page

  • Copyright page

  • Dedication

  • Contents

  • Fly title

  • Etymology

  • Extracts

Title page

As shown previously, it is possible to reproduce the title page fairly accurately.[1]

Book title

The markup for the book’s title does not include sufficient information to accurately reproduce the formatted title:

<docTitle>
  <titlePart>MOBY-DICK;</titlePart>
  <titlePart type="sub">OR, THE WHALE.</titlePart>
</docTitle>

plus the book’s title is formatted identically on the fly title page, but its markup has even less correspondence to the formatting:

<div type="fly_title">
  <head>MOBY-DICK; OR, THE WHALE.</head>
</div>

Because the stylesheet is specific to Moby-Dick, it was simpler to ignore the markup and to use xsl:analyze-string and generate FOs around parts of the title text:

<xsl:template match="docTitle | div[@type = 'fly_title']/head"
              priority="5">
  <fo:block
      font-size="24pt"
      letter-spacing="0.37em"
      line-height="1"
      text-align="center"
      font-stretch="extra-condensed">
    <xsl:analyze-string
        select="normalize-space(.)"
        regex="OR,">
      <xsl:matching-substring>
        <fo:block
            font-size="8pt" font-variant="all-small-caps"
            font-stretch="normal"
            letter-spacing="0.125em" space-before="30pt">
          <xsl:value-of select="." />
        </fo:block>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <fo:block axf:letter-spacing-side="start">
          <xsl:if test="contains(., 'THE WHALE.')">
            <xsl:attribute name="space-before" select="'30pt'" />
            <xsl:attribute name="letter-spacing" select="'0.9em'" />
          </xsl:if>
          <xsl:analyze-string
              select="."
              regex="\.| ">
            <xsl:matching-substring>
              <fo:inline letter-spacing="0.3em">
                <xsl:value-of select="." />
              </fo:inline>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
              <xsl:value-of select="." />
            </xsl:non-matching-substring>
          </xsl:analyze-string>
        </fo:block>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </fo:block>
</xsl:template>

The document, with a few minor exceptions, is formatted entirely in Source Serif Pro. The font is both open source and a reasonable match for the font that was used for paragraphs in the first edition. However, in the first edition, the title page (and some other titles) uses both narrow and small-capital variants of the text font. Source Serif Pro does not have a narrow version, so the narrow variants are achieved by setting the font-stretch property (for example, font-stretch="extra-condensed") and relying on AH Formatter to adjust each character’s width. Source Serif Pro does have true small caps 8, but font-variant="small-caps" as defined in XSL 1.1 uses small caps only for lower-case letters. Because the small caps in the title are represented in the XML as capital letters, it is necessary to use the font-variant="all-small-caps" AH Formatter extension to format the capital letters in the source as small caps.

Many of the titles in the first edition use letter-spaced characters. Letter spacing is specified with the letter-spacing property, and values such as letter-spacing="0.37em" were arrived at through trial and error to match the appearance of the page image when it was used as the background. However, in the first edition, the letter spacing between an alphabetic character and a following punctuation character is sometimes less than the letter spacing between two alphabetic characters. The solution is to use xsl:analyze-string to generate an fo:inline with a different letter-spacing value around just those characters. A refinement, used elsewhere in the stylesheet, is to also use the axf:letter-spacing-side AH Formatter extension so that all of the space that is added is between the alphabetic characters added at their start side and so does not contribute to the space between an alphabetic character and a following punctuation character.

Contents

The Table of Contents is formatted in two columns. It is marked up as a list but is rendered as a four-column table to be able to recreate the formatting of the first edition.

The TEI for the Table of Contents begins:

<div type="contents">
  <pb n="v (Table of Contents) " xml:id="VAC7237-00000003"/>
  <head>CONTENTS.</head>
  <list>
    <item>I.—Loomings. <ref target="VAC7237-00000013" rend="right">1</ref>
    </item>
    <item>II.—The Carpet Bag. <ref target="VAC7237-00000017" rend="right">7</ref>
    </item>

The content of the ref elements is each chapter’s page number in the first edition. The target attribute, however, refers to pb milestone elements that mark the start of each two-page spread. At least one of the cross-references was found to point to the spread after the first page of its chapter and had to be corrected. More may yet be found.

The cross-references could not be used anyway because the XSL-FO version does not attempt to recreate the page breaks of the first edition. The cross-references from Table of Contents entries to chapters in the generated PDF are determined from the position of the list item for each chapter in the Table of Contents list:

<!-- Every chapter has a generated ID, and 'EPILOGUE.' is the only
     ToC entry without a page number. -->
<xsl:variable
    name="target"
    select="if (exists(ref))
              then concat('chapter-', position())
            else 'epilogue'"
    as="xs:string" />

The markup for each chapter begins <div type="chapter">, but generating the ID for each chapter could not use position() because some of the pb milestones appear between chapters:

<fo:block
    id="{@type}-{count(preceding::div[@type = current()/@type]) + 1}">

The Table of Contents is formatted as a four-column table to keep the different parts of the Table of Contents entries aligned:

  • Chapter number (in small-caps roman numerals)

  • Em-dash

  • Chapter title and leader dots

The alignment and spacing of the leader dots is simple with XSL-FO:

<fo:leader leader-pattern="dots"
           leader-pattern-width="1em"
           leader-alignment="end" />
  • Page number

‘Etymology’ and ‘Extracts’

The ‘Etymology’ and ‘Extracts’ segments each consist of an introductory narrative page followed by quotes and, in ‘Etymology’, a table. The fonts used for the titles in the first edition are not consistent, so they each needed a separate template.

In both ‘Etymology’ and ‘Extracts’, each quote has an attribution, and each attribution is marked up as following the quoted material:

<cit>
  <q>
    <p>"Very like a whale."</p>
  </q>
  <bibl>
    <title>Hamlet</title>.</bibl>
</cit>

However, if there is enough space on the last line, the attribution is formatted in the same fo:block as the quotation:

If there is not enough space, because either the last line or the attribution is too long, then the attribution is formatted on the next line.

Placing the attribution in the same block is handled by the common XSLT pattern of not formatting the bibl as part of the default processing and instead explicitly selecting its content when processing the q:

<xsl:template match="q/p">
  <fo:block>
    <xsl:apply-templates />
    <xsl:if test="position() = last() and
                  exists(../following-sibling::*[1][self::bibl])">
      <fo:leader leader-pattern="space"/>
      <fo:leader leader-pattern="space" leader-length.optimum="100%"/>
      <fo:inline-container padding-left="2em" padding-right="0.125in"
                           max-width="80%" text-indent="0">
        <fo:block text-align="right">
          <xsl:apply-templates
              select="../following-sibling::*[1]/node()" />
        </fo:block>
      </fo:inline-container>
    </xsl:if>
  </fo:block>
</xsl:template>

<xsl:template
    match="bibl[exists(preceding-sibling::*[1][self::q[p]])]"
    priority="5" />

Placing the attribution either on the last line of the quotation or on the next line is handled by the common XSL-FO pattern of using two fo:leader.

Body

The majority of Moby-Dick is 135 chapters of largely text. Melville scholars like to find patterns in the structure of the chapters 9, but when formatting Moby-Dick, the most useful distinctions are between paragraph-like blocks of text and other content.

The non-paragraph content includes:

  • A single graphic (for Queequeg’s mark, )

  • Inscriptions from tombstones

  • Songs and poems

  • Speeches and stage directions as if for a play

Chapter separator

When a chapter in the first edition ends near the bottom of a page, the next chapter begins on the following page with space before the chapter title:

When a chapter does not end near the bottom of a page, there is an additional separator printed before the chapter title. To complicate matters, the space between the separator and the chapter title is less than the space before a chapter that starts on a new page:

When a chapter ends with some space left at the bottom of a page but not enough space for the separator and the chapter title, the separator is printed at the end of the chapter and the next chapter starts on the next page:

When the first edition was composed manually, it would have been straightforward to add the separator when and where it was needed. It is not quite as straightforward with automated, ‘lights-out’ formatting using XSL-FO. Because the page breaks are not known before the document is formatted, it is not possible to just insert as many separators as needed, and the XSL 1.1 Recommendation does not support conditional processing based on an area’s position on the page.

Two things make this possible with AH Formatter: firstly, the axf:suppress-if-first-on-page extension property makes AH Formatter suppress the separator for a chapter title at the top of a page; and, secondly, the standard space-after.precedence="force" on the fo:block for the separator ensures the correct distance between the separator and the chapter title when the separator is present while allowing the different space-before value on the chapter title to apply when the separator has been suppressed or is on the previous page.

<xsl:template
    match="div[@type = 'chapter'][exists(head[@type = 'sub'] |
           fw[@type = 'head'])]">
  <xsl:if
      test="exists(preceding-sibling::div[@type = current()/@type])">
    <fo:block axf:suppress-if-first-on-page="true" text-align="center"
              padding-top="0.125in"
              space-after="0.2in" space-after.precedence="force"
              axf:baseline-grid="none"
              axf:baseline-block-snap="none">
      <fo:external-graphic src="images/separator.svg" />
    </fo:block>
  </xsl:if>
  <fo:block
      id="{@type}-{count(preceding::div[@type = current()/@type]) + 1}">
    <fo:marker marker-class-name="Chapter-Title">
      <xsl:apply-templates
          select="(fw[@type = 'head'], head[@type = 'sub'])[1]/node()"
          mode="marker" />
    </fo:marker>
    <fo:block-container
        axf:baseline-grid="none"
        axf:baseline-block-snap="none"
        keep-together.within-page="always"
        keep-with-next.within-page="always"
        space-before="{if (exists(preceding::div[1]
                                                [@type = 'chapter']))
                         then '0.5in'
                       else '0.72in'}"
      space-before.conditionality="retain">
      <xsl:apply-templates select="head" />
    </fo:block-container>
    <xsl:apply-templates select="* except head" />
  </fo:block>
</xsl:template>

Footnotes

Footnotes are marked up as a ref containing the footnote marker that refers to the separate note containing the footnote content:

<p>
  <emph>Whaling not respectable?</emph> Whaling is imperial! By old
  English statutory law, the whale is declared "a royal fish."<ref
  rend="super" target="#note_001" xml:id="return_001">*</ref>
  <note place="foot" xml:id="note_001">
    <p><ref target="#return_001">*</ref>See subsequent chapters for
    something more on this head.</p>
  </note>
</p>

The XSL-FO fo:footnote contains both an fo:inline for the footnote marker and an fo:footnote-body for the footnote content, so the XSLT stylesheet does not process the note where it occurs in the document but instead formats the content of the note by using key() to find the note that is referred to by each ref:

<xsl:template match="note[@place = 'foot']" />

<xsl:template match="ref[exists(key('footnote',
                                     substring-after(@target, '#')))]"
              priority="5">
  <fo:footnote
      id="{@xml:id}"
      axf:suppress-duplicate-footnote="true">
    <fo:inline>
      <fo:basic-link
          internal-destination="{substring-after(@target, '#')}">
        <xsl:value-of select="." />
      </fo:basic-link>
    </fo:inline>
    <fo:footnote-body
        id="{substring-after(@target, '#')}"
        font-size="7pt"
        line-height="10pt">
      <xsl:apply-templates
          select="key('footnote',
                      substring-after(@target, '#'))/node()" />
    </fo:footnote-body>
  </fo:footnote>
</xsl:template>

Duplicate footnotes

One page of the first edition has two references to the same footnote:

The TEI XML repeats the footnote text:

<p>
  <emph>Whaling not respectable?</emph> Whaling is imperial! By old
  English statutory law, the whale is declared "a royal fish."<ref
  rend="super" target="#note_001" xml:id="return_001">*</ref>
  <note place="foot" xml:id="note_001">
    <p><ref target="#return_001">*</ref>See subsequent chapters for
    something more on this head.</p>
  </note>
</p>
...
<p>
  <emph>The whale never figured in any grand imposing way?</emph> ...
  cymballed procession. <ref rend="super"
  target="#note_002" xml:id="return_002">*</ref>
</p>
<note place="foot" xml:id="note_002">
  <p><ref target="#return_002">*</ref>See subsequent chapters for
  something more on this head.</p>
</note>

XSL 1.1 would render both footnotes, but the axf:suppress-duplicate-footnote extension property causes AH Formatter to generate only one copy of the footnote when both footnotes occur on the same page.

Footnote size

Moby-Dick also includes some whale-size footnotes:

Some things that could have been done were not needed:

  • In the first edition, these two footnotes start on the same page, and it is the second footnote that continues onto a second page. Even so, both footnotes have the same ‘*’ footnote marker in the first edition.

    Because the markers in the first edition are all the same, it is not necessary to use the axf:footnote-number and axf:footnote-number-citation extension elements to generate and use a sequence of footnote markers.

  • It is possible to limit the height of the footnotes using axf:footnote-max-height, but the height of the formatted footnotes is comparable to the height in the first edition, so this also was not necessary.

Block

Widows and orphans

An orphan is too few lines before a page break, and a widow is too few lines after a page break.

The First Edition has multiple single-line orphans.

However, the only single lines at the top of a page are single-line dialogue. It is impossible to say how many two-line widows were deliberately forced. For example, page 610 ends with widely-spaced text, and page 611 begins with the last two lines of the paragraph:

Some of the wide spacing is due to the white-space before ‘?’ and ‘!’ in the First Edition, but compare the First Edition with the fewer lines when the paragraph is formatted:

Similarly, the paragraph on pages 371–373 in the first edition is 26 lines, but is 25 lines when formatted on one page by AH Formatter. The first four formatted lines are identical to the First Edition, but then they diverge.

The formatted version uses the XSL-FO 1.1 defaults of orphans="2" and widows="2".

Hyphen at end of page

The First Edition has multiple pages that end with a hyphen:

The formatted version specifies hyphenation-keep="page" on fo:root so that words are not hyphenated across a page break. The hyphenation-keep-mode setting in the Option Setting File is not overridden, so AH Formatter pushes only the otherwise-hyphenated word to the next page, not the entire last line.

Text

Italics and small-caps

The markup for text in italics and in small-caps needed to be corrected for proper formatting. In the original TEI XML, italic text was marked by an empty <hi rend="i"/> element at the start of the italic text but there was no indication where the italic text ended. It might be argued that to not enclose the italic text makes textual analysis easier, but foreign words (or words thought to be foreign) were marked up with a start and an end tag, for example: <foreign xml:lang="LAT">Folio</foreign>.

Text in small-caps in the first edition was included in the TEI XML as capital letters without any extra markup. It was necessary to find the text that should be small-caps, add markup, and change the text to mixed-case. For example, ‘THE’ at the start of a chapter becomes <hi rend="small-caps">The</hi>.

‘Curly’ quotes

Moby-Dick makes extensive use of both single- and double-quotes. This includes apostrophes replacing letters in broken English for speech from non-native speakers of English. In the first edition, the left and right quotes are visibly different:

In the TEI source XML, however, all quotes are the same:

<p>"Do you is all sharks, and by natur wery woracious, yet I zay to
you, fellow-critters, dat dat woraciousness—'top dat dam slappin' ob
de tail! How you tink to hear, 'spose you keep up such a dam slappin'
and bitin' dare?"</p>

Converting the straight quotes to ‘curly’ quotes initially seemed straightforward, but it was made complicated by quotes before emphasized text and the difference between left single quotes at the start of quoted text and right single quotes at the start of a word to indicate a dropped letter.

<!-- Convert single and double quotes to 'curly' quotes. -->
<xsl:template match="text()" name="ahf:text">
  <xsl:param name="text" select="." as="text()" />

  <xsl:value-of select="ahf:text($text)" />
</xsl:template>

<xsl:function name="ahf:text" as="xs:string">
  <xsl:param name="text" as="text()" />

  <!-- The replacement that depends on the current node must be
       first. -->
  <xsl:variable
      name="text"
      select="if (matches($text, '&quot;$') and
                  empty($text/following-sibling::node()))
                then replace($text, '&quot;$', '&rdquo;')
              else $text"
      as="xs:string" />
  <!-- Moby-Dick uses broken English for speech from non-native
       speakers of English.  The speech can include words with the
       dropped initial vowel indicated by a right single-quote.
       Handle those before replacing any &apos; with left
       single-quotes. -->
  <xsl:variable
      name="text"
      select="replace($text, '''(s?t?(&quot;|\s|[.,;:]|(balmed|dention|em|gainst|ll|mong|parm|quid|specially|spose|stead|teak|till|[Tt]is|[Tt]was)(,|\.|\s)|$))', '&rsquo;$1')"
      as="xs:string" />
  <xsl:variable
      name="text"
      select="replace($text, '(^|\s|&quot;|—)''([^&quot;]|$)', '$1&lsquo;$2')"
      as="xs:string" />
  <xsl:variable
      name="text"
      select="replace($text, '(^|—|\s)&quot;', '$1&ldquo;')"
      as="xs:string" />
  <xsl:variable
      name="text"
      select="replace($text, '&quot;(\s|[—.,;:]|$)', '&rdquo;$1')"
      as="xs:string" />
  <xsl:variable
      name="text"
      select="replace($text, '([^\s])''([^\s])', '$1&rsquo;$2')"
      as="xs:string" />

  <!-- Variations on '* * *' in 'Extracts'. -->
  <xsl:variable
      name="text"
      select="replace($text, ' \*', '&#xA0;&#xA0;*')"
      as="xs:string" />
  <xsl:variable
      name="text"
      select="replace($text, '\* ', '*&#xA0;&#xA0;')"
      as="xs:string" />

  <xsl:sequence select="$text" />
</xsl:function>

The ahf:text() XSLT function is also used in other contexts; for example:

<xsl:template match="div[@type = 'fly_title']/bibl">
  <fo:block text-align="center" hyphenate="false" font-size="5pt"
            line-height="10.5pt"
            space-before="2.33in" space-before.conditionality="retain">
    <!-- Provide structure that is not in the source XML. -->
    <xsl:analyze-string select="ahf:text(edition/text())"
                        regex="HERMAN MELVILLE,">
...

Consecutive em dashes

The First Edition uses two or three consecutive em dashes as a typographic effect in multiple places, for example:

Most typography books that cover the em dash recommend a thin space before and after the dash. For example, Correct Composition 12 states:

As the dash entirely fills the body sideways, it should have before and after it a thin space to prevent the interference with adjoining characters.

Many digital fonts preserve the letterpress practice that the em dash completely fills its width. However, Source Serif Pro includes built-in white-space before and after the stroke. This is generally useful, but it looks bad when there are consecutive em dashes:

It looked for a time that it would be necessary to wrap consecutive em dashes with <fo:wrapper font-family="serif"> to select a font with em dash that would join up. However, a chance (re)discovery of the Unicode characters for two and three consecutive em dashes provided the way to show the correct dashes without changing fonts. More steps were added to the text handling:

<xsl:variable
    name="text"
    select="replace($text, '&mdash;&mdash;&mdash;', '&#x2E3B;')"
    as="xs:string" />
<xsl:variable
    name="text"
    select="replace($text, '&mdash;&mdash;', '&#x2E3A;')"
    as="xs:string" />

It is now possible to use the correct dashes from the same font:

Baseline grid

‘Show-through’ occurs when text on the back of a page is visible through the paper. The shadow of the text on the back reduces the legibility of the text on the front. One way to reduce show-through (aside from using thicker paper or only reading the document electronically) is to align the lines of text on the front and back of the page.

This image from the first edition shows some show-through, but it also shows both that the lines mostly line up and that lines resume their alignment after the three irregular lines:

Keeping lines aligned front-and-back is straightforward when all of the text is the same font size and has the same line height. It becomes harder when the text includes titles, etc., that have different font sizes, line heights, and space before and after. It is often possible to style a title such that the space before the title, the line height of the title, and the space after the title add up to a multiple of the base line height. However, this will fail if some titles extend over two lines and the line height of the title is not a multiple of the base line height.

The AH Formatter baseline grid extension can both align lines to a common baseline and allow lines in specific blocks to either align to their own grid or align to no grid at all. The red lines in the following figure highlight that lines in ordinary paragraphs are aligned to the baseline grid even after the three irregular lines and after a chapter number and title:

The first step is to specify the baseline grid using axf:baseline-grid:

<xsl:template match="body">
  <fo:page-sequence
      master-reference="PageMaster"
      writing-mode="from-page-master-region()"
      initial-page-number="1"
      axf:baseline-grid="root">
    <xsl:call-template name="PageMaster-static-content" />
    <fo:flow flow-name="xsl-region-body" hyphenate="true"
             text-align="justify">
      <xsl:apply-templates />
    </fo:flow>
  </fo:page-sequence>
</xsl:template>

The second step is for blocks that do not use the baseline grid to establish their own grid, also using axf:baseline-grid:

<xsl:template match="body//q">
  <fo:block text-align="center"
            text-indent="0"
            space-before="0.25lh"
            font-size="7pt"
            line-height="9pt"
            axf:baseline-block-snap="before margin-box"
            axf:baseline-grid="new">
    <xsl:apply-templates />
  </fo:block>
</xsl:template>

axf:baseline-block-snap specifies how a block aligns with the baseline grid, if any, of its parent block.

Headers and Footers

The headers and footers in the first edition, when present, are quite simple: just the page number and the chapter title. However, an abbreviated title is used for some chapters, even for chapters that do not have a long title. The TEI XML did not include the running header text, so any abbreviated titles were added as fw (“forme work”) elements[2] 10. For example:

<div type="chapter">
  <head>CHAPTER XXIX.</head>
  <head type="sub">ENTER AHAB; TO HIM, STUBB.</head>
  <fw type="head" place="top-centre">ENTER AHAB.</fw>

It is simple to choose the fw element, if present, in preference to the title text as the content of the fo:marker for the running header:

<fo:marker marker-class-name="Chapter-Title">
  <xsl:apply-templates
      select="(fw[@type = 'head'], head)[1]/node()"
      mode="marker" />
</fo:marker>

The abbreviated title is ordinarily centered in the header:

However, even the abbreviated title can be quite long. At least one title is long enough that it cannot be centered in the header without crowding the page number:

The solution is to let the header overflow when it is too wide and to specify axf:overflow-align so the page number remains aligned with the outer edge of the text block:

<xsl:template name="Odd-Header">
  <fo:block
      keep-together.within-line="always"
      text-align="center"
      font-size="8pt"
      border-bottom="1pt solid black"
      axf:leader-expansion="force"
      padding-bottom="5pt"
      margin-bottom="4pt"
      axf:overflow-align="end">
    <fo:page-number color="transparent"/>
    <fo:leader />
    <fo:inline letter-spacing="0.22em">
      <fo:retrieve-marker
          retrieve-class-name="Chapter-Title"
          retrieve-position="last-starting-within-page" />
    </fo:inline>
    <fo:leader />
    <fo:page-number />
  </fo:block>
</xsl:template>

Conclusion

Developing a stylesheet to format the first American edition of Moby-Dick by Herman Melville presented several challenges, including challenges posed by the TEI markup for the source. The challenges were able to be solved using a combination of the features of XSLT, XSL-FO, and AH Formatter extensions.

The stylesheets for formatting Moby-Dick are on GitHub at https://github.com/AntennaHouse/moby-dick. The TEI XML source is in a submodule of the main repository. The XML is also available separately at https://github.com/AntennaHouse/moby-dick-tei.

Acknowledgments

Wendell Piez helped me navigate some of the details of TEI markup.

References

[1] Internet Archive. Moby-Dick, or, the Whale. Duke University Libraries. https://archive.org/details/mobydickorwhale01melv/page/n7/mode/2up.

[2] Melville Electronic Library. Moby-Dick Side-by-Side: The American And British First Editions. Melville Electronic Library. https://melville.electroniclibrary.org/moby-dick-side-by-side (Archive).

[3] IU Digital Library Program. Moby-Dick, or, The Whale. Melville, Herman, (1819–1891). http://webapp1.dlib.indiana.edu/TEIgeneral/view?docId=wright/VAC7237&brand=wright (Archive).

[4] IU Digital Library Program. Moby Dick, or, The Whale. http://dogwood.dlib.indiana.edu:8080/xubmit/rest/repository/wright/VAC7237.xml (Archive).

[5] IU Digital Library Program. Wright American Fiction. Indiana University. http://webapp1.dlib.indiana.edu/TEIgeneral/welcome.do?brand=wright (Archive).

[6] Antenna House. Automated Analysis. https://www.antenna.co.jp/AHF/help/en/ahf-analyzer.html.

[7] Antenna House. Antenna House Formatter V7. https://www.antennahouse.com/formatter-v7.

[8] Grießhammer, Frank. Introducing Source Serif 2.0. Adobe Typekit Blog. January 10, 2017. https://blog.typekit.com/2017/01/10/introducing-source-serif-2-0/.

[9] Wikipedia. Moby-Dick. Chapter structure. https://en.wikipedia.org/wiki/Moby-Dick#Chapter_structure.

[10] Text Encoding Initiative. Headers, Footers, and Similar Matter. P5: Guidelines for Electronic Text Encoding and Interchange. https://tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHSK.

[11] Project Gutenberg. The Project Gutenberg eBook of Moby-Dick, by Herman Melville. https://www.gutenberg.org/files/15/15-h/15-h.htm.

[12] De Vinne, Theodore Lowe. Correct Composition. The Century Co., New York, 1904.



[1] At the time of this writing, the formatting of the list of previous Herman Melville novels is not yet styled quite like the first edition. The quotation marks in the first edition are a larger font size than the titles. xsl:analyze-string will be used to add fo:inline elements around the quotation marks to change their font size.

[2] The term “forme work” for headers and footers was completely unknown to me. I checked the indexes of eight printing, composition, or book typography books published between 1904 and 2005, and none of them included “forme work”.

×

Internet Archive. Moby-Dick, or, the Whale. Duke University Libraries. https://archive.org/details/mobydickorwhale01melv/page/n7/mode/2up.

×

Melville Electronic Library. Moby-Dick Side-by-Side: The American And British First Editions. Melville Electronic Library. https://melville.electroniclibrary.org/moby-dick-side-by-side (Archive).

×

IU Digital Library Program. Moby-Dick, or, The Whale. Melville, Herman, (1819–1891). http://webapp1.dlib.indiana.edu/TEIgeneral/view?docId=wright/VAC7237&brand=wright (Archive).

×

IU Digital Library Program. Moby Dick, or, The Whale. http://dogwood.dlib.indiana.edu:8080/xubmit/rest/repository/wright/VAC7237.xml (Archive).

×

IU Digital Library Program. Wright American Fiction. Indiana University. http://webapp1.dlib.indiana.edu/TEIgeneral/welcome.do?brand=wright (Archive).

×

Antenna House. Antenna House Formatter V7. https://www.antennahouse.com/formatter-v7.

×

Grießhammer, Frank. Introducing Source Serif 2.0. Adobe Typekit Blog. January 10, 2017. https://blog.typekit.com/2017/01/10/introducing-source-serif-2-0/.

×

Wikipedia. Moby-Dick. Chapter structure. https://en.wikipedia.org/wiki/Moby-Dick#Chapter_structure.

×

Text Encoding Initiative. Headers, Footers, and Similar Matter. P5: Guidelines for Electronic Text Encoding and Interchange. https://tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHSK.

×

Project Gutenberg. The Project Gutenberg eBook of Moby-Dick, by Herman Melville. https://www.gutenberg.org/files/15/15-h/15-h.htm.

×

De Vinne, Theodore Lowe. Correct Composition. The Century Co., New York, 1904.