Up-Translation and Up-Transformation logo

Up-Translation and Up-Transformation
a Balisage pre-conference symposium

Monday, 31 July 2017

Monday 8:00 am - 9:00 am (location: Baker)

Conference Registration & Breakfast

Pick up your conference badge and join us for breakfast.

Monday 9:00 am - 9:15 am (location: Sinequa)


Welcome to the symposium

Monday 9:15 am - 9:45 am

Symposium introduction

Evan Owens, Cenveo Publisher Services

Evan Owens, the symposium chair, will introduce the context of up-translations and up-transformations, identify critical issues in workflow solutions, and summarize the themes and topics to be covered by the conference speakers and in the demos. Content enhancement by up-transformation and/or up-translation has historically been an important task and component of markup language technology. Requirements keep evolving; this has been and will remain an important and perpetual topic.

Monday 9:45 am - 10:30 am

Rebuilding a digital Frankenstein by 2018: Towards a theory of losses and gains in up-translation

Elisa Eileen Beshero-Bondar, University of Pittsburgh at Greensburg

The Bicentennial Frankenstein project is producing a new, freshly collated digital edition of Mary Wollstonecraft Shelley's novel, based on earlier work by the website Romantic Circles and on the Pennsylvania Electronic Edition of Stuart Curran and Jack Lynch. At first glance, the task appears simple: up-grade the older edition to meet current expectations for text encoding, the display of variants, and the use of HTML. But closer examination of these earlier editions — built when digital editions were young and the expansive possibilities of hypertext on the Web were being explored for the first time — raises more general questions worthy of reflection. How do we understand the relationships among generations of digital editions? What aspects of older editions transcend or exceed the capabilities of current encoding practices? What can we learn from a thorough review of the input to our up-translation process?

Monday 10:30 am - 11:00 am


Monday 11:00 am - 11:45 am

Uphill to XML with XSLT, XProc ... and HTML

Wendell Piez, Piez Consulting Services

HTML is a widely familiar vernacular for ad-hoc representation of documents, and can be useful as a staging ground for decomposing and breaking down the more complex operations in uphill data transformation. HTML, syntactically well-formed and maintained within XML pipelines with well-defined interfaces, can usefully join XSLT and XProc to provide for a complete up-conversion or data-enhancement pipeline – especially when the ultimate target is semantically richer than HTML. In a project based on this approach, lessons learned include: “Many steps may be easier than one”; “If it doesn't work, try it the other way around”; and “Validation is in the eye of the beholder”.

Monday 11:45 am - 12:30 pm

Up and sideways

Ari Nordström

Legal documents are long, complicated, and replete with complex systems of annotation and cross reference. In the particular case of Halsbury’s Laws of England and Wales, they begin in Rich Text Format (RTF). The challenge of converting hundreds of megabytes of RTF to structured, legal XML is not one to be undertaken lightly. With a goal of fully automated conversions, running for hours, perhaps days, producing complete, accurate, and flawless XML conforming to a specific legal tag set, the stage is set for an epic battle. Herein the author describes one such conversion involving commercial and open source tools, stylesheets, and pipelines.

Monday 12:30 pm - 2:00 pm (location: Social Circle - Lobby Level)


Please check backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.

Monday 2:00 pm - 2:45 pm

Looking for Rumpelstiltskin: a case study of spinning straw into gold

Mary McRae, Orbis Technologies

As a publisher, Staywell was managing content from 6 systems in 5 formats with 3 classification systems and wanted to create repurposable XML. The desire for fine-grained content reuse led them to DITA. A concise set of MS Word styles was developed and used to style approximately 3500 documents so that the DITA Open Toolkit could use a DITA style-to-map file to make XML gold from the straw of Word. XSLT transforms both schema-controlled and well-formed-only HTML into XML and integrated metadata from multiple databases. The cycle is wash/rinse/repeat as data is transformed/checked/retransformed. In summary: up-translation is a black art, the stuff of conjurers and tricksters.

Monday 2:45 pm - 3:30 pm

Automated up-translation: Addressing the tipping points

Caitlin Gebhard, Inera

Up-translation can be accomplished automatically or manually: automatic translation introduces errors and misses content; manual translation introduces different errors and is time-consuming. The best results are obtained by finding a middle ground between automation and manual tagging. However, finding that middle ground is a challenge unto itself. Addressing that challenge requires carefully balancing investment in software development for automation, automatic flagging of suspect cases for manual review, and designing a tagging and quality assurance workflow that is robust and efficient. Balancing automation with manual review is the key to dealing with the inevitable inconsistencies, ambiguities, and “gotcha” moments found when up-translating scholarly manuscripts to models such as JATS and BITS.

Monday 3:30 pm - 4:00 pm


Monday 4:00 pm - 5:00 pm


Several up-translation tools will be demonstrated.

Monday 5:00 pm - 5:30 pm

Panel Discussion - Q&As

Symposium presenters and participants will wrap up the day with unstructured questions, answers, and discussion.

Monday 8:00 pm - 10:00 pm Location: Baker

Balisage Hospitality

Stop in to the Balisage Coffee and Conversation room. We’ll have desserts, coffee, a comfortable place to talk, and possibly a toy or two worth a look.