Using BITS for conference paper conversion
Alexander B. Schwarzman
The Optical Society
The Optical Society
As is typical for many society publishers, OSA–The Optical Society, has both a journal and a conference program. Integrating both journal articles and conference papers within a single data source opens up a pathway to conduct business intelligence analysis over the entire corpus of the published research material, which can benefit both programs and advance the society’s mission.
In 2017, having successfully completed a project to convert almost 100 years of its journal legacy material to JATS XML, OSA decided to convert its conference content as well, tag it in a JATS-compatible way, and to combine both content segments in a single MarkLogic database. While it has been well-accepted that JATS and BITS cover the markup needs for journal and book content, respectively, it is less clear what Tag Set would be most suitable for tagging conference proceedings.
Even though we thought
we had seen it all in converting journal content, in the course of the project we learned that handling
conference metadata and journal metadata presents very different challenges. In this
paper, we share our experience with using BITS for marking up individual conference
papers and how our business decisions shaped how we structure the XML. We demonstrate
that because BITS was explicitly designed to enable the construction of books composed
of units that could be part of many collections, the BITS metadata model is well-suited
for representing conference paper’s nested collections, both event- and publication-related.
To ensure data quality, we have built workflows, designed XML tools (e.g., Tag Subset,
Schematron), and instituted visual QC procedures that allowed us to achieve our objective.
We conclude our paper with lessons learned from this project and new opportunities
its successful completion has opened up.