How to cite this paper

Beck, Jeffrey. “JATS Superhighway: Onramp to a Backward-incompatible Version.” Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022. In Proceedings of Balisage: The Markup Conference 2022. Balisage Series on Markup Technologies, vol. 27 (2022). https://doi.org/10.4242/BalisageVol27.Beck01.

Balisage: The Markup Conference 2022
August 1 - 5, 2022

Balisage Paper: JATS Superhighway

Onramp to a Backward-incompatible Version

Jeffrey Beck

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

Jeff Beck is the Program Head for Literature at the National Center for Biotechnology Information at the US National Library of Medicine. He has been involved in the PubMed Central project since it began in 2000. He has been working in print and then electronic journal publishing since the early 1990s. Currently he is co-chair of the NISO Z39.96 JATS Standing Committee and is a BELS-certified Editor in the Life Sciences.

Author's contribution to the Work was done as part of the Author's official duties as an NIH employee and is a Work of the United States Government. Therefore, copyright may not be established in the United States. 17 U.S.C. § 105. If Publisher intends to disseminate the Work outside the U.S., Publisher may secure copyright to the extent authorized under the domestic laws of the relevant country, subject to a paid-up, nonexclusive, irrevocable worldwide license to the United States in such copyrighted work to reproduce, prepare derivative works, distribute copies to the public and perform publicly and display publicly the work, and to permit others to do so.

Abstract

Tag sets change over time. Tag set designers manage a complex system where everything is connected to everything else and new user requirements continue to surface. Tag set users manage complex systems to create, manage, and archive documents. Users strongly resist backward-incompatible change, so as JATS has grown we have made compromises in the design to meet new requirements while maintaining backward-compatability. We think it is time to consolidate redundant models, remove deprecated items, and generally reduce confusion. Can we guide users towards a new, backwards incompatible version in a way that they'll find palatable?

We have a plan. We're going to extend the JATS 1.x schema so that it contains the new, 2.0 models in addition to the old models. Then we'll make an "Onramp" subset of 1.x that has the deprecated items removed. Documents valid against the onramp subset of 1.x will also be valid against 2.0.

Table of Contents

Introduction and Background
Backward Compatibility
Subsetting and the "JATS Lite" Movement
The Swelling Tagset
An Interesting Way Forward
Conclusion
Acknowledgments

Introduction and Background

JATS is a NISO Standard that describes XML elements and attributes and three models for defining journal articles. The work started in 2002 as an extension of the PubMed Central DTD that became the "NLM DTDs".

Backward Compatibility

As we were planning to move the NLM DTDs to NISO - to become NISO JATS - in 2007, the NLM DTD Steering committee decided to do a "cleanup" version of the article models to fix all of the infelicities we had introduced since 2003 by keeping the models backward-compatible.

For example, we had introduced the <permissions> element in version 2.1 (2005) as a container for copyright and license information. In earlier versions, <copyright-statement> and <license> were available in <article-meta>. In version 2.1, they were available within the new <permissions> wrapper but we could not remove them from <article-meta> because of backward-compatibility. When we released NLM DTD 3.0 in 2008, all license and copyright elements had to be enclosed in a <permissions> element.

The backward-incompatible release of NLM 3.0 was a game-changer for many users, and some PMC submitters are still using NLM DTD version 2.3 or earlier to submit their articles. NISO Z39.96-1012 version 1 became official in August 2012. It, and all following JATS versions, was backward compatible with NLM DTD 3.0.

Since 2012, as the JATS Standing Committe considered requests for enhancements, we sometimes found that design decisions that were appropriate in the past are non-optimal given current requirements. In the normal maintenance mode, we find ourselves making compromises in the tag set in order to meet new requirements while maintaining backwards-compatibility.

By 2019, we believed that there were enough of these compromises in JATS 1.2 that it was time to consider a non-backwards-compatible version in which, for example, structures that have been deprecated are removed, redundant models are consolidated, some confusion is eliminated, and some structures are tidied up to strengthen JATS for the future.

With the experice we have had with the strong reaction to the backward-incompatible release in the past, the JATS SC is trying to smooth the way to a JATS 2.0 version. We have been updating and maintainng the JATS 1.x line while we address the larger "2.0 issues", and have released a JATS version 1.3 in 2019 and are working on addressing public comments for JATS 1.4. Many of the decisions that are being made on the "2.0 issues" can be applied in the 1.x line as long as newly unpopular elements and structures are decprecated.

Subsetting and the "JATS Lite" Movement

The article models described in JATS have always been complex, with multiple ways to tag structures even within the more prescriptive "Journal Publishing" model. May users develop subsets of the models by writing their own more restrictive schemas to control usage within their organization (lapeyre). Other practically subset the models by controlling usage with a validation layer on top of DTD validation with something like Schematron or XSLT. For PubMed Central, we use a subset of JATS defined by the PMC Tagging Guidelines (pmc1) that we control with the PMC Style Checker (pmc2, beck1).

We hear discussion of an official JATS Lite version, similar to the TEI Lite version that is a "customization of the TEI tagset, designed to meet '90% of the needs of 90% of the TEI user community'". So far, the JATS Standing Committee has not worked on developing a JATS Lite version, but there is a lot of interest in the community, including a presentaiton at Balisate 2021 that tried to define the elements necessary to be in a Lite Subset (imsieke). So far we can get agreement that an official Lite Subset would be a good idea, but we can't get agreement on what should be excluded.

The Swelling Tagset

Imagine a Tag Set in version 1.0 (see Fig. 1). It has the elements and attributes defined in it. As time goes by and additions are requested, if the updates are made to be backward-compatible, the Tag Set will grow (see Fig. 2). New elements, attributes, and structures are added, but nothing ever gets take away.

Figure 1

The initial state of a Tag Set.

Figure 2

The Tag set after the fifth backward-compatible update.

Nothing gets taken away, not because the old structures should not be used, but because of the fear of breaking backward compatibility and the effect it will have on the update of a new version.

An Interesting Way Forward

Currently the JATS Standing Committee is working on the "2.0 list", which is revisiting existing structures and coming up wiht solutions like developing a reasonable way to represent multi-language articles. We are also working through user comments that are submitted through the NISO Public Comment form to keep the JATS 1.x line meeting the users' needs.

With the experience we have had with the slow uptake by some users of the last backward-incompatible version over a decade ago, we have been wary of making such deep changes to the article models although we are sure that the 2.0 structures will be "better."

Tommie Usdie made an intresting proposal on one of our JATS Standing Committee calls as a way to make the new 2.- structures less frightening to new users. Most of the new things that we are adding - like all of the new elements for multi-language articles - can be added into the 1.x line to get users familiar with them.

This will be just like all previous JATS updates - adding new structures to the existing, swelling Tag Set. As a way to ease the community into a JATS 2.0 version using the newer structures, once the 2.0 models in place in the 1.x line, we will make a Subset of the 1.x schema with all deprecated elements removed. This means that any instances valid to this Subset are valid against 1.x but will also be working toward the 2.0 models.

The JATS 1.x release will be backward-compatible with the current JATS line. If users start using this "Onramp" subset, they will get the benefit of the newer structures. The "Onramp" schema will not be backward-compatible with the current JATS line, but any instances written against it will be because there are no elements or attributes in the Onramp schema that are not in 1.x.

Figure 3

The "OnRamp" subset that uses newer structures and disallows deprecated ones.

Conclusion

This seems like a reasonable way to get users familiar with newer structures that are being added in 2.0 without forcing a wholesale upgrade of systems. As Wendell Piez pointed out when this was presentat at JATS-Con this year (usdin), the difficult thing is going to be how to name it. It is not just another schema in the 1.x line; that one is as bloated with old and new structures as you would expect.

Acknowledgments

This work was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health

References

[lapeyre] Lapeyre DA. “Why Create a Subset of a Public Tag Set.” In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010. Available from: https://www.ncbi.nlm.nih.gov/books/NBK47099/

[pmc1] PMC Tagging Guidelines. https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/style.html

[pmc2] PMC Style Checker. https://www.ncbi.nlm.nih.gov/pmc/tools/stylechecker/

[beck1] Beck, Jeffrey D. “How many hamsters does it take? Under the hood at PMC.” Presented at Balisage: The Markup Conference 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). doi:https://doi.org/10.4242/BalisageVol19.Beck01.

[tei1] TEI Lite. https://tei-c.org/guidelines/customization/lite/

[imsieke] Imsieke, Gerrit, and Nina Linn Reinhardt. “JATS Blue Lite: The Quest for a Compact Consensus Customization.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Imsieke01.

[usdin] Usdin, Tommie. “Thinking about a Convenience Subset of JATS.” JATS-Con Open Session. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2022 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2022. Available from: https://www.ncbi.nlm.nih.gov/books/NBK580693/

×

Lapeyre DA. “Why Create a Subset of a Public Tag Set.” In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010. Available from: https://www.ncbi.nlm.nih.gov/books/NBK47099/

×

Beck, Jeffrey D. “How many hamsters does it take? Under the hood at PMC.” Presented at Balisage: The Markup Conference 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). doi:https://doi.org/10.4242/BalisageVol19.Beck01.

×

Imsieke, Gerrit, and Nina Linn Reinhardt. “JATS Blue Lite: The Quest for a Compact Consensus Customization.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Imsieke01.

×

Usdin, Tommie. “Thinking about a Convenience Subset of JATS.” JATS-Con Open Session. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2022 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2022. Available from: https://www.ncbi.nlm.nih.gov/books/NBK580693/