Customizing JATS (Journal Article Tag Suite)

Deborah A. Lapeyre

Mulberry Technologies

Copyright ©2019, Mulberry Technologies, Inc. Used with permission.

expand Abstract

expand Deborah A. Lapeyre

Balisage logo

Proceedings

expand How to cite this paper

Customizing JATS (Journal Article Tag Suite)

Symposium on Markup Vocabulary Customization
July 29, 2019

1. Introduction to JATS

Terminology: JATS Suite Versus JATS Tag Set

The Journal Article Tag Suite (JATS) is a growing set of modules which define the elements and attributes from which new tag sets can be developed. Think of the Suite as a build-a-tag-set kit. Individual Journal Article Tag Sets are built using the modules of the Suite and necessary new tag-set-specific modules.

Journal Article Tag Suite for Journal Articles

The core JATS document type is a journal article, and the ANSI/NISO JATS Tag Sets are journal article tag sets. The JATS standard (ANSI/NISO Z39.96-2019 JATS: Journal Article Tag Suite) defines XML elements and attributes that describe the content and/or metadata of journal articles —including research articles; subject review articles; non-research articles; editorials; letters; product, software, and book reviews; and included peer reviews or author responses— with the intent of providing a common format in which publishers, vendors, web hosts, libraries, and archives can produce, exchange, and store journal article content.

The intent of the Tag Suite is to preserve the intellectual content of journal articles independent of the format in which that content was originally delivered. Although it does not model any particular sequence or textual format, the Tag Suite enables a JATS user to capture structural and semantic components of existing tagged material.

JATS was originally developed as a conversion target, because, at that time, most of the large journal publishers used their own proprietary formats and a way was needed both for them to interchange content with each other and for archives, libraries, aggregators, hosters, and vendors to accept content readily from all of them.

Originally, JATS was designed for STEM articles (Science, Technology, Engineering, Mathematics), but it is now used internationally for journals of all disciplines. In addition, we have seen JATS tag sets built for:

  • books [BITS (Book Interchange Tag Suite)]

  • standards [NISO STS (ANSI/NISO Z39.102-2017, STS Standards Tag Suite)]

  • technical reports

  • conference proceedings

  • magazines and newsletters

  • posters

JATS Article Tag Sets

There are (as of August 2019) three official journal article instantiations of the Suite, loosely called the JATS ‘Tag Sets’. These tag sets are built from the elements and attributes defined in the Suite and are intended to provide models for archiving, interchange, processing, publishing, and authoring journal article metadata and/or article content.

  • Archiving (Journal Archiving and Interchange Tag Set) was designed to be a conversion target from other tag sets. The loose models, no required elements, attributes that accept any text value, user-defined name/value pairs in the metadata, generic escape-hatch elements in the narrative content, and information-classing attributes, make it easy (during an XML-to-XML conversion) to rename structures while mostly preserving both publishing sequence (reading order) and semantic intent. [Archiving is also known as “Green” from the color palette of its Tag Library documentation.]

  • Publishing (Journal Publishing Tag Set, aka Blue) is a moderately prescriptive tag set to provide a standard format for publishers to regularize and optimize data for their internal repositories, for production processing, and for the initial XML-tagging of journal articles, usually as converted from an authoring format such as Microsoft Word. Publishing is also used by archives and vendors that wish to regularize their content, rather than to accept the sequence and arrangement presented to them by any particular publisher.

    Because this Tag Set is intended to regularize data, the model includes fewer elements and fewer tagging choices than JATS Archiving. These more limited choices produce more consistent data structures that provide a single location of information in a document to simplify searching and that make it easier to display and to produce derivative products.[Publishing is also known as “Blue” from the color palette of its Tag Library documentation.]

  • Authoring (Article Authoring Tag Set) was designed for a user to create a new journal article using model-driven tools (tools that read, interpret, and apply schemas in real time) and submit that article to journals for publication consideration. Because Authoring is optimized for use with tools, it is the most prescriptive tag set in the JATS Suite. It includes many elements whose content must occur in a specified order and limits the options for formatting. For example, Authoring does not allow explicit numbering on list items or citations. These are considered to be formatting decisions determined by a journal’s editorial style and are not appropriate for inclusion in the XML by an author. [Authoring is also known as “Pumpkin” from the color palette of its Tag Library documentation.]

These three JATS Journal Article Tag Sets are available in several flavors:

  • With MathML 2.0 or with MathML 3.0 (never with both)

  • With or without the OASIS CALS Table Model (All public JATS tag sets can use an XHTML-based Table Model)

  • And all the permutations and combinations of those four choices

Thus there are four Archiving Tag Sets, four Publishing Tag Sets, and two Authoring Tag Sets. (Authoring does not provide the CALS table option.)

image ../../../vol24/graphics/Lapeyre01/Lapeyre01-001.png

Other JATS-based Tag Sets

In addition to the three JATS Journal Article Tag Sets (10 flavors in all), there are NISO- or NLM-sponsored tag sets, as well as many non-public subsets and supersets.

  • BITS (Book Interchange Tag Suite

  • NISO STS (ANSI/NISO Z39.102-2017, STS: Standards Tag Suite)

  • ISOSTS (ISO Standards Tag Set)

BITS (Book Interchange Tag Suite)

BITS is JATS for books: a superset of JATS Archiving intended for journal publishers who are already using JATS. BITS has two top-level elements: <book> and <book-part> (a book part is a major component of a book, called something like chapter, module, lesson, part, etc.) BITS grew from demand for a JATS-compatible Book Model by JATS users who also publish books and want to maintain their books using the same structure and semantics when possible. BITS enables JATS users to use familiar (JATS) tools for books, mix books and articles in databases and presentation systems, use JATS articles as book content (e.g., as a chapter or a section in a chapter), manage large books in multiple files, or publish collections of books (e.g., series).

BITS is a much larger tag set than JATS, although the narrative content is largely the same for both. BITS adds book-specific metadata and is more flexible than JATS, because there is more variety in books than in articles. BITS adds XInclude to accommodate larger documents managed in pieces, and BITS can support cut-&-paste from JATS (i.e., a JATS <article> can become a BITS <book-part> with a few tweaks).

There are two public BITS Book Tag Sets, one using only XHTML-inspired tables and one using OASIS CALS tables. BITS only uses MathML 3.0. BITS is funded by NCBI for NLM and used in the NLM Bookshelf project.

NISO STS (ANSI/NISO Z39.102-2017, STS: Standards Tag Suite)

NISO STS describes the metadata and the full content of normative standards documents (international, national, organizational, and SDO-produced). NISO STS is intended for standards publishing and interoperability and may also be used for non-normative materials such as guides and handbooks, although it was not designed for non-normative material. NISO STS has two top-level elements: <standard> (for standards documents) and <adoption> (for standards documents adopting and embedding other standards documents).

NISO STS was based on ISOSTS, which was based on JATS Publishing. There are two NISO STS Tag Sets available: the Interchange version (without CALS tables) and the Extended version (with CALS tables), each of is available with either MathML 2.0 or MathML 3.0. NISO STS is funded by jointly by the American Society of Mechanical Engineers (ASME) and ASTM International with support by NISO.

JATS History

JATS roots go back to 1985 and the very first DTD (an SGML DTD) ever published. It was produced (in advance of any SGML parsers) by the Association of American Publishers as one of three tag sets (journal articles, books, and journals). The AAP DTD (also known as the AAP Electronic Manuscript Standard and the AAP/EPSIG standard) was ratified as the U.S. standard ANSI/NISO Z39.59 in 1988. By 1993, the AAP Article DTD had metamorphosized into the international ISO 12083 Electronic Manuscript Preparation and Markup, which was reworked over the next few years and reemerged as ANSI/NISO 12083-1995, which was a nearly complete rewrite of the original ISO 12083.

In December 2001, the Harvard University Library under a grant from the Mellon Foundation commissioned a report to address the feasibility of developing a common structure (model/tag set) that could reasonably represent the intellectual content of journal articles. The resulting E-Journal Archival DTD Feasibility Study for the Harvard University E-Journal Archiving Project came to the conclusion that yes, a single model/tag set was possible and probably desirable, but that a model to meet that need did not then exist.

In 2002, NCBI (National Center for Biotechnology Information) of the National Library of Medicine began work on a single model for journal articles, thereafter called the NLM DTD, based on the 1998-2002 PMC DTD that had been written for PubMed Central. This DTD was (and is) in use and widely adopted even outside the realm of STEM articles.

NLM gave NISO the NLM DTD to use as a start to JATS and a NISO JATS Working Group was formed. The first JATS (NISO Z39.96.201x version 0.4) was released by NISO on March 30, 2011 as a Draft Standard for Trial Use. There was a 6-month public Comment Period and after the comments were resolved, the JATS Working Group released the JATS version 1.0 in 2012. (ANSI NISO Z39.96-2012). JATS became a continuous maintenance standard at NISO, under the JATS Standing Committee. which resolves user requests and publishes Committee Drafts once or twice a year. JATS 1.1 was issued in 2015 (JATS 1.1 ANSI NISO Z39.96-2015), and JATS 1.2 was issued in 2019 (JATS 1.2 ANSI/NISO Z39.96-2019)

Figure 1

png image ../../../vol24/graphics/Lapeyre01/Lapeyre01-002.png

JATS Adoption

JATS is how much of the world publishes/interchanges journal articles. JATS is in use in at least 25 countries world-wide (US, UK, Germany, France, Australia, Japan, Russia, Brazil, Egypt, etc.), and most middle-sized and small publishers world-wide publish journal articles in JATS. All of the huge publishers, who typically use their own bespoke tag sets, produce JATS for interchange or archival deposit. Many public archives accept (or require) JATS, for example Australian National Library, British National Library, Europe PMC, ITHAKA/JSTOR, Library of Congress (US), PubMed Central, SciELO, and many others. Conversion vendors all know how to handle JATS; numerous web-hosting and service vendors require or support JATS; and there are more tools and products written every day for authoring in JATS and conversion from Microsoft Word to JATS.

As Jeff Beck of NCBI said at JATS-Con (the JATS user conference) in 2017:

JATS is no longer one of the cool kids; it’s just what you do if you have journal articles.

2. Customizing JATS

Why is JATS Customizable?

The JATS/NLM customization mechanism was designed in the 1980s to synchronize the maintenance of tag sets for organizations that needed to maintain multiple highly-interrelated tag sets. The idea was that most structural models (figures, tables, lists, sidebars) and most inline elements should be the same across all tag sets, with only a few differences where necessary. The top-level elements and their direct descendants would be different.

As an example, consider the following situation. A single organization needs to develop and maintain 25 related tag sets:

  • 6 for journals (all different),

  • two for magazines (one online, one print),

  • one for newsletters and email,

  • two for standards,

  • one for conference proceedings, and

  • 13 more, largely for various types of books and pamphlets.

Each of the 25 document types is defined in a separate DTD; all DTDs share the modular library of components. The customization mechanism must allow the organization to change models whenever they need to but only when they want to.

A modular system with modular overrides allows the organization to localize changes in a very few files. Then a simple find-in-files mechanism, from an editing tool or the operating system, can tell them that, for example: amongst these 25 related vocabularies, there are 6 variations on this element and here are the exact Parameter Entities where that change can be examined or changed.

Who Customizes JATS?

While it does take some XML knowledge and an understanding of DTD structure and Parameter Entities to maintain a JATS-based tag set, it does not require XML experts. The mechanism is simple to learn. Organizational customizations are typically maintained by XML-aware programmers in the journal departments of the organization, but some JATS users have trained their senior editorial staff to do it, and some hire consultants, particularly if they do not modify their tag sets very often.

The 10 JATS Tag Sets and two BITS Tag Sets are maintained in a modular library using the customization mechanism described in this paper. Schema maintenance, documentation, and testing for the 12 are performed by Mulberry Technologies, Inc. and sponsored by the National Center for Biotechnology Information of the US National Library of Medicine.

How to Customize JATS

JATS is distributed in XSD, DTD, and RELAX NG format, but is maintained as a DTD. The customization mechanism for DTDs is modularity and Parameter Entity overrides.

First, a brief syntax reminder on how Parameter Entities look and work. A Parameter Entity is composed of:

  • begins with a percent sign (%) on the front,

  • then an XML name, and

  • ends with a semicolon (;).

Some examples include: %list.class;, %emphasis.class;, %abstract-model;, %title-elements;, and %abstract-atts;, where the bold words are the Parameter Entity names. Parameter Entities (like programming language parameters) are established to be overridden. Internal Parameter Entities are for string replacement. First you define a Parameter Entity, then you may use it for string substitution as many times as you want. For example:

  • Define a Parameter Entity list.class

        list.class "def-list | list"

  • Now use the Parameter Entity %list.class; in a content model

        term-list  (title?, (%list.class;)+ )

  • and the parser or other XML processor will see the Parameter Entity as though it were replaced by its string:

        term-list  (title, (def-list | list)+ )

Precedence is Paramount in Parameter Entities

If you reference two Parameter Entities of the same name

  • the first one encountered is used and

  • all other definitions are ignored.

This is how DTD customization overrides work. External Parameter Entities allow DTDs to be modularized. You define Parameter Entities in your-tag-set-specific customization modules that can be called in first and override the same-named Parameter Entities in the JATS default modules. For example:

  • In the your own DTD’s customization module (called in first), list.class is defined as

       list.class  'def-list | list | var-list | term-list'

  • In a JATS default module (called in second), list.class is defined as

       %list.class  'def-list | list'

So the operational value of %list.class; in your customized tag set is

      'def-list | list | var-list | term-list'

Parameter Entities as Far as the Eye Can See!

JATS DTDs define parameter entities for almost everything defined in the DTD modules:

  • Almost all content models

    • %element-name-model; for element content (%abstract-model;)

    • %element-name-elements; to be combined with #PCDATA for elements with mixed content models (%article-title-elements;)

  • All attribute lists (%element-name-atts;)

  • Lots of element classes (logical grouping of elements, such as %lists.class;, %citation.class;, %person-name.class;)

  • Many element mixes, commonly occurring functional or structural groupings of elements (%para-level;)

  • Lists of attribute values

All those Parameter Entities allow almost anything to in a JATS Tag Set to be changed very easily! Why not everything? Why aren’t all content models represented as Parameter Entities? There are only a very few of the #PCDATA-only models that are not Parameter Entities, and these are typically models for identifiers. The lack of a Parameter Entity is our subtle way of saying Please don’t redefine this model.. Not providing the Parameter Entity makes it much more difficult to modify that portion of the tag set. An expert could probably make that modification; a beginner probably could not. And that is fine!

As a result of all these Parameter Entities, most JATS Element Declarations look like the one below, with the basic Element Declaration only showing the Parameter Entities, and all actual models and attribute lists located somewhere above the Element Declaration in the document.

    <!ENTITY % ref-list-model "some model or other"       >
    <!ENTITY % ref-list-atts  "a list of attributes"      >

    <!ELEMENT  ref-list        %ref-list-model;           >
    <!ATTLIST  ref-list
                 %ref-list-atts;                            >

While this indirection-on-every-content-model-and-attribute-list style may be slightly harder to learn to read; once you concentrate on the Parameter Entities, rather than the Element Declarations, the expressive power of this approach becomes clear. All these Parameter Entities make possible nearly infinite customization. They also make it very simple to compare customizations across related-families of tag sets. Have any of our 27 DTDs changed the model of the abstract element? Check out all the %abstract-model: Parameter Entities and find out which ones and what they have changed.

JATS Customization Conventions

Your Tag Set will be easier to construct, maintain, and keep in sync with JATS as the Suite is revised, if you follow a few simple Parameter Entity construction mechanisms. These conventions make it easier for someone else to read and understand your JATS customization as they also make it easier for you to find and make single-circumstance changes. In brief:

  • Content model choices (OR bar) should never contain element names:

    term-list  (list | def-list)            BAD EXAMPLE

  • Instead, content models should name Parameter Entity classes or mixes of elements:

    term-list  ((%list.class;)+ )
    
    title      (#PCDATA | %my-inline.mix;)* 

  • Even choices inside sequences should use Parameter Entities instead of element names:

    body ((%para-level;)*, (%sec-level;)*, sig-block?)

JATS is a Modular DTD System

Each JATS DTD is built using a library of DTD-fragment modules. Each module contains definitions for a group of related declarations. Each DTD is free to call in only the library modules it needs. If your target tag set has no bibliographic references, then leave out the modules defining those elements. That said, most JATS-based DTDs use most of the base JATS modules, adding a few modules of their own.

Three non-JATS exemplar models are shown below:

Figure 2

image ../../../vol24/graphics/Lapeyre01/Lapeyre01-003.png

Modules are called into a DTD using External Parameter Entities, one external entity per module called. This is string substitution on a grand scale, where the entire contents of a module file can be called into the DTD at once.

Best practice in using a modular DTD library is to set up a system catalog and use it to provide access to the modules of the Suite. Catalogs provide an indirection mechanism to associates an established identifier with a URI — typically a file name. In this example, I will be using an OASIS XML Catalog specification catalog, but any method of catalog indirection would work.

OASIS XML catalogs use formal public identifiers (fpis) and establish one fpi and its filename (URI) equivalent for each module used in the tag set. So my customization process would be:

  • Assign each DTD-specific file module a formal public identifier (fpi)

  • Name each DTD-specific files in a catalog entry (an OASIS XML catalog shown) in the same catalog that already includes all the JATS modules.

        <public
          publicId="-//Mulberry//DTD Mystuff Models v1.0 20190710//EN"  
          uri="my-very-own-custom-models.ent" />

  • In your modules, define one external Parameter Entity for each DTD-specific module

        <!ENTITY % the-custom-models.ent
           PUBLIC "-//Mulberry//DTD Mystuff Models v1.0 20190710//EN" 
           "my-very-own-custom-models.ent"                            />

  • Then reference (call) your module using a Parameter Entity in your DTD. This places the entire file logically into the DTD at the point of reference.

        %the-custom-models.ent;

JATS-based Custom DTD Assembled from Modules

There is one very important rule when making new customizations using this method. Never, ever copy a JATS module and edit it. Or, more succinctly:

Never change the published JATS modules — Override them.

Each new document type is a new DTD that defines, in its DTD-specific modules, the Parameter Entities that override the JATS default Parameter Entities. Each DTD may use as many of the JATS base modules (or as few) as necessary. The DTD module typically defines only the top-level element (document element) and (maybe) its immediate children, and calls in all the Suite modules it needs to define the rest of the elements. New (non-JATS) elements (Learning Objectives, parts lists, product codes, taxonomic descriptions) are typically defined in their own modules or in the DTD module. In this way, all customizations are isolated in a few DTD-specific modules. When a new version of JATS is issued, the user can just plug in the new JATS modules and use the new version, unless they have overridden something that has changed in the new version, and the organization wants those new JATS changes. In such as case, add the contents of the new JATS Parameter Entit(ies) to your custom override Parameter Entit(ies).

Typical Scenario for a New JATS Customization

A JATS user wanting to make a new document type (for Reports for example), typically creates at least five new modules and adds them to a local JATS library. And, of course, adds the names of all these modules to a catalog.

Table I

Report DTD Names and models the new top-level element (<report>)and calls in all the needed modules, in order, first Report-DTD-specific modules and then JATS modules
Report-custom-modules Names any new Report-DTD-specific modules created just for this DTD
Report-custom-classes Overrides for JATS element collections (classes)
Report-custom-mixes Override for JATS structures (mixes)
Report-custom-models Overrides for JATS content models and attribute lists
Any new element modules As many modules as necessary to define Report-DTD-specific new elements (Taxonomic material, parts list, whatever is entirely new and cannot be found in ordinary JATS)

Since the DTD module must perform specific functions in a specific sequence, the structure of a customized JATS-based DTD typically looks like the following. This sequence is important because parameter Entities must be defined before they are used, and the first declaration found is the definitive definition.

The JATS-based DTD module:

  • names and describes itself and its purpose in an initial comment,

  • names the DTD-specific Module of Modules, then invokes it,

  • names the JATS Suite Module of Modules, then invokes it, and then

  • calls in the rest of the modules:

    • invokes the DTD-specific class and mix override modules and the default classes and mixes they override,

    • invokes the Model customization module that overrides the element modules,

    • invokes all the necessary element modules, and then

    • defines the top-level element and its components (as needed).

Figure 3: Structure of a Customized DTD

image ../../../vol24/graphics/Lapeyre01/Lapeyre01-004.jpg

As an example, here is a Report-DTD customized DTD fragment

  • Modules have been named with external Parameter Entities.

  • The DTD fragment call in all Report-DTD-specific modules, followed by the JATS modules being overridden.

  • Report-DTD-specific Parameter Entities override JATS Parameter Entities. Anything the new Report DTD did not change is left alone; it is standard JATS.

    %Report-custom-modules.ent;
    %JATS-modules.ent;

    %Report-custom-classes.ent;
    %JATS-default-classes.ent;

    %Report-custom-mixes.ent;
    %JATS-default-mixes.ent;

    %Report-custom-models.ent;

    %JATS-common.ent;
    %JATS-articlemeta.ent;
    %JATS-backmatter.ent;
    %Report-new-stuff.ent;
       and so on for the other modules the Report DTD needs

Now, with all the component modules in place, the DTD can define the new document element (<report>) and maybe new children such as <report-metadata> and <report-body>.

Caution: you may define a lot you never use by including a module. If you include a whole module (it defines 20 elements) and your tag set only uses one element from the module, you have just defined a lot of elements that you will not use. So what? This is not a problem, so do not be concerned that some tools will warn you about these orphan declarations. The condition defined but not used is legal in XML and very handy in modular DTDs!

3. JATS Interchange and Interoperability

JATS was written for the interchange of articles. The expectation in the early years was that each publisher/archive/library would use their own schema (DTD/XSD/RNG) to produce or store journal articles, but then they needed to get their articles into the same form of XML:

  • to put information into a single repository,

  • to exchange information with each other,

  • to sell/display items on the same hosting platform,

  • so vendors do not need to learn another unique tag set, and

  • to share tools and resources.

Therefore, JATS was written as a conversion target and storage format, designed to maximize the number of journal styles and formats that could be usefully tagged as JATS XML. The idea, particularly behind the Archiving Tag Set, was to enable translation into JATS from as many XML journal tag sets and precursor word-processing formats as possible, without semantic loss and with minimal structural impact (rearrangement). The result is journal article tag sets that are very functional for archives and libraries and (with JATS Publishing) for publishing production, vendors, and web-hosters.

By design, JATS is descriptive rather than prescriptive, enabling rather than enforcing. JATS allows multiple ways to tag the same structure, how-to-tag and how-much-to-tag are assumed to be editorial decisions not a JATS-level requirements. In the default JATS Tag Sets, little is required, but much is possible. For example JATS allows all of the following for bibliographic reference tagging:

  • very granular markup inside references, e.g., tagging over 40 mostly semantic elements,

  • very little markup inside references, e.g., tagging only face markup,

  • no markup at all inside references, just a reference start and reference end, and

  • end notes mixed in with references in a bibliography (or prohibited from being intermixed.)

JATS-based tag sets can record very detailed metadata, but are not required to do so. For example, a JATS Tag Set could record:

  • unique identifiers for authors (ORCID),

  • unique identifiers for institutions (RINGGOlD, Crossref Open Funder Registry),

  • IDs on all elements, not required, except for internal link targets,

  • detailed publication metadata (e.g., events and publishing history),

  • detailed funding reporting (with ability to map to Crossref Open Funder Registry)

  • linking terminology to tie terms in the text to ontologies/taxonomies,

  • numeration (i.e., numbers for list items or sections) present in the XML (or not), and

Adding Rules for Interchange

As the preceding paragraphs illustrate, the JATS-XML produced by one organization may be significantly different from the JATS-XML produced by a different organization, even when they are using the same flavor of the same base tag set. XML-to-XML transformation may be necessary for seamless interchange and integration.

Specific best practice recommendations for JATS are outside the scope of the ANSI/NISO standard and even outside the scope of the non-normative DTD, XSD, RELAXNG schemas, and the Tag Library documentation. Basic interoperability lies in being a related family of specifications, with changes isolated so differences can be easily determined and resolved if possible.

Detailed recommendations for interchange, how to use JATS in the same way, and what is best tagging practice are being developed by groups such as Pub Med Central and JATS4R (JATS for Reuse).

PubMed Central Tagging Guidelines

The PMC/NLM/NIH Guidelines describe PubMed Central’s preferred XML tagging style for submitting articles to PubMed Central. PMC accepts article submissions in the NLM Journal Publishing DTD or the NISO JATS Journal Publishing DTD. This site includes links to tools and resources (such as a style checker, fully-tagged samples, fully-tagged citations, etc.) as well as an email distribution subscription list for updates to the guidelines.

https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/style.html

JATS4R (JATS for Reuse) Best Practice Recommendations

JATS4R is a NISO-sponsored industry consortium of JATS users who develop Best Practice Recommendations for JATS. In their own words, JATS4R is an inclusive group of publishers, vendors, and other interested organizations who use the NISO Journal Article Tag Suite (JATS) XML standard..The organization is based on the principle that JATS is broad and inclusive, but reuse and interchange of JATS-tagged documents would be facilitated if JATS users agreed on a single best practice for tagging (or at worst a small number of variations). Therefore the JATS4R active working subgroups are devoted to optimizing the reusability of scholarly content by developing best-practice recommendations for tagging content in JATS XML.

https://jats4r.org/

4. Conclusion: JATS (The Suite) is a “Build-a-Model” Kit

The JATS modules can be thought of as a giant "build-your-own-JATS-model" kit. The input to the process is the JATS Suite library and your-customized new DTD and customized supporting DTD-fragment modules. The JATS Suite modules will provide:

  • most of the structural components for your new model (paragraphs, lists, footnotes, sections, figures, tables, etc.),

  • most of the inline components for your new model (face markup, inline math, abbreviation, custom-styling, etc.), and

  • publishing metadata objects (authors and affiliations, ISBN, copyright and licenses, funding information, etc.)

Parameter Entities are used to make new tag sets. Your custom DTD file and its modules provide:

  • the top-level structure (your document element),

  • all user-specific metadata and semantic elements,

  • any all-new structures, and

  • any changes you want to JATS default models or attribute lists.

The output of the process is a customized semantically fit-to-purpose DTD or several (one document type per tag set).

Once your tag-set-specific DTD has been created, it can be easily transformed into to an XSD Schema or a RELAX NG schema as needed, to make a JATS Tag Set all tools can use.

Appendix A. The JATS Compatibility Meta-Model

Many people who create vocabularies based on JATS assume that documents tagged according to their new JATS-based models will be compatible with existing JATS documents and the systems that manipulate them. This is not necessarily the case.

Building JATS-Compatible Vocabularies* provides guidance for customizing the JATS Tag Suite in ways that are:

  • predictable (know where to find information),

  • consistent (no semantic surprises), and

  • generally non-destructive (purpose of the individual elements is not compromised).

*http://www.niso.org/apps/group_public/download.php/16764/JATS-Compatibility-Model-v0-7.pdf

The goal of this document is to enable creators and maintainers of JATS-based document models to know when the extensions they make to JATS models are JATS-compatible, and to suggest ways in which they can achieve their modeling goals in a JATS-compatible way.

Tagging consistency and best practices in document creation are outside the scope of this document. JATS compatibility is evaluated on the element/attribute and tag set levels. A structure in a JATS-based model that uses an existing JATS name must have the same semantic meaning as in JATS. Additionally, there are a number of “Properties” that a structure might or might not have. For example: an element might or might not be allowed to contain character data; an attribute might or might not be an XML ID or an XML IDREF; a structure might or might not have a recursive section-like model.

An element or attribute defined by a JATS extension is “JATS-compatible” if it has the same semantic meaning as the object of the same name in JATS and the object matches the corresponding JATS object on all of the Compatibility Properties identified in this document. A tag set that is an extension of JATS is “JATS-compatible” if all of the shared elements are JATS-compatible.

This document is intended to help developers of new JATS-related XML vocabularies create those vocabularies in ways that usefully extend the reach of the JATS vocabularies without conflicting with current JATS vocabularies. It describes those things that must not change about a model for it to be consistent with the JATS models and some best practices to follow when extending JATS.

The highpoints of the compatibility model are:

Table II

Rule Implications of the Rule
Respect the Semantic The first and most important rule of customizing JATS is to respect the semantics of the existing elements and attributes. Use a named structure to mean the same thing JATS means by that named structure. If you change the structure’s meaning, give it a new name.
Linking Direction Links in JATS go from the many to the one, not the other way around. So, a reference to a section, table, figure, or equation uses an IDREF to point to the ID of the section, table, figure, or equation [Note: Links in both directions in a user interface can be built from one-way ID/IDREF attributes in the XML files.]
Use Recursive Section Models JATS uses a recursive Section Model. Sections contain titles, paragraph-level things, and (optionally) sections. So Section levels are computed by context, not indicated in the XML.
Subsetting A proper subset of any content model or model of attribute values is always allowed. Elements may be removed from an “or” group with many elements. Elements that are required in JATS may be removed or made optional. Values may be removed from the list of specified values of an attribute. Attributes may be removed from elements.
Model as Element or Attribute? Model it the way JATS does, or use a different name (make your own).
Whitespace Handling A compatible tag set extension must not change the whitespace handling type for any existing element:
  • Element-like whitespace (element contains only elements)

  • Data-like whitespace (element contains characters and or mixed content)

  • Preserved whitespace (model specifies whitespace should be preserved)

Alternatives Elements An alternatives element is a wrapper that says all of these things are equivalent (name-alternatives, aff-alternatives, etc.) For display or counting, you typically want to use only one of the (possibly many) supplied versions, or you may want to treat one as the preferred version, while any others are synonyms.

Appendix B. Sample JATS Customizations

The next few sections will show simple customizations of the JATS, illustrating how to:

  • remove a block element,

  • remove an inline element,

  • add a new inline element,

  • add a new block element,

  • constrain an attribute value,

  • constrain the data type of an element,

  • constrain the content model of a block element, and

  • define a new top-level document type.

How to Remove a Block Element in a Class (choice)

Here the JATS users decides there are no poems in their corpus, so they want to delete all mentions of the <verse-group> element.

  • Find-in-files shows all places verse-group is used

  • In the user’s own customization modules, they redefine each class, mix, or model Parameter Entity that includes verse-group

  • Here is the Report-DTD Parameter Entity (defined first) that will override the JATS default:

        <!ENTITY % intable-para.class
               "disp-quote | speech | statement" >
    

  • Here is JATS default Parameter Entity (called second) being overridden

        <!ENTITY % intable-para.class
               "disp-quote | speech | statement | verse-group" >
    

How to Remove a Block Element in a Sequence

(Remove the <alt-title> and <fn-group> elements inside an article title group)

  • A change to the article title group elements

  • Replace the entire content model

  • Report-DTD Parameter Entity defined first

        <!--       Content model for <title-group> element-->
        <!ENTITY % title-group-model
                     "(article-title, subtitle*)"                   >

  • JATS default Parameter Entity called second

        <!--       Content model for the <title-group> element-->
        <!ENTITY % title-group-model
                     "(article-title, subtitle*, alt-title*, fn-group?)">

How to Remove an Inline Element

(Remove <ruby> as an emphasis element)

  • Find-in-files shows you all the places <ruby> is used

  • In your customization modules: Redefine each class or mix Parameter Entity that includes <ruby>

  • Report-DTD Parameter Entity defined first

        <!ENTITY % emphasis.class
             "bold | italic | monospace" >                        
    

  • JATS default Parameter Entity called second

        <!ENTITY % emphasis.class
             "bold | italic | monospace | ruby" >                            
    

How to Add a New Block Element

(Add <taxonomic-data> anywhere <disp-quote> can be used)

  • Determine where new inline to be used (e.g., anywhere <disp-quote> is used)

  • Find the class(es) or mix(es) describing that usage in the JATS default modules

        <!ENTITY % rest-of-para.class
                "ack | disp-quote | speech | statement" >
    
        <ENTITY % intable-para.class
                "disp-quote | speech | statement"       >

  • In your modules: Redefine those class(es) or mix(es) as an override and call yours first!

        <!ENTITY % rest-of-para.class
                "ack | disp-quote | speech | statement | taxonomic-data" >
    
        <ENTITY % intable-para.class
                "disp-quote | speech | statement | taxonomic-data" >

This example shows 2 replacement Parameter Entities that should contain the new element

How to Add a New Inline Element

(Add <underline> to the emphasis elements)

  • Determine where new inline to be used (e.g., anywhere bold is used)

  • Find the class(es) or mix(es) describing that usage

  • In your modules: Redefine that class or mix as an override

  • Report-DTD Parameter Entity defined first

        <!ENTITY % emphasis.class
                 "bold | italic | monospace | underline">                        
    

  • JATS default Parameter Entity called second

        <!ENTITY % emphasis.class
                 "bold | italic | monospace">                       
    

How to Constrain an Attribute Value

(Case 1: Attribute already has value list)

  • Each attribute list is defined by Parameter Entity (Inside that Parameter Entity, each attribute and its values are defined)

  • The attribute values are also defined in a Parameter Entity

    <!ENTITY % person-group-types     
           "author | compiler | curator | director | 
            editor | inventor"                             >
    
     ... used later in the attribute lists:
    
         person-group-type   (%person-group-types;)     #IMPLIED
    

  • Redefine the value-defining Parameter Entity and call it (before the default)

    <!ENTITY % person-group-types     
           "author | compiler | curator | director | 
            editor |  illustrator | inventor"                             >
     
     ... used later in the attribute lists:
    
         person-group-type   (%person-group-types;)     #IMPLIED
    

  • Notice no attribute list needs to change

How to Constrain an Attribute Value

(Case 2: Attribute is a type like CDATA, not a defined list)

  • Each attribute list defined by Parameter Entity. (Inside that Parameter Entity, each attribute and its values are defined)

    <!ENTITY % person-group-atts
                "%jats-common-atts;                                       
                 person-group-type  CDATA  #IMPLIED"             >
    

  • Name the attribute values you want in a Parameter Entity

    <!ENTITY % person-group-types   
                  "author | compiler | curator | editor | illustrator"  >
    

  • In your modules: Redefine the original attribute list(s) Parameter Entities using your values

<!ENTITY % person-group-atts
            "%jats-common-atts;                                       
             person-group-type  (%person-group-types;)   #IMPLIED >

How to Constrain the Data Type of an Element

  • DTDs can’t do that!

  • Constrain the data type of an element using Schematron (Using XSLT 2.0+)

        <rule context="event-desc[uri]">
          <assert test="uri castable as xs:anyURI">Element <uri> is not 
                           of type xs:anyURI </assert>
        </rule>

  • Constrain the data type of an attribute using Schematron (Using XSLT 2.0+)

       <rule context="pub-date[@iso-8601-date]">
         <assert 
            test="normalize-space(@iso-8601-date)">Empty @iso-8601-date
                     attribute</assert>
       </rule
       <rule
         <assert 
            test="@iso-8601-date castable as xs:date">The attribute 
               @iso-8601-date is not in ISO date format</assert>
       </rule>
    

How to Constrain the Content of a Block Element

Block element models come in three types:

  • Choice groups of other elements

  • Sequences of other elements (include ordered choices and elements)

  • Data characters, with or without intermixed elements

How to Constrain the Content of a Block Element (sequence)

  • Parameter Entity for content-model named %element-name-model;

  • Models mostly made of element classes

    <!ENTITY % ref-model    
                "(label?, (%citation.class; | %note.class;)+ )"  >

  • Write your own override Parameter Entity (called first)

    <!ENTITY % ref-model    
                "(label?, 
                  (%citation.class; | %note.class; | %normative-note.class;)+ )" >

  • To take selected elements out, edit the class Parameter Entities or the -model Parameter Entity

How to Constrain the Content of a Block Element (choice)

For bags of elements

  • Parameter Entity for content-model named %element-name-model;

  • Model entirely made up of element classes

    <!ENTITY % annotation-model ((%just-para.class;)+)"             >

  • Write your own override Parameter Entity (called first)

    <!ENTITY % annotation-model
                               "((%just-para.class; | %note.class;)+)"  >

  • To take selected elements out, edit the class Parameter Entities or the -model Parameter Entity

How to Constrain the Content of a Block Element (mixed-content)

For data content, with or without intermixed elements

  • Parameter Entity for mixed elements named %element-name-elements; is mixed with data characters (#PCDATA)

  • Content model is #PCDATA plus any classes to name mixed elements

    <!ENTITY % chapter-title-elements
                     "| %emphasis.class; | %inline-display.class; 
                      | %simple-link.class;"                      >
    
    <!ELEMENT  chapter-title
                      (#PCDATA %chapter-title-elements;)*         >
    

  • Write your own override Parameter Entity (called first)

    <!ENTITY % chapter-title-elements
                     "| %emphasis.class; | %inline-display.class;
                      | %simple-link.class; | %subsup.class;"     >

  • To take elements out, edit the class Parameter Entities or the -elements Parameter Entity

How to Constrain the Content of an Inline Element

  • Most inline elements are data characters or mixed content

  • Parameter Entity for mixed elements named %element-name-elements;

  • %element-name-elements; may be empty ("")

  • Model is #PCDATA plus any classes naming the mixed-in elements

    <!--      Within a citation, the title of a
              cited data source such as a dataset or spreadsheet. -->
    <!ENTITY % data-title-elements
                    "| %address-link.class; | %emphasis.class; | 
                     %phrase-content.class;"  >
    
    <!ELEMENT  data-title
                     (#PCDATA %data-title-elements;)*     >

  • In your modules, Write an override Parameter Entity

    <!ENTITY % data-title-elements
                    "| %address-link.class; | %emphasis.class; | 
                     %phrase-content.class; | %subsup.class;"  >

  • To take elements out, edit the class Parameter Entities or the -elements Parameter Entity

How to Define a New Top-level Document Type

A DTD is the definition of ONE document type

  • Make a new DTD module

  • Define the new element type

  • Call in any JATS modules you need

  • Original Book definition in a BITS Book DTD

    <!ENTITY % book-model   "(collection-meta*, book-meta?,
                              front-matter?, book-body?, book-back?)"    >

  • Second top-level element, a Book-part Wrapper DTD to hold one chapter

    <!ENTITY % book-part-wrapper-model
                            "(collection-meta*, book-meta,
                              (%book-parts-mix;) )"                      >

References

American National Standards Institute/National Information Standards Organization (ANSI/NISO). ANSI/NISO Z39.96-2019. JATS: Journal Article Tag Suite, version 1.2. 2019. Baltimore: National Information Standards Organization. https://groups.niso.org/apps/group_public/download.php/21030/ANSI-NISO-Z39.96-2019.pdf.

JATS4R (JATS for Reuse) Best Practice Recommendations. https://jats4r​.org/.

Lapeyre, Debbie. Introduction to BITS (Book Interchange Tag Suite). XML.com. January 18, 2019. https://www.xml.com/articles/2019/01/18/introduction-bits-book-interchange-tag-suite/.

Lapeyre, Debbie. Introduction to JATS (Journal Article Tag Suite). XML.com. October 12, 2018. https://www.xml.com/articles/2018/10/12/introduction-jats/.

Lapeyre, Deborah A. How JATS Empowers Scholarly Communication. Presented at SciELO 20 Years: Conference, São Paulo, Brazil, September 26-28, 2019. https://pt.slideshare.net/scielo/deborah-118692580/scielo/deborah-118692580.

Lapeyre, Deborah A. XML — Why And How: JATS. Presented at SciELO 20 Years: SciELO Network Meeting, São Paulo, Brazil, September 24-25, 2019. https://www.slideshare.net/scielo/deborah-aleyne-lapeyre-xml-why-and-how-jats.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Article Authoring Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats​.nlm.nih​.gov/articleauthoring/tag-library/1.2/.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Archiving and Interchange Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats​.nlm.nih​.gov/archiving/tag-library/1.2/.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Article Tag Suite. https://jats.nlm.nih.gov/. Splash page.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Publishing Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats​.nlm.nih​.gov/publishing/tag-library/1.2/.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), PubMed Central (PMC). PubMed Central Tagging Guidelines. https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/style.html.

Organization for the Advancement of Structured Information Standards (OASIS). std-entity-xml-catalogs-1.1. XML Catalogs, v1.1. 7 October 2005. https://www.oasis-open.org/committees/download.php/14810/xml-catalogs.pdf.

Rosenblum, Bruce, and Irina Golfman. E-Journal Archival DTD Feasibility Study. December 5, 2001. https://old.diglib.org/preserve/hadtdfs.pdf. Newton: MA: Inera Incorporated. Prepared for the Harvard University Library Office for Information Systems E-Journal Archiving Project.

Schwarzman, Alexander B. JATS Subset and Schematron: Achieving the Right Balance. Presented at JATS-Con 2017, Bethesda, MD, April 25-26, 2017. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017. Bethesda (MD): National Center for Biotechnology Information (US), 2017. https://www.ncbi.nlm.nih.gov/books/NBK425543/.

Schwarzman, Alexander B. Superset Me—Not: Why the Journal Publishing Tag Set Is Sufficient if You Use Appropriate Layer Validation. Presented at JATS-Con 2010, Bethesda, MD, November 1-2, 2010. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010. Bethesda (MD): National Center for Biotechnology Information (US), 2010. https://www.ncbi.nlm.nih.gov/books/NBK47084/.

Usdin, B. Tommie, and Deborah Aleyne Lapeyre. JATS/BITS/NISO STS. Presented at Symposium on Markup Vocabulary Ecosystems, Washington, DC, July 30, 2018. In Proceedings of the Symposium on Markup Vocabulary Ecosystems. Balisage Series on Markup Technologies, vol. 22 (2018). doi:https://doi.org/10.4242/BalisageVol22.Usdin01.

Usdin, B. Tommie, Deborah Aleyne Lapeyre, Laura Randall, and Jeffrey Beck. JATS Compatibility Meta-Model Description, Draft 0.7. July 12, 2016. https://groups.niso.org/apps/group_public/download.php/16764/JATS-Compatibility-Model-v0-7.pdf.

Usdin, B. Tommie. What is NISO STS? XML.com. January 6, 2019. https://www.xml.com/articles/2018/01/06/what-niso-sts/.

Wikipedia contributors. AAP DTD. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=AAP_DTD&oldid=897327071. Includes discussion on the evolution of the AAP DTD into the ISO 12083 standard.

Author's keywords for this paper: JATS; Journal Article Tag Suite; Journal Archiving and Interchange Tag Set; Journal Publishing Tag Set; Journal Authoring Tag Set; BITS; Book Interchange Tag Suite; BITS; NISO STS; NISO Standards Tag Suite; Parameter Entities; modular DTD; Customization modules; classes and mixes customization, JATS; ANSI NISO Z39.96-2015 Journal Article Tag Suite; ANSI/NISO Z39.102-2017