I know, let’s start a journal
Creating a new academic journal is not something to
undertake lightly [swan2012]. Quite apart from
the need to find articles of the quality needed for publication,
and the problems of dealing with authors, there is usually a
significant administrative workload [shap2005]. In a traditional paper-based journal,
you must deal with the typesetter and printer, and then — often
independently — deal with getting the issue onto the web. In an
academic environment, it is assumed that the editors do the work
free time, without payment; and that
graduate students can be co-opted onto the team for the
experience it provides them. Even if the editor’s institution
provides hosting and connectivity for a web site, the
prohibiting factor for paper-based publication is the cost of
printing and distribution.
It has therefore been common for a long time for many new journals to be web-only, with PDFs and eBooks as download options for readers requiring print. This is now so commonplace that new journals are started all the time — a recent report from the International Association of Scientific, Technical, and Medical Publishers (STM) gives the active numbers listed in Ulrich’s Directory (a respected resource) as growing from 23,000 in 2002 to 28,000 in 2014, with the strongest growth in 2008–2010 [ware2015] — nearly one a day, and that just represents the journals that have succeeded. (The report also reproduces an interesting graph showing an ongoing growth of 3.5% a year in the number of journals since 1700 [mabe2003].) Unfortunately many journals also disappear: some older-established journals who failed to make the transition to the web, and some initially enthusiastic web-only journals which simply stopped publishing [beal2015]. The numbers are probably unknowable
In my group — responsible in the university for electronic publishing, including the web — we are often asked for advice about setting up a new journal, but the number of suggested journals which actually get established is small, as the administrative or organisational commitment is quite large, print or no print. In 2006 we were approached by a member of the Department of German, asking about creating a journal on the use of drama in second-language education. He had already organised an editorial team, discussed articles with potential authors, decided on a twice-yearly publication cycle, and, crucially, made the decision that it was to be online-only and Open Access. As the journal was to be bilingual in German and English, a facility for two versions of the same article, one in each language, was a necessity (in the circumstances a common request because of the need in other areas to satisfy the requirements of the country’s dual-language legislation as it applies to institutions accepting public funding).
In the discussions that followed, it became clear that the technical process of putting the material on the web would be relatively straightforward, provided that the entire business of pointy brackets could be kept completely out of the picture as far as authors and editors were concerned. In this respect we suggested that, as Microsoft Word had started using XML as their default file save format, authors and editors should continue using Word (with OpenOffice XML as an alternative). We were already running a Cocoon server for several other projects, so the underlying technology for taking in XML and serving HTML and PDF was an established path.
The alternatives included the Open Journal System (OJS), then fast growing in popularity, the Public Library of Science’s system (now Ambra, but then an early beta of Topaz), and several other PHP and Java systems. However, at the time they all presupposed a large and well-funded team of people with ample time, all well-versed in web and markup technologies, and with sufficient administrative support to handle a complex workflow. The new editor’s plans were that he and the other editor would handle submissions by email from authors, and do all the editorial work in Word on their own desktop systems; so all they felt they needed was to be able to drag and drop accepted articles into a folder, from where they would automatically appear at the relevant URI for preview and publication. The publicly-available systems, by contrast, were written for the submission and editorial workflow to be mediated by the software, sometimes involving circular conversion between the publishable HTML and the authors’ and editors’ source formats, and they also included powerful but complex control panels. At the time of initial evaluation, a modular approach was not seen as a viable alternative.
Implementation and technical limitations
A number of restrictions were immediately apparent. While drag-and-drop to a Windows share was perfectly possible on-campus, the SMB protocol was not exposed outside the firewall, meaning work from home or while travelling would not be possible without the added complexity of VPN access. A static drop-point like a Windows share would also have meant a implementing some form of periodic trigger mechanism such as a cron (1) script to execute at very frequent intervals to pick up recently-deposited documents and push them into the server workflow.
The direct use of Word 2003
.xml files was
attractive because they could be operated upon immediately by
Cocoon. However, they implemented images in Base64-encoded CDATA
sections, and no suitable decoder was available for Cocoon that
would also save the resulting binary image file into a specified
directory (decoding and serving inline in real time, each time
the article was accessed, would have slowed the process
significantly). Later, the zipped
.docx file format
would store images in their binary format outside the XML file,
but in the early stages it was clear that a separate process
would be needed to extract and save them, along with the main
document and its ancillary files in XML, so a control panel
Having decided that a control panel was going to be needed, it therefore made sense to embed the functionality of uploading in it, so that the control panel would be aware of uploads — no trigger mechanism needed. An Open Source utility (osFileManager) was installed to handle uploads. As it turned out, the frequency of re-uploading re-edited documents, and sometimes removing them altogether, was much higher initially than had been expected, so having the functionality in the control panel rather than as a separate drag-and-drop facility proved to be more efficient.
The control panel itself was written in XSLT2, executed
within Cocoon, running against a simple XML configuration file
for the journal (title, ISBN, details of editorial board, and
expected frequency of issues). After a file upload, the issue
display gets updated via a small CGI web script on the same
server, which emits XML, so that it can be called silently
from within the XSLT2 code. The script performs the disk
housekeeping functions like creating directories for new
issues, unzipping files, stripping unwanted namespaces (!),
moving their contents into the directories where the
publishing mechanism would expect them, and deleting unwanted
articles; functions which were not directly available within
Cocoon. This represented an unavoidable departure from the
philosophy of staying within XML at all times, but the script
uses XML utilities such as those from the LTXML2 suite (from
the Edinburgh Language Technology Group) for extracting
metadata (author, title, etc) from each
file; which in turn makes it easier to incorporate that in the
script’s XML output.
As can be seen from Figure 1, the control
panel exposes the issues as buttons across the top, with the
articles for the selected issue listed underneath. The author
name is a button linking to the preview or published form of
the article in HTML (in a pop-up window). A rigid upload file
naming convention identifying sequence, author, volume, issue,
and language (
the editors to order the articles and keep multiple articles
by the same author distinct (00 and 99 are reserved for the
Foreword and Author Details).
Three buttons under the
Type column in the
control panel allow a) the pop-up of the
Word XML source for error tracking; b) a
list of the named styles used; and c) a
download of the
.odt file (required in case a later version has
been uploaded by a colleague). The final
column has drop-down menus to allow an article to be set as
Published; set as Republished after correction (as shown);
reverted to Editing status (and removed from public view); or
Perhaps the most obvious technical requirement, however, was that the Word files had to use rigorously-named styles in order to be convertible in any meaningful fashion. Apart from tables, a Word file is basically a sequence of paragraphs distinguished only by their embedded typographic variation (font changes, bigger and bolder type for headings, prefixed by bullets or numbers for list items, indented for quotations, etc). Adding a named style to each paragraph allows Word to format it consistently without the need for manual intervention. But it also means the editors can distribute a stylesheet to authors, and it allows authors to play around with the visual appearance of the styles on their own system. Provided the names of the styles remain unchanged, the XSLT2 code can use them to identify the type of paragraph, ignore any the authorial modifications to the typographic design, and apply the house style. The output can therefore be relatively plain XHTML (HTML5) with almost all the formatting done with CSS.
Using named styles with XSLT2 in this way is now commonplace, but it was an experiment in this project because the use of styles in Word is rarely taught or even mentioned, not even when the author has undergone some training in how to use Word, which in itself is unusual. Although most users who write formal documents, from business reports to technical documentation to academic papers, will freely admit to the importance of consistency and structure, the use of styles remains at a surprisingly low level.
The original journal has now been publishing successfully
with this method for eight years, and has been joined by four
more, with another five under development. The
competing publicly-available software for the
same purpose has also developed, much further than our limited
resources permit with the in-house solution. While there is
little pressure at the moment to change, Wordpress and similar
systems with themes and plugins to perform largely the same task
are attractive to newcomers, and widely available both in-house
free and commercial platforms.
Universities can also sign up for limited use of publication
systems run by the larger publishers (eg Elsevier), although
these tend to be developmental rather than for production. In
the current economic and staffing environment of a state-funded
university, our (IT division) requirements are for systems
needing the smallest amount of support, and a move to one of the
other systems available is unlikely to occur in the short
Simplicity is the hardest thing to achieve
In removing all evidence of pointy brackets, and having a single control panel, the business of detecting and handling errors falls on the XSLT2 code which drives the control panel, and on the CGI script which deals with uploaded files. A considerable amount of effort was necessary during the first two years of operation to modify both programs to handle unexpected requirements from both authors and editors, and to deal with the unconventional and sometimes obtuse implementation of markup which is OOXML in order to maintain this simplicity.
Most of our editors are experienced academics, well-used to the demands of their own publishers both for consistency and adherence to a known style. But this cannot prepare them for the business of dealing with inconsistent demands from authors — even experience as a guest editor on a larger journal, for example, usually involves the editor being shielded from a large amount of technical detail by a production team. A new journal also means that some of the unforeseen requirements have to be resolved on the spot, rather than by a planning committee working in the background.
Authors’ reluctance to understand styles
As mentioned above, authors are generally unfamiliar with the concept of named styles, so careful documentation was needed to accompany the Word stylesheet (template) distributed to them. However, subsequent calls for support showed that this was either ignored or not understood, especially the recommendation to set up Word to show the styles in a separate margin (which was carefully documented in step-by-step instructions). As this is key to any proper use of named styles, failing to use it creates a significant additional workload for the editors.
Authors have also grown accustomed to the liberality with
which Word allows text to be treated, and are therefore
surprised to encounter restrictions such as a house style; in
the author sees it as his own
job to invent the schema [piez2007, my emphasis]. This is in direct
conflict with the assumption that academics’ experience with
their own publishers would lead them to expect such rules, and
is an unresolved problem.
The initial document structure assumed the following styles:
Series (eg “Review”), optional
Author, Affiliation (repeatable in pairs)
Heading1, Heading2, Heading3 (no more)
Paragraph (or null style)
NumberedList (1, 2, 3,…; with a, b, c,… as a second level)
BulletedList (• then ·, no custom icons)
Figure with Caption
Table with Caption
Inline markup is limited to bold, italics, and hypertext links (who knew that Microsoft would come up with three separate and mutually-incompatible ways to mark up and implement links?).
Additions and changes
It was recognised that authors would want (and should be
permitted) freedom to express their views as best they
could, but not at the expense of making each article into a
personal typographic experiment or sandbox. The intention
keep the mean between the two extremes, of too
much stiffness in refusing, and of too much easiness in
admitting any variation (BCP).
As the subject matter concerned the use of drama in language teaching, styles for Speech and Speaker were early additions.
An Epigraph is allowed, both at the start (between title block and abstract), and under a Heading1.
A Subtitle must now be styled as such, even though the formatted result may place it inline to the Title, separated by a colon.
Tables in articles in the Humanities are more often used as a way to arrange related blocks of text side-by-side, than to tabulate numeric quantities. The editors have to intervene when an author’s cells contain headings and figures; virtually miniature articles in their own right.
An unexpected hiatus was caused by finding that instances of Word configured for different languages insert style names in those languages. The style names used for footnotes appear to be language-dependent, so the XPath selectors now cater for these.
The house style is for numbers or large bullets for the first level of lists, and lowercase letters or small bullets for a second level (maximum). Editorial understanding and tact is needed to maintain consistency, which seems particularly hard for authors to accept in the case of lists, where their own institution, journal, or personal preference may be for the reverse, or for more decorative fleurons as bullets.
There are plans to add a style to identify an Author by ORCID, the internationally-agreed unique author identification code. This will materially assist in the accurate citation counting which institutions now rely on as a measure of output.
Enhancements which did not involve changes to the markup included the addition of COinS metadata (ContextObject in SPAN), which allows users of bibliographic mining and retrieval software (eg Zotero, Mendeley, etc) to extract a reference with a single click.
Despite the best intentions to shield editors from pointy brackets, they do need some basic IT knowledge, and exposure to conventions and best-practices which are neither taught nor mentioned in most training courses. These include the use of the filetype (extension) which is hidden from most Windows users; the avoidance of spaces and non-ASCII or non-alphanumeric characters in filenames and directory names (excepting dot, hyphen, and underscore); the distinction between unidirectional and typographic quotation marks; and the need for that meticulous exactness which one normally finds only in a proofreader. An understanding of markup is also useful, as it enables queries and explanations between academics and technical support staff to be expressed more succinctly.
Where the original journal uses separate XSLT2 templates for each named style, the later ones use a lookup on the style name into a XML style configuration file, which handles most cases of mapping from source document styles to result tree element types.
The only two major exceptions are the handling of
repeated Author/Affiliation pairs for multiple authors (a
relative rarity in the Humanities), and the exceptionally
dense XPath statements required to deal with Word lists,
especially when list items may have embedded tables or
figures, block quotations, or sub-lists, or may need
restarting where a previous list left off several
sections (heading levels) earlier. Editors
try to discourage unnecessary structural complexity of this
type by using subsubsection levels instead.
The journal configuration originally specified two issues
per year. This was stored in the configuration file to allow
the control panel to work out what directory naming and menu
structure to implement when the
button was pressed. As new journals were added to the fold,
other values were implemented (annual, quarterly, etc, with
both year-number and volume-number numbering). There is
currently no provision for a journal to change frequency or to
skip an issue or bring out a special issue.
The question of allowing editing after publication has arisen a few times. Ethically, this is widely deprecated, and the current informal practice is that it should only be undertaken when there is a legal question over material, or when a serious factual error is discovered (a wrongly-dated reference, or an incorrect URI). The contrary view is that digital publication is implicitly mutable, and that ongoing updates to an article are in fact desirable. From the reader’s point of view, however, this is very undesirable, as it makes citation unreliable.
In a university, there is a reasonable guarantee of
continuity; that is, no-one is going to wipe the disk because
of specious claims about
core business or
affecting our reputation. The university
repository offers both preservation and data exposure, and
arrangements are under way to lodge the issues there, once the
current work on implementing DOIs is completed (section “Missing features”).
The issue of footnote representation and positioning arose during discussions with editors on the web layout. In an earlier unrelated project (using TEI XML), we had used pop-up windows for footnotes, migrating these to lightboxes for HTML5, but the journal editors advised that a more conventional format would sit better with their readership, especially if it could more easily be printed from the HTML pages (as distinct from the PDFs). It was clear that the print-oriented demands of the user community, and their other publishers in particular, would mean a pop-up, hover, or lightbox rendering for footnotes would be inappropriate.
The current solution is that points of reference are signalled in the standard way with superscripted digits, hyperlinked to the texts, but the texts themselves are dealt with at the start of the XSLT2 template for each of the Heading styles. This tests all preceding paragraphs that lie in the domain of the preceding heading, and formats them (using HTML list markup) into a footnote block which is placed before the new heading starts (see Figure 2). Each footnote has a backlink to the point of reference, as many users appeared to be unaware the the Back button — exercised after traversal of a link within a document — would bring them back to where they were.
The effect is that the footnote texts are kept relatively close to their points of reference, often already visible as the reader scrolls down; given the pageless model of the web, this is a close substitute for the footnotes visible in a conventionally paged print document. Multiple references to the same footnote simply create an additional link: the direction and extent of the required scroll does not appear to be a concern for readers. Footnotes in tables and figures are kept within the table or figure, and are output lettered.
It cannot be emphasised too much that all intending journal owners should get a designer to design the web and print layout at an early stage — unless the institution wishes to impose a standard format for all its journals (certainly easier to implement!). In reality, as noted earlier, many academic journals operate on a best-effort basis, with no subscription income and no other funding, so the resources to pay for design work are non-existent. Unpaid student help is often used, and the work of design students is frequently excellent, but enthusiasm and good intentions are no substitute for talent.
In the absence of a separate design input, a largely generic (and amateur) HTML page style has been adopted for some of the journals, and some have opted for using the skeleton of the university’s main web page style. As we plan to move all the designs to a responsive, mobile-aware pattern, we aim to make use of a number of open source page layouts which have generously been made available by several designers.
Over the course of seven years, a number of additional features have been added to the TODO list. At the top are the implementation of DOIs and the provision of EPUBs for whole issues. Both are in train, the former awaiting agreement with the editors, the second dependent on a more scarce resource: time.
Digital Object Identifiers (DOIs) are unique numbers
assigned by a publisher under licence from an official
Registration Agency (RA) of the International DOI Foundation,
guaranteeing a permanently resolvable reference linking to the
original published location: conceptually they sit somewhere
permalink URI and an ISBN or ISSN.
Several RAs are available, such as CrossRef, and they sell
blocks of numbers at a rate per article or other digital
object. To supply feedback and contribute to their link
ranking, they also require that in each article published, any
bibliographic references which themselves have DOIs granted by
the same RA, must be clickable links; and they provide a
database of all their DOI’d articles for editors to check
(Simple Text Query). They also require indemnification against
legal challenges over copyright and plagiarism, and provide or
sell detection software for the purpose of checking articles
before publication. Implementing these is an additional
editorial task which must be agreed with all editors before
signing up for DOIs, as they require a significant amount of
time to undertake.
EPUB generation from rigorously-defined XML is complex but commonplace (we already provide this for the TEI project mentioned earlier). Generating EPUBs from the less well defined morass that is a Word document — even with named styles — is rather more onerous, and tests have shown that some remedial post-editing of obsolescent or experimental styles in some early articles will be needed to resolve some of the conflicts currently handled by unnecessarily complex XPath statements in the existing XSLT2 templates.
In the journals created to date, all but one have come
from the existing academic community (the exception is a
postgraduate journal of very short articles summarising each
authors' PhD research, and does not contribute to citation
count). The editors, who were in all cases the prime movers,
are in effect the
owners of their
journal. They were all very open to discussion of the various
alternative ways of undertaking publication, but they were very
conscious of the prevailing ethos of their disciplines (we have
mentioned some of these constraints already).
Paper-based publication is still the norm; web publication is seen as a supplementary mode of access;
Web texts needed to resemble, rather than differ from, paper formats (this refers to the actual article text itself; having it surrounded by other material such as menus and headers was perfectly acceptable);
Referencing methods in most disciplines required the existence of a page number: a URI and section number was not only unacceptable, but there was no consensus about how such pointers would be incorporated into the prevailing reference formats;
Authors were universally considered to be too scarce a resource to risk alienating by requiring anything other than Word documents. The need to use named styles was accepted, but has been a significant contributor to the support and editorial workload;
Most editors are in an environment where little time can be spared from teaching, research, and administration, and where departmental pressures (including small-p politics) can sometimes lead to the work of journal editing being seen as discardable. The administrative workload of styling, uploading, testing, and releasing had to be kept to an absolute minimum.
Despite these constraints, most editors strongly support the idea that the publication of articles in journals must move into an environment where formal (ie peer-reviewed) electronic publication is evaluated and valued on a par with paper publication. In particular, they were very receptive to the ideas of mobile-enablement, XHTML-first, cyclical or ongoing publication (as distinct from a strict periodic schedule), and non-traditional modes of presentation (eg graphic-novel styling, mixed-media, or reader-determinable). However, they felt that their readers would not all yet be ready for these as defaults.
Planning is nevertheless under way for an alternative, mobile-first HTML5 design, and the generation of eBooks for each issue. We hope to move these to production in 2016 along with the creation of DOIs.
The objective was to provide the editors with as simple an interface as possible, and this appears to have been achieved. It is far from elegant — we have no designers on the staff — but it performs the tasks required with the minimum effort.
The penalty for this to ourselves (IT support) is that far more time was spent on the initial discovery of the pitfalls of taking in user-formatted Word than was expected, but this has repaid itself in later years with a very small need for support to keep the system running — the unexpected language variants of footnote style names were the only significant change in the last year (2014).
The system depends on Cocoon, which in turn depends on
Tomcat and Apache; and the CGI script is written in
bash and uses Unix-type utilities, which
presupposes a GNU/Linux platform, but none of these is regarded
as a major limitation. However, Cocoon has undergone significant
changes recently, and no longer ships with even a
minimally-usable deployment ready to use. This means that
upgrading the current system will require a very large
commitment of time, as the new modular construction of Cocoon is
poorly documented and exampled, and is aimed at
applications very different from the
straightforward serving of XML as HTML (and building Cocoon uses
Ant, which is notoriously difficult to work with).
The temptation is strong to switch to OJS or a similar system, as they have matured considerably since 2006, but investigation of the resources required for setup, configuration, and ongoing maintenance indicates that this is a task on a par with building a new version of Cocoon, leaving the decision moot for the moment. OJS remains attractive because of maturity and strong support, but we remain concerned about the complexity of the interface, the relative inflexibility of the configuration, and the uneven control of formatting.
Currently it’s XML most of the way, and the discipline of adhering to that standard has been a benefit throughout. Whether it is a strong enough benefit to outweigh the perceptions of simplicity of competing systems is a decision we have yet to make.
 The PDF articles are generated (via XSLT2) with LaTeX, which uses the standard print conventions automatically. Using LaTeX also means that the page numbering can be captured and reused in the web Table of Contents, which is essential for users in the Humanities, where some bibliographic citation formats still require a compulsory page number even for electronic resources.
 In the case of MLA, APA, and some others, this has now changed.