Balisage logo

Proceedings

The Kiln XML Publishing Framework

Paul Caton

Research Analyst, King's Digital Laboratory

King's College, London

Miguel Vieira

Software Engineer, King's Digital Laboratory

King's College, London

XML In, Web Out: International Symposium on sub rosa XML
August 1, 2016

Copyright © 2016 by the authors. Used with permission.

How to cite this paper

Caton, Paul, and Miguel Vieira. “The Kiln XML Publishing Framework.” Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington, DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). DOI: 10.4242/BalisageVol18.Caton01.

Abstract

Kiln, previously known as xMod, is an open source multi-platform framework for building and deploying complex websites whose source content is primarily in TEI/XML. It brings together various independent software components into an integrated whole that provides the infrastructure and base functionality for such sites. Separation of roles is central to Kiln's design, allowing people with different backgrounds, knowledge and skills to work simultaneously on the same project without interfering with one another’s work. Developed and maintained at King’s College London it has been used to generate more than 50 websites for digital humanities research projects which have very different source materials and customised functionality.

Table of Contents

Overview
Origins of Kiln
Kiln - the principle components
Kiln - as the user sees it
Kiln's templating system
Kiln introspection
RDF / Linked Open Data
Future for Kiln

Overview

Kiln[1], previously known as xMod,[2] is an open source multi-platform framework for building and deploying complex websites whose source content is primarily in TEI/XML. It brings together various independent software components into an integrated whole that provides the infrastructure and base functionality for such sites. Kiln has two competing design goals: to support the development of unique, complex web applications; and to provide an out-of-the-box system suitable for a single non-technical person to publish a TEI-based site. The former requires the customisability of every default component and the flexibility to integrate external components as necessary; the latter requires a large amount of built-in behaviour that can be easily tweaked in isolation, and excellent documentation. Kiln’s documentation includes a tutorial showing how to customise each of the major elements of a site, as required beyond the provided defaults. Separation of roles is central to Kiln's design, allowing people with different backgrounds, knowledge, and skills to work simultaneously on the same project without interfering with one another’s work.

Kiln is the latest iteration of work begun by a team at the Centre for Computing in the Humanities (CCH) - later the Department of Digital Humanities (DDH) - at King’s College, London (KCL). Further development and maintenance of Kiln is now under the auspices of the recently-formed King's Digital Laboratory at KCL. Over the past years and over several versions, Kiln has been used to generate more than 50 websites for digital humanities research projects[3] which have very different source materials and customised functionality.

Origins of Kiln

Kiln originated at CCH around 2004 as a framework called xMod. CCH was collaborating with academic partners in text-based projects where primary sources were encoded using markup from the TEI Guidelines. From this work three things became clear:

  1. Even for relatively simple, straightforward digital resources academics needed to come to CCH/DDH to have the resource built.

  2. Multiple projects shared a core set of requirements.

  3. In most cases, as well as the core set of requirements projects also had their own very particular and often quite complex needs.

And from them arose corresponding needs:
  1. The need to give non-technical people a way to set up a basic digital resource that allowed web page display of XML-encoded source files.

  2. The need to set up for ourselves a quick, dependable, consistent way of getting the core requirements dealt with to maximize time available for the project-specific work.

  3. The need to be able to meet the particular requirements while still using the shared approach to basics.

Those needs pointed towards the best solution being a framework that enables a phased approach, where the initial phase involves quickly and easily setting up a basic digital resource which displays texts and offers basic browse and search functionality. Subsequent phases could involve any or all of the following: customizing the look and feel; expanding the browse and search capability; integrating with other things to create a larger whole (eg. having a CMS front end). This approach avoids making users choose between "simple" and "complex" versions of a framework.

Kiln - the principle components

Architecture: Figure 1 - Kiln Architecture

png image ../../../vol18/graphics/Caton01/Caton01-001.png

Kiln has been developed around the concept of the separation of roles, allowing people with different backgrounds, knowledge and skills to work simultaneously on the same project without overriding each other’s work. The parts of the system used by developers, designers and content editors are distinct; further, the use of a version control system makes it simpler and safer for multiple people with the same role to work independently and cooperatively.

Given the needs we described above, Apache Cocoon[4] is a natural choice to sit at the heart of Kiln because the Cocoon sitemap+pipeline system is very flexible and powerful. At the basic level it is easy to create default paths and behaviours which are available to users after a few simple steps, thus meeting needs (1) and (2). Then, if desired, we can set up processing sequences of increasing complexity and/or granularity which supplement the defaults rather than replacing them - thereby satisfying need (3).

The Solr search platform[5] is a good complement to Cocoon. At the basic level we can have simple indexing pipeline to provide a free text search facility (see next section, below). As our needs become more complex - when, for example, we might want to incorporate into the index data from non-primary sources such as authority files, bibliographies, and so on - we can use Cocoon's aggregation mechanism to bring the disparate sources together and channel them into a single indexing transformation. By using internal Cocoon URLs we can pre-process some of the secondary sources and channel the output into the aggregation. And because Solr has numerous faceting features built-in we can easily add faceted browsing functionality to the resource; indeed this step is so straightforward that even a site admin with relatively modest technical knowledge can implement it.

Earlier versions of Kiln - then named xMod - used XML databases such as eXist for storing and indexing structured textual data and queried the databases using XQuery requests. Perfomance issues with XML databases led to our adopting Solr and since then we have had no compelling use case for XQuery so it is not included in the 'off-the-shelf' Kiln package.

Kiln comes bundled with the Jetty web application server[6] thus allowing Kiln to be a completely standalone application (beyond the user having the Java language installed on their machine). For a larger-scale production environment it is also easy to install Kiln as a WAR in, for example, an Apache Tomcat[7] setup.

The last main component of Kiln is the Sesame RDF framework[8], about our use of which we say more below.

One small convenience which this component set offers is that the parts that a user with limited technical knowledge might want to tinker with - ie. Cocoon sitemaps; Solr schema; XSLT stylesheets - are all in XML and the user is at least likely to be familiar and comfortable with its syntax and rules (assuming they are also responsible for the TEI content files).

Kiln - as the user sees it

After dowloading it or cloning it from GitHub, the user can start Kiln from the command line with ./build.sh (there is a .bat version for Windows) to use the built-in Jetty server or alternatively can associate it with an existing Tomcat server. The default port is 9999 and on going there the user will see the default home page (Figure 2). Obviously this is intended to serve only as a place holder until the user 'finds their feet' and feels confident enough to begin shaping the site themselves. To that end the page offers suggestions about next steps and has a link to the online documentation which includes a tutorial that walks users through initial setup and common tasks.

Users can see a barebones display of their XML texts simply by adding them under ROOT/ as content/xml/tei/*.xml. The 'Texts' link which is already present in the navigation bar brings up an index list of files available, showing for each file some simple metadata extracted as part of the pipeline processing for the 'Texts' URL. This is default behaviour so the index list is always current without the user having to restart the server.

Welcome: Figure 2 - Kiln default home page

png image ../../../vol18/graphics/Caton01/Caton01-002.png

Kiln's default home page offers suggestions as to what the user should do next.

The most common need users have after being able to view their texts is for search capability. To enable this users go to the Admin page where a button lets the user run a Solr indexing process (Figure 3); when complete the user can perform simple text searches over the XML files.

Admin: Figure 3 - Kiln admin page

png image ../../../vol18/graphics/Caton01/Caton01-003.png

Kiln's admin page allows users to perform basic content processing operations.

Beyond this point a user wanting more advanced features such as faceted browsing will have to start editing application files. While the documentation provides guidance and the steps are not particularly complex, we do expect the user here to be at least comfortable with XML configuration files and XSLT stylesheets.

Kiln's templating system

Where TEI-encoded XML files constitute the main source content, an XSLT transformation remains the crucial gateway through which content must pass as it is fetched from the back end to be displayed on the front end. The approach we adopt to this part of the site workings is guided by the needs outline earlier. Ideally:

  • It should be clear to non-technical users how displays are assembled.

  • There are commonly required types of displays, so these should be 'pre-assembled' and offered by default.

  • We also want to be able to adapt/add to/replace defaults to provide project-specific displays.

In addition to the desiderata just listed, we know that very often the person with the skills to write templates that find and handle parts of the source XML is not the person with the skills to organise the output into a functional, ergonomic, and aesthetically pleasing display - so we want our approach to allow for that. As far as is possible the XSLT specialist and the UI/UX specialist should be able to do their respective work concurrently and independently. Kiln handles these concerns by using a distinctive XSLT-based templating system. To show how this system works we'll follow a request for an XML source file to display as an HTML page.

  • A request for texts/**.html goes to ROOT/sitemaps/main.xmap, is matched by a template.

  • That template:

    • Firstly, creates an aggregate, which includes this: <map:part label="tei" src="cocoon://internal/tei/preprocess/{1}.xml" />

      • That preprocess call runs the XML through two stylesheets under ROOT/kiln/stylesheets/tei with the aim of identifying some known potentially troublesome features in the source XML and 'preparing the ground' for the final display stylesheet to deal with them.

        • One aggregates the content of div elements that are linked via "next" and "prev" attributes. Supposing an input markup structure like so:

          <body> <div xml:id="div_1" next="#div_2"> <p>content of div 1</p> </div> <div xml:id="incidental"> <p>intervening unwanted content</p> </div> <div xml:id="div_2" prev="#div_1"> <p>content of div 2 that continues div 1</p> </div> </body>

          the output structure would be:

          <body> <div xml:id="div_1" next="#div_2"> <p>content of div 1</p> <anchor xml:id="div_2"/> <p>content of div 2</p> </div> <div xml:id="incidental"> <p>intervening unwanted content</p> </div> </body>

        • The other moves pagebreak markers that occur in certain structural contexts into a different structural context (to stop what should be a single block display being broken up); and adds some kiln-namespaced attributes:

          • To block level elements saying whether or not they contain only inline material.

          • To link elements saying whether or not they are nested inside another link element.

          These attributes allow for allocation of CSS class markers that will help adjust the display formatting according to context.

    • Secondly, runs a transform on the aggregate with this call: <map:transform src="cocoon://_internal/template/tei.xsl" />. The important thing here is that tei.xsl does not exist as an actual stylesheet. Instead, the cocoon URL pattern is matched in /ROOT/kiln/sitemaps/main.xmap as "_internal/template/**.xsl" by a template which:

      1. Looks for a template XML file which matches the wildcard value (in this case it would be "tei", so it looks for tei.xml).

      2. On the template XML file it runs: <map:transform type="xinclude"/> to grab anything referenced with an xinclude. So for example it acts on <xi:include href="base.xml"/> to bring in a template file which sets up the overall default HTML page framework.

      3. Then on the template XML file it runs: <map:transform src="../stylesheets/template/inherit-template.xsl" />; inherit-template.xsl is an actual stylesheet which creates what is effectively a virtual stylesheet as its output - and that output functions as 'tei.xsl', applies the templates defined within itself, and thereby completes the transformation call that originated with <map:transform src="cocoon://_internal/template/tei.xsl"/>.

  • Finally the output from 'tei.xsl' is serialized as HTML by a <map:serialize/> instruction. (Note that the default type of serializer is set to be HTML in sitemap.xmap, so no @type is specified.)

The processing sequence outlined above allows a designer to shape the structure of output web pages by putting HTML directly into the template XML files. They don't need to know anything about writing XSLT templates because they never need to edit an XSLT stylesheet. An inheritance system based on named blocks means all parts of a page can be customized and at different levels of granularity. Each template file has as its first element <kiln:parent>, with an XInclude child that brings in the base.xml template. This template is an hierarchical structure of named <kiln:block> elements which by itself supplies all the necessary elements of a web page. The idea is that the calling template (tei.xml in our example) declares named <kiln:block>s each of which overrides part-or-all of the equivalent <kiln:block> in base.xml. If the named <kiln:block> in the calling template has a <kiln:super>as its first child element, that imports all the content and functionality of the corresponding named block in the parent template. The user can then add elements according to what they wish to override from the parent. With this templating mechanism overriding can occur from a very granular level all the way up to the top-level <kiln:block name="html">. This means the user can easily create a page that looks different in almost every way from their other pages but that is still a regular page as far as the framework is concerned.

Kiln introspection

Another distinctive feature of Kiln is the ability to see the back end workings via the front end. Browser tools such as Firebug can give a lot of information about the current HTML page but do not usually reveal how the page got that way. Non-technical site owners normally have only a limited grasp of the back-end workings, and if the site is complex with multiple sitemap/pipeline files in play then even developers can have a tedious time identifying templates and stylesheets responsible for producing a particular page. As a development and debugging aid Kiln allows users to view relevant aspects of the processing mechanism via three different access routes:

  • Match for URL - in a search text field the user specifies a root-relative URL from the site - eg. text/myfile.html - and the search returns the associated sitemap template

  • Match by ID - the user is given a list of sitemap template identifier strings - eg. "local-tei-display-html" - and clicks on the name to see the template

  • Templates by file name - the user is given a list of XML template files - eg. tei.xml - and can click on a link to see (via view source) the relevant XSLT stylesheet

RDF / Linked Open Data

Kiln includes the Sesame [9] framework for processing and handling RDF data. The framework is composed of two web applications, a server web application (openrdf-sesame) to store, parse and infer over RDF data, and a client web application (openrdf-workbench) to make queries over the data.

Sesame is built into Kiln via a set of Cocoon pipelines. By default there are pipelines for generating and adding RDF to the Sesame store, and pipelines for making SPARQL queries, in the sitemap file ROOT/sitemaps/rdf.xmap, which makes use of the basic operations - to add, remove, and query the triple store - defined in the internal sitemap kiln/sitemaps/sesame.xmap. Because the RDF requirements are very distinct across different projects, the default XSLT for adding content to the triple store is basically a placeholder meant to be extended and customised as required. The Kiln tutorial[10] includes a sample XSLT for converting TEI documents into RDF statements, and that can be used as a guide for further customisation work.

The main reasons to include a RDF framework with Kiln were to promote the publishing of linked data and also to increase the interoperability between the projects implemented with Kiln.

Future for Kiln

When CCH staff first began to shape xMod, they did so to meet specific needs which they felt were not being met by any free, open-source XML-to-webpage application available at the time. For the technically competent willing to 'get their hands dirty' on the server side there was Apache's AxKit[11], but this mod_perl module was not designed with convenience for non-specialists in mind (see, for example, Eric Morgan's account of trying to use it in Morgan 2005.) The most similar framework in the digital humanities field - the California Digital Library's eXtensible Text Framework (XTF)[12] was equally in its infancy at the time. Other existing frameworks such as TUSTEP ("TUebingen System of Text Processing Programs")[13] were more specialist in focus - designed to help scholarly editors produce editions - and without the same concern for enabling a website from XML-encoded source files. Today the landscape is somewhat different, with more applications available that are designed and documented with the non-technical user in mind.[14]

Most of the development work that produced Kiln in its current form was done via grant funding that ended two years ago. However, KDL still allocates time to maintain and further develop Kiln. Plans for the future include the possibility of a replacement for Cocoon, due to the lack of active development in Cocoon and also the direction that Cocoon is currently heading - its build process has become a lot more complicated with the later versions and it would not be possible to package a default version to be used in Kiln without off-loading technical work to the users. One possible future step would involve using XProc to handle the XML pipeline operations currently performed by Cocoon, but this would require an extensive codebase change for which (in the immediate future at least) KDL does not have resources to spare.[15] Another thing we would like to explore is adding modular extensions that could be easily 'made live' by the user and that would orient the functionality towards a particular content type, for example source files encoded according to the EpiDoc[16] guidelines. Tighter integration out-of-the-box with CMS frameworks is also a desideratum, as most project websites involve information pages, image galleries, etc. that are often more conveniently handled by such frameworks. Whatever is to come in the future, Kiln remains the most important tool that KDL uses to build XML-based online resources.

References

[Morgan 2005] Morgan, Eric Leese. "Creating and managing XML with open-source software" Library Hi Tech, Vol. 23 Iss: 4, pp.526 - 540. doi:10.1108/07378830510636328



[2] The name change reflected a major rewrite of the code. 'Kiln" was chosen to call to mind a container into which 'raw' source materials go and from which, after processing, 'finished' materials emerge.

[3] A list of some of these projects is available at http://kiln.readthedocs.org/en/latest/projects.html. Note that due to the nature of humanities grant funding a majority of these project sites remain as they were at point of launch. KDL can undertake to keep sites running for an agreed period (usually five years from the end of thew funding period) and fix bugs if caused by system updates/upgrades, but upgrading a project site to use a later version of Kiln usually depends upon the project partners acquiring extra funding.

[15] Kiln is, however, open source and interested parties are welcome to contribute to the codebase.

Author's keywords for this paper: Cocoon pipelines; templating; XSLT

Paul Caton

Research Analyst, King's Digital Laboratory

King's College, London

Paul Caton has worked in digital humanities since for two decades. Beginning as Electronic Publications Editor for the Women Writers Project he went on to hold posts with the TEXTE Project at the National University of Ireland, Galway and with the INKE Project at the University of Victoria in British Columbia before going to the Centre for Computing in the Humanities at King's College, London in 2010. Now a Research Analyst in the recently-formed King's Digital Laboratory he works on multiple projects in both analytical and development roles. His research interests include the representation of text by formal models and by markup languages; ontologies of personal relations; and models of transcription.

Miguel Vieira

Software Engineer, King's Digital Laboratory

King's College, London

Miguel Vieira is Kiln project manager and one of the developers. He has worked in the digital humanities area as a developer/software engineer for more than ten years. He is currently a software engineer/technical coordinator at the recently-formed King's Digital Laboratory, where he is reponsible for the research projects technical architecture, and managing the development team. His research interests include analysing and modelling humanities and unstructured data, natural language processing, machine learning, data visualisation, and linked data.