Web-Based Job Aids Using RESTXQ and JavaScript for Authors of XML Documents
Balisage: The Markup Conference 2016
August 2 - 5, 2016
Using XML as the source format for authoring technical publications creates
opportunities to develop tools that provide analysis, author guidance, and visualization.
This case study describes two web applications that take advantage of the XML source format
of documents. The applications provide a browser-based tool for technical writers and
editors in a 100-person documentation department of a software company. Compared to desktop
tools, the web applications are more convenient for users and less affected by
hard-to-predict inconsistencies among users' computers. One application analyzes file
dependencies and produces custom reports that facilitate reorganizing files. The other helps
authors visualize their network of topics in their documentation sets.
Both applications rely on the XQuery language and its RESTXQ web API. The visualization
application also uses JavaScript, including the powerful jQuery and D3 libraries. After
discussing what the applications do and why, this paper describes some architectural
highlights, including how the different technologies fit together and exchange data.
Amanda
Galtman
Amanda Galtman develops XML markup designs, quality-checking tools, and other
XML-based tools for MathWorks software documentation. She integrated XQuery and RESTXQ
into the Documentation Department's tool chain. She was previously a technical writer at
MathWorks.
Documentation Tool Developer
MathWorks
Amanda.Galtman@mathworks.com
Copyright © 2016 The MathWorks, Inc.
XQuery
RESTXQ
topic-based authoring
topic network
authoring job aids
Making Life Easier for Authors
Publishing organizations have known for years that structured markup languages can
facilitate processing of documents into final deliverables. Historically, a big challenge has
been making the markup palatable to authors. In two specific browser-based applications that
take advantage of the XML format, authors have come to see XML as an ally, and not just a
necessary evil.
The applications analyze dependencies among files and topics in potentially large sets of
documentation topics, and they deliver relevant information to authors using a web browser.
The technologies behind the applications are XQuery [], its RESTXQ
web API [], [], JavaScript, and popular
JavaScript libraries. The applications described here are used in a 100-person documentation
department and are not publicly available. However, the underlying technologies are available
to you. This paper shows you enough about possibilities and techniques to inspire you to
create your own web applications that combine powerful XML data processing with user-friendly,
robust, and potentially dynamic interfaces.
Application 1: Reporting on File Reorganization
The first application helps authors move XML files and associated graphics files within a
file system, without breaking dependencies and without violating business rules of the
company. It is more cost-effective and less frustrating to make the right set of changes once
than to iteratively break and fix things. In 2015, a change in business rules caused the
documentation department to move several hundred files that were connected to hundreds of
other files through file references. The volume of work and the potential complexity of
dependencies meant that having authors manually analyze dependencies would be inefficient and
error-prone.
At the other extreme, designing and implementing a push-button tool that analyzed
dependencies, moved files, and interacted with the software configuration management system
would have required more tool development time than was available. To address the core need
within a reasonable time period, we chose the middle ground and developed a web application
that analyzes dependencies and provides guidance. One manager said that her group found the
guidance helpful: "Quite a few people were a bit concerned about this work but, once they
followed the instructions, they got it done very quickly." Over 90% of the file moves were
completed in less than a year.
Web Form and Step-by-Step Procedure
To use this application, authors submit a form in a web browser that indicates the file
to move, the new location, and which of two computing clusters they use. In the browser, the
application displays a report describing a procedure for moving the file and addressing all
its dependencies. The following sample report indicates in step 4 that moving the XML file
necessitates moving its referenced graphics files to comply with business rules. In step 5,
the report indicates which references to the relocated graphics files also need to be
updated in other XML source files.
The web interface was more appealing than alternatives for two reasons:
Convenience for authors. They do not need to install anything new or look up a
command-line syntax.
Timing. When we developed this application, the department was planning to switch
XML editors. A tool that ran only in the old editor would have become obsolete quickly.
By contrast, a web application is independent of the XML editor.
The next three sections discuss the technical details of how RESTXQ turns XQuery
functions into web applications, ways to separate analysis code from formatting, and unit
testing techniques.
RESTXQ Usage in XQuery Module
The file reorganization application is a simple web application, with a single form that
leads to a static report. This application was a good way to start exploring the RESTXQ web
API, building on our group's prior experience with XQuery. RESTXQ is a good fit for the
technical requirements of this application, providing a way to make a web browser call
XQuery code when the author accesses a URL associated with the code. The XQuery code can
access XML files in a file system, or in a database compatible with the RESTXQ
implementation. In this case, there was already a BaseX database associated with the XML
source files.
Using the BaseX [] engine for the XQuery language, we developed a
single module file that defines a series of XQuery functions. Here are how the functions
correspond to the author's actions in the browser:
The author points the web browser at a specific URL that ends with
move-xml
. A RESTXQ annotation, %rest:path("move-xml")
, in
the declaration of a particular XQuery function in the module tells BaseX to invoke that
function []. The job of that function is to return HTML markup
so the browser can render it. As a result, the browser renders the form.
The author submits the form, causing BaseX to invoke a different XQuery function in
the module. This function returns the HTML markup for the report, including text content
tailored to the particular file that the author wants to move. In the course of
computing the correct HTML markup, the function calls other functions, as described
next.
We use the same BaseX database to do bulk reporting on file dependencies, chiefly to
help identify files that are shared or unused. Instead of using a web application for the
task, the bulk reporting uses cron jobs that create Excel reports. We chose this approach
because analyzing the whole file set is time-consuming and because the requirements for bulk
reporting do not make a compelling argument for a web application.
Separation of Analysis and Formatting
If you work with XML documents or HTML5, you are familiar with the idea of separating
semantics or content from presentation or styling. By design, the XQuery module for this
application mostly separates the data analysis, such as looking for file dependencies in the
XML documents, from the formation of HTML markup for the browser. Separating these concerns
makes it easier to write unit tests for the analysis code and easier to change the
appearance or organization of the HTML report.
For example, one step in the report instructs authors to update XInclude references from
the XML file they are moving to other XML files. Behind this step are these XQuery
functions:
filemove:xincludes-within-file
— Finds the XInclude references and
returns their href
attribute values as a sequence of strings.
filemove:html-xincludes-within-file
— Takes the sequence of strings as
an input argument, and returns a sequence of HTML <p>
and
<ol>
elements that the author ultimately sees in the browser. The
prefix html-
in the function name is a naming convention throughout this
XQuery module.
The unit test module has these corresponding test functions:
test:xincludes-within-file
— Tests that
filemove:xincludes-within-file
identifies the correct XInclude references
and returns the results in the expected data structure.
test:html-xincludes-within-file
— Tests that
filemove:html-xincludes-within-file
produces the expected HTML markup for
different input patterns, including the case where there are no XInclude references to
report.
Alternate Approach Using XSLT
An alternate way to achieve the desired separation would be to make the XQuery module
produce not HTML markup, but rather intermediate XML markup that links to an associated
XSLT stylesheet. The author's web browser would receive the XML markup, apply the XSLT
stylesheet to it, and render the resulting HTML markup. This approach would include
additional components in the implementation: a design for the intermediate XML markup, the
XSLT stylesheet, and potentially an XSpec unit test file for the XSLT. Analysis functions
in the XQuery module would return data as XML markup fragments instead of as sequences or
maps.
Unit Testing for Analysis Functions
Multiple unit testing frameworks exist for XQuery, some of which are vendor-specific
[][][]. Because we use BaseX to develop and deploy the web
application, we use the BaseX unit testing module for testing []. We have unit tests for all the analysis functions and most other functions. The unit
tests reside in a separate XQuery module that imports the main XQuery module for the file
reorganization application.
For some functions, unit testing is straightforward; you call the function with
representative but simple input arguments, and use unit:assert()
or
unit:assert-equals()
to check that the function returns an expected
result.
For other functions, we use a set of XML files with known dependencies on other XML
files to be able to check that the analysis functions find the correct set of dependencies.
Basing tests on production XML documents would make the tests unstable because the XML
documents change over time. Instead, we use a small set of test documents. The test module
includes a setup function that creates an XQuery database with the test documents. In the
declaration of this setup function:
The %unit:before-module
annotation makes BaseX call this function
before executing the test cases in the module.
The %updating
annotation indicates that the function updates a database
[]. In this case, the function creates a database
using the BaseX function, db:create
[].
After the setup function executes, unit tests can use the testing database when calling
the functions under test.
Application 2: Visualizing a Network of Linked Topics
The second application for XML authors helps them visualize the topic network in their
DocBook documentation sets. A topic network is a collection of HTML topics and the hyperlinks
that interconnect those topics. Inter-topic links make it easy for customers to consume a
collection of related HTML topics. In Every Page Is Page
One, Mark Baker uses the phrase link richly to describe one
of seven fundamental characteristics of effective web topics [].
Given the importance of links, we wanted a way to visualize the links among topics in a
way that would help authors gain insight into the topic network they have created. We already
have ways to create links, modify links, and navigate via links, but nothing visual. This
diagram uses boxes to represent topics and arrows to represent the links. Solid arrows link
among the topics shown, while dashed arrows link to topics that are not shown.
One challenge is that links and topics in realistic topic networks are too numerous to
represent using typical directed-graph diagrams. We are only in the early stages of designing
effective, readable visualizations of topic networks. At the same time, we have learned useful
technical points that are worth sharing here. We describe a way to combine XQuery with
JavaScript in a web application that graphically depicts characteristics of XML documents and
supports iterative exploration.
Small Multiples and Iterative Behavior
The current prototype of this application uses the concept of small
multiples [] to depict author-coded links among topics
as geometric icons. Icons in the application vary in their characteristics: shape, color,
and fill. The icon characteristics depend on:
The information type of the link target
Whether the link goes from one set of product documentation to another
Whether the link is located in the topic body or a designated XML link container,
typically labeled See Also
in our HTML deliverables
The following legend shows the combinations of icon characteristics that can occur and
what they mean.
For example, the following row of icons shows that authors coded 25 links in a certain
topic.
The fill, shape, and color of the icons indicate that, of those 25 links:
Seven are located in the See Also section and point to reference topics in the same
product.
Two are located in the See Also section and point to task topics in the same
product.
Twelve are located in the topic body and point to reference topics in the same
product.
One is located in the topic body and points to a task topic in the same
product.
Three are located in the topic body and point to reference topics in other
products.
The idea is that in the web application, authors can start with a topic of interest,
click Submit to display the set of icons that depict topic links, and use the depiction to
learn about topic connectivity in the network of topics. They can hover over any icon to see
a tooltip that indicates which topic the icon represents. They can click an icon to display
the corresponding set of icons for that topic, as a new row at the bottom. By iterating,
they can follow an end user's possible clickstream in the HTML deliverables or cover a
desired subset of their topics. Using their knowledge of the subject matter and relative
importance of each topic, they can assess whether the connectivity characteristics of a
given topic are appropriate. After iterating a few times, the results might look like
this:
This application traverses a large set of XML documents at the back end, while
supporting graphics and dynamic behavior at the front end. Using a web-based front end lets
us take advantage of third-party JavaScript libraries.
Boundaries Between XQuery and JavaScript
The application relies on XQuery to find links in the XML documentation and identify
their characteristics. We integrated the D3 library, a popular JavaScript library for
visualizing data and manipulating documents based on data [], with
XQuery in a single web application. We also simplified the code using the jQuery JavaScript
library []. In this application, JavaScript is the main driver, and
it interacts with XQuery at these discrete points:
When the author submits a form to indicate a starting topic of interest, the
jQuery-based callback function calls XQuery to gather data about links to that
topic.
After XQuery gathers data from the XML documents, JavaScript depicts the data
graphically using the capabilities of D3.
When the author clicks an icon that has a JavaScript callback function, XQuery
gathers new data about links. The callback for the clickable icons is similar to the
callback for the submission button in the form.
This diagram summarizes how JavaScript, XQuery, and the author's mouse operations
transfer control back and forth in this application.
Calling from JavaScript to XQuery
One way to call XQuery from JavaScript is to use the AJAX .load()
method in
jQuery []. When this method accesses a URL associated with a RESTXQ
path, the XQuery engine invokes the XQuery function that maps to that URL. The XQuery
function returns markup. The .load()
method regains control and inserts the
returned markup into the page.
For example, suppose the HTML page contains the following markup.
<div class="xmlcontainer">
<!–- XQuery will populate this part of the page. -->
</div>
The following JavaScript code selects the <div>
element using a jQuery
selector, and loads a URL.
$(".xmlcontainer")
.load("my-restxq-path/mydocID/mytargetID");
Through the RESTXQ path, "my-restxq-path/mydocID/mytargetID"
, the
.load()
method calls the XQuery function associated with that path. The
XQuery module contains a function similar to the next excerpt, receiving two function
arguments via the {$docID}
and {$targetID}
path templates in the
%rest:path
annotation.
declare
%rest:path("my-restxq-path/{$docID}/{$targetID}")
function linkvis:linkinfo(
$docID as xs:string,
$targetID as xs:string)
as element()
{
(: Body of XQuery function goes here :)
};
When JavaScript regains control, it inserts the return value of the XQuery function into
the HTML <div>
element.
If further JavaScript processing is necessary after the XQuery function has finished,
the extra JavaScript code can go in a callback function that becomes the third argument of
.load()
. The callback executes when the AJAX request completes.
$(".xmlcontainer")
.load("my-restxq-path/mydocID/mytargetID", {}, function() {
// JavaScript code to execute after XQuery function finishes
});
Passing Data from XQuery to JavaScript
In this case, the return value of the XQuery function is not the
HTML markup for the graphics we want the application to display. It is the job of D3 to
create that HTML markup. XQuery gathers data about links in documents and passes the data
back to JavaScript, enabling D3 to create HTML markup based on the data.
Script as an XQuery Return Value
The JavaScript portion of the application needs variables that contain data computed
by XQuery. To define these variables, the XQuery function invoked from JavaScript returns
a <script>
element. This element contains inline JavaScript code that
defines the desired variables as shown here. (The application uses additional properties,
but for simplicity, this code shows only three properties per array object.)
<script>
var mydata =
[
{
"docID":"import_export",
"targetID":"buh9wft",
"topictitle":"Getting Started with Datastore"
},
{
"docID":"import_export",
"targetID":"bhg4l7w-1",
"topictitle":"Getting Started with MapReduce"
}
];
</script>
XQuery code to produce this return value takes this form:
<script>{
"var mydata = " ||
serialize(arguments for serializing data...)
|| ";"
}</script>
The following larger XQuery fragment puts <script>
in the context of
the XQuery function that the JavaScript .load()
method invokes.
declare
%rest:path("my-restxq-path/{$docID}/{$targetID}")
function linkvis:linkinfo(
$docID as xs:string,
$targetID as xs:string)
as element()
{
(: XQuery computations go here. :)
return
<script>{
"var mydata = " ||
serialize(arguments for serializing data...)
|| ";"
}</script>
};
Constructing JavaScript code by concatenating XQuery strings can be inconvenient and
error-prone. We minimized the amount of code we constructed in this way.
JSON Serialization
We used serialize(...)
in the last program listing. JSON is a convenient
notation for JavaScript to receive, and XQuery can produce data in JSON format. The
W3C XPath and XQuery Functions and Operators 3.1 candidate
recommendation outlines how to serialize data as JSON [].
Although this recommendation is not final, some XQuery engine vendors support JSON as the
candidate recommendation describes. In BaseX, XQuery code such as the following produces a
JSON array like the earlier sample.
serialize(
<json type="array">{
for $thislink in $linkseq (: Define $linkseq elsewhere :)
return
<_ type="object">
<docID>{ (: Query logic, such as $thislink/@docID :) }</docID>
<targetID>{ (: Query logic... :) }</targetID>
<topictitle>{ (: Query logic... :) }</topictitle>
</_>
}</json>,
<output:serialization-parameters
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
<output:method value="json"/>
</output:serialization-parameters>
)
Data Processing After JavaScript Regains Control
The processing described so far works as follows:
The JavaScript .load()
method loads a RESTXQ URL, which transfers
control to an XQuery function.
The XQuery function computes data and returns a <script>
element
whose inline JavaScript code defines a variable that stores the data.
JavaScript regains control and executes the code inside
<script>
.
JavaScript invokes the callback function specified as the third argument in
.load()
. The callback function has access to the variable,
mydata
.
Going one step further, the callback function can use the D3 .data()
method to bind the data in the mydata
variable to graphical objects. Here is
a JavaScript code fragment that creates an SVG section of the page and inserts an array of
<g>
grouping elements, each of which is associated with an array element
in the mydata
variable.
// Start a new row of data.
var svg = d3.select(".new").append("svg")
.attr("height",rowHeight)
.attr("width","100%");
var symdiv = svg.append("g").attr("class","symdiv");
// Insert <g> elements with data bound to them.
var symbolgroups = symdiv.selectAll("g").data(mydata).enter()
.append("g")
.attr("transform","translate(" + labelWidth + ",0)")
.attr("class","gsymbols")
.on("click",symbolclick);
// Further code inserts symbols, labels, and tooltips.
Challenges
While web APIs for XQuery are not that new, it took the tools group a while to be able to
use them. We needed to wait until the company's IT infrastructure supported Java servlets. The
alternative — running XQuery applications from an individual's local Tomcat server environment
— was not appealing because of the dependency on that person's machine.
An additional challenge has to do with individual skill sets. The documentation tool
developers in the group who use XQuery are not web technology experts, while the web
technology experts in the larger organization do not use XQuery. We are working on this
challenge through collaboration and learning.
Conclusion
Over the past several years, XQuery without the web interface has proven its usefulness to
documentation tool developers. We are just beginning to explore the benefits of its web
interface, and we have found it useful for applications such as the ones described in this
paper. We are considering the use of the XQuery Update Facility [] within web applications to modify XML documents, instead of
just reading and analyzing them.
At the same time, modern authoring tools for XML documents let authors execute XQuery code
directly within the authoring environment. Depending on requirements, that capability could be
more convenient than a browser-based application. We would do a case-by-case evaluation of
functional designs for a given project, and we appreciate having XQuery web applications as a
viable option.
Bibliography
BaseX. The XML Database.
Baker, Mark. Every Page Is Page One.
Laguna Hills, California: XML Press, 2013.
Bostock, Mike. D3.js - Data-Driven Documents.
Database Module - BaseX Documentation.
The jQuery Team. jQuery.
The jQuery Team. .load() | jQuery API Documentation.
Kay, Michael, ed. XPath and XQuery Functions
and Operators 3.1. W3C Candidate Recommendation 18 December 2014.
RESTXQ - BaseX Documentation.
http://docs.basex.org/wiki/RESTXQ.
Retter, Adam. RESTful XQuery.
XML
Prague 2012, Conference Proceedings, p. 91-124.
.
Retter, Adam and Christian Grün. RESTXQ
1.0: RESTful Annotations for XQuery.
Unofficial Draft 23 February 2015.
Robie, Jonathan et al, eds. XQuery 3.0:
An XML Query Language. W3C Recommendation 8 April 2014.
Snelson, John and Jim Melton. XQuery Update Facility 3.0. W3C Last Call Working Draft 19 February 2015.
Tufte, Edward. Envisioning
Information.
Cheshire, Connecticut: Graphics Press, 1990.
Unit Module - BaseX Documentation.
http://docs.basex.org/wiki/Unit_Module.
Vatsendvik, Knut. Unit Testing
Framework for XQuery.
https://blogs.oracle.com/knutvatsendvik/entry/unit_testing_framework_for_xquery
Whitby, Rob. xray.
https://github.com/robwhitby/xray
xquery-unit.
https://developer.marklogic.com/code/xquery-unit