functionaland declarative tree-transformational language XSLT, furnishes many advantages in both design efficiency, robustness and potential codebase reuse. Over the past 14 years the XSLT language has evolved through three major versions from an initial simple pattern-matching transformation mechanism to a complete (homoiconic) language with XML trees as the principal data type, functional properties and a large suite of suitable datatypes and extensive function libraries. Anyone working in document engineering using XML over that period would have tracked the changes in XSLT and often modified code to take advantage of the evolving functionality.
DDF) over many years (
dialects– standards, such as XSLT, SVG and parts of XSL-FO, and elements and attributive properties describing specialist layout and document-structure aspects.
parent-childrengeometric constraints, along with a system of single-assignment
presentational variables.
within-paragraphtext layout, or image size lookup, but these aren't germane to this paper – all main layout was processed through XSLT code.
declarative natureof the code. Specific examples include:
xsl:iterate
to define the operation of a large pagination
processor, rather than use of a small set of mutally recursive templates. (This will not
be described in detail in this paper, but involves some 15+ choices to be made for each
component to be allocated, and a couple of accumulatedvariables, such as
next-yposition and current floating and kept components.)
presentational variables(already layed-out sections of content that can be reused in a similar fashion to XSLT variables) as a
map{}
rather than a pair of stack-frametunnelled variables.
reversingautomatically some of these syntactic and semantic changes to accomodate running code that has migrated to XSLT 3.0 on a 2.0 platform, for which the original target was a browser-based Saxon-CE
backtrackingthrough the source document in common design problems. Some of those were defined at the XSLT instruction level, such as
xsl:stream
, xsl:iterate
and xsl:accumlator
;
others were declarations associated with streaming-related properties, such as
xsl:mode
; yet others were added to the XPath repertoire, such as
map{}
, and others to the function library (e.g.
snapshot()
).xsl:try/xsl:catch
)xsl:evaluate
).let $x := value return
....
).compile-time, i.e. static, evaluation of stylesheets, such as shadow attributes and static declarations.
||
' being a string concatentation operator. xsl:iterate
in the standard, it is noted that:"There are two main reasons for providing the xsl:iterate
instruction. One is that many XSLT users find writing recursive functions to be a difficult skill, and this construct promises to be easier to learn."
reasonablesemi-automatic transformer can act as a stimulus to both developers and implementors, providing a means to write test and exemplar programs within 3.0 that themselves can be tested before full XSLT 3.0 processor ipeentations are available.
parent-childrenlayouts, such as
all-children-in-a-circle, could be added merely by defining the XPath function required to work out the positions and place each component of a sequence – all encapsulation and child-recursive evaluation was handled in common. Similarly the addition of maps simplified a stack-frame implementation of single-assigment presentational variables.) As mentioned earlier, the author's variable document architecture was used principally to publish a set of instances of a variable data document bound to a sequence of data instances. As such it can be considered to be a server-side operation. In 2013 the release of Saxon-CE as a fully-compliant
$some.var
), then various transformations
can be performed using regular expression substitution, avoiding the need to carry out
complete XPath expression parsing and modification of parse trees – a relatively
expensive operation in a complete XSLT system.@xsl:use-when()
with
version conditions can be exceptionally helpful to accomodate such changes. (For example
the use of a higher-order architecture for layout processing described earlier, was
conditionally retained for true 3.0 conditions, leaving 2.0 parallel code for the 2.0
case.)xsl:try
), multi-thread declaration (xsl:fork
,
xsl:merge)
, detailed serialization control, and so forth.flatteningpackages into resultant non-packaged stylesheets.
xsl:try/catch
and
compile/run source line numbers will likely be highly errant.outsidethe stylesheet, such as unusual entry invocations (
initial-function
, initial-context-item
) are
ignored.by hand, by the reader.
loop:for
, loop:while
,
loop:do
, loop:last
loop:update
) can be added to an XSLT stylesheet to define iterative constructs,
which are then compiledby an XSLT transform (some 500 lines, of which about half is concerned with validation) into an equivalent using recursive named templates and parameters. As an example this re-working of a number-summing example from Michael Kay's XSLT Programmer's Reference:
function()
item is represented by two XSLT-defined functions (a zero-arity to
return the identifying element and the true-arity version to do the work) and a template that
matches the unique template reference element and invokes the workingfunction. Higher-order functions (e.g
foldl()
) are written to use this
route-via-templatetechnique, and a complete set of higher-order features has been implemented. Note that this approach requires all function items to be created in XSLT space – XPath local functions (
function($args) {....}
) are not supportable
directly. The library developed during the work, FXSL, is unparsed-text()
, etc.) and the additional functiontypes in XSLT3.0. This means that in general it will
reversalin certain conditions. Those I have attempted so to engineer include:
xsl:mode
streamable, but also gave a very useful means to declare the default behaviour of that mode, i.e. what should happen in the absence of any other template matching a given node in push mode. (The default behaviour of
text-onlywas too restrictive and many coding errors arose because of missing
identitytransforms for a named mode.)
built-inrules (text-only-copy) of XSLT 2.0. In practice with stylesheet importation this substitution needs to be placed below the lowest level of importation precedence as well, which suggests a specialist stylesheet, containing
xsl:apply-imports
(rare in
my experience) would complicate matters further. (Supporting the
@on-multiple-match
property is probably less useful and more dependent upon
the XSLT implementation being employed.)xsl:iterate
xsl:next-iteration
, xsl:break
and
xsl:on-completion
provide means to invoke controlled continuation, early
exit and completion postlude and tidying.xsl:iterate
with none of these directives behaves as
xsl:for-each
, save that the latter could execute for all selected nodes
in parallel.xsl:number
, but a simple example helps
to explain the equivalence.)ancestor::*[@expand-text]
defining the tag to a true value
(yes|1|true
):xsl:evaluate
xsl:evaluate
instruction that takes a string defining an
XPath expression, a possible context item, a set of possible parameters that can referred
to within the expression and some optional contextual information such as namespace
bindings. The error-free result is a sequence of items resulting from evaluating that
given expression on the context.saxon:evaluate()
can support much of the functionality, with
suitable mappings for parameters and minor modifcations of the presented XPath expression stringlast-ditchpossibility is to implement an XPath interpreter in XSLT 2.0. Using a full XPath parser, implemented in XPath 2.0 (see later) we can generate a full parse tree. A simple recursive interpreter matching the XPath grammar productions can generate a sequence of result items by working its way through the tree. Requests for built-in functions (e.g.
FunctionCall
name="sum"
) are mapped to calls to the appropriate built-in, having pre-evaluated
the argument subtrees. This mechanism can work, but is unsuprisingly let
let $var :=
expr0 return expr1
where
$var
is now in-scope in expr1
. (The
for
directive will be similar for singleton sequences, but let
preserves sequence values.). In simple cases, where the let
s are nested from
the outside, it may be possible to convert to an equivalent set of XSLT variables. For example:let treeinto a series of local bindings in the XSLT space. When we can place these variable bindings within an entirely local context (i.e. the sequence constructor of the
xsl:value-of
) then we
will not risk name clashes. When the let
is buried below other constructs
(e.g. a for
) then we cannot use this technique, as XSLT instructions cannot
appear within XPath expressions, and we must look to further expansion of the XPath tree
into the XSLT space. This is described further in map
map{}
is to associate a series
of keys to values that are sequences of items (i.e. item()*
). For example in
the author's work on document layout a tunnelled variable $lay:variables as
map(xs:string,item()*)
contained bindings to named sections of layed-out
components, that could be reused or examined (e.g.
$lay:variables('background')
might give all the items in the page
background).<binding/>
elements are reserved forms and clearly
distinguishable from normalcontent. On the left
$elements
.updating, whilst still preserving locality of scope. (As all variable bindings in XSLT/XPath are single-assignment with variable name overriding strictly constrained to
following-sibling::*/descendant-or-self::*
situations, any local addition to a map is effectively copy-and-addwhich
map-endmarker, it might be possible to represent singleton values of
map(*)
type within
such maps. $some.name(
)
by a call to the
(stylesheet-defined) function
map:get($some-name,
)
(item()*,xs:anyAtomicType)
from, the standard
map:get(map(*),xs:anyAtomicType)
and shamelessly exploits the fact that
in XSLT2.0 the namespace associated with maps is $some-name
for the given
name (hence preserving locality of scope) and then returns the appropriate subsequence of
the stack from immediately behind the binding element@length
property of the binding may appear redundant (search to
the next binding...) but the length of the sequence is known when the entry is created
and it makes retrieval of the key's associated value more efficient.map(*)
is) but a sequence. This means it cannot be treated as count()
,
is
and instance of
isMapItem()
which can differentiate between map entries and non-map
entries in such sequences. This is used in some of the processing of presentational
variables described in empty()
will be unable to distinguish between an
empty map and no map (i.e. exists($map) and map:size($map) = 0
would be
indistinguishable from empty($map)
). Equally well this scheme cannot
represent a sequence of maps. In the author's limited experience most
conventionaluse of maps rarely involves such existential manipulation of arbitrary maps. Perhaps mostly simply, this scheme only works for interpolations of keyed-values from maps.
xsl:variable name="
"
as="map(*)
→ xsl:variable
name="
" as="item()*
,
xsl:variable name="
"
as="element(map:binding)*
. Now the bindings contain an offset and length into the
stack. This approach can be more efficient in lookup, but is significantly more restricted
in the situations in which it can be deployed.map{ key : value,...}
)
:=
to
:
.map:entry(), map:merge()
) or XSLT
instructions (xsl:map
, xsl:map-entry
). How and to what extent,
these features might be simulated in 2.0 is discussed
later.array:get()
etc.). Restrictions on existentialmanipulation of these emulations are similar to those for
map()
. head()
and tail()
– reducing a sequence. This can of course be replaced by
a number of equivalent XPath forms, but the most coherent is to use equivalent
XSLT-defined functions:f:
is bound to some reserved namespace.innermost()
and outermost()
– producing lowest and highest ancestry nodes.
Again these are most simple supported with equivalent XSLT-defined functions:string-join($seq)
which defaults the second (joiner) argument to a
zero-length string. Again such invocations (if detected) can be linked to a simple
currying function: fn:
is bound to the normal XPath function namespace.||
(string concatenation)$a || $b
is equivalent to
fn:concat($a,$b)
. Thus if the associativity can be analysed such operations
can be replaced by calls to concat()
.!
(simple map)/
with sequences of nodes. Thus (1 to 5)! (. + 10)
is almost
equivalent to for $i in (1 to 5) return ($i + 10)
, though technically the
context focus for the right hand operand of !
is set to each of the left in
turn; this would have effect on context-defaultingfunctions such as
name()
which would need to have default arguments added. @static
property being supported on global variables and parameters. For
many of the situations envisaged, the 2.0 program will be projected from the master 3.0
version under conditions which xsl:evaluate
instruction and is discussed further
in quotingproblems can be tricky. However, a significant proportion of an XSLT program's functionality is described in XPath expressions, which at the XML syntax levels are merely string values of attributes. For some simple cases some dependencies can be analysed through clever use of regular expressions, an example being finding names of normal variable references when they don't appear within string literals. However in general a parse tree for XPath expressions is needed. That requires an extension library, but such things are possible, especially with tools such as Gunther Rademacher's REx
xsl:mode
and xsl:iteration
),
text value templates, an XPath construct (map{}
) and static evaluation and
conditional compilation of the XSLT program.xsl:mode
xsl:mode
instruction is relatively straightforward. A
template matching an xsl:mode
defining no-match behaviour has conditional
choices of content, derived effectively from XSLT3.0 Built-in Template
Rules, chosen dependent upon the @on-no-match
attribute value:X:
is bound to an aliased namespace that will become
xsl:
in the output. The priority is chosen, as remarked earlier, to be lower
than any priority found for that mode within the entire expanded stylesheet, but higher than
the (text-only-copy) default rules. (It should be possible to find a reasonable minimum
priority through min(xsl:template/@priority) - 1
.) Text-only-copy behaviour is
the built-in default, so a request for that requires no substituted templates. (More
strictly, these defaults should be placed in the xsl:iterate
xsl:iterate
instruction into the recursive template
declaration and xsl:call-template
instruction pair shown earlier is rather more
complex and needs to be a recursive process as of course the bodies of iterations can
themselves contain further iterations.xsl:iterate
instruction that must
be considered. Firstly, the only result(s) of the instruction arise from the evaluation of
the xsl:next-iteration
directives which potentially
modify transmitted (state) information through parameters – these will be converted into
suitably parameterised xsl:call-template
instructions. Thirdly, extraordinary
exit and postlude processing needs to be supported. Finally, the whole
xsl:iterate
instruction operates within a local scope within which variables
can exist and whose values can be interpolated. Any solution must preserve such bindings,
including local bindings within nested iteration constructs. xsl:iterate
. For each of these generate a uniquely named
template corresponding to each found iteration (which may be nested) as an additional
top-level declaration.ancestor-or-self::*/preceding-sibling::(xsl:variable|xsl:param)
(Global
variables can be determined separately and allocated to $global.variables
).
Then determine which variables are actually used in the parameter lists and body of the
iteration. As all references can only be through XPath expressions, these will
@select
or @test
or attibute and text value
templates ({...}
). These attributes and value templates can be found with a
suitable XPath lookup.$Qname...
') then we can find the names of referenced variables through
regular expressions: \$(\i\c*)
. We can then reduce the set of in-scope
variables to just those needed and use this set to both generate the list of parameters
to be added to the named template, and the parameter bindings to be set on the call@as
) on the generated parameters, which increases robustness and
possible performance.tunnelledto support pass-through access during recursion.
$iterate.sequence
)xsl:next-iteration
instructions which will have been transformed to
:xsl:for-each
to
support the default behaviour. In the absence of an on-completion
instruction, the choose can of course be replaced just by the for-each. xsl:break
instructions terminate the closest surrounding iteration
leaving a possible completion component. In this case the break is replaced by either
its sequence constructor or an interpolation of its @select
expression.@[xsl:]expand-text
attributes attached to ancestor elements. Processing these is straightforward, with a
pre-emptive template setting a boolean state, and text nodes which contain such text value
templates being processed by regular expression analysis:\$\i\c*\((\i\c*\))
can recognise such map interpolations.$some-name(
)
→
map:get($some-name,
)
. One implication
of this approach is that a map cannot be a valuein another map, i.e. we cannot support anonymous maps – they must all have (variable) names within XSLT/XPath scope.
map:entry(
)
,
map:merge(
)
and so forth.map:entry()
) then by providing suitable emulation functions in a
stylesheet-defined library, XPath expressions need not be touched. Some of the emulation
functions are as follows:map:merge()
just passes through). Using these functions as interfaces
decouples the exact details of the representation from the actual use of maps.map{ }
, we
need to modify the XPath parse tree. Ignoring tokens, the tree for map{ 1 :=
('fred',3+4), 2 := 'bert'}
is shown on the left:
|
|
map:entry()
and replace the MapExpr
element with
Expr
, we get the tree on the right that would have been parsed from
(map:entry(1,('fred',3+4)), map:entry(2,'bert'))
, which is the desired
representation for the initialised map. Back-conversion of the parse tree into text
correctly modifies the XPath expression. (If the representation of the map is something
other than a sequence of map:entry()
results, such as using
map-start...map-endmarkers, then surrounding this construct with a function call to
map:merge()
will be adequate.)@use-when
directive
or within shadow attributes), we need two activities: firstly to collect and detemine the
values of all the (global) static variables and secondly to evaluate the XPath expression of
the @use-when
or attribute value templates within shadow attributes, using
those bindings. It is possible for the values of static variables to be interpolated within
@use-when
directives (which may be attached to static variable declarations)
or shadow attributes, so these two processes must be handled concurrently.@select
attributes (i.e. XPath expressions, no XSLT instructions can
influence) and reference between static variables (which must all be top-level children of
stylesheets and have unique names) is only permitted in a reverse direction. Hence an
iteration across the top-level children of a stylesheet, evaluating any static variables
with possible variable bindings already accumulated into a map, evaluating the effect of
static variables on the top-level trees and determining @use-when
effects, will
produce the statically evaluated top-level stylesheet children:X:select()
converts atomic items into suitable XPath
expression values, such as surrounding xs:string
with quotes.) Toplevel
children are processed with matching templates for @use-when
and shadow attributesexpandedstylesheet importation/inclusion tree, but the situation for a single stylesheet is described for simplicity.
xsl:*/@*[starts-with(name(.),'_')]
and any value templates
processed using string analysis – a properly named attribute with value is then substituted.
All these actions can take place in an early 'static processing' phase (which is still
effectively in the 3.0 space) before subsequent code substitutions are made.staticinformation that will be true in the execution context of the transformed stylesheet. For example the built-in function
system-property(
)
can yield
information about the implementation, such as the version of XSLT supported. This is often
used within conditional code, such as use-when="system-property('xsl:version') =
'3.0'"
. In this case we anticipate that the transformed stylesheet will operate
under a regime where system-property('xsl:version')
has the value
'2.0'
, so if we replace such a function call with its expected value, the
conditionality can be projected during the transformation. programmer oraclepowers are useful.) In the more general and robust cases XPath expressions must be parsed fully to recover necessary information or generate correctly modified expressions.
conversion
map{}
) as described earlier, the following
template is sufficient: ||
)
where StringConcatExpr
→ FunctionCall name="concat"
.let
at an
outermost level , we permit adding those XSLT instructions into the parse tree, in this case
placing variable bindings as the sequence constructor of a suitably named XSLT
variable:@select
). When XSLT instructions are present the result will be a
mixed sequence, placed as the sequence constructor of the enclosing element, within which
string items (representing XPath expressions) will be placed in the @select
attributes of xsl:sequence
instructions, and the XSLT instructions will stand in place@select
or a child sequence constructor, such as
xsl:variable
or xsl:sort
. Other situations involving XPath
expressions such as xsl:if
test="
expr
"
, where any element
sequence constructors carry result trees, would require closely-preceding temporary
variables to be set to an appropriate value and referenced from the test. let
in the for-each selection means that the
variable assignment must be performed within the XSLT space in 2.0, whereas of course in 3.0
it is defined in XPath context. let
performs the too-big
element) will be evaluated for each selected item
collectively yielding the sequence value of the for-each construct.let
with a sequence of an XSLT
binding of the local variable (avg
), followed by a interpolation of that value
within the predicate. As this is encapsulated in a sequence constructor, name locality is
preserved. But of course this is not correctly executing XSLT - this new item should behave in
the for-each
. In the third tree we have
lifted this above the for-each
, binding its value to a unique XSLT variable
(which is typed item()*
) and interpolating its value in the for-each
selector. Now what we have is legal – there is no XSLT embedded within XPath. Thus flattening
the XPath trees back into carrying attributes (@select
, @test
) will
yield a correct XSLT2.0 construction.xsl:apply-templates
,
xsl:choose/xsl:when
, xsl:if
and attribute value templates. The
main test is the presence of a buried xsl:variable
within an XPath expression
tree after first stage modification.let
is buried within an XPath
for
expression effectively in the if
test as shown at stage 2
of ForExpr
) which is acting in the xsl:for-each
. ForExpr
from XPath to XSLT space, and as before binding it into a
unique variable, whose declaration is promoted in front of the surrounding use. The resulting
final XSLT is:xsl:include
and xsl:import
redirection instructions. For most of this work, these are
legalXSLT 2.0, then the bodies of such stylesheets are not processed, but passed through unchanged.
source codeusing the original XSLT 3.0 version of the layout library (comprising some 30 different files), and its 2.0 modified version. The document not only shows various forms of layout, for which their original support code requires both XSLT 3.0 (
xsl:iterate
, xsl:evaluate
) and XPath
3.0 (map{}
...) facilities, but also self-examines the library and reports on XSLT
versions and features found.@select
etc.) which contain one
or more of that type of operator.) Having processed the layout library through the XSLT 3 to
2 converter, we can again process the source document using the modified (XSLT2.0) library and
get the following output.inspectionelement reports that all stylesheets used were declared to be in XSLT2.0 and that no XSLT3.0 features were found in that set. The
extrafive stylesheets are accounted for by the support library, added by the converter, which provide various emulation functions and XPath parsing and evaluation support.
traffic lightslayout is very expensive indeed here, involving full parsing and emulation of document-borne XPath expressions entirely by XSLT2.0 code. (If a built-in evaluator is available, such as
saxon:evaluate()
, almost two orders of magnitude improvement
in speed can be expected.)oraclecontrol information, these tools can be used to automate considerable sections of such conversion. The remainder can be accomodated by alternative conditional code sections controlled by
@use-when
directives, which can be retained or removed by the converters during
static evaluation. (Recall that the converter knowsthat the target
xsl:version
will be 2.0
, so it can evaluate accordingly during
conversion.)self-processingexample discussed below.
self-processing, where innocuous
exclude-result-prefixes="xs
math"
declarations raised many problems.||
) reasonably heavily, mainly to
reduce clutter in forming expressions.{...}
) to reduce code size.map{}
to hold bindings of static variables to support processing of
static features such as @use-when
.xsl:mode
for every one of the half-dozen modes employed.xsl:iterate
to handle and track variable bindings, mostly for static
resolution and condiitonal compilation. (Interestingly, conversion of
xsl:iterate
itself does not at present involve use of iteration, though
perhaps the view of variable scoping is over-generous).xsl:evaluate
to evaluate static variables for substitution and
processing of @use-when
directives.concat()
is probably overdue – actually they
aren't used in very complex situations.) The use of map{}
is comparatively
simple, but does involve functional constructors. The most thorny issue is use of
xsl:evaluate
. In (commercial) Saxon 2.0 implementations there is a
saxon:evaluate()
function that can be be exploited, with suitable variable
bindings as discussed earlier. Unfortunately this isn't a option in the current open-source Saxon-CEsaxon:evaluate()
.xsl:namespace-alias
declaration helps solve this issue, in this case by
declaring the X:
prefixed namespace to map into the XSLT namespace in the final
result. However if the compiler is processing itself, then input elements marked with that
prefix need to be preserved. This requires a remapping and re-alisassing.
Convert3
is the converting stylesheet, then Hard pounding this, gentlemen.This activity exercises problems of quoting, namespace preservation and URI relativity more acutely than most other XSLT work.
unstreamablewould still operate. Simple packaging based on static analysis, flattening, visibility projection and overriding replacement is a distinct possibility: provided all stylesheet packages are available, the package processing is effectively a static operation
standardhigher-order functions have implementations within the library, and static definitions of function items can be determined from XPath parses, so the issue is whether these can be converted to appropriate
The XSLT Loop Compiler. [online] http://www2.informatik.hu-berlin.de/~obecker/XSLT/loop-compiler/
XQ2XML: XML syntaxes for XQuery. [online] http://monet.nag.co.uk/xq2xml/
Multi-user interaction using client-side XSLT. [online] XML Prague 2013 proceedings, pp1–22. http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf
A Framework for Structure, Layout & Function in Documents. Proceedings of the 2005 ACM symposium on Document engineering. doi:
Functional, Extensible, SVG-based variable documents. Proceedings of the 2013 ACM symposium on Document engineering, pp 131-140. doi:
Analysing XSLT Streamability. doi:
Documents as Functions. University of Nottingham, PhD Thesis. June 2012. [online] http://etheses.nottingham.ac.uk/2631/
Higher-Order Functional Programming with XSLT 2.0 and FXSL. Proceedings of Extreme Markup Languages, Montreal 2006. [online] http://conferences.idealliance.org/extreme/html/2006/Novatchev01/EML2006Novatchev01.html
REx Parser Generator. [online] http://www.bottlecaps.de/rex/
XML Path Language (XPath) 3.0. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-30/
XQuery and XPath Functions and Operators 3.0. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-functions-30/
XSL Transformations (XSLT) Version 2.0 (Second Edition). World Wide Web Consortium, 23 January 2007. [online] http://www.w3.org/TR/xslt20/
XSL Transformations (XSLT) Version 3.0. World Wide Web Consortium, 2 October 2014. [online] http://www.w3.org/TR/xslt-30/