In pursuit of streamable stylesheet functions in XSLT 3.0

Copyright © 2014 by the author. Used with permission.

expand Abstract

expand Abel Braaksma

Balisage logo

Proceedings

expand How to cite this paper

In pursuit of streamable stylesheet functions in XSLT 3.0

Balisage: The Markup Conference 2014
August 5 - 8, 2014

Notes

Updates

This paper discusses very recent changes to the XSLT 3.0 specification that are still under discussion in the related BugZilla bug entries. When the specification is updated, rules laid out in this paper need updating as well. I will publish those updates at http://exselt.net/papers in DocBook and PDF formats.

Prerequisites

To read and understand this paper, a basic understanding of XSLT 3.0 and streaming is desirable. For an introduction on streaming, you can refer to two earlier papers in this series, Braaksma 2014a, about the Ten rules of thumb of streaming and Braaksma 2014b, about Streaming XSLT design patterns.

Disclaimer

This paper relies on information that can be found in the public XSLT Last Call WD at the time of this writing, and in some of the public XSLT 3.0 BugZilla bugs (see W3C BugZilla). While I am an invited expert for the XSL Working Group, I do not speak at there behalf, and any thoughts I lay out in this paper are my own and are not necessarily the thoughts of the XSL Working Group.

Changes to the Last Call Working Draft will be discussed through the same BugZilla and will ultimately result in the publication of a new version of the specification in XSLT Latest Version. Where this paper refers to XPath, XPath functions and operators or the XDM, it uses XPath 3.0, XPath and XQuery F&O 3.0 and XQuery and XPath Data Model 3.0.

Since the XSLT 3.0 specification is not final yet, it is possible that syntax or semantics of constructs used in this paper change in the future or are dropped in their entirety. Large parts of this paper rely on discussions and conclusions of functionality reported in the public section of BugZilla. Where the text of this paper deviates from the public Working Draft and where these changes are publicly available through BugZilla, I will state so by using footnotes.

Introduction

Streamability analysis is a complex subject, yet the basics are, as is often the case, relatively trivial: just write your stylesheet in such a way that it only uses the child axis, plus an occasional escape to node properties such as attributes on the ancestor or self axes, and you have essentially written a streamable stylesheet.

But in practice, it is not so trivial. Rules in the specification are complex and even hard-core spec-readers have trouble following them. They are meant for implementors and not necessarily for programmers or users, which is why these rules have to take care of every possible corner case and yet make sure that for general use, the rules work with the least possible surprises.

This paper tries to fill in the gap between spec-prose and tutorial. While there currently are no tutorials on the subject of streamable stylesheet functions, this paper aims to explain the basics and several rather advanced concepts such as recursive streamable functions, in a way that it becomes understandable for a larger public.

A streamable stylesheet function is a function that can take a streamed node from a streaming input document, while still passing the tests for being guaranteed streamable. Being able to write streamable stylesheet functions is import for package designers, so that they can create functions that work alike for streaming and non-streaming scenarios.

This paper first introduces you to part of the terminology used in the specification and at several sections in this paper, see section “Streaming terminology”. It then explains typical use-cases for streamable functions in section “A case for streamable functions”, after which it explains the challenges we face when dealing with streamability analysis and streamable functions in section “The challenge of streamable stylesheet functions”.

Streamability of standard, non-recursive functions is explained in section “In pursuit of flexible rules for guaranteed streamable stylesheet functions” and section “Multiple arguments that can take nodes”; and the rather complex subject of streamability for recursive functions is dealt with in depth in section “Recursive streamable functions” and its sub-sections. Futher sections deal with packages and function inheritance with respect to streamability (see section “Non-final streamable functions in packages”) and a proposal for a general improvement to the analysis of the posture in section “Streaming terminology”. And in section “Miscelleneous” several related subjects are covered on inline functions, partial function application, named function refverences and dynamic function calls.

The current state of the specification is discussed in section “Streamability of stylesheet functions according to the specification” and the current state of processors with regard to streamability analysis of stylesheet functions in section “Status of current processors with respect to streamable stylesheet functions”.

Streaming terminology

This section will briefly discussed terminology used throughout this paper. This overview does not try to be complete.

  • Guaranteed streamable: determines whether a particular construct, instruction or declaration is streamable according to the streamability rules in section 19 of the XSLT 3.0 specification. If it is, any processor that is conformant with the streamability feature will be able to process your input using streaming, that is, by reading the input in forward-only mode without keeping the whole document in memory.

  • Posture: the posture of a construct determines the state of a streamed node in the output of that construct. Often, this will be the same as the input posture. It is used in streamability analysis to determine whether a construct returns streamed nodes and in what way. Many constructs are capable of returning streamed nodes, but if all rules are followed, the usages of these constructs are limited in such a way that they can only return a subset that is compatible with streaming. The different postures are:

    • Grounded: the construct does not return a reference to a streamed node. An expression or instruction operating on a grounded posture is allowed to be free-ranging. Examples: fn:currentDateTime, fn:copy-of, xsl:copy-of, a variable reference.

    • Climbing: the construct can return nodes that refer climb the tree through the ancestor or attribute axis. Examples: parent::foo, author/@name, ...

    • Striding: the construct can return non-overlapping nodes, typically only on the child axis. Examples: any child-select expression, any match pattern, fn:zero-or-one, fn:outermost.

    • Crawling: the construct can return overlapping nodes on a downward axis. Examples: foo/descendant-or-self::bar, ./b//c.

    • Roaming: the construct can return nodes that can be anywhere in the input tree. Examples: following::node(), ancestor::foo/bar, preceding-sibling::price.

  • Sweep: the sweep of a construct determines how the current read position of the input stream is changed as a result of executing the construct. The different sweeps are:

    • Motionless: the read position does not change. This is true for expressions that only operate on grounded nodes or data, and for expressions that request a property of a node or the ancestor axis[1]. Examples: fn:has-children(.), @name, if(./name()) then ..., ancestor-or-self::para/@xml:lang, any constant and any expression not operating on a streamed node.

    • Consuming: the read position changes in a forward-only direction. Any expression that requires read-ahead is considered a consuming expression, unless the current node is a childless node, such as text- and comment nodes. Consuming constructs form the heart of streamability analysis. Any construct may have at most one consuming construct. Examples: fn:count(foo), x/y, <xsl:value-of select="name" />.

    • Free-ranging: the construct cannot be evaluated by using forward-only movement of the read pointer of the input stream, for instance when evaluation of a filter expression requires look-ahead. Typically, a free-ranging construct is also roaming. Examples: foo[bar], parent::author/book, any xsl:sort on streamed nodes, @* | foo, if(a) then b else c, fn:reverse(index).

  • Usage: the usage determines what an operand of a construct does when it receives a streamed node as argument. The following usages are defined:

    • Inspection: if the argument is a streamed node, it will inspect the node, but not consume the node, resulting in a motionless sweep (assuming no other part of the expression or construct consumes the input). An inspection operand is the only operand that can take a climbing, motionless expression as its argument.

    • Absorption: if the argument is a streamed node, it will consume the node, resulting in a consuming sweep. Examples are arguments of fn:data, fn:string, constructor functions, xsl:value-of/@select, xsl:apply-templates/@select and text value and attribute value templates.

    • Transmission: if the argument is a streamed node, the result will also be a streamed node. Depending on the function or construct, this can result in any kind of sweep. Examples are arguments of fn:outermost, fn:remove, fn:subsequence, xsl:sequence/@select.

    • Navigation: if the argument is a streamed node, there is no guarantee that calling the construct will be streamable. Typically, a navigational operand will result in free-ranging and roaming streamability analysis results. Examples are arguments of xsl:call-template (the implicit context item, making it impossible to use this instruction with streamed nodes), first argument of fn:fold-left, last argument of fn:key, fn:innermost, any untyped argument to a stylesheet function not specifically marked streamable.

The combination of posture, sweep and usage determines whether a certain construct is guaranteed streamable or not. Constructs can be nested, and constructs have operands. What exactly constitutes a construct and operands is outside the scope of this paper[2], except for stylesheet functions, which will be explained in the following sections.

A case for streamable functions

Functions are a nice machinery in XSLT 2.0 that have surplaced most of the xsl:call-template scenarios from XSLT 1.0. The benefit of being able to create a function and call it directly inside an XPath expression or in a pattern filter has shown its use-cases. More often than not, functions tend to operate on atomic types, such as strings and integers, because the natural habitat for nodes processing is done declaratively by template matching. This is often called The XSLT way of doing things, see Welker 2008, Gerstbach 2006 and Lenz 2005, and even today, this is still often advocated on the XSL Mailing List[3] and on discussion fora such as StackOverflow[4] and Experts-Exchange.com. However, even advocates of using templates over (recursive) functions, such as Dimitre Novatchev, sometimes choose a function over a template based scenario because it is clearer or because it is simply a better tool for that particular job[5], or in the words of Jeni Tennison in Tennison 2001:

“If the result follows the structure of the source, then a push method is more natural — the source drives the process. If the result has a substantially different structure from the source, then a pull method is more natural — the result drives the process.”

It should be said: in XSLT 3.0, the natural way of doing things with nodes from an input tree is to use template declarations. This is no other than previous XSLT versions. Perhaps it is even stronger now that it is possible to apply templates on atomic values such as sequences of strings as well[6]. But this does not help with complex patterns, which can only be replaced by functions. Also, it does not help in scenarios where reusing a result in an expression is important, which is where functions come into play: they can be used inside expressions and call-template nor apply-template can do that (unless wrapped inside a function, that is).

The current working draft, which is in Last Call, does not have a provision for streamable functions. That is, it is impossible in any which way to pass a streamed node to a function. Dimitre Novatchev's example from the StackOverflow question would therefore not be possible in a streaming scenario. Now that more and more people have gotten used to write functions, and with the advent of (precompiled) XSLT packages containing lots and lots of library functions, it seems unfair that they cannot continue doing so in streamable scenarios, nor does it seem fair that library vendors will not have any means to create library packages with functions that are streaming-aware.

The XSL Working Group has considered this and as the solution of Bug 25679 shows, the proposal was considered and adopted into the internal working draft. The following sections will explain my personal analysis on the situation and section “Streamability of stylesheet functions according to the specification” will summarize which parts of this analysis match the specification. Since the discussion on this bug report is not final yet, anything in that section is very preliminary and may change without prior notice.

Stylesheet functions in streaming scenarios before they were allowed to be streamable

In the current XSL Transformations 3.0 Last Call Working Draft, calls on stylesheet functions are analysed based on the declared type of the arguments. If an argument is typed as an atomizing type, such as xs:string or xs:integer, the usage of that argument will be absorption, in other words, it will absorb a streamed node, which in most cases results in a consuming sweep. The result of such function calls is defined as grounded. If the argument is untyped or is a type that can take nodes, it is disallowed to pass a streamed node to the function call, because there is no way of knowing statically what will happen with that node inside the function. I call this the safe bet: disallowing streamed nodes as arguments makes streamability analysis easier. Example:

Code listing: calculate square

<xsl:function name="f:square">
    <xsl:param name="i" as="xs:integer" />
    <xsl:value-of select="$i * $i" />
</xsl:function>

<xsl:template match="number">
    <xsl:sequence select="f:square(.)" />
</xsl:template>

In this example, the function f:square takes an argument $i that is of type xs:integer. Upon calling the function, the processor will have to atomize the value or node. To do so, it must process all its children, which is why such a usage is considered consuming. In this case, the element number will be atomized.

This is a typical way of writing functions and it works perfectly well with streaming. However, if you want the argument to be a node and get some properties of a node or process the node's children, it gets trickier, since passing a node to a function is disallowed with streaming. Consider the following:

Code listing: get attribute of ancestor

<xsl:function name="f:hasname">
    <xsl:param name="n" as="item()" />
    <xsl:sequence select="
        if($n/ancestor::node()[@name]) 
        then true() 
        else false()" />
</xsl:function>

<xsl:template match="author[f:hasname(.)]">
    <xsl:text>Has name attribute in ancestor!</xsl:text>
</xsl:template>

In this example, the call to the function is not streamable. We pass on a node to the function and the rules in the current Working Draft state that it has usage navigation[7], which has the effect that the whole expression, in this case the match pattern author[f:hasname(.)], will be roaming and free-ranging, in other words, it is not guaranteed streamable.

It is possible to call the function in some situations, by using the fn:copy-of or fn:snapshot functions. But that means that users of your function must have enough knowledge to know what atomizing or grounding function to use before passing on an argument. In this particular case, it would make little sense to use either of these functions, because a match pattern must be motionless, so it is not possible to use the function in a match pattern predicate expression. This is essentially true for any function that does not consume the input tree. With the current rules in place, even if a function does not consume the input tree, you still need to create a copy of the tree to call that function. Hardly efficient and in many cases impossible with streaming, because the input node may simply be too big to be copied.

Were you to use it in another context, you still have to choose between fn:copy-of and fn:snapshot. This function would not work with fn:copy-of, because that function does not copy the ancestor axis. In other words, you must know the internals of the function to find out that you need to call it with a call to fn:snapshot. And even then, if the function requires information from the original document, such as the base-uri or other properties that are not copied with either of those functions, you are out of luck.

In essence, these rules disallow stylesheet authors to write a function that take nodes as arguments, and in the case of library package authors, they will not be able to write functions that take nodes that work the same way in a streaming scenario and a non-streaming one. They would have to educate library function users how to use their functions in a streaming scenario and many functions will never work in streaming scenarios because their usage would always be consuming even though the function body does not actually consume the input node. In other words, it makes writing functions for use in streamable stylesheets next to useless.

The challenge of streamable stylesheet functions

It is often trivial to see at a glance that a function is motionless and that it would not hurt using it on a streamed input node. Take, for instance, Code listing: get attribute of ancestor from the previous section. It only uses the ancestor axis and tests whether one exists with an attribute @name. Since walking the ancestor axis is motionless (the processor is required to keep a stack of the ancestor nodes and their properties) and the attribute axis is too, any call on that function would be motionless with respect to its argument.

This shows us one thing: it is possible to write functions that can be statically analyzed to be motionless. But this function has special properties: the body of the function is motionless and the result of the function is grounded. That means, it can never return any nodes, it can either return true() or false(). But can we always statically determine that that is the case? Let us write a slightly different function, this time we return the attribute node, instead of testing for it:

Code listing: return attribute node

<xsl:function name="f:hasname">
   <xsl:param name="n" as="item()" />
   <xsl:sequence select="$n/ancestor::node()/@name" />
</xsl:function>

<!-- guaranteed streamable: -->
<xsl:template match="author[f:hasname(.)]">
   <xsl:text>Has name attribute in ancestor!</xsl:text>
</xsl:template>

<!-- potentially guaranteed streamable: -->
<xsl:template match="*">
   <xsl:apply-templates select="author/f:hasname(.)" />
</xsl:template>

The function still takes a node as its argument, but this time it returns a reference to an attribute of the ancestor axis of that node. The attribute axis has climbing posture, which is limited in that you cannot navigate downwards again from it. In the first usage of this function, in the predicate of the pattern, the input is the current node, but a predicate only has to be true (nodes are there) or false (no nodes). After that, there is no navigation away from the climbing posture, so this could be considered streamable.

If we take a look at the second example, we see there an xsl:apply-templates. By definition, assuming the rest of your stylesheet is guaranteed streamable, any apply-templates will be an atomizing construct: templates must be grounded and therefore, passing on nodes to a template will atomize those nodes. Just as we saw before, atomizing means that the node is consumed. In streamability terms this means it has usage absorption. But to consume a node, its children must be visited, which is a downward movement. The argument has a climbing posture because it ends with f:hasname(.), which we know returns attribute nodes, which are climbing.

There is a caveat, however: if the climbing expression returns childless nodes, consuming such a node will not harm the streaming process: there are no children to consume. The General Streamability Rules[8] in the specification have a special provision for this: childless nodes with a climbing posture are allowed in an absorption context.

So it can be made streamable, right? Wrong! For a construct to be streamable it must be statically determinable to be so. Here, we have a function with an unspecified return type. If we had written as="attribute()", the processor would be able to detemine that the returned nodes would always be childless and it could be used in an absorption context.

But this is only one example. What happens when the stylesheet function returns a climbing node and we pass it a crawling expression? What happens if the function returns crawling (overlapping) nodes and the input is climbing? And how about recursive functions, or within packages, with abstract or overriden functions?

Let us look at some possible approaches to tackle these issues.

Approach 1: analyzing stylesheet functions statically on their own

Arguably the easiest approach from both the user's point of view and for implementors alike is to try to find a method to analyze the streamability of stylesheet functions statically, without taking into account any possible context the stylesheet function is called in.

Suppose you have a stylesheet function my:for-each which traverses all nodes in a node set and returns a certain aggregate result. By its definition, traversing the node set, it will consume the input tree, just like xsl:for-each would. Writing such a function that it works equally well for different kinds of input postures is hard, perhaps even impossible. A function call like my:for-each(head//section) may have to traverse over overlapping nodes, the expression in the argument is a crawling expression and looping over a crawling node set is not allowed in streaming because certain buffering is required to cache the overlapping nodes. If the loop is not going to consume the individual nodes, this is alright[9], but otherwise, it is not guaranteed streamable.

For such a function to be statically analyzable without knowing what context it is used in, it is allowed to traverse the input tree, but it is not allowed to consume the individual nodes of the streamed input tree. A function that can potentially work with any input posture, albeit climbing, striding or crawling might look like the following:

Code listing: streamable my:for-each working with any argument posture

<xsl:function name="my:for-each1">
    <xsl:param name="$node-set" as="node()*" />
    <xsl:for-each select="$node-set">
        <xsl:sequence select="./@name" />
    </xsl:for-each>
</xsl:function>

This example loops over all the nodes, but does not consume the individual nodes, it only requests a property of those nodes using the attribute axis, which is motionless and its result is climbing. And a motionless expression on either a climbing, striding or crawling context is always allowed[10].

Even though this example is exemplary for showing the possibility of writing a function that can take any posture, it limits our possibilities greatly. Suppose we want to write an implementation of my:for-each that does consume the individual nodes in the streamed node set, for instance as in the following example:

Code listing: streamable my:for-each failing with argument postures climbing and crawling

<xsl:function name="my:for-each2">
    <xsl:param name="$node-set" as="node()*" />
    <xsl:for-each select="$node-set">
        <xsl:value-of select="child::*[self::author | self::author-name]" />
    </xsl:for-each>
</xsl:function>

Whether or not this example would use the child axis within the xsl:for-each or directly without the loop, for the streamability analysis this does not matter. The child axis, if applied on a climbing or crawling axis, is always roaming and has a sweep of free-ranging[11]. This means that the expression my:for-each2(child::book) is allowed, but the expressions my:for-each2(listing//book) and my:for-each(title/ancestor::book) are not, because the latter two have arguments with crawling and climbing postures respectively and can therefor not be consumed.

This leaves us with essentially two options for static analysis of streamable stylesheet functions, without having to take the function calls into account:

  • Option 1: force stylesheet authors to only write functions that are capable of dealing with any input posture. This means that the body of functions must be motionless, or if they are not, they must use a higher-order operand[12] such as the shown xsl:for-each construct in the example.

  • Option 2: limit the allowed posture to only one posture, so that caller and callee are guaranteed to have the same input posture, always. Most likely candidate is the striding posture, because it is the most common posture when processing an input document. This would mean that the argument that can take nodes always has its context posture set to striding and that the only allowed argument posture when calling the function must also be striding.

Both options have severe drawbacks, but they have the advantage of being very clear-cut and easy to communicate or explain. As it turns out, the current approach taken by the XSL Working Group, as explained in XSL Bug 25679, takes the approach of Option 2: the result posture of a function must be striding or grounded and the argument to a function must also be striding or grounded. To accomplish this, the parameter that can take nodes, which can at most be one parameter, will have a context posture of striding and is by itself motionless.

Approach 2: analyzing stylesheet functions from the function call

Radically different from the previous approach is to analyze the function's body only when we actually know what the context is of the argument that takes a streamed node. For this approach, we simply ignore analyzing the declaration of the function on itself.

If we take the example from Code listing: streamable my:for-each failing with argument postures climbing and crawling and call it with a striding argument, for instance my:for-each2(*/books/book), the processor, upon encountering such function call, takes the posture of the argument and sets the context posture of $node-set to this posture, in this case: striding. With this posture, the processor can now analyze the function body and return a result of that analysis, not surprisingly the same as in the previous section: a consuming sweep and a grounded result posture (because the function does not return any nodes, the result is grounded).

If user were to call the function with a crawling argument, say my:for-each2(*/books//book) (mark the extra slash), the parameter $node-set is set to a context posture of crawling and the resulting analysis of the whole function body will be roaming and free-ranging. Exactly as expected. And with a climbing argument, or even a roaming argument, the result would be the same.

Doing the same analysis for the first example, Code listing: streamable my:for-each working with any argument posture, the result would be different each time, showing the advantage of this approach:

Table I: Posture and sweep of my:for-each1 function, depending on argument

Argument postureFunction posture my:for-each1Resulting posture for function-callResulting sweep for function call
GroundedGroundedGroundedMotionless
ClimbingClimbingClimbingMotionless
StridingClimbingClimbingMotionless or consuming
CrawlingClimbingClimbingMotionless or consuming
RoamingRoamingRoamingFree-ranging

The advantage is clear, by postponing analysis of the function body until it is called, the programmer has more flexibility in creating the function and the user will get an error if he uses the function with an invalid argument. This is also the disadvantage: without a clear rule how to call the function, the user may have trouble understanding why one argument works and the other does not. The previous table showed a function, my:for-each1, that works with any posture as argument input (except roaming, but that posture will always result in an unstreamable result, whether it is a user-defined function, a build-in function or any other construct). The following table shows the drawback of this approach, where only upon calling the function it becomes apparent that certain postures are disallowed:

Table II: Posture and sweep of my:for-each2 function, depending on argument

Argument postureFunction posture my:for-each2Resulting posture for function-callResulting sweep for function call
GroundedGroundedGroundedMotionless
ClimbingRoamingRoamingFree-ranging
StridingStridingStridingConsuming
CrawlingRoamingRoamingFree-ranging
RoamingRoamingRoamingFree-ranging

As we can see, only one argument posture (bar grounded, which is always allowed), striding, is allowed with the function my:for-each2, and its sweep is consuming. This is similar to our expectations, as in the previous section, we already saw that this function only worked with a striding argument posture. But in the previous approach, for Option 2, we considered this the only allowed posture for function arguments to stylesheet functions, which made it pretty clear for end users. Here, the user does not know beforehand what the argument posture can be, in fact, he needs intricate knowledge of the function body to find out what posture is allowed.

While this approach allows more flexibility, it also introduces more room for error, at least at the static development stage. Of course, the documentation of a function could contain information of how it should be used, but who reads documentation anyway?

Another advantage, though, is the potential that many existing functions might "just work". In fact, quite some functions that can be found online are often short and meant for a certain given input. As it turns out, if you take, for instance, a look at the functions on www.xsltfunctions.com that can take nodes as their arguments, quite some of them are streamable right out of the box if we would allow this approach.

Approach 3: analyzing stylesheet functions based on static posture assessments

If we try to combine the previous two approaches into one, and try to achieve both static analysis of function bodies and clear information for the users of the functions, we achieve that by decorating the functions with an extra attribute that can take one or more postures. Let us call the attribute argument-posture and add it to xsl:param, with at most one parameter allowed to have that attribute (the challenge of multiple arguments that can take nodes is discussed in section “Multiple arguments that can take nodes”). The value for the attribute is set to be one or more of grounded climbing striding crawling roaming. The postures grounded and roaming are redundant, because the former is always allowed and results in a motionless sweep and the latter is never allowed, resulting in a free-ranging sweep.

If we take the previous example, my:for-each1 and rewrite it with this additional decorating attribute, it looks like this:

Code listing: streamable my:for-each with decorated parameter

<xsl:function name="my:for-each1" as="attribute()*">
    <xsl:param name="$node-set"
        as="node()*"
        argument-posture="climbing striding crawling" />

    <xsl:for-each select="$node-set">
        <xsl:sequence select="./@name" />
    </xsl:for-each>
</xsl:function>

This tells the processor that the programmer declares the function to be streamable with each and every one of these postures. The processor can then statically assess the function body with each posture one by one and report back whether or not this assessment is correct or not. In this case, it would compile successfully, because all postures yield correct results.

Still, you might argue, the different input postures do not necessarily mean that the sweeps are all the same. True, but the sweep is something that can never be statically determined for a declaration, as it is dependent on the context of the caller. For instance, if the function is called with a motionless but striding argument, the resulting sweep is also motionless, because the xsl:for-each with a motionless argument and its body containing a motionless sequence constructor all together yield a motionless call to the function, even though it is striding. This is true, for instance, for the following call:

Code listing: calling function with motionless striding argument

<xsl:template match="books/book[my:for-each1(.) = 'Swift')]">
    <h1>Books from Swift!</h1>
    <xsl:apply-templates />
</xsl:template>

This example shows the power of the current rules in the XSL Working Draft. The existing rules allow a function call inside a predicate in a pattern. Because the context item expression, (.) is motionless and takes the context posture as its result posture, which in this case is striding, the result of calling the function with this argument is also motionless and has a result posture of climbing (which follows from our table above). Finally, the result must be atomized to be able to compare it to a string. For a climbing expression that returns an attribute, atomization is allowed and motionless.

As we can see, this third approach gives us more flexibility, but the drawback is that it requires programmers to understand what a posture is. One of the design goals of the streamability rules is that they can be applied without knowing all the internals, and bringing one of those internals, the posture of a construct, to the surface, may not be desirable.

In the next sections we dive deeper in the analysis of stylesheet functions and try to come up with a set of rules that works for different inputs, not only striding, and that works also with recursion or non-final packaged functions, subjects that have not yet been touched in this section.

In pursuit of flexible rules for guaranteed streamable stylesheet functions

The previous section introduced three possible approaches. In the following sections we will research these approaches further. In fact, the focus will be on Approach 2 above.

In the specification, under the rules on path expressions[13] and axis steps[14], a table shows what happens if the output posture of one part of the path expression is the input posture of the other part of it. Stylesheet functions are not much difference. If we take the lessons learned in that section and we reproduce that table here and adjust it slightly for stylesheet functions, we can deduct that the following rules apply:

Table III: Posture and sweep of stylesheet functions

Argument postureFunction postureResulting postureAllowed sweepa

a) This is the highest allowed resulting sweep of the function body resulting in the given result posture, if the sweep is higher then the sweep of the function body is effectively free-ranging.

b) The term "direct-transitional" is not a posture, it applies to constructs such as "$n/self::foo" or "$n", where $n is the streamed parameter. Such expressions are motionless and take the posture of their input as their output posture.

c) The current rules on streamability do not allow a striding or crawling posture after a climbing or crawling posture, the result of such analysis is always roaming and free-ranging.

GroundedAnyGroundedMotionless
AnyGroundedGroundedSweep of function body
AnyDirect-transitionalbargument postureMotionless
ClimbingClimbingClimbingMotionless
ClimbingStridingcRoamingFree-ranging
ClimbingCrawlingcRoamingFree-ranging
StridingClimbingClimbingMotionless
StridingStridingStridingConsuming
StridingCrawlingCrawlingConsuming
CrawlingClimbingClimbingMotionless
CrawlingStridingcRoamingFree-ranging
CrawlingCrawlingcRoamingFree-ranging
RoaminganyRoamingFree-ranging

To determine the posture of a function body, it is easiest to consider the argument that can take nodes similar to the context item expression ".". This expression is motionless on itself, but if used with usage absorption it may become consuming depending on the input posture of the expression or instruction it appears in, called the context posture. For functions that means, it takes the posture of the argument to that function, in other words, the context posture of the parameter that can take streamed nodes is the posture of the argument to the function. Example:

Code listing: striding streamable function that can take streamed nodes

<xsl:function name="f:thirdchild" as="element()">
    <xsl:param name="n" as="node()" />
    <xsl:sequence select="$n/child::*[3]" />
</xsl:function>

<!-- guaranteed streamable: -->
<xsl:template match="author">
    <xsl:apply-templates select="f:thirdchild(.)" />
</xsl:template>

<!-- not guaranteed streamable: -->
<xsl:template match="author">
    <xsl:apply-templates select="f:thirdchild(.//book)" />
</xsl:template>

The body of the function f:thirdchild has a posture striding because of the child axis step. The usage of the function is transitional, which means it can return nodes. Using the table above, we can lookup the result posture and sweep of any call to the function. The first use in this example is in xsl:apply-templates, which we have seen before takes a usage of absorption, meaning that overlapping nodes are not allowed[15]. The operand here is a call to our function with the current node as argument. Inside any template, by definition the context posture, and thus the posture of the context item expression, is striding. The table then gives us for argument is striding, function body is striding, then result posture is also striding and the sweep is consuming.

Looking at the second use of the function we see another xsl:apply-templates, this time with a crawling path expression (the descendant axis is returns a crawling posture, it can have overlapping nodes) as argument to the function. From the table this gives us for argument is crawling, function body striding the result posture of roaming and a sweep of free-ranging. Even without the table we can see from the argument that it returns overlapping nodes and that a processor will have to potentially look back and forward to find the third child of each overlapping node. If there is no overlap, this problem would not arise, but statically we do not know that, and the processor has to assume the worst.

Multiple arguments that can take nodes

With streaming, it is not possible to have one construct take multiple downward select expressions, or more generally, multiple consuming expressions. Suppose we were to allow a function call to take multiple arguments that are streamed nodes, this rule would backfire. An function call such as f:equals(foo, bar) has two downward select expressions, foo and bar. Because there is no way the processor can know in what order these appear in the input document, it is possible that foo appears before bar, or the other way around, which means that the processor may have to look back to find the other child. This is the general reason why multiple downward selects are not allowed in any single construct.

We could argue that one argument can be a grounded node and another a streamed node. But that does not help us much further, because it hits the same problems as with the non-streamable functions we showed earlier: how can we tell the user of the function to choose between fn:snapshot and fn:copy-of and how can he determine what argument can take a streamed node and what argument cannot?

Another argument to allow multiple arguments that can take nodes is that one argument can be motionless in the function body and the other is not. Together they would be consuming, or if both are motionless, together they would be motionless. But this still does not deal with the fact that to get a node into the function, that node first needs to be consumed, unless it is the context node or an ancestor of the context node. These restrictions would be very limiting and also very hard to define in spec language.

The resolution is to keep things simple. Instead of allowing multiple arguments that take nodes, streamable functions can have a maximum of one parameter that can take nodes. The other arguments must either be non-node types, or absent.

Recursive streamable functions

In Table III: Posture and sweep of stylesheet functions in section “In pursuit of flexible rules for guaranteed streamable stylesheet functions” we have seen that it is possible to write functions with any kind of result posture. The result posture is then dependent on the argument passed to the function. This works for the trivial case where the function body is simple and can be analyzed. But what happens if we try to apply those rules with a function body that is recursive?

For this section I consider four kinds of recursion[16]:

  • Direct recursion: function calls itself inside the body of the function.

  • Indirect recursion: function calls itself indirectly through another function it calls.

  • Indeterminate recursion: function applies templates and a template at some level calls the function.

  • Dynamic recursion: a dynamic function call resolves, or can resolve to a function which is currently executed.

Direct recursion

To understand the challenges when dealing with recursive functions it is enough to look at a most trivial example, which is clearly streamable and uses infinite direct recursion:

Code listing: trivial direct recursion

<xsl:function name="f:recur">
    <xsl:param name="n" />
    <xsl:sequence select="f:recur($n)" />
</xsl:function>

<xsl:template match="x">
    <xsl:apply-templates select="f:recur(y)" />
</xsl:template>

This function does not have an exit-clause for the recursion and will run indefinitely, likely causing a stackoverflow error. But it is legal XSLT and it is streamable in the sense that it no movement occurs on the streamed argument.

If we analyze this function using Table III: Posture and sweep of stylesheet functions, we run into a problem:

  • First we need to determine the posture of the argument to the function, here it is striding.

  • This sets the posture of $n to striding as well. The next step is to determine the posture and sweep of the function body.

  • To determine that, we need to determine the posture and sweep of the contained sequence constructor, which is a single xsl:sequence with here one operand, the select-expression.

  • The posture and sweep of xsl:sequence is equal to the posture and sweep of its select expression[17], here f:recur($n).

  • We know the posture of $n, but to determine the posture of the function call f:recur we need to analyze the function body again, which we already did, but without having a conclusive sweep or posture yet. We've reached an eternal analysis loop, the function is non-analyzable.

It is not trivial to resolve this dead end situation. One way forward could be to split the function body artificially into two operands and analyze each individually: one for the part without the recursion point[18], setting the recursive function to the empty sequence (), and one for the argument to the recursion point. This will help somewhat, but feels wrong and is hard to proof right. What would happen if a function returns streamed nodes and gets a climbing input and the output is of a different posture?

Looking back at Table III: Posture and sweep of stylesheet functions we can deduct that any function call with a climbing or crawling argument must yield a climbing or crawling result. The only argument that is allowed to return a different result posture is an input with a striding posture. This makes sense, because the natural process for streaming is to go through the input document depth-first, which is the striding posture and once you change direction upwards, you cannot go down again: it would make a construct roaming. To summarize, we can consider that:

  • A climbing input argument must remain climbing to be streamable, so if the input is climbing, the recursive argument must also be climbing and the body cannot contain an expression in another direction.

  • A striding input argument can change to climbing but not to crawling in recursion, because that would validate the next point. If it changes to climbing the whole function body must be motionless, because you cannot both climb and consume, and if it stays striding the whole function body must be striding and can be either motionless or consuming.

  • A crawling input argument can only take motionless expressions, it cannot turn magically back into striding[19] nor is it allowed to have a crawling expression operate on another crawling expression.

  • Anything roaming will always stay roaming.

The following table shows what this means for different argument postures of calls to a function, what the recursive argument (the one at the recursion point) can be and the result posture of the recursive call is and what the posture of the function body must be (it must be the same) and what the sweep is of the recursive call. The last column stresses the maximum sweep of the function body with the given arguments. Note that all these analyses are in the context of the initial function call, not the recursive call, which makes it possible to allow one function body to take different postures as input:

Table IV: Posture and sweep of recursive stylesheet function calls

Argument postureRecursive argument postureRecursive function body & call postureaRecursive function call sweepMaximum function body sweep

a) The function body must match the recursive call posture or be grounded, otherwise the posture is roaming.

b) In fact, if the argument is grounded, the recursive argument will also be grounded.

c) Because on a next iteration, the argument will be crawling, not striding anymore, it would be a crawling expr. on a crawling expr. which is not guaranteed streamable.

GroundedAny (grounded)bGroundedbMotionlessMotionless
anyDirect-transitionalPosture of argumentMotionlessConsuming
ClimbingClimbingClimbingMotionlessMotionless
ClimbingNon-climbingRoamingFree-rangingn/a
StridingClimbingClimbingMotionlessMotionless
StridingStridingStridingConsumingConsuming
StridingCrawlingRoamingcFree-rangingcn/a
CrawlingClimbingClimbingMotionlessMotionless
CrawlingNon-climbingRoamingFree-rangingn/a
RoaminganyRoamingFree-rangingn/a

For this table to work in practice, we can allow at most one recursion point, or each recursion point must have the same posture and at most one can be consuming. Once you have found the posture and sweeps of the recursive function call, you can continue the analysis of the function body and then apply the previous Table III: Posture and sweep of stylesheet functions.

The rules in this table are laid out in such a way that it is not possible to have a climbing expression as the argument to a recursive function at a recursion point and somewhere else in the same function body a consuming construct or a non-climbing result. This is necessary, because if we would allow the combination of a climbing recursion point and a striding function body, then on either the initial call (with a climbing argument) or the recursive call (if the initial call would have been striding for instance) would call the function with an incompatible climbing posture, which would then be consumed. And a climbing posture can never be consumed. More specifically, it prevents a function like the following to be considered streamable:

Code listing: non-streamable climbing and consuming

<xsl:function name="f:climbstri">
    <xsl:param name="n" />
    <xsl:choose>
        <xsl:when test="f:climbstri($n/ancestor::foo)/contains(., 'bar')">
            <xsl:text>Found it</xsl:text>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$n" />
        </xsl:otherwise>
    </xsl:choose>
</xsl:function>

Glancing over the example might already give a hint as to why it cannot possibly be streamable: the xsl:value-of can consume a streamed node and that streamed node is likely to be at a climbing axis because of the climbing posture of the recursive argument. If we take the item in the table into account and consider calling this function with a striding argument, as with f:climbstr(a/b) then we find in the entry for striding and climbing that the body of the function must at most be motionless. It is consuming, because of the xsl:value-of so the analysis fails and the function is roaming and free-ranging.

If we go back to our original example (see Code listing: trivial direct recursion) and apply the rules of this table, we have one recursion point, f:recur($n), the argument is direct-transitional, that is, it is motionless and takes the posture of its caller, in this case the caller of the function. According to the table above, we can use any posture as input, which will then become the result posture, and the sweep of the function body follows as motionless and grounded. In our case, the call has a child-select expression in f:recur(y), which has posture striding and sweep consuming. The result posture and sweep of the whole template is then grounded and consuming.

Indirect recursion

With indirect recursion there are extra layers of calls before the recursion takes place, as in A calls B calls C calls A again. The question that rises is whether the rules in section “Direct recursion” also hold if a function calls itself indirectly. Indirect recursion is a type of recursion that can also be determined statically. Let us look at an example:

Code listing: indirect recursion

<xsl:function name="f:filterattr">
    <xsl:param name="n" />
    <xsl:param name="filter" />
    <xsl:sequence select="$n/@*[name() = $filter]" />
    <!-- indirect recursion -->
    <xsl:sequence select="f:take-ancestor($n, $filter)" />
</xsl:function>

<xsl:function name="f:take-ancestor">
    <xsl:param name="n" />
    <xsl:param name="filter" />
    <xsl:sequence select="
        if($n) 
        (: indirect recursion :)
        then f:filterattr($n/ancestor::element()[1], $filter) 
        else ()" />
</xsl:function>

The example is a very verbose and clumsy way of getting all attributes of all ancestor-or-self nodes that have a certain name. For instance, with the following input and stylesheet, the output will be 3.14 1.64:

Code listing: input document

<foo version="1.64" name="john">
    <zed>
        <bar version="3.14" id="1234">
    </zed>
</foo>

Code listing: indirect recursive function call example

<xsl:mode on-no-match="shallow-skip" />

<xsl:template match="bar">
    <xsl:value-of select="f:filterattr(., 'version')" />
</xsl:template>

But whether it is clumsy or not is not the matter. It serves as a basic example of indirect recursion and the function is clearly streamable because it only traverses the ancestor and attribute axes, which are motionless. However, the question is, of course, whether this example is streamable by following the rules from Table IV: Posture and sweep of recursive stylesheet function calls.

If we try that a first time, we quickly find that it is not immediately obvious how to apply it, because the functions above are not directly recursive. Recall that we said in section “Direct recursion” we wrote we can allow at most one recursion point, or each recursion point must have the same posture and at most one can be consuming . There are two recursion points here, the one inside f:filterattr that calls f:take-ancestor and the one inside f:take-ancestor that calls f:filterattr. The posture of the first one takes the input posture of the argument, it is Direct-transitional, the second one traverses the ancestor axis, which gives it a climbing posture. Finally, the argument inside the template example in Code listing: indirect recursive function call example has itself a posture of striding, because the context item expression takes the context posture which in the case of xsl:template is striding.

In short, if we start at the template, we start with posture striding, then striding again (because it is direct-transitional), then climbing and on the next recursive iteration this repeats, but now from climbing as input to f:filter-attr, then climbing again (because direct-transitional takes over the posture of the argument) and so it continues climbing until all recursive iterations have finished.

This breaks the rule we set to ourselves: the posture changed from striding to climbing. Yet, we also established that the example functions were streamable. The conflict arises from the argument $n being direct-transitional, which works like a cameleon and changes the posture along the way. One resolution to this specific situation is to allow direct-transitional, provided the rest of the body of the function is motionless, in which case it does no harm. Another, perhaps easier solution is to force the same posture in, same posture out rule and rewrite the argument to this function as follows:

Code listing: fixed indirect recursive function call

<xsl:mode on-no-match="shallow-skip" />

<xsl:template match="bar">
    <xsl:value-of select="f:filterattr(./ancestor-or-self::*[1], 'version')" />
</xsl:template>

The change to using the ancestor-or-self axis also changes the initial posture to climbing, which in turn changes the posture of $n to be climbing as well, making all postures in all recursion points climbing.

While this relatively trivial example seemed to show an edge-case because it used a recursive argument posture of direct-transitional, it taught us two things: one, even a trivial example can quickly appear hard once we try to apply streamability rules on them and two, to keep things simple, it is best to stick a set of ground rules, even though that potentially means that we slightly limit certain use-cases.

In practice, indirect recursion is found pretty rarely in XSLT programming, even though XSLT is itself a functional language, it is much more common to use direct recursion. But if a scenario requires indirect recursion, it is possible, even with streaming.

Indeterminate recursion

Another challenge comes from indeterminate recursion as in the following example:

Code listing: trivial indeterminate recursion

<xsl:function name="f:indeter">
    <xsl:param name="n" />
    <xsl:apply-templates select="$n" mode="streamable" />
</xsl:function>

<xsl:template match="*" mode="streamable">
    <xsl:sequence select="f:indeter(.)" />
</xsl:template>

In this situation, the analysis of the function declaration and the template declaration can be done independently, without ever finding the recursion point. However, when this code is run, the recursion kicks in (assuming the argument is an element). There is, however, one big advantage to this type of recursion: the rules on templates define that the result of a template declaration must be grounded[20]. Therefore, by definition, the result of any xsl:apply-templates is also grounded, which gives the function body above a grounded posture as well. In turn, the call to f:indeter results in a grounded posture, which concludes the circle. In addition, any select-expression in xsl:apply-templates must be atomizable, which in streamability terms means that it cannot be climbing or crawling[21]. This leads us to a temporary conclusion: if the arguments to a function are striding they can be used in indeterminate recursive functions. If the arguments are not striding, the analysis fails and makes the call on the function roaming and free-ranging.

Following Table III: Posture and sweep of stylesheet functions and taking the same approach that the body of the function should be analyzed during the call to the function, we can conclude that it is impossible for a function body to apply templates on something else then a striding posture. The one exception that is allowed in the General Streamability Rules[22]is for a climbing posture of childless nodes, which only applies to the attribute axis. This means that we can consider the following example streamable as well, even though we pass a climbing, or even a crawling expression to the function:

Code listing: climbing result accepting crawling posture

<xsl:function name="f:apply-attribs">
    <xsl:param name="n" />
    <xsl:apply-templates select="$n/@*" />
</xsl:function>

Whether or not that function will be called indeterminate-recursively or not is irrelevant, because on the second iteration, the selection will be empty, because attribute nodes do not have children.

In short: indeterminate recursion does not have influence on the streamability analysis, nor does it break it. The existing rules suffice.

Dynamic recursion

Dynamic recursion looks a bit like indeterminate recursion in the sense that whether or not recursion takes place cannot be determined statically. In XPath 3.0, it is possible to bind a variable to a function[23], this is commonly referred to as higher order functions. The concept is not new, already in 2001, Dimitre Novatchev showed in his paper Novatchev 2001 that writing higher order functions is possible in XSLT 1.0 by means of what he calls template references[24], a concept also applied to the functional coding patterns in Mangano 2005, which was further popularized as the Novatchev technique in Kay 2008. It took until XSLT and XPath 3.0 that function items became first-class citizens, see my own paper Braaksma 2013 for an extended coverage of that subject for XSLT and Grust 2013 for a coverage for XQuery and databases.

For the rest of this section, we will assume you have a basic understanding of higher order functions in XSLT and XPath. The closure of higher order functions cannot contain the context item, so for our analysis we should only worry about arguments that can be streamed nodes, just like with regular functions.

The nature of a dynamic function call is that the function it refers to is only known at runtime, when the variable is bound to a function item. As a result, there is not much we can say of the posture and sweep of the function it refers to until it is actually invoked. Following the rules on streamability, the result of binding a variable to a function item is always grounded. This makes sense, because at the moment of binding, a function item is assigned to a variable and no streamed nodes are possibly involved in the result.

Once a function item is evaluated, its arguments are known. Provided that the arguments themselves are not roaming and free-ranging, the analysis will thenceforth look at the bound function. If this function is streamable, we can do the analysis from the previous sections as if it was a direct function call. The biggest difference lies in the fact that the analysis is done during execution. This means that an error for whether a call is not streamable will be a dynamic error.

This is no different for dynamic recursion. If dynamic evaluation leads to a recursive call, the posture of the argument, the recursive call and the function body must be matched to Table IV: Posture and sweep of recursive stylesheet function calls. If the result is roaming and free-ranging, the recursive call is not guaranteed streamable.

While it is theoretically possible to allow such dynamic streamability rules, the rules in the specification are meant to guarantee streamability statically. As a result, any potential dynamic streamability is prohibited. Processors may allow dynamic streamability as a vendor extension, but the specification will consider passing a node to a dynamic function call not guaranteed streamable, dynamic recursion, as suggested in this section, is therefor not possible with the current set of rules on static guaranteed streamability.

Analyzable and non-analyzable functions

Analyzable stylesheet functions are functions that can be analyzed by themselves, as we saw in . Non-analyzable stylesheet functions are those that can only be analyzed from the function call, because the body of the function on itself cannot be analyzed without knowing what the argument is. Most of the previous sections was about non-analyzable function calls.

Spec-wise, the term non-analyzable refers to the set of functions that cannot be analyzed statically and are given very conservative assumptions on streamability. This is true for recursive stylesheet functions and non-final streamable functions (see section “Non-final streamable functions in packages”). The term non-analyzable does not necessarily mean that the function is not analyzable, however, the general rule is that it requires a call, or knowledge of the context of a call to the function to find out whether or not the function is streamable. We have seen in the previous sections that it is possible to allow quite a variety of postures, even for non-analyzable functions, but in the specification such functions are limited to a posture of striding and are always considered consuming even if in fact they are not.

Non-final streamable functions in packages

In XSLT 3.0, a new feature, packages allows programmers to create a collection of modes, functions, named and matching templates, accumulators etc and pack them in a, potentially pre-compilable, package, which can then be included in other packages or stylesheets with xsl:use-package[25]. Components in that package can have a visibility attribute that can be set to public, private, final or abstract. If the visibility is set to private it can only be used in the local package, and if it is set to final the declaration cannot be overridden, but can be used also in other packages. Visibility abstract means there is no implementation yet, and it must be implemented by a using package (one that includes the package). And public, the default, means that a declaration can be overridden by using xsl:override, which will then take precedence over the original implementation. This can be considered similar to virtual methods in object-oriented languages like C++ and C#.

The result for determining whether a function can be guaranteed streamable or not cannot be determined statically at the moment that a package is compiled, because it is possible that it will later be overridden and any calls to that function can be impacted if the implementation, and the streamability, changes.

As briefly explained in the previous section, if a function is defined with visibility abstract or public, in other words, if they are non-final and non-private, then they are considered non-analyzable by the specification. This means that the processor cannot, and does not have to, try to analyze function calls too deeply: it only expects a striding posture and it will not try to determine whether or not the function is consuming or motionless when you call the function: it assumes the worst and will consider the call consuming.

A proposal from me in the bug entry [26]awaits reaction from the working group. It shows a way that processors can use to statically detemine the posture of non-final streamable functions. However, if that proposal were excepted, it will pose limitations on overriding non-final streamable stylesheet functions. I think, however, that not being able to override a function, and not being able to write non-final functions, is too severe a limitation. Even if a function is non-final, if you override it to make it final, you will have to rewrite the whole function body, you cannot use xsl:original(), because that would call into the non-final function again, which takes too much of conservative approach.

An alternative approach is the approach taken for the recursive functions in the previous sections. For all analysis there, we always considered the function call itself to get the whole picture. For functions that are non-final, such an approach can also work. Because the processor only knows what function to call when it encounters the function call itself, it should include the body of that function at that moment into the analysis. This can be done statically, there is no need to first run the stylesheet.

A potential problem, however, still arises if someone writes a package function that works correctly with a climbing result posture, but cannot be used if that is changed into a function override with a striding result posture. Even more, a motionless function could be changed into a consuming function, which will greatly impede existing calls to that function. One way to remedy that is to disallow overriding functions with a different result posture or with a different sweep. That means that any given call will need to be analyzed against the new function and the xsl:original() function call, as if it still existed in scope, otherwise it will not be possible for the processor to determine whether there is a change in posture or sweep.

Whether or not this turns out to be a feasible option in practice remains to be seen. After all, there is always a trade-off between usability of a feature and the complexity of the rules or the complexity for implementations. If the rules become too complex for anybody to understand, then there is littly chance it will be used in practice, and there will be little incentive for existing implementors to implement such feature, let alone write specification-tests for it.

An improvement: type-determined posture

In the past few sections we have seen that analyzing a function that can return nodes can be quite hard and intricate. However, if you know what type your function will return, you can use the as-attribute on the xsl:function declaration. This will tell the processor that the result must be converted into the type defined in that attribute.

In cases where the type is not a node, the processor can go much further with static analysis than in situations where the type is not known, because if the type is not known, the processor must assume the worst. Suppose you declare the function with as="xs:integer", there is no way your function can return nodes. For recursion this is a big help, because the processor now knows the result of the recursion will not contain nodes. Whether or not there is overlap in the argument expression is now less relevant and the posture of any function call, even the recursive ones, will always be grounded. This makes the requirement to match the argument with the result posture go away and simplifies both writing and analysis.

Let us look at an example, adopted from an example by Michael Sperberg-McQueen:

Code listing: type determined posture

<!-- mimicking what can also be written as
     string-join(('', reverse(ancestor-or-self::*)/name()), '/') -->
<xsl:function name="f:fqgi" as="xs:string">
    <xsl:param name="n" as="node()"/>
    <xsl:choose>
        <xsl:when test="count($n/ancestor::*) = 0">
            <xsl:value-of select="'/' || name($n)"/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="concat(
                name($n), 
                '/', 
                f:fqgi($n/parent::*))"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:function>

The example takes a node, processes its parents recursively and returns the names in reverse order (deepest first) separated by slashes. Because the function is declared to return a grounded result, namely xs:string, there is no need to assume the function can return nodes, because it cannot. Through Table IV: Posture and sweep of recursive stylesheet function calls we find that if the argument posture is striding and the recursive argument posture is climbing, as is the case here, the recursive function call is considered climbing as well and the function body must then result in either a climbing or grounded posture.

In this case, the function call itself does not have to be set to be climbing, because we know from the xs:string result type that the result will always be grounded. This simplifies the analysis in that we can now consider the function argument to be of either absorption or inspection usage. If the function body as a whole consumes the node referenced by $n it will be absorption and the analysis fails, because a climbing expression with absorption usage is not allowed. Because the only other operands on the streamed argument are fn:count and fn:name, both having inspection operand usage and both being motionless, the result of the analysis as a whole is that the function is guaranteed streamable, does not consume the streamed argument which we can summarize as having usage inspection, and while it is direct-recursive we know it can be called with any argument, albeit climbing, striding or crawling.

We could expand the type-determined analysis further by including whether or not the streamable argument has a cardinality of zero-or-one or has a cardinality of zero-or-more. In the case of zero-or-one there is no way that overlapping nodes can be passed on to the function, which can simplify analysis of a function call with a crawling argument, because it will fail dynamically if the crawling argument returns more than one node. This special-case scenario has meanwhile been tackled and been generalized for any situation and cardinality[27], which can also by applied to stylesheet function call arguments.

Miscelleneous

Inline functions and streamability

Inline functions are a new XPath 3.0 capability[28] that allows you to write an anonymous function inside an XPath expression. The definition of inline function expressions prohibits referencing the context item, which makes it not directly possible to refer to the context. However, the closure of an inline function contains all local variable bindings, which makes it possible to refer to a variable outside of the inline function, which itself could be bound to a streamed node. The streamability rules for xsl:variable and xsl:param prohibit referencing a streamed node, but one exception is inside a streamable stylesheet function, where it is possible to have a parameter bound to a streamed node.

There are two solutions possible. One is to come up with complex rules for inline functions referencing the streamed parameter, the other is to simply prohibit using a streamed parameter inside an inline function body. This in itself would not be consistent with the definition in XPath, which states that all local variable bindings are available in the dynamic context of the inline function, which leads us to a third option: simply disallow inline function expressions within the body of a streamable stylesheet function.

This third option would be in line with the current streamability rules for inline function declarations[29], which states that they are, by definition, grounded and motionless, simply because it is not possible to pass a streamed node into the closure of the inline function body.

Partial function application and streamability

Partial function application is also a new XPath 3.0 capability by writing down a question mark as a placeholder for an argument[30], and apply the other arguments already. The result is a function item with less arguments than the original and some arguments already filled in. An example is the expression index-of(?, ?, "http://my-collation"), which returns a function item that presets the collation argument to http://my-collation, resulting in a two-argument function that always uses the same collation.

The current rules are under the rules on function calls[31] state that, unless the function is focus-dependent, that the general streamability rules apply, which means that it is treated as a normal function call, but the placeholder arguments are ignored.

If we take into consideration that binding a streamed node to a variable and passing it around is prohibited, a similar rule should be applied to partial function application: if the function is a streamable stylesheet function, the argument that can take a streamed node, must be the placeholder, or a grounded node must be supplied, otherwise it would become a roaming and free-ranging expression.

In other words, partial function application works with user-defined streamable stylesheet functions, as long as you do not try to bind a streaming node from the current context to it. You should do that at a later stage, when you actually call the function.

Named function references and streamability

Named function references are another XPath 3.0 feature[32]: you can create a function item of an existing function by using the syntax FunctionName#ArgCount, where ArgCount is a literal giving the arity of the function. For instance, count#0 will give a reference to the zero-argument function of count and my:filter#3 will give a reference to a three-argument function my:filter.

The function item returned by such an expression can be called as a normal function by applying parentheses and arguments, which in turn will call the function you referred to to begin with. In practice, this syntax is useful for binding variables to existing named functions.

Creating a named function reference is an atomic action and does not involve references to nodes. As a result, a named function reference itself is always grounded. There is one exception, if you try to create a reference to a function that is focus-dependent. In such cases, just like in previous rules, the result is roaming and free-ranging. This is in fact the current rule for streamability of named function references in the specification[33].

Since stylesheet functions by definition cannot be focus-dependent, this exception does not apply to named function references that refer to a (streamable) stylesheet function.

Dynamic function calls and streamability

A dynamic function call, also a new XPath 3.0 feature[34], is a call to a function item that is bound to a variable. Suppose you have <xsl:variable name="fref" select=" 'name#1' " />, which binds the one-argument version of the name-function to $fref, then you can call that function by adding parentheses and arguments, the same way you would have done if the variable were the actual function: $fref(child::*[1]).

Since it is not possible to know at runtime what function the variable is bound to, analysis can only take place once the function is actually called. The current rules[35] state that all arguments have operand usage navigation, which means that you can only call a dynamic function when you actually create a grounded copy of a streamed node. In case you wanted to apply it to a streamed node you are out of luck.

The rules make an exception in case the signature of the function item is known, in which case type-determined usage, as in part explained in section “An improvement: type-determined posture”, can be used. That means that, if an argument is declared as a non-node type, the usage typically becomes absorption.

Streamability of stylesheet functions according to the specification

The current approach taken by the rules in Bug 25679 is a pessimistic one. For stylesheet functions to return nodes, they must be striding and are allowed to be either motionless or consuming. As a further limitation, which actually follows from forcing the striding posture on function bodies, is that the argument to the function must itself also be striding. In fact, it is not possible, by the current definition, to have a streamed node with crawling or climbing posture act as an argument to a streamable stylesheet function.

By limiting the result posture of the stylesheet function to one allowed posture, it is easier to write rules for both analyzable and non-analyzable stylesheet functions, see section “Analyzable and non-analyzable functions”. However, because the bug entry is quite fresh and the discussion is still ongoing, there is little conclusive I can tell about the rules that will eventually make it into the next public version of the specification[36]. For instance, in Comment#1 of the same public bug entry I have proposed some of the rules from Table III: Posture and sweep of stylesheet functions. Whether or not any or all of these suggestions will make it into the specification will remain to be seen.

Even if the specification will only allow relatively pessimistic stylesheet functions, vendors are still allowed to use wider rules on streamability. In the case of functions, Exselt will allow any function that is streamable, as explained in the previous section, by using an optimistic operational mode at user option.

The streamable attribute on xsl:function

In none of the previous sections in this paper have I mentioned the streamable attribute on an xsl:function declaration. Since most of the paper discussed streamability of a stylesheet function in the context of the function call, whether or not a function is streamable then depends on its implementation and there is no need for the processor to know beforehand what functions to analyze and what not.

The streamability rules in the specification are made such that the processor must statically determine whether or not a function declaration is streamable or not. Since it is likely that any given stylesheet will have both streamable and non-streamable stylesheet functions, it is necessary to tell the processor which are and which are not streamable. For that, the xsl:function declaration gets a new attribute, similar to the attribute of xsl:accumulator and xsl:mode, which determines whether or not a function should be analyzed for streamability.

The attribute takes a value of yes or no. If the value is yes, then the function must be guaranteed streamable and the function can be called with a streamable node as an argument.

I think that such an attribute is not necessary if we take the approach in this paper and analyze the entire function only upon the actual function call. If that function call is inside a streamable context and a streamable node is passed as an argument, the body of the function is analyzed with the context posture of the streamable argument set as the posture of the streamable parameter reference.

Status of current processors with respect to streamable stylesheet functions

Since streamable stylesheet functions are a relative recent addition to the specification and lots of it is not yet publicly available, there are currently no processors available that fully support streamable stylesheet functions.

At the time of this writing, of the two streaming processors that I know of, Exselt and Saxon, the former currently has a full streamability analysis for all constructs but a limited ability to analyze stylesheet functions for streamability. However, the available set of tests is still growing and the rules on streamability may change between now and the actual presentation of this paper, or even afterwards, since the specification is not final yet and work on streamable stylesheet functions may change the current rules. Apart from analyzing a function for streamability, Exselt chooses an optimistic streaming approach: if you write your stylesheet in a correct way with forward expressions only, your stylesheet will process streaming input in a streamable way.

For Saxon, I do not know the actual current status on streamable stylesheet functions, but I do know that they plan to do it in the not-so-far future.

If you are interested in streaming, or more specifically in streamable stylesheet functions, keep an eye out on the website of Exselt and Saxon as it is likely that in the near future, both processors will support this feature.

Conclusion

The previous sections have shown that writing streamable stylesheet functions is not that hard, as long as you do not require recursion, or want to write overridable non-final functions. But if you do require recursion, the simplest rule to remember is same posture in, same posture out. As long as you stick to that rule, and forget about the exceptional cases, you are in safe streaming waters.

We have also seen that the current resolution in the specification taken to allow streamable stylesheet functions is far more limiting, allowing only a posture of striding for the input arguments and in the case of recursive functions or non-final functions, always concludes that a call to such a function is consuming, even if it is not. This conservative approach has a valid reason, though: it tries to keep things simple and the rules understandable and implementable. But even with this limitations in mind, it gives stylesheet authors quite a wide range of possibilities to write streamable stylesheet functions.

This paper has shown how a relative small change to the specification opens up the way for stylesheet and library package writers to write streamable stylesheet functions, which can be used in both streamable and non-streamable scenarios alike[37]. This change was an important and vital one and has brought us very close to making the streamability rules as mature as they can be and ready for widespread use in stylesheets and packages. Hopefully this paper has given potential library writers a strong hint that they should take the extra step to write their library packages with streamability in mind to be useful for both the streaming and non-streaming markets.

References

[Braaksma 2013] Braaksma, Abel. Efficient XML processing with XSLT 3.0 and higher order functions: pp 23-40. Presented at XML Prague 2013, a conference on markup languages and data on the web, Prague, Czechia, Feb 8-10, 2013. In XML Prague Proceedings 2013. http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf. Web.

[Braaksma 2014a] Braaksma, Abel. Streaming for the masses, an introduction to streaming with XSLT: pp 29-80. Presented at XML Prague 2014, a conference on markup languages and data on the web, Prague, Czechia, Feb 14-16, 2014. In XML Prague Proceedings 2014. http://archive.xmlprague.cz/2014/files/xmlprague-2014-proceedings.pdf. Web.

[Braaksma 2014b] Braaksma, Abel. Streaming Design Patterns or: How I Learned to Stop Worrying and Love the Stream: pp 24-53. Presented at XML London 2014, UK XML Conference, London, United Kingdom, June 7-8, 2014. In XML London Proceedings 2014. http://xmllondon.com/2014/xmllondon-2014-proceedings.pdf. Web. doi:10.14337/xmllondon14.braaksma01

[Grust 2013] Grust, Torsten and Ulrich, Alexander. First-Class Functions for First-Order Database Engines Presented at International Symposium on Database Programming Languages. In Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), Trento, Italy. arXiv:1308.0158: http://arxiv.org/pdf/1308.0158v1.

[XPath and XQuery F&O 3.0] Kay, Michael. XPath and XQuery Functions and Operators 3.0 Recommendation 8 April 2014. W3 Consortium. W3C Recommendation. http://www.w3.org/TR/xpath-functions-30/. Web.

[Gerstbach 2006] Gerstbach, Peter. Generating Structured Documents to Create Reports by Integrating Data from CMS/DMS and EAI Systems: pp 27-28. Master thesis. At: Softwaretechnik und Interaktive Systeme der Technischen Universit¨at Wien. May 2006. http://www.gerstbach.at/2006/thesis/. Web.

[Kay 2008] Kay, Michael. XSLT 2.0 and XPath 2.0 Programmer's Reference 2nd edition, 5 May 2008: pp 251+. Published by O'Reilly Media. ISBN: 0470192747. Print.

[Lenz 2005] Lenz, Evan. XSLT 1.0 Pocket Reference (Pocket Reference). 1st edition, 19 August 2005: pp 23-40. Published by O'Reilly Media. ISBN: 0596100086. Chapter 3 (pages 23-40) is available online: http://lenzconsulting.com/how-xslt-works/. Print.

[Mangano 2005] Mangano, San. XSLT Cookbook, 2nd Edition, Solutions and Examples for XML and XSLT Developers 2nd edition, 21 December 2005: pp 686-732. Published by O'Reilly Media. ISBN: 0596009747. Print.

[Novatchev 2001] Novatchev, Dimitre. The Functional Programming Language XSLT — A proof through examples November 2001. Seen at: Sommer Semester 2010 course material University of Magdeburg 2010 http://edu.cs.uni-magdeburg.de/EC/lehre/sommersemester-2010/funktionale-programmierung/uebungen/gruppe-2/aufgabenblatt-12/XSLTasFP.pdf. Online reference: http://fxsl.sourceforge.net/articles/FuncProg/Functional%20Programming.html. Print.

[Tennison 2001] Tennison, Jeni. Rescuing XSLT from Niche Status.. 17 February 2001. MulberryTech XSL Mailing List Archive. http://www.biglist.com/lists/xsl-list/archives/200102/msg01143.html. Web.

[W3C BugZilla] Various contributors. Bugzilla - Public W3C Bug / Issue tracking system 1997 - 2014. W3 Consortium. https://www.w3.org/Bugs/Public/. Web.

[Welker 2008] Welker, Eddie. Advantages of push-style XSLT over pull-style 25 November 2008. http://www.eddiewelker.com/2008/11/25/push-style-xslt-vs-pull-style/. Web.

[XQuery and XPath Data Model 3.0] Walsh, Norman; Berglund, Anders; Snelson, John. XQuery and XPath Data Model 3.0 Recommendation 8 April 2014. W3 Consortium. W3C Recommendation. http://www.w3.org/TR/xpath-datamodel-30/. Web.

[XPath 3.0] Robie, Jonathan; Chamberlin, Don; Dyck, Michael; Snelson, John. XML Path Language (XPath) 3.0 Recommendation 8 April 2014. W3 Consortium. W3C Recommendation. http://www.w3.org/TR/xpath-30/. Web.

[XProc] Walsh, Norman; Milowski, Alex; Thompson, Henry S. XProc: An XML Pipeline Language 11 May 2010. W3 Consortium. W3C Recommendation. http://www.w3.org/TR/xproc/. Web.

[XSLT Latest Version] Kay, Michael. XSL Transformations (XSLT) Version 3.0, Latest Version Undated. W3 Consortium. W3C Working Draft / Last Call Working Draft / Candidate Recommendation / Proposed Recommendation / Recommendation. http://www.w3.org/TR/xslt-30/. Web.

[XSLT Last Call WD] Kay, Michael. XSL Transformations (XSLT) Version 3.0 W3C Last Call Working Draft 12 December 2013. W3 Consortium. Last Call Working Draft. http://www.w3.org/TR/2013/WD-xslt-30-20130201/. Web.



[1] The ancestor-or-self axis is available during streaming and requesting properties on nodes on that axis is allowed. However, it is not possible to navigate away from that axis, doing so would result in a free-ranging and roaming expression.

[2] See section 19 of the XSL 3.0 specification.

[5] See, for instance, his answer in question 16631213 at StackOverflow, which uses recursion on element nodes, it's an example where functions are arguably a better choice than template matching.

[6] See section 5.6 Patterns, specifically the part on predicate patterns in the XSLT 3.0 Working Draft.

[7] Before user-defined stylesheet functions were allowed to be streamable and take on streamed nodes, the rules were as in the current Working Draft: For a call to a stylesheet function, the general streamability rules apply. There is one operand role for each argument in the function signature, and its operand usage is the type-determined usage based on the declared type of that argument., see section 19.8.7.12 Streamability of Function Calls in XSLT 3.0 Working Draft. Please note that this and other references to the WD will change when the next version of the specification comes out.

[8] See section 19.8.1. in XSL Transformations 3.0.

[9] See section 19.8.4.17 Streamability of xsl:for-each in the XSLT 3.0 Working Draft, specifically the rule The posture of the instruction is the posture of the contained sequence constructor, assessed with the context posture and context item type set to the posture and type of the select expression..

[10] That the attribute axis in any context results in a posture of climbing and a sweep of motionless follows from the rules in section 19.8.1 General Rules for Streamability, rule 2.e and the rules on the axis steps itself, which can be found in 19.8.7.7 Streamability of Axis Steps.

[11] The rule that a child axis is not guaranteed streamable if applied on an axis other than an axis with striding posture follows from 19.8.7.7 Streamability of Axis Steps in the XSLT Working Draft, in the table under item 6 in the list.

[12] A higher-order operand is an operand that can change focus, such as xsl:for-each or xsl:for-each-group, see the definition of higher-order operand in the XSLT 3.0 Working Draft.

[13] See 19.8.7.6 Streamability of Axis Steps in XSL Transformations 3.0.

[14] See 19.8.7.7 Streamability of Axis Steps in XSL Transformations 3.0.

[15] A public bug entry Bug 25185 tries to resolve this by allowing limited buffering of crawling postures in a absorption contexts such as these, if that bug is accepted and resolved, overlapping nodes as with crawling posture is allowed with usage absorption, at least in the general case.

[16] The XSL Transformations 3.0 Working Draft does not use this terminology, but it helps differentiate different situations for this discussion.

[18] I use the term recursion point for the place in the code where the recursion is invoked, but this is not a term that is used in the specification.

[20] See section 6.6.3, Streamable Templates in XSL Transformations 3.0, second bullet-point: The sequence constructor contained in the body of the xsl:template element is grounded.

[21] One exception is allowed for selections that only select childless nodes, such as attribute nodes. These are climbing but do not contain children and are henceforth allowed.

[23] See section 3.1.5.1 Evaluating Static and Dynamic Function Calls in the XPath 3.0 Recommendation.

[24] See page 4 of his paper Novatchev 2001.

[25] See section 3.6.1 Dependencies between Packages in XSLT 3.0 Working Draft.

[26] See Bug 25679 in W3C's BugZilla, which discusses the implications of allowing streamable stylesheet functions.

[27] See the resolution to Public XSLT Bug 25185 on how crawling posture is now allowed in atomizing contexts.

[28] See section 3.1.7 Inline Function Expressions in the XPath 3.0 Recommendation.

[29] See section 19.8.7.14 Streamability of Inline Function Declarations in the XSLT 3.0 Working Draft.

[31] See section 19.8.7.12 Streamability of Function Calls in the XSLT 3.0 Working Draft.

[32] See section 3.1.6 Named Function References in the XPath 3.0 Recommendation.

[33] See section 19.8.7.13 Streamability of Named Function References in the XSLT 3.0 Working Draft.

[34] See section 3.2.2 Dynamic Function Call in the XPath 3.0 Recommendation.

[35] See section 19.8.7.9 Streamability of Dynamic Function Calls in the XSLT 3.0 Working Draft.

[36] The next version will probably be a Candidate Recommendation, considering that the current status is Last Call Working Draft, meaning that the specification is ready for implementors and only bugs found that come from implementation issues will be fixed. No new features will be added.

[37] Any streamable construct will work exactly the same in a non-streaming scenario. Functions or modes written with streaming in mind can be used with both streaming and non-streaming input without alterations.

Author's keywords for this paper: XML; XSLT 3.0; XPath 3.0; Streaming; Streamable-stylesheet-functions; Guaranteed-streamability; Stylesheet-functions; User-defined-functions; Packages; Recursive-functions