How to cite this paper

Lenz, Evan. “Carrot: An appetizing hybrid of XQuery and XSLT.” Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). https://doi.org/10.4242/BalisageVol7.Lenz01.

Balisage: The Markup Conference 2011
August 2 - 5, 2011

Balisage Paper: Carrot

An appetizing hybrid of XQuery and XSLT

Evan Lenz

Software Developer, Community

MarkLogic Corporation

Evan Lenz has been a specialist in XML technologies since 1999, having served on the W3C XSL Working Group, written XML-related books and articles, and spoken at numerous conferences. He is currently working for MarkLogic Corporation.

Copyright © 2011 MarkLogic Corporation

Abstract

On the surface, XQuery and XSLT are very different languages. Users tend to prefer one language or the other. XSLT users are loath to give up the power of template rules; on the other hand, users of XQuery prefer its concise, composable syntax and perhaps wouldn't dare writing code in XML. There are good historical reasons why they are not the same language. For one thing, XSLT came first, and XQuery was designed more with SQL users in mind. However, the two languages share the same data model and a large syntactic subset (XPath 2.0), which raises the question: Is there a way to yield the unique benefits of both languages without having to continually decide between the two? The answer is yes. Carrot, an appetizing hybrid of XQuery and XSLT, lets you have your cake and eat it too.

Table of Contents

Introduction
Background and influences
Introduction by example
Carrot definitions
Global variables
Functions
Rules
Carrot expressions
Ruleset invocations
Shallow copy constructors
Text node literals
Expression semantics
What about xsl:for-each, xsl:for-each-group, etc.?
Implementation strategy
Future directions
Simple mapping operator
Mode merging
Underlying language development

Introduction

Carrot combines the best that XQuery and XSLT have to offer:

  • the friendly syntax and composability of XQuery expressions, plus

  • the power and flexibility of template rules in XSLT.

Carrot can also be (loosely) thought of as an alternative, more composable syntax for XSLT.

Background and influences

Carrot is not the first XSLT-inspired project to provide a shorter syntax than XSLT itself. Syntax shorthands have included Paul Tchistopolskii's XSLScript, Sam Wilmott's RXSLT, and another project called XSLTXT. Although none of these projects provided direct inspiration for Carrot, they all address one of the same desires that Carrot addresses: being able to program in XSLT more concisely. However, unlike these projects, Carrot addresses more than XSLT's verbosity. It also addresses XSLT's limited composability. For example, in XSLT you can't include an element constructor in a path expression (like you can in XQuery and Carrot) or apply templates inside a path expression (which you can uniquely do in Carrot).

A more direct inspiration was James Clark's proposal for Unifying XSLT and XQuery element construction. Written during the early days of the W3C activity on XQuery, that proposal suggested that XQuery and XSLT language constructs could be used interchangeably if XQuery used an XML-based syntax (via a simple document element wrapper). As we now know, things didn't turn out that way. Carrot takes essentially the opposite approach. Rather than make XQuery use an XML-based syntax like XSLT's, make XSLT (Carrot, actually) use a non-XML-based syntax like XQuery's.

Carrot is also inspired by Haskell's syntax, which defines functions using pattern-matching and an equation-like syntax.

Introduction by example

Carrot is best understood by example. Here's an example of XSLT's syntax for a template rule (henceforth "rule"):

<xsl:template match="para">
  <p>
    <xsl:apply-templates/>
  <p>
</xsl:template>

In Carrot, you'd write the above rule like this:

^(para) := <p>{^()}</p>;

There are a few things to note about the above. To define a rule in Carrot, you use the same operator that XQuery uses for binding variables (:=). Everything on the right-hand side up to the semi-colon is an expression in Carrot. An expression in Carrot is simply an XQuery expression, plus some extensions. In this case, the expression is using the extended syntax for invoking rules:

^()

which is short for:

^(node())

just as:

<xsl:apply-templates/>

is short for:

<xsl:apply-templates select="node()"/>

All rules belong to a ruleset (equivalent to a "mode" in XSLT). The above examples use the unnamed ruleset (there's just one of these). Here's an example that belongs to a ruleset named "toc":

^toc(section) := <li>{ ^toc() }</li>;

The above is short for:

<xsl:template match="section" mode="toc">
  <li>
    <xsl:apply-templates mode="toc"/>
  </li>
</xsl:template>

Here's the identity transform in Carrot:

^(@*|node()) := copy{ ^(@*|node()) };

This recursively copies the input to the output, one node at a time.

Here's a Carrot script that creates an HTML document with dynamic content for its title and body, converting <para> elements in the input to <p> elements in the output:

^(/) :=
 <html>
   <head>
     { /doc/title }
   </head>
   <body>
     { ^(/doc/para) }
   </body>
 </html>;

^(para) := <p>{ ^() }</p>;

As a comparison, here's what you'd have to write if you were using regular XSLT:

<xsl:transform version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/">
    <html>
      <head>
        <xsl:copy-of select="/doc/title"/>
      </head>
      <body>
        <xsl:apply-templates select="/doc/para"/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="para">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>

</xsl:stylesheet>

Just as in XSLT, rules in Carrot can be associated with more than one mode. In XSLT, this template rule belongs to two modes:

<xsl:template mode="foo bar" match="bang"/>

Here's the equivalent rule in Carrot, belonging to two rulesets:

^foo|bar(bang) := ();

Carrot definitions

A Carrot module consists of a set of unordered definitions. Unlike XQuery, there is no distinction between main modules and library modules. Likewise, a Carrot module has no "body." Instead, there are only definitions. Carrot is more like XSLT in this regard. Also unlike XQuery, Carrot modules need not be associated with a namespace.

There are three kinds of definitions in Carrot:

  • global variables,

  • functions, and

  • rules.

Global variables

A global variable definition is very similar to a variable declaration in XQuery, except that you don't need the "declare variable" verbiage. Whereas in XQuery you would write:

declare variable $foo := "a string value";

In Carrot you would instead write:

$foo := "a string value";

Functions

A function definition is just like a function declaration in XQuery except that you don't need the "declare function" verbiage and, instead of curly braces, you use the same binding operator (:=) as a variable definition. For example, whereas in XQuery, you would declare functions like this:

declare function my:foo() { "return value" };
declare function my:bar($str as xs:string) as xs:string { upper-case($str) };

In Carrot, you would instead write:

my:foo() := "return value";
my:bar($str as xs:string) as xs:string := upper-case($str);

Why not just use the regular XQuery syntax? Two reasons: conciseness (lower signal-to-noise ratio) and consistency (with the other two types of definitions).

Rules

The third type of definition is a rule. This corresponds to a template rule in XSLT. For example, this rule matches any element node (*):

^foo(*) := "return value";

Unlike a function definition, the "argument" of a rule definition ("*" in the above case) is not an (optional) formal parameter list; instead it is a required pattern (as XSLT defines a pattern). Thus, it's illegal to have an empty set of parentheses in a rule definition:

^foo() := "return value"; (: NOT LEGAL :)

Note the asymmetry with ruleset invocations, where it is legal to call ^foo(), which is short for ^foo(node()).

Of course, rules can also have parameters (just as template rules can have parameters in XSLT). The syntax for declaring these is very similar to an XQuery function parameter list, except that it comes after the pattern and is separated from the pattern by a semicolon:

^foo(* ; $str as xs:string) := concat($str, .);

Carrot also supports tunnel parameters, as in XSLT. To indicate a tunnel parameter, you add the keyword "tunnel" before the parameter:

^foo(* ; tunnel $str as xs:string) := concat($str, .);

Unlike XQuery functions, parameters in a rule are identified by name, not position. Thus the syntax for passing them looks very similar to how they are declared, and the order of parameters is insignificant. The following expression applies the "foo" ruleset to the context node, passing the tunnel parameter $str with the value "Hello":

^foo(. ; tunnel $str := "Hello")

What about conflict resolution among multiple matching rules? Carrot behaves the same as XSLT: rules with higher import precedence win, followed by rules with higher priority. Default priority is based on the syntax of the pattern, just as in XSLT. You can also specify the priority explicitly (right before the binding operator :=), as in the first rule of this example, which explicitly sets the priority to 1:

^author-listing( author[1]      ) 1 :=           ^();
^author-listing( author         )   := ", "    , ^();
^author-listing( author[last()] )   := " and " , ^();

Carrot expressions

The right-hand side of a Carrot definition, whether it be a variable, function, or rule, is a Carrot expression. The context for the expression evaluation is the same as it is for sequence constructors within a template rule in XSLT. For example, the context node is the node matched by the rule's pattern.

A Carrot expression is an XQuery expression with some extensions:

  • ruleset invocations — ^mode(nodes)

  • shallow copy{…} constructors

  • text node literals — `my text node`

Let's look at each of these extensions in turn and the rationale behind each one.

Ruleset invocations

Ruleset invocations (i.e., "apply-templates" in XSLT) are largely Carrot's raison d'etre. They are not possible in XQuery; thus, the extension is required. Not only that, but XSLT can't invoke rules (apply templates) in an expression either. In Carrot, all definitions are bound to an expression, so the only way to "do" anything is to write an expression. (Unlike XSLT, Carrot does not make a distinction between "instructions" and "expressions"; everything is an expression.)

Shallow copy constructors

Shallow copy constructors are possible in XSLT but not XQuery. The difference between a copy constructor and using an XQuery element constructor is that, in the latter case, the namespace context comes from the query rather than the source document. XQuery allows you to perform deep element copies from the source document, but not shallow copies. Without this ability, modified identity transforms are impractical in XQuery. The semantics of Carrot's copy constructor are essentially the same as XSLT's <xsl:copy> instruction. For example, when the context node is not an element node, it behaves the same as if a deep copy were being performed.

Note

XSLT 2.1/3.0 promises to add a "select" attribute to <xsl:copy> to make it convenient to perform a shallow copy of a node other than the context node. This is largely unnecessary in Carrot, since copy constructors can be easily composed within an expression, making it convenient to write, for example, foo/copy{…}.

Text node literals

Carrot also adds text node literals, using the back-tick (`) for the delimiter. This extension may at first seem to be of minimal value, since XQuery already allows you to construct text nodes using text{…}, and strings using quotes (or apostrophes). However, in practice, text node literals will often be the preferred syntax, as the following examples should make clear. Consider the following template rules in XSLT:

<xsl:template mode="file-name" match="doc">doc</xsl:template>
<xsl:template mode="file-ext" match="doc">.xml</xsl:template>
 
<xsl:template match="/doc">
  <result>
    <xsl:apply-templates mode="file-name" select="."/>
    <xsl:apply-templates mode="file-ext" select="."/>
  </result>
</xsl:template>

In Carrot, you might naturally rewrite the above as follows:

^file-name(doc) := "doc";
^file-ext (doc) := ".xml";
^(/doc)         := <result>{ ^file-name(.), ^file-ext(.) }</result>

The problem is that this will produce an undesired result:

<result>doc .xml</result>

The extra space results because of the way in which sequences of atomic values are combined to make a text node in XQuery. Contiguous sequences of text nodes, on the other hand, are merged together without any intervening spaces, so you could fix things by using explicit text node constructors:

^file-name(doc) := text{"doc"};
^file-ext (doc) := text{".xml"};

The problem here is that it may be an edge case with a large syntactic cost if you want to cover your bases (six extra characters for every text node). If in 90% of cases, using a string will result in the exact same behavior as if you had used a text node, you will be strongly tempted as a user to use quotes instead of text{…} everywhere. However, you will get bugs in the remaining 10% of your code because of the way sequences of strings are concatenated to make a text node in XQuery.

Whereas it's more verbose in XQuery to construct a text node (using text{…}) than it is to return a string (using quotes), it's more verbose in XSLT to return a string (using <xsl:sequence>) than it is to return a text node (using a literal text node in the stylesheet). Text node literals in Carrot address this imbalance by making it equally convenient to create text nodes and strings. Thus, we naturally rewrite our Carrot definitions to get the desired result, without having to think about whether this is an edge case or not:

^file-name(doc) := `doc`;
^file-ext (doc) := `.xml`;

The existence of text node literals makes it easy to follow a simple rule: use text node literals when you are constructing part of a result document; use string literals when you know you want to return a string.

Expression semantics

Expressions in Carrot, unless otherwise noted here, are assumed to have the same semantics as in XQuery. Carrot operates on exactly the same data model as XQuery 1.0 and XPath 2.0.

One exception is that namespace attribute declarations on element constructors in Carrot do not affect the default element namespace for XPath expressions. Carrot is more like XSLT in this regard, in that it makes a distinction between the default namespace for input documents and the default namespace for output documents ("xpath-default-namespace" in XSLT), thereby correcting what is arguably a design bug in XQuery.

What about xsl:for-each, xsl:for-each-group, etc.?

Given that XQuery expressions do not include everything that it's possible to do in an XSLT template rule, that begs the question: What do all the XSLT instructions get mapped to in Carrot? In many cases, Carrot simply does not have an analogue. In some cases, that's because XQuery already provides a different way to achieve the same use case. For example, <xsl:for-each> does not have a direct analogue in Carrot. For iteration over a sequence, you can use "for" expressions, or even just "/" when applicable. The following Carrot (and XQuery) expression constructs a new <bar> element for each <foo> element, rendering <xsl:for-each> unnecessary for this case: foo/<bar/>. Similarly, Carrot does not support <xsl:sort>. For sorting sequences in Carrot, you would instead use "order by", as in XQuery. Local variables are defined using "let" expressions. Etc.

The biggest area not currently addressed by Carrot—and which remains an open question—is how to perform grouping. There are a few answers to this question, not all mutually exclusive:

  1. Extend Carrot to support grouping.

  2. Import an XSLT 2.0 stylesheet when you need grouping.

  3. Wait for grouping to be added to XQuery 3.0 expressions and use those.

At this stage, the operative answers to this question are #2 and #3.

Designing support for multiple output documents (corresponding to <xsl:result-document> in XSLT) and how it interacts with document{} node constructors is on my TODO list. (If you have ideas, I'd be happy to hear them.)

Implementation strategy

Carrot is being implemented by compilation to XSLT 2.0. Several things are worth noting about this:

  • Each Carrot module compiles to an XSLT 2.0 module.

  • Carrot can include and import other Carrot modules or XSLT modules.

  • Carrot can also import XQuery modules, but since this is not supported directly in XSLT 2.0, the semantics depend on your target XSLT processor (e.g., <saxon:import-query> in Saxon and <xdmp:import-module> in MarkLogic Server)

Carrot is still in the process of being defined more formally. The current strategy for defining and implementing Carrot is as follows:

  1. Create a BNF grammar for Carrot

    1. Hand-convert the EBNF grammar for XQuery expressions to BNF

    2. Extend the resulting BNF to support Carrot definitions and expressions

  2. Use yapp-xslt to generate the Carrot parser from the Carrot BNF

  3. Write a compiler in XSLT 2.0 to convert parsed Carrot modules to XSLT 2.0 modules

The syntax for other top-level constructs, such as namespace declarations, serialization options, and parameter definitions are still being worked out. Some mock-up examples can be found at the project's home page: http://github.com/evanlenz/carrot

Future directions

Carrot is both a practical tool and a research project. I'm trying to find the right balance between innovation and sticking to the syntax and/or semantics of XPath, XSLT, and XQuery. I'm excited by the future possibility of using XML-oriented scripting languages in the browser, as made possible by projects like Saxon-CE and XQIB. I'm convinced that XSLT's syntax is an obstacle to mainstream adoption as a browser scripting language. Carrot, or something like it, could help overcome such obstacles.

As a research project, the ideas at the heart of Carrot may possibly influence the longer-term W3C work, as XQuery and XSLT continue to move closer to each other. I'm already quite satisfied by the composability that Carrot provides in contrast to XSLT. That said, I'm always itching for more features in the XPath/XQuery/XSLT triumvirate. As a sample, here are two.

Simple mapping operator

I think XPath needs a "simple mapping operator" that behaves similarly to "/" except without its restrictions and special behavior with regard to node sequences. This is one possible extension that could be added to Carrot, without having to wait for XSLT/XQuery 3.0 (if it's even being considered for inclusion).

Mode merging

Another more recent idea (which would be straightforward to implement in Carrot) would be "mode merging."

In XSLT, a single template rule can declare itself to be a part of more than one mode. However, a single call to apply-templates cannot invoke rules in more than one mode. The ability to merge modes would provide a static mode extension mechanism, the chief benefit of course being that you wouldn't have to go add a new mode to each template rule's list of modes (and in the case when it's in the default mode, go add mode="#default new-mode" to each rule).

In XSLT:

<xsl:apply-templates mode="foo bar"/>

In Carrot:

^foo|bar()

This would be especially handy in multi-stage transformations where each stage of processing makes an incremental change to its input, but some stages need to handle things slightly differently, for example, to avoid transforming an already-converted element more than once. Mode merging would allow you to invoke statically determined subsets and supersets of rules.

Underlying language development

Finally, Carrot is a project that can grow with the languages it is based on. As various features are added in XSLT/XQuery 3.0, such as JSON support or the ability to apply templates to sequences of atomic values, Carrot will (happily) be updated accordingly.