Using XProc, XSLT 2.0, and XSD 1.1 to Validate RESTful Services

[Prev][Next]
[Prev][Next]
Jorge Luis Williams and David Cramer

Rackspace

08 Aug 2012

Background

[Prev][Next]

What is Rackspace?

[Prev][Next]
  • A hosting company

  • A hosting company

  • The Open Cloud Company

  • Known for Fanatical Support®

What is OpenStack?

[Prev][Next]
  • The open alternative to cloud lock-in

  • A highly engaged community of over 3,300 individuals and over 180 companies including Rackspace, Dell, HP, IBM, Red Hat, and many others

  • Android to AWS's iOS

What is REST?

[Prev][Next]

Very briefly and over-simplified:

  • An acronym for “REpresentational State Transfer

  • an architectural style for designing APIs

  • REST is not a technology or programming language

  • the architecture of the Web adapted as a platform for any kind of application

  • resources addressable by URIs

  • representations with media types

RESTful API documentation

[Prev][Next]
  • RESTful APIs require a contract and documentation

  • Inherently a tedious and repetitive task

  • Documentation was doomed get out of sync with the service

  • Not like documenting a Java API where you can just use Javadoc

  • We needed a domain specific language to describe the API

  • We needed an open source toolchain we could share with OpenStack

  • We look at several options (Swagger, Mashery, Apigee, creating our own DSL, etc), but also discovered WADL...

What is WADL?

[Prev][Next]

An acronym for “Web Application Description Language

An XML vocabulary designed to describe RESTful APIs.

Strengths:

  • Flexibility

  • Content Reuse

  • Assertions

  • Inline Documentation

  • Machine readable and, since it's XML, easy to integrate with our existing toolchain

Basic Structure of a WADL

[Prev][Next]
  1 <?xml version="1.0" encoding="UTF-8"?>
  2 
    <application xmlns="http://wadl.dev.java.net/2009/02"
  4              xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <resources base="http://localhost/">
  6         <resource path="path/to/record/{date}">
                <param style="template" type="xs:date" name="date"/>
  8             <method name="GET"/>
            </resource>
 10     </resources>
    </application>

Flexibility in WADL

[Prev][Next]

Tree structure (can also be mixed path and tree):

  1 <application xmlns="http://wadl.dev.java.net/2009/02"
  2     xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <resources base="http://localhost/">
  4         <resource path="path">
                <resource path="to">
  6                 <resource path="record">
                        <resource path="{date}">
  8                         <param style="template" type="xs:date" name="date"/>
                            <method name="GET"/>
 10                     </resource>
                    </resource>
 12             </resource>
            </resource>
 14     </resources>
    </application>

Content Reuse in WADL

[Prev][Next]
  1 <application xmlns="http://wadl.dev.java.net/2009/02"
  2              xmlns:wapi="http://widget/api/v1">
        <grammars>
  4         <include href="xsd/widget.xsd"/>
        </grammars>
  6     <resources base="https://test.api.openstack.com">
            <resource path="widgets">
  8             <method href="#getMetadata"/>
            </resource>
 10         <resource path="gadgets">
                <method href="#getMetadata"/>
 12         </resource>
        </resources>
 14     <method name="GET" id="getMetadata">
            <response status="200 203">
 16             <representation mediaType="application/xml" element="wapi:metadata"/>
            </response>
 18     </method>
    </application>
  • Resources can also be of one or more resource_type where a resource_type is an element that contains resources, params, and methods.

  • Methods and resource_types can be referred to within wadls and between wadls.

Assertions in WADL

[Prev][Next]

  1  <method name="GET" id="versionDetails">
  2     <response status="200 203">
            <representation mediaType="application/xml" element="common:version">
  4             <param name="location" style="plain" type="xsd:anyURI"
                       required="true"
  6                    path="/common:version/atom:link[@rel='self']/@href">
                    <link resource_type="#VersionDetails" rel="self"/>
  8             </param>
            </representation>
 10     </response>
    </method> 

Inside the representation of the response that this is a valid xpath and the type must be of this type. Note that required="true".

Documentation in WADL

[Prev][Next]
  1     ...
  2     <method name="GET" id="getMetadata">
            <doc xml:lang="EN" title="Get metadata"> 
  4           Ipsum lorem. 
             <doc>
  6         <response status="200 203">
                <representation mediaType="application/xml" element="wapi:metadata"/>
  8         </response>
        </method>

Challenges

[Prev][Next]

If you use all these indirection features:

  • WADL can be hard to get right

  • WADL can be hard to process

In any case, while WADL is as human readable as any XML, it's not documentation.

Tools to the rescue

[Prev][Next]

  • oXygen framework for editing wadl

  • Wadl normalizer

  • wadl2docbook

WADL in DocBookDocBookRackBook

[Prev][Next]
  1 <section>
  2     <title>Volume Lists</title>
        <para>
  4         These operations provide a list of volumes associated
            with a particular tenant. Volumes contain a status
  6         attribute that can be...
        </para>
  8     <resources xmlns="http://wadl.dev.java.net/2009/02">
            <resource href="os-block-storage-1.0.wadl#Volumes">
 10             <method href="listVolumes"/>
            </resource>
 12     </resources>
    </section>

DocBook in WADL

[Prev][Next]
  1     ...
  2     <method name="GET" id="getMetadata">
            <wadl:doc xml:lang="EN" title="Get metadata"  
  4                   xmlns="http://docbook.org/ns/docbook"> 
              <para>Ipsum lorem:
  6             <itemizedlist>
                  <listitem><para>Ipsum</para></listitem>
  8               <listitem><para>Lorem</para></listitem>
                </itemizedlist>
 10 		  </para>
             <wadl:doc>
 12         <response status="200 203">
                <representation mediaType="application/xml" element="wapi:metadata"/>
 14         </response>
        </method>

The XProc pipeline

[Prev][Next]
  1. Processes the RackBook XML and collects a list of WADLs used in the wadl (remove duplicates)

  2. Normalizes each WADL

  3. Resolves pointers to wadl in DocBook to pull in the actual wadl

  4. Converts wadl to DocBook

  5. Processes the DocBook to one or more output formats (pdf, webhelp html, new 'war' format).

WADL to API Reference

[Prev][Next]
  1 <?xml version="1.0" encoding="UTF-8"?>
  2 <!-- Volume Methods -->
    <method name="GET" id="listVolumes">
  4     <doc xml:lang="EN" title="List Volumes">
            <db:para role="shortdesc">
  6             List all volumes (IDs, names, links).
            </db:para>
  8         <db:para>
                A list of volumes. Each volume contains IDs, names, and
 10             links -- other attributes are omitted.
            </db:para>
 12     </doc>
        <request>
 14         <param name="changes-since" style="query" required="false" type="xsd:dateTime"/>
            <param name="type" style="query" required="false" type="osapi:UUID"/>
 16         <param name="backup" style="query" required="false" type="osapi:UUID"/>
            <param name="name"   style="query" required="false" type="xsd:string"/>
 18         <param name="marker" style="query" required="false" type="osapi:UUID"/>
            <param name="limit"  style="query" required="false" type="xsd:int"/>
 20     </request>
        <response status="200 203">
 22         <representation mediaType="application/xml" element="bs:volumes">
                <doc xml:lang="EN">
 24                 <xsdxt:code href="samples/core/volumes-sparse.xml" />
                </doc>
 26         </representation>
            <representation mediaType="application/json">
 28             <doc xml:lang="EN">
                    <xsdxt:code href="samples/core/volumes-sparse.json" />
 30             </doc>
            </representation>
 32     </response>
        <!-- Common Faults -->
 34     <response>
            <representation mediaType="application/xml" element="bs:blockstorageFault"/>
 36         <representation mediaType="application/json"/>
        </response>
 38     <response status="503">
            <representation mediaType="application/xml" element="bs:serviceUnavailable"/>
 40         <representation mediaType="application/json"/>
        </response>
 42     <response status="401">
            <representation mediaType="application/xml" element="bs:unauthorized"/>
 44         <representation mediaType="application/json"/>
        </response>
 46     <response status="403">
            <representation mediaType="application/xml" element="bs:forbidden"/>
 48         <representation mediaType="application/json"/>
        </response>
 50 </method>

PDF Output

[Prev][Next]

Output

The Validation Problem

[Prev][Next]

The Validation Problem

[Prev][Next]
  • Usually, REST service starts as a document

  • Implementation (sometimes in the form of a mock is created)

  • The implementaion and clients provide feedback, this leads to changes.

    • Docs / impl must remain in synch, through out the process.

    • Challenging / Error prone.

  • Eventually, the implementation stabalizes and the document becomes a contract.

    • Used by clients and alternate implementations.

The Validation Problem

[Prev][Next]

The implementation and its docs must remain stable and consistent in the face of:

  • Bug fixes

  • Updates

  • Enhancements

  • New Features / Extensions.

This too is challenging and error prone

The Validation Problem

[Prev][Next]

It's important to test that the implementaiton and the docs that describe it conform to one another.

However…

  • QA/QE teams are not focused on document conformance. They consume the docs to write tests.

  • Focused almost exclusively on the functionality of the service.

  • In our experience, tests and implementation tend to drift together (inadvertently) unless there is detail review.

  • We usually don't find errors until alternate implementations and new clients come on the scene.

    • Fixing errors at this point is hard!

Goals

[Prev][Next]
  • Catch errors, descrepencies early

  • Introduce doc conformance testing into existing functional testing process in an automated way

  • Incorporate the process into existing services…without having to create a new test suite

Validation via Intermediary

[Prev][Next]

Use our existing documentation pipeline to generate validation rules that a validator can check in a layer between functional tests and the service

Figure 1. A REST validator

Validator Requirements

[Prev][Next]
  • A validator needs to be able to tell the difference between an HTTP message that

    • …meets all the criteria defined in the documentation

    • …violates the criteria docs

  • The validator should be able to categorize messages that are not valid based on the expected error code, so that the response code generate by the service can be verified

  • Accepting messages that meet some criteria is a common problem in Computer Science…

Validating HTTP with Automata

[Prev][Next]

…one techinque for solving the problem of accepting messages is to utilize an automaton

An automaton is a state machine that…

  • Transitions from a start state to other states based on current input

  • If after reading the input the machine is in an accept the message is accepted, otherwise the message is rejected

WADL to Automaton

[Prev][Next]

The idea is to translate a WADL either…

  • stand alone

  • or from RackBook document

Example 1. Initial WADL
  1 <?xml version="1.0" encoding="UTF-8"?>
  2 
    <application xmlns="http://wadl.dev.java.net/2009/02"
  4              xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <resources base="http://localhost/">
  6         <resource path="path/to/record/{date}">
                <param style="template" type="xs:date" name="date"/>
  8             <method name="GET"/>
            </resource>
 10     </resources>
    </application>

WADL to Automaton

[Prev][Next]

The idea is to translate a WADL either…

  • stand alone

  • or from RackBook document

…to a representation of an automaton that can be used to validate messages between function tests and REST service.

Figure 2. Resulting Automaton

Automaton Example

[Prev][Next]

Automaton Example

[Prev][Next]

Accepts…

GET /path/to/record/{date}

…where {date} is a an xs:date.

Automaton Example

[Prev][Next]

Three accept states...

  1. SA: accepts HTTP messages that follow the constraints defined by the API

  2. d30U: accepts HTTP messages for which a 404 (Not Found) response should be expected

  3. d30M: accepts HTTP messages for which a 405 (Method Not Allowed) should be expected

Automaton Example

[Prev][Next]
  • Match URI first, one path segment at a time

  • Only try to match against the method when there are no path segments

  • In this example, once the method is read, we're done.

Automaton Example

[Prev][Next]
  • U(regex), U(QName)   Proceed to this state if the path segment matches the regex, or validates against the type defined by the QName.

  • U!(regex), U!(QName)   Proceed to this state if the path segment does not match the regex, or does not validate against the type defined by the QName.

Automaton Example

[Prev][Next]
  • M(regex)   Proceed to this state if the HTTP method matches the regex.

  • M!(regex)   Proceed to this state if the HTTP method does not match the regex.

  • ε   Proceed to this state no matter what the input is.

Automaton Example

[Prev][Next]
GET /path/to/record/2001-01-02
  • Expected Result  200 Okay

  • States Traveled  S0, d18e4, d18e5, d18e6, d18e7, d18e9, SA

Automaton Example

[Prev][Next]
GET /my/path/
  • Expected Result  404 Not Found

  • States Traveled  S0, d30U, d30U, d30U

Automaton Example

[Prev][Next]
PUT /path/to/record/2001-01-02
  • Expected Result  405 Bad Method

  • States Traveled  S0, d18e4, d18e5, d18e6, d18e7, d30M

Checker Format

[Prev][Next]

Checker Format

[Prev][Next]

Initially, the WADL is translated into checker format an XML representation of the automaton

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 <checker xmlns="http://www.rackspace.com/repose/wadl/checker"
             xmlns:xs="http://www.w3.org/2001/XMLSchema">
  4    <step id="S0" type="START" next="d18e4 SE1 d21e2u"/>
       <step type="URL_FAIL" id="d21e2u" notMatch="path"/>
  6    <step type="URL" id="d18e4" match="path" next="d18e5 SE1 d21e3u"/>
       <step type="URL_FAIL" id="d21e3u" notMatch="to"/>
  8    <step type="URL" id="d18e5" match="to" next="d18e6 SE1 d21e4u"/>
       <step type="URL_FAIL" id="d21e4u" notMatch="record"/>
 10    <step type="URL" id="d18e6" match="record" next="d18e7 SE1 d21e5u"/>
       <step type="URL_FAIL" id="d21e5u" notTypes="xs:date"/>
 12    <step type="URLXSD"
             id="d18e7"
 14          match="xs:date"
             label="date"
 16          next="d18e9 d21e6m SE0"/>
       <step type="METHOD_FAIL" id="d21e6m" notMatch="GET"/>
 18    <step type="METHOD" id="d18e9" match="GET" next="SA"/>
       <step id="SE0" type="URL_FAIL"/>
 20    <step id="SE1" type="METHOD_FAIL"/>
       <step id="SA" type="ACCEPT"/>
 22 </checker>

Checker Format

[Prev][Next]

Each element maps to a state in the machine

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 <checker xmlns="http://www.rackspace.com/repose/wadl/checker"
             xmlns:xs="http://www.w3.org/2001/XMLSchema">
  4    <step id="S0" type="START" next="d18e4 SE1 d21e2u"/>
       <step type="URL_FAIL" id="d21e2u" notMatch="path"/>
  6    <step type="URL" id="d18e4" match="path" next="d18e5 SE1 d21e3u"/>
       <step type="URL_FAIL" id="d21e3u" notMatch="to"/>
  8    <step type="URL" id="d18e5" match="to" next="d18e6 SE1 d21e4u"/>
       <step type="URL_FAIL" id="d21e4u" notMatch="record"/>
 10    <step type="URL" id="d18e6" match="record" next="d18e7 SE1 d21e5u"/>
       <step type="URL_FAIL" id="d21e5u" notTypes="xs:date"/>
 12    <step type="URLXSD"
             id="d18e7"
 14          match="xs:date"
             label="date"
 16          next="d18e9 d21e6m SE0"/>
       <step type="METHOD_FAIL" id="d21e6m" notMatch="GET"/>
 18    <step type="METHOD" id="d18e9" match="GET" next="SA"/>
       <step id="SE0" type="URL_FAIL"/>
 20    <step id="SE1" type="METHOD_FAIL"/>
       <step id="SA" type="ACCEPT"/>
 22 </checker>

Checker Format

[Prev][Next]

All steps contain @id of type xs:ID

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 <checker xmlns="http://www.rackspace.com/repose/wadl/checker"
             xmlns:xs="http://www.w3.org/2001/XMLSchema">
  4    <step id="S0" type="START" next="d18e4 SE1 d21e2u"/>
       <step type="URL_FAIL" id="d21e2u" notMatch="path"/>
  6    <step type="URL" id="d18e4" match="path" next="d18e5 SE1 d21e3u"/>
       <step type="URL_FAIL" id="d21e3u" notMatch="to"/>
  8    <step type="URL" id="d18e5" match="to" next="d18e6 SE1 d21e4u"/>
       <step type="URL_FAIL" id="d21e4u" notMatch="record"/>
 10    <step type="URL" id="d18e6" match="record" next="d18e7 SE1 d21e5u"/>
       <step type="URL_FAIL" id="d21e5u" notTypes="xs:date"/>
 12    <step type="URLXSD"
             id="d18e7"
 14          match="xs:date"
             label="date"
 16          next="d18e9 d21e6m SE0"/>
       <step type="METHOD_FAIL" id="d21e6m" notMatch="GET"/>
 18    <step type="METHOD" id="d18e9" match="GET" next="SA"/>
       <step id="SE0" type="URL_FAIL"/>
 20    <step id="SE1" type="METHOD_FAIL"/>
       <step id="SA" type="ACCEPT"/>
 22 </checker>

Checker Format

[Prev][Next]

Next steps are connected via @next type xs:IDREFs

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 <checker xmlns="http://www.rackspace.com/repose/wadl/checker"
             xmlns:xs="http://www.w3.org/2001/XMLSchema">
  4    <step id="S0" type="START" next="d18e4 SE1 d21e2u"/>
       <step type="URL_FAIL" id="d21e2u" notMatch="path"/>
  6    <step type="URL" id="d18e4" match="path" next="d18e5 SE1 d21e3u"/>
       <step type="URL_FAIL" id="d21e3u" notMatch="to"/>
  8    <step type="URL" id="d18e5" match="to" next="d18e6 SE1 d21e4u"/>
       <step type="URL_FAIL" id="d21e4u" notMatch="record"/>
 10    <step type="URL" id="d18e6" match="record" next="d18e7 SE1 d21e5u"/>
       <step type="URL_FAIL" id="d21e5u" notTypes="xs:date"/>
 12    <step type="URLXSD"
             id="d18e7"
 14          match="xs:date"
             label="date"
 16          next="d18e9 d21e6m SE0"/>
       <step type="METHOD_FAIL" id="d21e6m" notMatch="GET"/>
 18    <step type="METHOD" id="d18e9" match="GET" next="SA"/>
       <step id="SE0" type="URL_FAIL"/>
 20    <step id="SE1" type="METHOD_FAIL"/>
       <step id="SA" type="ACCEPT"/>
 22 </checker>

Checker Format

[Prev][Next]

The format currently supports steps that check all aspects of the HTTP request, including …

  • URI Path Segment (Regex, XSD 1.1 simple type)

  • Required Headers (Regex, XSD 1.1 simple type)

  • Method (Regex)

  • Content Type

  • Well formness, XML and JSON

  • XML content via W3C XML Schema 1.1 Impl (Xerces, Saxon)

  • XPath assertions for XML (XPath 2.0)

    • Root element at a URI

    • Required plain parameters

Checker Format

[Prev][Next]

There is also a step that performs an XSL transformation

Checker Format

[Prev][Next]

Transformation is useful when using plain parameters to extend or a restrict a type based on context.

  1 <!-- WADL -->
  2 <resource path="widget"
              xmlns="http://wadl.dev.java.net/2009/02"
  4           xmlns:widget="http://rackspace.com/sample/widget">
        <method name="GET">
  6         <response>
                <representation mediaType="application/xml"
  8                             element="widget:widget"/>
            </response>
 10     </method>
        <method name="POST">
 12         <request>
                <representation mediaType="application/xml"
 14                             element="widget:widget">
                    <param name="widget" style="plain"
 16                        path="/widget:widget"
                           type="widget:WidgetForCreate"/>
 18             </representation>
            </request>
 20     </method>
    </resource>

Checker Format

[Prev][Next]

Widget Request (before XSLT step)

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 
    <widget xmlns="http://rackspace.com/sample/widget"
  4         name="MyWidget"/>

Checker Format

[Prev][Next]

Widget Request (after XSLT step, before XSD step)

  1 <?xml version="1.0" encoding="UTF-8"?>
  2 
    <widget xmlns="http://rackspace.com/sample/widget"
  4     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        name="MyWidget"  xsi:type="WidgetForCreate"/>

Optimizations

[Prev][Next]

Optimizations

[Prev][Next]

The automata created to validate REST services may be quite complex, often involving many states and connections.

OpenStack Compute API v2

[Prev][Next]

Optimizations

[Prev][Next]
  • The translation form WADL to checker format is not concerned with optimizing the number of steps — it's only concerned with the accuracy of the machine created.

  • Optimization stages are allowed.

  • An optimization stage is an XSLT that takes a checker format converts to one with less steps, but which accepts the same input

  • Optimization stages can be chained together

OpenStack Compute API v2

[Prev][Next]

OpenStack Compute API v2, optimizations

[Prev][Next]

Current Optimizations

[Prev][Next]
  • Remove duplicate checks

  • Combine multiple XPath steps into a single XSLT step

  • Other optimizations are possible

The Validation Pipeline

[Prev][Next]

The Validation Pipeline

[Prev][Next]
  • The first three parts of the pipeline are shared with our general documentation pipeline

  • The stages utilize XProc, XSLT2, and XSD 1.1 to generate the final checker document

  • The final stage, creates an immutable data structure that holds the representation of the automaton used at runtime

The Validation Pipeline

[Prev][Next]
  • The pipeline is executed as preprocessing step

    • Only the immutable datastructure is required at runtime

  • Because the runtime datastructure is immutable validation is threadsafe

    • Multiple HTTP request can be validated simultaneously

The Validation Pipeline

[Prev][Next]

Various aspects of the pipeline are configurable

  • The strictness of the validation, what kinds of steps are added to the checker

  • Specialized options for each individual steps

    • XPath version (1.0/2.0)

    • XSD implementation to use (Saxon EE, Xerces)

    • XSLT implementation (Xalan, XSLTC, Saxon)

  • What optimization stages to use, if any

Other Use Cases

[Prev][Next]

Other Use Cases

[Prev][Next]
  • Once an implementation was created, it became evident that the pipeline can be used to solve other problems

  • Some of the additional usecases became available because the validation process was more efficient than expected

Other Use Cases: Filtering and Error Reporting

[Prev][Next]
  • Requests that would otherwise result in an error condition, can be culled before they reach the backend service — this potentially save processing time

  • In this case, the validator can form much better error messages. For example:

    404 Not found /path/to/{widgets}, got widgets but expecting widget | gadget

Other Use Cases: Authorization

[Prev][Next]
  • Assuming the validator can filter out bad request, assign users with different privledges different validators

  • For example, an Admin user can make calls not available to regular users.

Other Use Cases: API Coverage

[Prev][Next]
  • Take an unoptimized automata and track what states are visited while running tests…

  • …this should give you an indication of API coverage.

Challenges

[Prev][Next]

XSD 1.1, early adoption

[Prev][Next]
  • OpenStack APIs are extensible so XSD 1.0 is not an option, we chose XSD 1.1

  • Since OpenStack is an open and free platform, we have the goal of ensuring that everything that we develop remains open

  • We make sure we support Xerces (free opensource), and Saxon EE (proprietary)

  • During the development of the pipeline we have encountered a number of errors with the Xerces implementation

  • To date we've been unable to use Xerces successfully in production (In fairness XSD 1.1 support is still beta, most bugs resolved quickly, but still finding bugs, performance issues)

  • We've come to rely on XSD 1.1 features when there is yet a full, production ready, and free open source XSD 1.1 implementation -- looking at Schematron, RelaxNG

XPaths and XSLT

[Prev][Next]
  • WADL uses XPath (… and JSONPath) — no easy way to handle these in XSLT, where as QNames are easily handled

  • Keeping the namespace prefixes correct after various levels of transformation can be challenging

  • The current implementation is copying all relevant namespace nodes when it sees an XPath

  • There are still problems esp. when there is contention for the default namespace

  • We're looking at parsing XPath 2.0 in XSLT — maybe via an extension

Future Work

[Prev][Next]

Future Work

[Prev][Next]
  • Our inital goals have not yet been met. Most request checks done, but we still need …

    • checks on http response

    • better reporting

  • Better JSON support …

    • JSONPath

    • JSONSchema

  • Other validation languages in XML …

    • RelaxNG

    • Schematron

  • Plan to support extended use cases

    • Authorization

    • API coverage

Software

[Prev][Next]

Our software is available on github

[Prev][Next]

Questions?

[Prev][Next]