Introduction

We are used to think about the file system as a tree of resources: folders and files. A folder is a named set of folders and files. A file is named content which in many cases (e.g. XML or JSON) can be represented by a tree of items. To this picture we add the RESTnet – the sum total of resources accessible via URI. As REST URIs usually contain a path, the logical structure of the RESTnet is a set of named file systems.

Using XQuery extension functions, not only XML, but also other mediatypes can be parsed into item trees – JSON, CSV, HTML. Leveraging ixml, many more file formats could be made available as item trees. With this in mind, we sketch the idealized picture of pervasive tree structure:

  • File system + RESTnet = a tree of resources

  • Resource = a tree of items

We call this tree structure a space of information – the infospace (see 6) for a precursor of this idea). In three-dimensional space, every point is identified by three coordinates, and two points can be connected by a vector, composed of three coordinates. The points of the infospace are tree nodes, representing a folder, a file or a file content item. The coordinates of the infospace are names – folder, file and item names. Vectors of the infospace are name paths. Nodes which are resources are identified by a path of folder and file names connecting them to the root of the space. Nodes which are file content items are identified by a path composed of two parts, one pointing to the resource, the other within the resource to the item. A simple path syntax lets us address any point and connect any pair of points. Adding to the syntax XPath capabilities, the path syntax becomes a path language – a tool of navigation, selection and aggregation.

This perspective is based on the determination to regard resources as nodes within a tree structure, rather than entities only identified by a URI. It makes tree thinking more pervasive, as it is not any more restricted to file contents. It applies to file system contents.

Of course we are used to thinking about resources as arranged in trees. It is our daily practice to navigate our file systems in a tree-minded way – opening and closing folders, going upwards and downwards, searching folder contents, etc. But technological reality is different. We have (e.g. when using XML technology) powerful means for navigating the tree structure of file content, but hardly anything for navigating the file system tree. As a consequence, the infospace idea may be perceived as an empty abstraction, without the promise of genuine usefulness. An infospace without a unified path language is not an infospace, as far as user experience is concerned.

This has significant consequences for the prospects of ixml. Its general approach how to turn any text format into node trees is a fundamental idea. But if the node trees are only used for small and very specific tasks, the promise of practical use is severely limited. One might ask for another fundamental idea lifting the concept onto a higher ground.

We notice the role which ixml may play in the unfolding of an infospace – making tree structure increasingly pervasive. The question arises if the infospace must remain an empty abstraction, or can become an experience, awakening a desire for further unfolding its technological reality.

This work reports our endeavour to crack the eggshell of the infospace abstraction. The Foxpath language (2, 3, 4) is a set of expressions turning the abstraction into an experience. We summarize the language briefly and present a few example uses thought to demonstrate the infospace at work. The examples are followed by an account of our effort to integrate ixml into Foxpath. Our vision is a unified space of information, created by a single, pervasive tree structure.

Part 1: A short introduction to the Foxpath language

1.1 Summary

Foxpath (2, 3, 4) is an extended version of XPath 3.0 (8). The main extensions are:

  • Foxpath expression – navigation of file system trees

  • Extension functions – simplified evaluation of navigation results

A syntactical change has been applied: the slash of XPath has been replaced with a backslash. In Foxpath, the slash is used as a separator between steps navigating the file system. In other words, a slash is followed by a step of file system navigation, and a backslash is followed by a step of file content navigation.

Taking the syntactical change into account, Foxpath is fully backward-compatible with XPath, that is, every valid XPath 3.0 expression is a valid Foxpath expression and yields the same result. The only exceptions to this rule are cases, in which XPath raises an error and Foxpath returns a value. The most important example concerns the left-hand side operand of the path operator (backslash). An atomic item triggers in XPath an error, whereas in Foxpath it is interpreted as a document URI and parsed into a node tree. This change enables the mixing of file system navigation and file content navigation within a single navigation path. A second example is the effective boolean value of an operand consisting of several strings or URIs. In Foxpath, the value true is returned, in XPath an error is raised. The change enables elegant filtering of folders by their content. A third example is the except operator, which in Foxpath can also be used for filtering URIs, not only nodes. This supports intuitive selections involving the explicit exclusion of certain resources.

1.2 The Foxpath expression

The Foxpath expression is very similar to the XPath expression, but navigates the file system tree, rather than file content. Navigation is based on relationships between URIs, not between XDM nodes (11). Note that Foxpath does not map folders and files to XDM nodes.

A Foxpath expression is a sequence of navigation steps, each one consisting of a navigation axis, a name test and optional predicates. The name test has glob expression syntax (5), supporting as wildcard characters * and ?. Steps are separated by slashes. The default axis is the child axis, and axis names have a tilde (~) appended, in order to be distinguished from XPath axes (example step: descendant~::*.css). Shortcut syntax as used by XPath is available (//, .., .).

First examples demonstrate how file system navigation is expressed as a sequence of navigation steps along navigation axes, which default to the child axis.

Table I

Examples 1 – steps along axes

Expression Description

child~::frameworks/child~::*

List frameworks

frameworks/*

As before, relying on the default axis

frameworks/tei/descendant~::*.xsl/parent~::*

Follow various navigation axes

frameworks/tei//*.xsl/..

As before, using shorthand syntax

Each navigation step may include one or several predicates, which are expressions evaluated to an effective boolean value (8). As in XPath, path steps are not necessarily navigation steps. In particular the last step is often used for representing or evaluating the result of the preceding steps.

Table II

Examples 2 – using predicates

Expression Description

frameworks//*[is-dir()][not(*)]

List empty folders

frameworks/*[.//*.xsl]

Filter folders by content

frameworks/*[.//*.xsl]/(.||' /'||count(.//*.xsl))

As before, attaching additional information

frameworks//*.xsd[empty(ancestor~::*xsd*)]

List all XSDs not contained by a folder with a name matching *xsd*

File system navigation and file content navigation can be mixed within a single expression. Typically, navigation starts with steps traversing the file system and is continued by steps drilling into file contents. Another typical pattern is file system navigation with predicates checking file contents.

Table III

Examples 3 – mixing file system and file content navigation

Expression Description

frameworks/tei//filterNodes.xsl\*\xsl:param\@name

Retrieve an attribute value

frameworks/tei//*.xsl[\*\@version = '3.0']

Filter files dependent on an attribute value

frameworks/tei//*.xsl[\*\@version = '3.0']/..

As before, and continue navigation

frameworks/jats//*.xml\\@href\resolve-link()\*\clark-name()

Follow links and enter target content

1.3 Extension functions

The term extension functions means Foxpath functions not included in the set of XPath standard functions (9). Many extension functions are available, simplifying the evaluation of navigation results, in particular data analysis and data aggregation.

Example 1 – function annotate() - annotate URIs (or other items) with additional information

List the frameworks containing XSLT stylesheets, annotated with the number of stylesheets contained.

> fox "frameworks/*[.//*.xsl]/annotate(count(.//*.xsl))"
  
>>>>
/programme/Oxygen XML Editor 25/frameworks/dita (716)
/programme/Oxygen XML Editor 25/frameworks/docbook (462)
/programme/Oxygen XML Editor 25/frameworks/extensions (4)
...

Example 2 – function table() - write value tuples into a table

Write a table displaying for each XSD of the TEI framework the file name and the target namespace.

> fox "frameworks/tei//*.xsd/tuple(file-name(), \*\@targetNamespace) 
       => table('File name, TNS', 'sort')"
  
>>>>
#----------------------------------------------------------------------------------------------#
| File name           | TNS                                                                    |
#----------------------------------------------------------------------------------------------#
| a.xsd               | http://relaxng.org/ns/compatibility/annotations/1.0                    |
| examples.xsd        | http://www.tei-c.org/ns/Examples                                       |
| isotei-lite.xsd     | http://www.tei-c.org/ns/1.0                                            |
| isotei-odd.xsd      | http://www.w3.org/1998/Math/MathML                                     |
| isotei.xsd          | http://www.w3.org/1998/Math/MathML                                     |
| main.xsd            | http://schemas.openxmlformats.org/wordprocessingml/2006/main           |
| math.xsd            | http://schemas.openxmlformats.org/officeDocument/2006/math             |
| mathml.xsd          | http://www.w3.org/1998/Math/MathML                                     |
| ...                 | ...                                                                    |
#----------------------------------------------------------------------------------------------#

Example 3 – function xwrap() - collect items into a document

Collect all union type definitions found in XSDs of the TEI framework into a document. The flags Pnp ensure pretty printing and annotation of the type definitions with file name and XPath:

> fox "frameworks/tei//*.xsd\\xs:union\ancestor::xs:simpleType 
       => xwrap('unionTypes', 'Pnp')"

>>>>
<unionTypes xmlns:fox="http://www.foxpath.org/ns" ... countItems="1047">
  <xs:simpleType fox:fileName="tei_all.xsd" 
                 fox:path="/xs:schema[1]/xs:attributeGroup[12]/xs:attribute[1]/xs:simpleType[1]">
    <xs:union memberTypes="xs:double xs:decimal">
      <xs:simpleType>
        <xs:restriction base="xs:token">
          <xs:pattern value="(\-?[\d]+/\-?[\d]+)"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:union>
  </xs:simpleType>
  ...
</unionTypes>

Part 2: The infospace experience – examples

The examples in this section have been tested using as working directory the root folder of an installation of the Oxygen XML editor. When copying the example code into the command line, please remove linefeeds, which have been added for the sake of readability.

See Appendix A for instructions how to install Foxpath.

2.1 What is an infospace experience?

The term infospace experience has no precise meaning. It attempts to capture a sensation inspired by code which takes advantage of a pervasive tree structure encompassing the file system and file contents. It may be summarized as a dual seamlessness, as explained below.

Seamless navigation

Foxpath navigation has three important properties:

Uniform: A single navigation model is used on both sides of the resource boundaries – navigating between and within the resources. The model defines navigation as a sequence of steps, each one returning the filtered content of a navigation axis.

Cross boundary: Navigation on both sides of the resource boundaries can be combined into a single trajectory.

Ubiquitous: As a consequence, navigation can connect any point with any point, be it a folder, a file or an item of file content.

Navigation thus merges the outside and inside of files into a single, continuous whole. Such navigation may be experienced as seamless.

Seamless combination of navigation and evaluation

Navigation is an operation which returns resource URIs and resource contents as XDM values. The XDM value model (11) is very abstract and flexible: a value is a sequence of items, an item has one of five item types, each item type is defined by a fixed set of item properties. Being an extended version of XPath, Foxpath is a pure expression language and any Foxpath expression consumes and produces only XDM values. Processing can thus be experienced as a seamless combination of navigation and evaluation.

2.2 Seamless navigation

In section “1.2 The Foxpath expression” several examples have been given which demonstrate unified and seamless navigation. A further example involves navigation into file contents and from there back into the file system: locate XSLT stylesheets and collect all xs:import elements which reference non-existing files:

> fox "frameworks/tei//*.xsl\*\xsl:import\@href[not(resolve-link())]\.. => xwrap('hrefs', 'Pb')"
  
>>>>
<href ... countItems="9">
  <xsl:import xml:base=".../stylesheet/docx/from/dynamic/tests/xspec/test-toc-scenario.xsl" 
              href="file:/usr/local/bin/xspec-v0.1/generate-tests-utils.xsl"/>
  <xsl:import xml:base=".../stylesheet/profiles/agora/docx/to.xsl" 
              href="../../../docx/to/docxtotei.xsl"/>              
  <xsl:import xml:base=".../stylesheet/profiles/agora/html/from.xsl" 
              href="../../../tools/html2tei.xsl"/>              
  ...
</href>

Another example returns all TEI stylesheets containing a template with a @mode attribute which contains an apply-templates element without @mode attribute, which may be intended, maybe not:

> fox "frameworks/tei//*.xsl[\*\xsl:template[@mode]\\xsl:apply-templates[not(@mode)]]

2.3 Seamless combination of navigation and evaluation

The Foxpath language has a focus on interactive use and the power of concise expressions. Many extension functions have been added in order to enable meaningful evaluation of navigation results with few keystrokes. In this section we give a few examples.

2.3.1 Frequency distribution

The extension function freq() maps a sequence of values to a frequency distribution.

Which file name extensions are used in the TEI framework?

> fox "frameworks/tei//*[is-file()]/file-ext() => freq()"
  
>>>>
...
.sch ....... (4)
.tei ....... (21)
.ttf ....... (20)
.txt ....... (28)
.TXT ....... (1)
.xcu ....... (1)
.xml ....... (92)
.xsd ....... (57)
.xsl ....... (395)

Which root element names have .xml files in the TEI framework?

> fox  "/programme/*oxy*/frameworks/tei//*.xml\*\clark-name() => freq()"
  
>>>>
...
plugin ......................................................................... (5)
project ........................................................................ (42)
Q{http://www.jenitennison.com/xslt/xspec}description ........................... (1)
Q{http://www.jenitennison.com/xslt/xspec}report ................................ (1)
Q{http://www.oxygenxml.com/ns/ccfilter/annotations}contentCompletionElementsMap  (1)
Q{http://www.tei-c.org/ns/1.0}elementList ...................................... (1)
Q{http://www.tei-c.org/ns/1.0}fLib ............................................. (1)
Q{http://www.tei-c.org/ns/1.0}TEI .............................................. (21)
Q{http://www.tei-c.org/ns/1.0}teiCorpus ........................................ (1)
Q{http://www.tei-c.org/ns/1.0}text ............................................. (1)
Q{urn:oasis:names:tc:entity:xmlns:xml:catalog}catalog .......................... (5)
structure-autocorrect .......................................................... (1)
styles ......................................................................... (1)
...

2.3.2 Path content

The extension function path-content() returns the path content of a node, that is, the paths leading from the node to its descendant nodes and their attributes.

Which path content have title statements in ODD documents of the TEI framework? In the result, we notice that title elements rarely have a @type attribute. Intrigued, we proceed to look at the attribute values and the containing file names.

> fox "frameworks/tei//*.odd\\tei:titleStmt => path-content-ec()"
  
>>>>
=== path-content ===============================
author ......... (23)
author/@id ..... (2)
author/email ... (2)
author/name .... (2)
editor ......... (3)
editor/@id ..... (3)
sponsor/orgName  (1)
title .......... (28)
title/@type .... (2)
================================================

> fox "frameworks/tei//*.odd\\tei:title\@type\annotate(bfname()"
 
>>>>
main (tei_tite.odd)
sub (tei_tite.odd)

2.3.3 Hierarchical list

The extension function hlist groups a sequence of tuples in a hierarchical way – first by the the first item, then by the second item, and so forth.

For which XML document types do file name extensions stand? A hierarchical list can tell us for each file name extension which XML root elements can be encountered, and for each root name the actual file names.

> fox  "frameworks/tei//(* except (*.html, *.rng, *.xsl, *.xsd))[is-file()][is-xml()]
        /tuple(file-ext(), \*\clark-name(), file-name()) 
        => hlist('File extension, Root name, File name')"
        
>>>>      
==================================================================================
File extension
.  Root name
.  .  File name
==================================================================================

.framework
.  serialized
.  .  teip5.framework
.  .  teip5jtei.framework
.  .  teip5odd.framework
.isosch
.  Q{http://purl.oclc.org/dsdl/schematron}schema
.  .  p5odds.isosch
.mod
.  Q{http://relaxng.org/ns/structure/1.0}grammar
.  .  mathml2-qname-1.mod
.model
.  Q{http://www.w3.org/1999/XSL/Transform}strip-space
.  .  stripspace.xsl.model
...

2.3.4 Filtering capabilities

Navigation and the selection of information are closely related to filtering. Foxpath introduces the concept of a unified string expression. It is a string which may be interpreted as a set of inclusive and/or exclusive patterns (glob or regex), or as a fulltext search. Exclusive patterns are marked by a preceding ~ character. By default, the expression is interpreted as a set of glob expressions. Options r and ft are used in order to have the expression interpreted as a set of regular expressions or as a fulltext expression. Options follow the expression, separated from it by the # character. Literal # characters are escaped by doubling.

Get the frequency distribution of @source attribute values, excluding values starting with http or the # character, or containing a colon.

> fox "frameworks/tei//*[\tei:*]\\@source[matches-pattern('~http* ~##* ~*:*')] => freq()"
  
>>>> 
mycompiledODD.xml .... (1)
quoteref7 ............ (1)
tei_bare.compiled.odd  (1)

All Foxpath extension functions interpret arguments with the semantics of a string filter as unified string expressions. For example, function descendant($nameFilter) returns all descendant nodes of the context node, with a name matching the unified string expression $nameFilter. Here we inspect the @type attribute of TEI elements with a name consisting of div followed by digits:

> fox "frameworks/tei//*.xml/descendant('^div\d+#r')\@type => freq()"
  
>>>>
advert .... (2)
alinéa .... (2)
altRecipe . (2)
appendix .. (1)
article ... (1)
backmat ... (6)
book ...... (3)
chapitre .. (3)
chapter ... (5)
epistle ... (1)
livre ..... (2)
member .... (14)
part ...... (6)
recipe .... (7)
section ... (18)
subsection  (13)

The option ft marks the expression as a fulltext expression, for which a compact syntax has been defined. Find TEI ODD files containing all of these phrases in any order: TEI corpus, TEI customization, TEI conformance, TEI modules

> fox "frameworks/tei//*.odd
       [\matches-pattern('tei corpus / tei customization / tei conformance / tei modules # ft')]"
      
>>>>
.../frameworks/tei/xml/tei/custom/odd/tei_corpus.odd

2.3.5 XML and not

Not only XML, but also some non-XML formats can be parsed into node trees. This expression returns the property names defined in a JSON schema:

> fox "frameworks/json//openAPIScenario.jschema/jdoc()\\properties\*\name() 
       => sort() => distinct-values()"
  
>>>>
apiKey
authorization
body
bodyContent
bodyType
description
expectedResponse
httpBasic
httpBearer
...

2.3.6 Bulk validation

The infospace encourages to think of large and distributed sets of information items as simple building blocks from which to compose expressions. You may validate sets of documents against sets of XSDs in a bulk fashion. The function returns an aggregated report of validation results:

> fox -o validation-report.xml "frameworks/dita/(.//*.xml => xsd-validate-ec(.//*.xsd))"

2.3.7 Annotated file system trees

Function ftree-view() creates a tree representation of selected resources, optionally annotated with expression values.

Get a tree representation of the from.xsl and to.xsl stylesheets found in the TEI framework.

> fox "frameworks/tei//(from.xsl, to.xsl) => ftree-view()"
  
>>>>
<ftree context=".../frameworks/tei/xml/tei/stylesheet" countFo="134" countFi="128">
  <fo name="profiles">
    <fo name="acm">
      <fo name="latex">
        <fi name="to.xsl"/>
      </fo>
      <fo name="pdf">
        <fi name="to.xsl"/>
      </fo>
    </fo>
    <fo name="adams">
      <fo name="docx">
        <fi name="from.xsl"/>
      </fo>
    </fo>
    <fo name="agora">
      <fo name="docx">
        <fi name="from.xsl"/>
        <fi name="to.xsl"/>
      </fo>
      <fo name="html">
        <fi name="from.xsl"/>
        <fi name="to.xsl"/>
      </fo>
      ...
    </fo>
    ...
  </fo>
</ftree>

Now we proceed to add annotations to the tree: optional param elements representing the names of stylesheet parameters. Note the curly braces enclosing expressions passed to a function as arguments.

> fox "frameworks/tei//(from.xsl, to.xsl) 
       => ftree-view(('params/param?', {\*\xsl:param\@name\string() => sort()} ))"

>>>>
<ftree context=".../frameworks/tei/xml/tei/stylesheet" countFo="134" countFi="128">
  <fo name="profiles">
    <fo name="acm">
      <fo name="latex">
        <fi name="to.xsl">
          <params>
            <param>attLength</param>
            <param>classParameters</param>
            <param>documentclass</param>
            <param>longtables</param>
            <param>spaceCharacter</param>
          </params>
        </fi>
      </fo>
      <fo name="pdf">...      
    </fo>
    ...
    <fo name="adams">...    
    <fo name="agora">
      <fo name="docx">...
      <fo name="html">
        <fi name="from.xsl"/>
        <fi name="to.xsl">
          <params>
            <param>autoToc</param>
            <param>bottomNavigationPanel</param>
            <param>cssFile</param>
            ...
          </params>
        </fi>
      </fo>
    </fo>
    ...
  </fo>
  ...
</ftree>

2.3.8 Modified file system copies

Several functions enable the modification of files (deleting, renaming, replacing, inserting nodes). Combining the modification with a file tree copy, one achieves a bulk update resulting in an updated file tree. The following example creates a file tree containing copies of all TEI XSD documents, with any xs:annotation elements removed:

> fox "frameworks/tei//*.xsd/doc-resource()\delete-nodes({\\xs:annotation})\pretty-node() 
       => file-tree-copy(tmp)"

A second example creates a modified file tree with missing @attributeFormDefault attributes added:

> fox "frameworks//*.xsd/doc-resource()\
       insert-nodes({\*[not(@attributeFormDefault)]},
                    {xatt-ec('unqualified', 'attributeFormDefault')}) 
       => file-tree-copy('tmp')"

Using Foxpath, checking the outcome is very convenient:

> fox "frameworks/tei//*.xsd => count()"
>>>>
57

> fox "frameworks/tei//*.xsd[\*\@attributeFormDefault] => count()"
>>>>
0

> fox "tmp//*.xsd[\*\@attributeFormDefault] => count()"
>>>>
57

Part 3: ixml in the infospace

3.1 Overview

One of the fundamental ideas of the infospace is the availability of file content trees. Ideally, this would be the case for any kind of file. But how does reality look like?

Depending on the underlying technology, some file formats are naturally represented as trees. Modern versions of XPath and XQuery like BaseX (1) can process not only XML, but also JSON, CSV and HTML. Other formats with similar data models like YAML are not supported out of the box. Extracting structure from unsupported file formats can always be implemented by specialized parsers embedded as extension functions, but this is a rather heavy-handed approach and must be well justified.

Invisible XML (or short ixml) (7) was introduced as a light-weight, declarative alternative. This is an excellent fit if one wants to enlarge the set of formats that can be navigated: previously opaque text content is unlocked and reveals its inner structure. By sharing ixml grammars, this additional structure may immediately be used by a multitude of different systems and projects.

We understand Foxpath not only as a practical tool for day-to-day work, but also as a test bed for how a user experience of the infospace might feel like. We identified four possible extension points where ixml significantly enhances its expressive power:

  1. File formats are associated with ixml grammars, enabling the usual Foxpath navigation capabilities to extend into these formats.

  2. Inside of already tree-structured formats, there may be text nodes containing inherently structured data. Applying ixml the user can navigate into these structures.

  3. ixml grammars can be used as node tests, enabling complex quality control scenarios.

  4. Unified string expressions are extended to allow matching against an ixml grammar. The flexibility of USEs is thus further enhanced, even suggesting a usefulness beyond the confines of the Foxpath language.

3.2 Infospace definition document

As explained in the introduction, the heart of the infospace idea is the continuation of file system tree structure into file content tree structure. The more file formats can be parsed into node trees, the better, and we added a new extension function which supports the parsing of any file format for which an ixml grammar is available. But one should also remember that different file formats require different parsing approaches – different built-in parsing functions (e.g. doc, json-doc, html-doc, csv-doc) and different grammars. This variability disturbs the sensation of seamless navigation, as it distracts from the navigation logic itself. (Which parse function must I call or which grammar must I use in order to enter this particular file?)

The ideal of seamlessness is restored by hiding the operation of parsing completely - the processor should know when and how to parse a file. The user should have the illusion that files are trees, rather than they can be parsed into trees. As various examples in the preceding section showed, the problem of when to parse has already been solved: any navigation into content is automatically preceded by parsing. But hitherto, automated parsing was always XML parsing. This limitation must be overcome in order to maximize the sensation of seamlessness. The processor must know how to parse, which function or grammar to use. The goal is achieved by a new feature, the infospace definition document.

Differences of how to parse a file are captured by the notion of a resource type. A resource type is defined by a particular parsing function (e.g. XML parse function or JSON parse function) or a particular ixml gramar. The infospace definition specifies an automated mapping of URIs to resource types. The document has three main parts:

  • Element grammars – defines available grammars in terms of a file URI and a short name

  • Element rtypes – defines resource types in terms of a parse function or a grammar

  • Element rtypeUses – defines a mapping of resource URIs to a resource type and, possibly, options to be used when parsing

The infospace definition helps to create the illusion of tree structure to be ubiquitous and pre-existing, with no need to create or extend it. The following table shows a few examples enabled by the current standard infospace definition.

Table IV

Examples of seamless navigation into various resource types.

Goal Foxpath expression
Extract JSON field data frameworks/tei//*.json\\title => freq()
Filter MS Words documents by fulltext pattern

frameworks/tei//*.docx[\matches-pattern('relevant patent rights | royalty payments#ft')]

Inspect HTML links

frameworks/tei//jteiHints.html\\*:a\@href => freq()

Count CSV rows

samples//sample.csv\\record => count()

List CSS properties

frameworks/tei//tei.css\\property\name[matches-pattern('font-*')] => sort()

The following listing shows a snippet of the standard infospace definition:

<ispace>
    <!-- ========
         Grammars 
         ======== -->
    <grammars baseURI="../grammar">
      <grammar name="css" uri="css.ixml" type="ixml"/>
      <grammar name="isbn" uri="isbn.ixml" type="ixml"/>        
      <grammar name="iso8601" uri="iso8601.ixml" type="ixml"/>
      <grammar name="words" uri="words.ixml" type="ixml"/>
    </grammars>
          
    <!-- ============== 
         Resource types 
         ============== -->
    <rtypes>
      <rtype name="xml">
        <docFn>doc#1</docFn>
        <parseFn>parse-xml#1</parseFn>            
      </rtype>
      <rtype name="json">
        <docFn>json:doc#1</docFn>
        <parseFn>json:parse#1</parseFn>            
      </rtype>
      <rtype name="html">
        <docFn>html:doc#1</docFn>
        <parseFn>html:parse#1</parseFn>            
      </rtype>
      <rtype name="csv">
        <docFn>csv:doc#2</docFn>
        <parseFn>csv:parse#2</parseFn>            
      </rtype>
      <rtype name="docx">
        <docFn>docx:doc#1</docFn>
      </rtype>
      <rtype name="css">
        <grammar ref="css"/>
      </rtype>
      <rtype name="words.ixml">
        <grammar ref="words"/>
      </rtype>
    </rtypes>
    
    <!-- ==================
         Resource type uses 
         ================== -->
    <rtypeUses>
      <!-- .xml etc. -->
      <case>
        <condition>
          <file name="*.dita *.ditamap *.docbook *.nvdl *.odd *.tei 
                      *.xhtml *.xml *.xsd *.xsl *.xslt "/>
        </condition>
        <rtypeUse rtype="xml" final="yes"/>
      </case>
      <!-- *.html *.htm -->      
      <case>
        <condition>
          <file name="*.html *.htm"/>
        </condition>
          <rtypeUse rtype="xml"/>              <!-- try to parse as XML -->
          <rtypeUse rtype="html" final="yes"/> <!-- parse as HTML -->
      </case>      
      <!-- *.json -->
      <case>
        <condition>
          <file name="*.json *.jsonld *.jsonschema *.jschema"/>
        </condition>
        <rtypeUse rtype="json" final="yes"/>            
      </case>
      <!-- *.docx -->
      ...
      <!-- *.csv -->
      ...
      <!-- *.css -->
      ...
      <!-- *.words.txt -->
      ...
      <!-- Last attempt - parse as XML -->
      <case>
        <rtypeUse rtype="xml"/>
      </case>
    </rtypeUses>
</ispace>

The user can provide an own infospace definition (option -s infospace-path), which may take advantage of project-specific information. Alternatively, she can extend the definition (option -x infospace-path) by supplying an additional definition whose grammars, resource types and resource type uses are merged into the standard definition. For example, an extension may map .csv files contained by a folder with a particular name to a parsing of CSV which uses semicolons as separator and expects a header line. The following example also adds a mapping of .jso files to the JSON resource type:

<ispace>
    <rtypeUses>
        <case>
            <condition>
                <file name="*.jso"/>
            </condition>
            <rtypeUse rtype="json"/>
        </case>
        <case>
            <condition>
                <file parentName="countries"/>
            </condition>
            <rtypeUse rtype="csv">
                <options>
                    <option name="header" value="yes"/> 
                    <option name="separator" value=";"/>
                </options>                        
            </rtypeUse>
        </case>
    </rtypeUses>
</ispace>

3.3 New extension functions

Sometimes it is required to parse a file using a grammar which is not automatically associated with the URI. This is enabled by a new extension function idoc(). Other ixml-related extension functions offer the parsing of a string, the validation of a string against a grammar and the expansion of text nodes into a subtree of items. In all cases, the grammar can be identified by short name as assigned in the infospace definition, or by URI.

Table V

Several ixml-related extension functions.

Function Description

idoc($uri, $grammar)

Returns the document resulting from parsing the resource at $uri according to $grammar

iparse($text, $grammar)

Parses $text according to $grammar

ivalid($text, $grammar)

Tests if $text conforms to $grammar

iexpand-nodes($targetExpr, $grammar)

Replaces the text content of target nodes with the parse result according to $grammar

Transforming content trees by replacing node values with some dynamically computed value is already a feature of Foxpath (function replace-values($targetNodesExpr, $newValueExpr)). The new function iexpand-nodes() has a similar behaviour. It replaces text nodes selected by an expression with the result of parsing that text with an ixml grammar. This can be understood as revealing the full structure of the content that has previously been hidden in unanalyzed text nodes.

3.4 Examples

The examples in this section have been tested using as working directory the root folder of an installation of the Oxygen XML editor. When copying the example code into the command line, please remove linefeeds, which have been added for the sake of readability. See Appendix A for instructions how to install Foxpath.

Example: Navigate CSS contents

Report the values of CSS property border-color.

> fox "frameworks/tei//*.css\\property[name eq 'border-color']\value => freq()"
  
>>>>  
#000000 .. (3)
#3C3C3C .. (4)
#c5d8bb .. (5)
#ffe1ad .. (5)
#ffe7e8 .. (5)
Black .... (1)
blue ..... (2)
darkred .. (1)
green .... (1)
grey ..... (1)
LightBlue  (1)
lightgrey  (1)
red ...... (7)
white .... (2)
yellow ... (1)

Example: Use ivalid() - report invalid ISBNs

Report invalid ISBNs grouped by file name.

> fox "frameworks/jats//*.xml
       \\*:isbn[not(ivalid('#isbn'))]
       \tuple(base-fname(.), .) 
       => hlist('File name, Invalid ISBN')"
       
>>>>  
================================
File name
.  Invalid ISBN
================================

bitso-book-of-parts-oasis.xml
.  1-234 567890-123
.  xxx-1
bitso-book-part1-oasis.xml
.  1-234 567890-123
.  xxx-1
bitso-samplesmall-book-oasis.xml
.  1-234 567890-123
.  xxx-1

Example: Use iparse() - extract data from text nodes

Find the oldest document.

> fox "frameworks/tei//*.xml\\dc:date\iparse('#iso8601')\\year => min()"
  
>>>>
1667

Part 4: Future work

Apart from continuously adding further extension functions, we plan to address the following topics:

  • Integration of external tools, using ixml to import results

  • Access to the structure of binary files

  • Plugin system for user-specific extension functions

  • Visualization of expression values implemented as HTML pages

  • Control over the handling of ambiguous parse results

Part 5: Final thoughts

Foxpath can be understood as a proof of concept of the infospace experience. We regard it as successful.

However, it should be remembered that Foxpath is a language with a strong focus on interactive use and the power of succinct expressions. After all, it is only an extended version of XPath. This limitation of the Foxpath language must not be mistaken for a feature of the infospace. Its true potential can only be realized by a full-fledged programming language. As XPath is a subset of XQuery and Foxpath is a backward-compatible extension of XPath, XQuery is an obvious candidate for a programming language which implements the infospace abstraction.

Appendix A. Installation of Foxpath

The Foxpath processor (3) is written in XQuery (10) and requires the installation of BaseX (1), version 11 or higher. In order to install Foxpath, proceed as follows:

  1. Download and install BaseX from here: basex.org/download

  2. Clone the Foxpath project from here: https://github.com/hrennau/foxpath

  3. Add the bin folder of the BaseX installation to the classpath.

  4. Add the bin folder of the Foxpath installation to the classpath.

  5. Done – now you can use Foxpath on the command line by calling the fox.bat or fox.sh shell script, depending on the operation system.

Note that the Foxpath expression passed to the script should be enclosed in quotes or double quotes: fox "expression".

In order to execute a Foxpath expression stored in a file, use option -f: fox -f prog-path.

References

[1] BaseX – an open source XML database. Homepage. http://basex.org

[2] Rennau, Hans-Jürgen. FOXpath – an expression language for selecting files and folders. Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Rennau01. https://www.balisage.net/Proceedings/vol17/html/Rennau01/BalisageVol17-Rennau01.html

[3] Rennau, Hans-Jürgen. Foxpath. Github repository. https://github.com/hrennau/foxpath

[4] Rennau, Hans-Jürgen. FOXpath navigation of physical, virtual and literal file systems. Presented at xmlprague, February 9 - 11, 2017. In XML Prague 2017 Conference Proceedings. https://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf

[5] glob (programming). Wikipedia article. https://en.wikipedia.org/wiki/Glob_%28programming%29

[6] Rennau, Hans-Jürgen. The XML info space. Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Rennau01

[7] Invisible XML. Homepage. https://invisiblexml.org

[8] Robie, Jonathan, et al., eds. XML Path Language (XPath), W3C Recommendation 08 April 2014. https://www.w3.org/TR/2014/REC-xpath-30-20140408/

[9] Kay, Michael, ed. XPath and XQuery Functions and Operators 3.0. W3C Recommendation 08 April 2014. http://www.w3.org/TR/xpath-functions-30/

[10] Robie, Jonathan, Michael Dyck, eds. XQuery 3.1: An XML Query Language. W3C Candidate Recommendation 18 December 2014. http://www.w3.org/TR/xquery-31/

[11] Walsh, Norman, et al., eds. XQuery and XPath Data Model 3.1. W3C Recommendation 21 March 2017. http://www.w3.org/TR/xpath-datamodel-31/