Introduction
We are used to think about the file system as a tree of resources: folders and files.
A
folder is a named set of folders and files. A file is named content which in many
cases (e.g.
XML or JSON) can be represented by a tree of items. To this picture we add the
RESTnet
– the sum total of resources accessible via URI. As REST URIs usually
contain a path, the logical structure of the RESTnet is a set of named file systems.
Using XQuery extension functions, not only XML, but also other mediatypes can be parsed into item trees – JSON, CSV, HTML. Leveraging ixml, many more file formats could be made available as item trees. With this in mind, we sketch the idealized picture of pervasive tree structure:
-
File system + RESTnet = a tree of resources
-
Resource = a tree of items
We call this tree structure a space of information – the infospace
(see 6) for a precursor of this idea). In three-dimensional space,
every point is identified by three coordinates, and two points can be connected by
a vector,
composed of three coordinates. The points
of the infospace are tree nodes,
representing a folder, a file or a file content item. The coordinates of the infospace
are
names – folder, file and item names. Vectors of the infospace are name paths. Nodes
which are
resources are identified by a path of folder and file names connecting them to the
root of the
space. Nodes which are file content items are identified by a path composed of two
parts, one
pointing to the resource, the other within the resource to the item. A simple path
syntax lets
us address any point and connect any pair of points. Adding to the syntax XPath capabilities,
the path syntax becomes a path language – a tool of navigation, selection
and aggregation.
This perspective is based on the determination to regard resources as nodes within
a tree
structure, rather than entities only identified by a URI. It makes tree
thinking
more pervasive, as it is not any more restricted to file contents. It
applies to file system contents.
Of course we are used to thinking about resources as arranged in trees. It is our daily practice to navigate our file systems in a tree-minded way – opening and closing folders, going upwards and downwards, searching folder contents, etc. But technological reality is different. We have (e.g. when using XML technology) powerful means for navigating the tree structure of file content, but hardly anything for navigating the file system tree. As a consequence, the infospace idea may be perceived as an empty abstraction, without the promise of genuine usefulness. An infospace without a unified path language is not an infospace, as far as user experience is concerned.
This has significant consequences for the prospects of ixml. Its general approach how to turn any text format into node trees is a fundamental idea. But if the node trees are only used for small and very specific tasks, the promise of practical use is severely limited. One might ask for another fundamental idea lifting the concept onto a higher ground.
We notice the role which ixml may play in the unfolding of an infospace – making tree structure increasingly pervasive. The question arises if the infospace must remain an empty abstraction, or can become an experience, awakening a desire for further unfolding its technological reality.
This work reports our endeavour to crack the eggshell of the infospace abstraction. The Foxpath language (2, 3, 4) is a set of expressions turning the abstraction into an experience. We summarize the language briefly and present a few example uses thought to demonstrate the infospace at work. The examples are followed by an account of our effort to integrate ixml into Foxpath. Our vision is a unified space of information, created by a single, pervasive tree structure.
Part 1: A short introduction to the Foxpath language
1.1 Summary
Foxpath (2, 3, 4) is an extended version of XPath 3.0 (8). The main extensions are:
-
Foxpath expression – navigation of file system trees
-
Extension functions – simplified evaluation of navigation results
A syntactical change has been applied: the slash of XPath has been replaced with a backslash. In Foxpath, the slash is used as a separator between steps navigating the file system. In other words, a slash is followed by a step of file system navigation, and a backslash is followed by a step of file content navigation.
Taking the syntactical change into account, Foxpath is fully backward-compatible with
XPath, that is, every valid XPath 3.0 expression is a valid Foxpath expression and
yields
the same result. The only exceptions to this rule are cases, in which XPath raises
an error
and Foxpath returns a value. The most important example concerns the left-hand side
operand
of the path operator (backslash). An atomic item triggers in XPath an error, whereas
in
Foxpath it is interpreted as a document URI and parsed into a node tree. This change
enables
the mixing of file system navigation and file content navigation within a single navigation
path. A second example is the effective boolean value of an operand
consisting of several strings or URIs. In Foxpath, the value
true is returned, in XPath an error is raised. The change enables elegant
filtering of folders by their content. A third example is the except operator,
which in Foxpath can also be used for filtering URIs, not only nodes. This supports
intuitive selections involving the explicit exclusion of certain resources.
1.2 The Foxpath expression
The Foxpath expression is very similar to the XPath expression, but navigates the file system tree, rather than file content. Navigation is based on relationships between URIs, not between XDM nodes (11). Note that Foxpath does not map folders and files to XDM nodes.
A Foxpath expression is a sequence of navigation steps, each one consisting of a
navigation axis, a name test and optional predicates. The name test has glob expression
syntax (5), supporting as wildcard characters * and
?. Steps are separated by slashes. The default axis is the child axis, and
axis names have a tilde (~) appended, in order to be distinguished from XPath
axes (example step: descendant~::*.css). Shortcut syntax as used by XPath is
available (//, .., .).
First examples demonstrate how file system navigation is expressed as a sequence of navigation steps along navigation axes, which default to the child axis.
Table I
Examples 1 – steps along axes
| Expression | Description |
|---|---|
|
|
List frameworks |
|
|
As before, relying on the default axis |
|
|
Follow various navigation axes |
|
|
As before, using shorthand syntax |
Each navigation step may include one or several predicates, which are expressions evaluated to an effective boolean value (8). As in XPath, path steps are not necessarily navigation steps. In particular the last step is often used for representing or evaluating the result of the preceding steps.
Table II
Examples 2 – using predicates
| Expression | Description |
|---|---|
|
|
List empty folders |
|
|
Filter folders by content |
|
|
As before, attaching additional information |
|
|
List all XSDs not contained by a folder with a name matching
|
File system navigation and file content navigation can be mixed within a single
expression. Typically, navigation starts with steps traversing the file system and
is
continued by steps drilling
into file contents. Another typical pattern is file system
navigation with predicates checking file contents.
Table III
Examples 3 – mixing file system and file content navigation
| Expression | Description |
|---|---|
|
|
Retrieve an attribute value |
|
|
Filter files dependent on an attribute value |
|
|
As before, and continue navigation |
|
|
Follow links and enter target content |
1.3 Extension functions
The term extension functions
means Foxpath functions not included in the set of XPath
standard functions (9). Many extension
functions are available, simplifying the evaluation of navigation results, in particular
data analysis and data aggregation.
Example 1 – function annotate() - annotate
URIs (or other items) with additional information
List the frameworks containing XSLT stylesheets, annotated with the number of stylesheets contained.
> fox "frameworks/*[.//*.xsl]/annotate(count(.//*.xsl))" >>>> /programme/Oxygen XML Editor 25/frameworks/dita (716) /programme/Oxygen XML Editor 25/frameworks/docbook (462) /programme/Oxygen XML Editor 25/frameworks/extensions (4) ...
Example 2 – function table() - write value
tuples into a table
Write a table displaying for each XSD of the TEI framework the file name and the target namespace.
> fox "frameworks/tei//*.xsd/tuple(file-name(), \*\@targetNamespace)
=> table('File name, TNS', 'sort')"
>>>>
#----------------------------------------------------------------------------------------------#
| File name | TNS |
#----------------------------------------------------------------------------------------------#
| a.xsd | http://relaxng.org/ns/compatibility/annotations/1.0 |
| examples.xsd | http://www.tei-c.org/ns/Examples |
| isotei-lite.xsd | http://www.tei-c.org/ns/1.0 |
| isotei-odd.xsd | http://www.w3.org/1998/Math/MathML |
| isotei.xsd | http://www.w3.org/1998/Math/MathML |
| main.xsd | http://schemas.openxmlformats.org/wordprocessingml/2006/main |
| math.xsd | http://schemas.openxmlformats.org/officeDocument/2006/math |
| mathml.xsd | http://www.w3.org/1998/Math/MathML |
| ... | ... |
#----------------------------------------------------------------------------------------------#Example 3 – function xwrap() - collect
items into a document
Collect all union type definitions found in XSDs of the TEI framework into a document.
The flags Pnp ensure pretty printing and annotation of the type definitions
with file name and XPath:
> fox "frameworks/tei//*.xsd\\xs:union\ancestor::xs:simpleType
=> xwrap('unionTypes', 'Pnp')"
>>>>
<unionTypes xmlns:fox="http://www.foxpath.org/ns" ... countItems="1047">
<xs:simpleType fox:fileName="tei_all.xsd"
fox:path="/xs:schema[1]/xs:attributeGroup[12]/xs:attribute[1]/xs:simpleType[1]">
<xs:union memberTypes="xs:double xs:decimal">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:pattern value="(\-?[\d]+/\-?[\d]+)"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
...
</unionTypes>Part 2: The infospace experience – examples
The examples in this section have been tested using as working directory the root folder of an installation of the Oxygen XML editor. When copying the example code into the command line, please remove linefeeds, which have been added for the sake of readability.
See Appendix A for instructions how to install Foxpath.
2.1 What is an infospace experience?
The term infospace experience
has no precise meaning. It attempts to capture a
sensation inspired by code which takes advantage of a pervasive tree structure encompassing
the file system and file contents. It may be summarized as a dual
seamlessness, as explained below.
Seamless navigation
Foxpath navigation has three important properties:
Uniform: A single navigation model is used on both sides of the resource boundaries – navigating between and within the resources. The model defines navigation as a sequence of steps, each one returning the filtered content of a navigation axis.
Cross boundary: Navigation on both sides of the resource boundaries can be combined into a single trajectory.
Ubiquitous: As a consequence, navigation can connect any point with any point, be it a folder, a file or an item of file content.
Navigation thus merges the outside
and inside
of files into a single, continuous
whole. Such navigation may be experienced as seamless.
Seamless combination of navigation and evaluation
Navigation is an operation which returns resource URIs and resource contents as XDM values. The XDM value model (11) is very abstract and flexible: a value is a sequence of items, an item has one of five item types, each item type is defined by a fixed set of item properties. Being an extended version of XPath, Foxpath is a pure expression language and any Foxpath expression consumes and produces only XDM values. Processing can thus be experienced as a seamless combination of navigation and evaluation.
Part 3: ixml in the infospace
3.1 Overview
One of the fundamental ideas of the infospace is the availability of file content trees. Ideally, this would be the case for any kind of file. But how does reality look like?
Depending on the underlying technology, some file formats are naturally represented as trees. Modern versions of XPath and XQuery like BaseX (1) can process not only XML, but also JSON, CSV and HTML. Other formats with similar data models like YAML are not supported out of the box. Extracting structure from unsupported file formats can always be implemented by specialized parsers embedded as extension functions, but this is a rather heavy-handed approach and must be well justified.
Invisible XML (or short ixml) (7) was introduced as a light-weight, declarative alternative. This is an excellent fit if one wants to enlarge the set of formats that can be navigated: previously opaque text content is unlocked and reveals its inner structure. By sharing ixml grammars, this additional structure may immediately be used by a multitude of different systems and projects.
We understand Foxpath not only as a practical tool for day-to-day work, but also as a test bed for how a user experience of the infospace might feel like. We identified four possible extension points where ixml significantly enhances its expressive power:
-
File formats are associated with ixml grammars, enabling the usual Foxpath navigation capabilities to extend into these formats.
-
Inside of already tree-structured formats, there may be text nodes containing inherently structured data. Applying ixml the user can navigate into these structures.
-
ixml grammars can be used as node tests, enabling complex quality control scenarios.
-
Unified string expressions are extended to allow matching against an ixml grammar. The flexibility of USEs is thus further enhanced, even suggesting a usefulness beyond the confines of the Foxpath language.
3.2 Infospace definition document
As explained in the introduction, the heart of the infospace idea is the continuation
of
file system tree structure into file content tree structure. The more file formats
can be
parsed into node trees, the better, and we added a new extension function which supports
the
parsing of any file format for which an ixml grammar is available. But one should
also
remember that different file formats require different parsing approaches – different
built-in parsing functions (e.g. doc, json-doc,
html-doc, csv-doc) and different grammars. This variability
disturbs the sensation of seamless navigation, as it distracts from the navigation
logic
itself. (Which parse function must I call or which grammar must I use in order to
enter this
particular file?)
The ideal of seamlessness is restored by hiding the operation of parsing completely
-
the processor should know
when and how to parse a file. The user should have the illusion
that files are trees, rather than they can be parsed into trees. As
various examples in the preceding section showed, the problem of when
to parse has already been solved: any navigation into content is automatically preceded
by
parsing. But hitherto, automated parsing was always XML parsing. This limitation must
be
overcome in order to maximize the sensation of seamlessness. The processor must know
how to parse, which function or grammar to use. The goal is achieved
by a new feature, the infospace definition document.
Differences of how to parse a file are captured by the notion of a resource type. A resource type is defined by a particular parsing function (e.g. XML parse function or JSON parse function) or a particular ixml gramar. The infospace definition specifies an automated mapping of URIs to resource types. The document has three main parts:
-
Element
grammars– defines available grammars in terms of a file URI and a short name -
Element
rtypes– defines resource types in terms of a parse function or a grammar -
Element
rtypeUses– defines a mapping of resource URIs to a resource type and, possibly, options to be used when parsing
The infospace definition helps to create the illusion of tree structure to be ubiquitous and pre-existing, with no need to create or extend it. The following table shows a few examples enabled by the current standard infospace definition.
The following listing shows a snippet of the standard infospace definition:
<ispace>
<!-- ========
Grammars
======== -->
<grammars baseURI="../grammar">
<grammar name="css" uri="css.ixml" type="ixml"/>
<grammar name="isbn" uri="isbn.ixml" type="ixml"/>
<grammar name="iso8601" uri="iso8601.ixml" type="ixml"/>
<grammar name="words" uri="words.ixml" type="ixml"/>
</grammars>
<!-- ==============
Resource types
============== -->
<rtypes>
<rtype name="xml">
<docFn>doc#1</docFn>
<parseFn>parse-xml#1</parseFn>
</rtype>
<rtype name="json">
<docFn>json:doc#1</docFn>
<parseFn>json:parse#1</parseFn>
</rtype>
<rtype name="html">
<docFn>html:doc#1</docFn>
<parseFn>html:parse#1</parseFn>
</rtype>
<rtype name="csv">
<docFn>csv:doc#2</docFn>
<parseFn>csv:parse#2</parseFn>
</rtype>
<rtype name="docx">
<docFn>docx:doc#1</docFn>
</rtype>
<rtype name="css">
<grammar ref="css"/>
</rtype>
<rtype name="words.ixml">
<grammar ref="words"/>
</rtype>
</rtypes>
<!-- ==================
Resource type uses
================== -->
<rtypeUses>
<!-- .xml etc. -->
<case>
<condition>
<file name="*.dita *.ditamap *.docbook *.nvdl *.odd *.tei
*.xhtml *.xml *.xsd *.xsl *.xslt "/>
</condition>
<rtypeUse rtype="xml" final="yes"/>
</case>
<!-- *.html *.htm -->
<case>
<condition>
<file name="*.html *.htm"/>
</condition>
<rtypeUse rtype="xml"/> <!-- try to parse as XML -->
<rtypeUse rtype="html" final="yes"/> <!-- parse as HTML -->
</case>
<!-- *.json -->
<case>
<condition>
<file name="*.json *.jsonld *.jsonschema *.jschema"/>
</condition>
<rtypeUse rtype="json" final="yes"/>
</case>
<!-- *.docx -->
...
<!-- *.csv -->
...
<!-- *.css -->
...
<!-- *.words.txt -->
...
<!-- Last attempt - parse as XML -->
<case>
<rtypeUse rtype="xml"/>
</case>
</rtypeUses>
</ispace>The user can provide an own infospace definition (option -s
infospace-path), which may take advantage of project-specific information.
Alternatively, she can extend the definition (option -x
infospace-path) by supplying an additional definition whose
grammars, resource types and resource type uses are merged into the standard definition.
For
example, an extension may map .csv files contained by a folder with a
particular name to a parsing of CSV which uses semicolons as separator and expects
a header
line. The following example also adds a mapping of .jso files to the JSON
resource type:
<ispace>
<rtypeUses>
<case>
<condition>
<file name="*.jso"/>
</condition>
<rtypeUse rtype="json"/>
</case>
<case>
<condition>
<file parentName="countries"/>
</condition>
<rtypeUse rtype="csv">
<options>
<option name="header" value="yes"/>
<option name="separator" value=";"/>
</options>
</rtypeUse>
</case>
</rtypeUses>
</ispace>3.3 New extension functions
Sometimes it is required to parse a file using a grammar which is not automatically
associated with the URI. This is enabled by a new extension function idoc().
Other ixml-related extension functions offer the parsing of a string, the validation
of a
string against a grammar and the expansion of text nodes into a subtree of items.
In all
cases, the grammar can be identified by short name as assigned in the infospace definition,
or by URI.
Table V
Several ixml-related extension functions.
| Function | Description |
|---|---|
|
|
Returns the document resulting from parsing the resource at |
|
|
Parses |
|
|
Tests if |
|
|
Replaces the text content of target nodes with the parse result according to
|
Transforming content trees by replacing node values with some dynamically computed
value is already a feature of Foxpath (function replace-values($targetNodesExpr,
$newValueExpr)). The new function iexpand-nodes() has a similar
behaviour. It replaces text nodes selected by an expression with the result of parsing
that
text with an ixml grammar. This can be understood as revealing the full structure
of the
content that has previously been hidden in unanalyzed text nodes.
3.4 Examples
The examples in this section have been tested using as working directory the root folder of an installation of the Oxygen XML editor. When copying the example code into the command line, please remove linefeeds, which have been added for the sake of readability. See Appendix A for instructions how to install Foxpath.
Example: Navigate CSS contents
Report the values of CSS property border-color.
> fox "frameworks/tei//*.css\\property[name eq 'border-color']\value => freq()" >>>> #000000 .. (3) #3C3C3C .. (4) #c5d8bb .. (5) #ffe1ad .. (5) #ffe7e8 .. (5) Black .... (1) blue ..... (2) darkred .. (1) green .... (1) grey ..... (1) LightBlue (1) lightgrey (1) red ...... (7) white .... (2) yellow ... (1)
Example: Use ivalid() - report invalid
ISBNs
Report invalid ISBNs grouped by file name.
> fox "frameworks/jats//*.xml
\\*:isbn[not(ivalid('#isbn'))]
\tuple(base-fname(.), .)
=> hlist('File name, Invalid ISBN')"
>>>>
================================
File name
. Invalid ISBN
================================
bitso-book-of-parts-oasis.xml
. 1-234 567890-123
. xxx-1
bitso-book-part1-oasis.xml
. 1-234 567890-123
. xxx-1
bitso-samplesmall-book-oasis.xml
. 1-234 567890-123
. xxx-1Example: Use iparse() - extract data from
text nodes
Find the oldest document.
> fox "frameworks/tei//*.xml\\dc:date\iparse('#iso8601')\\year => min()"
>>>>
1667Part 4: Future work
Apart from continuously adding further extension functions, we plan to address the following topics:
-
Integration of external tools, using ixml to import results
-
Access to the structure of binary files
-
Plugin system for user-specific extension functions
-
Visualization of expression values implemented as HTML pages
-
Control over the handling of ambiguous parse results
Part 5: Final thoughts
Foxpath can be understood as a proof of concept of the infospace experience. We regard it as successful.
However, it should be remembered that Foxpath is a language with a strong focus on interactive use and the power of succinct expressions. After all, it is only an extended version of XPath. This limitation of the Foxpath language must not be mistaken for a feature of the infospace. Its true potential can only be realized by a full-fledged programming language. As XPath is a subset of XQuery and Foxpath is a backward-compatible extension of XPath, XQuery is an obvious candidate for a programming language which implements the infospace abstraction.
Appendix A. Installation of Foxpath
The Foxpath processor (3) is written in XQuery (10) and requires the installation of BaseX (1), version 11 or higher. In order to install Foxpath, proceed as follows:
-
Download and install BaseX from here:
basex.org/download -
Clone the Foxpath project from here:
https://github.com/hrennau/foxpath -
Add the
binfolder of the BaseX installation to the classpath. -
Add the
binfolder of the Foxpath installation to the classpath. -
Done – now you can use Foxpath on the command line by calling the
fox.batorfox.shshell script, depending on the operation system.
Note that the Foxpath expression passed to the script should be enclosed in quotes
or
double quotes: fox "expression".
In order to execute a Foxpath expression stored in a file, use option -f:
fox -f prog-path.
References
[1] BaseX – an open source XML database. Homepage. http://basex.org
[2] Rennau, Hans-Jürgen. FOXpath – an
expression language for selecting files and folders.
Presented at Balisage: The Markup
Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup
Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016).
doi:https://doi.org/10.4242/BalisageVol17.Rennau01. https://www.balisage.net/Proceedings/vol17/html/Rennau01/BalisageVol17-Rennau01.html
[3] Rennau, Hans-Jürgen. Foxpath.
Github
repository. https://github.com/hrennau/foxpath
[4] Rennau, Hans-Jürgen. FOXpath navigation
of physical, virtual and literal file systems.
Presented at xmlprague, February 9 - 11, 2017. In XML Prague 2017 Conference Proceedings. https://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf
[5] glob (programming). Wikipedia article. https://en.wikipedia.org/wiki/Glob_%28programming%29
[6] Rennau, Hans-Jürgen. The XML info space.
Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9,
2013. In
Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies,
vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Rennau01
[7] Invisible XML. Homepage. https://invisiblexml.org
[8] Robie, Jonathan, et al., eds. XML Path Language (XPath), W3C Recommendation 08 April 2014. https://www.w3.org/TR/2014/REC-xpath-30-20140408/
[9] Kay, Michael, ed. XPath and XQuery Functions and Operators 3.0. W3C Recommendation 08 April 2014. http://www.w3.org/TR/xpath-functions-30/
[10] Robie, Jonathan, Michael Dyck, eds. XQuery 3.1: An XML Query Language. W3C Candidate Recommendation 18 December 2014. http://www.w3.org/TR/xquery-31/
[11] Walsh, Norman, et al., eds. XQuery and XPath Data Model 3.1. W3C Recommendation 21 March 2017. http://www.w3.org/TR/xpath-datamodel-31/