How to cite this paper
Rennau, Hans-Jürgen, and Hauke Brandes. “The infospace, Foxpath, ixml.” Presented at Balisage: The Markup Conference 2025, Washington, DC, August 4 - 8, 2025. In Proceedings of Balisage: The Markup Conference 2025. Balisage Series on Markup Technologies, vol. 30 (2025). https://doi.org/10.4242/BalisageVol30.Rennau01.
Balisage: The Markup Conference 2025
August 4 - 8, 2025
Balisage Paper: The infospace, Foxpath, ixml
Hans-Jürgen Rennau
Hauke Brandes
Copyright 2025 by the authors. Used with permission.
Abstract
Foxpath, short for folder XPath, is an expression language that enables XPath-like
addressing of the files and folders in a file system. Both file systems and REST resources
addressable through URIs can be thought of as a tree of folders, and thus navigated
by path
expressions in Foxpath. Foxpath is a superset of XPath 3.0 with node tree navigation
retained but file system navigation added and a free combination of both functionalities
allowed within a single path expression. Foxpath is a language with a strong focus
on
interactive use and the power of succinct expressions. Invisible XML allows us to
extend
Foxpath navigation to more resources. A new configuration mechanism associates grammars
with
file name patterns, enhancing the experience of a pervasive tree structure which we
call an
infospace.
Table of Contents
- Introduction
- Part 1: A short introduction to the Foxpath language
-
- 1.1 Summary
- 1.2 The Foxpath expression
- 1.3 Extension functions
- Part 2: The infospace experience – examples
-
- 2.1 What is an infospace experience?
- 2.2 Seamless navigation
- 2.3 Seamless combination of navigation and evaluation
-
- 2.3.1 Frequency distribution
- 2.3.2 Path content
- 2.3.3 Hierarchical list
- 2.3.4 Filtering capabilities
- 2.3.5 XML and not
- 2.3.6 Bulk validation
- 2.3.7 Annotated file system trees
- 2.3.8 Modified file system copies
- Part 3: ixml in the infospace
-
- 3.1 Overview
- 3.2 Infospace definition document
- 3.3 New extension functions
- 3.4 Examples
- Part 4: Future work
- Part 5: Final thoughts
- Appendix A. Installation of Foxpath
Introduction
We are used to think about the file system as a tree of resources: folders and files.
A
folder is a named set of folders and files. A file is named content which in many
cases (e.g.
XML or JSON) can be represented by a tree of items. To this picture we add the
RESTnet
– the sum total of resources accessible via URI. As REST URIs usually
contain a path, the logical structure of the RESTnet is a set of named file systems.
Using XQuery extension functions, not only XML, but also other mediatypes can be
parsed
into item trees – JSON, CSV, HTML. Leveraging ixml, many more file formats could be
made
available as item trees. With this in mind, we sketch the idealized picture of pervasive
tree
structure:
We call this tree structure a space of information – the infospace
(see 6) for a precursor of this idea). In three-dimensional space,
every point is identified by three coordinates, and two points can be connected by
a vector,
composed of three coordinates. The points
of the infospace are tree nodes,
representing a folder, a file or a file content item. The coordinates of the infospace
are
names – folder, file and item names. Vectors of the infospace are name paths. Nodes
which are
resources are identified by a path of folder and file names connecting them to the
root of the
space. Nodes which are file content items are identified by a path composed of two
parts, one
pointing to the resource, the other within the resource to the item. A simple path
syntax lets
us address any point and connect any pair of points. Adding to the syntax XPath capabilities,
the path syntax becomes a path language – a tool of navigation, selection
and aggregation.
This perspective is based on the determination to regard resources as nodes within
a tree
structure, rather than entities only identified by a URI. It makes tree
thinking
more pervasive, as it is not any more restricted to file contents. It
applies to file system contents.
Of course we are used to thinking about resources as arranged in trees. It is our
daily
practice to navigate our file systems in a tree-minded way – opening and closing folders,
going upwards and downwards, searching folder contents, etc. But technological reality
is
different. We have (e.g. when using XML technology) powerful means for navigating
the tree
structure of file content, but hardly anything for navigating the file system tree.
As a
consequence, the infospace idea may be perceived as an empty abstraction, without
the promise
of genuine usefulness. An infospace without a unified path language is
not an infospace, as far as user experience is concerned.
This has significant consequences for the prospects of ixml. Its general approach
how to
turn any text format into node trees is a fundamental idea. But if the node trees
are only
used for small and very specific tasks, the promise of practical use is severely limited.
One
might ask for another fundamental idea lifting the concept onto a higher ground.
We notice the role which ixml may play in the unfolding of an infospace – making
tree
structure increasingly pervasive. The question arises if the infospace
must remain an empty abstraction, or can become an experience,
awakening a desire for further unfolding its technological reality.
This work reports our endeavour to crack the eggshell of the infospace abstraction.
The
Foxpath language (2, 3, 4) is a set of expressions turning the abstraction into an
experience. We summarize the language briefly and present a few example uses thought
to
demonstrate the infospace at work. The examples are followed by an account of our
effort to
integrate ixml into Foxpath. Our vision is a unified space of information, created
by a
single, pervasive tree structure.
Part 1: A short introduction to the Foxpath language
1.1 Summary
Foxpath (2, 3, 4) is an extended version of XPath 3.0 (8). The main extensions are:
A syntactical change has been applied: the slash of XPath has been
replaced with a backslash. In Foxpath, the slash is used as a separator
between steps navigating the file system. In other words, a slash is followed by a
step of
file system navigation, and a backslash is followed by a step of
file content navigation.
Taking the syntactical change into account, Foxpath is fully backward-compatible with
XPath, that is, every valid XPath 3.0 expression is a valid Foxpath expression and
yields
the same result. The only exceptions to this rule are cases, in which XPath raises
an error
and Foxpath returns a value. The most important example concerns the left-hand side
operand
of the path operator (backslash). An atomic item triggers in XPath an error, whereas
in
Foxpath it is interpreted as a document URI and parsed into a node tree. This change
enables
the mixing of file system navigation and file content navigation within a single navigation
path. A second example is the effective boolean value of an operand
consisting of several strings or URIs. In Foxpath, the value
true is returned, in XPath an error is raised. The change enables elegant
filtering of folders by their content. A third example is the except operator,
which in Foxpath can also be used for filtering URIs, not only nodes. This supports
intuitive selections involving the explicit exclusion of certain resources.
1.2 The Foxpath expression
The Foxpath expression is very similar to the XPath expression, but navigates the
file
system tree, rather than file content. Navigation is based on relationships between
URIs,
not between XDM nodes (11). Note that Foxpath does
not map folders and files to XDM nodes.
A Foxpath expression is a sequence of navigation steps, each one consisting of a
navigation axis, a name test and optional predicates. The name test has glob expression
syntax (5), supporting as wildcard characters * and
?. Steps are separated by slashes. The default axis is the child axis, and
axis names have a tilde (~) appended, in order to be distinguished from XPath
axes (example step: descendant~::*.css). Shortcut syntax as used by XPath is
available (//, .., .).
First examples demonstrate how file system navigation is expressed as a sequence of
navigation steps along navigation axes, which default to the child axis.
Table I
Examples 1 – steps along axes
| Expression |
Description |
|
child~::frameworks/child~::*
|
List frameworks
|
|
frameworks/*
|
As before, relying on the default axis
|
|
frameworks/tei/descendant~::*.xsl/parent~::*
|
Follow various navigation axes
|
|
frameworks/tei//*.xsl/..
|
As before, using shorthand syntax
|
Each navigation step may include one or several predicates, which are expressions
evaluated to an effective boolean value (8). As in XPath, path steps are not necessarily navigation steps. In particular the
last
step is often used for representing or evaluating the result of the preceding steps.
Table II
Examples 2 – using predicates
| Expression |
Description |
|
frameworks//*[is-dir()][not(*)]
|
List empty folders
|
|
frameworks/*[.//*.xsl]
|
Filter folders by content
|
|
frameworks/*[.//*.xsl]/(.||' /'||count(.//*.xsl))
|
As before, attaching additional information
|
|
frameworks//*.xsd[empty(ancestor~::*xsd*)]
|
List all XSDs not contained by a folder with a name matching
*xsd*
|
File system navigation and file content navigation can be mixed within a single
expression. Typically, navigation starts with steps traversing the file system and
is
continued by steps drilling
into file contents. Another typical pattern is file system
navigation with predicates checking file contents.
Table III
Examples 3 – mixing file system and file content
navigation
| Expression |
Description |
|
frameworks/tei//filterNodes.xsl\*\xsl:param\@name
|
Retrieve an attribute value
|
|
frameworks/tei//*.xsl[\*\@version = '3.0']
|
Filter files dependent on an attribute value
|
|
frameworks/tei//*.xsl[\*\@version = '3.0']/..
|
As before, and continue navigation
|
|
frameworks/jats//*.xml\\@href\resolve-link()\*\clark-name()
|
Follow links and enter target content
|
1.3 Extension functions
The term extension functions
means Foxpath functions not included in the set of XPath
standard functions (9). Many extension
functions are available, simplifying the evaluation of navigation results, in particular
data analysis and data aggregation.
Example 1 – function annotate() - annotate
URIs (or other items) with additional information
List the frameworks containing XSLT stylesheets, annotated with the number of
stylesheets contained.
> fox "frameworks/*[.//*.xsl]/annotate(count(.//*.xsl))"
>>>>
/programme/Oxygen XML Editor 25/frameworks/dita (716)
/programme/Oxygen XML Editor 25/frameworks/docbook (462)
/programme/Oxygen XML Editor 25/frameworks/extensions (4)
...
Example 2 – function table() - write value
tuples into a table
Write a table displaying for each XSD of the TEI framework the file name and the target
namespace.
> fox "frameworks/tei//*.xsd/tuple(file-name(), \*\@targetNamespace)
=> table('File name, TNS', 'sort')"
>>>>
#----------------------------------------------------------------------------------------------#
| File name | TNS |
#----------------------------------------------------------------------------------------------#
| a.xsd | http://relaxng.org/ns/compatibility/annotations/1.0 |
| examples.xsd | http://www.tei-c.org/ns/Examples |
| isotei-lite.xsd | http://www.tei-c.org/ns/1.0 |
| isotei-odd.xsd | http://www.w3.org/1998/Math/MathML |
| isotei.xsd | http://www.w3.org/1998/Math/MathML |
| main.xsd | http://schemas.openxmlformats.org/wordprocessingml/2006/main |
| math.xsd | http://schemas.openxmlformats.org/officeDocument/2006/math |
| mathml.xsd | http://www.w3.org/1998/Math/MathML |
| ... | ... |
#----------------------------------------------------------------------------------------------#Example 3 – function xwrap() - collect
items into a document
Collect all union type definitions found in XSDs of the TEI framework into a document.
The flags Pnp ensure pretty printing and annotation of the type definitions
with file name and XPath:
> fox "frameworks/tei//*.xsd\\xs:union\ancestor::xs:simpleType
=> xwrap('unionTypes', 'Pnp')"
>>>>
<unionTypes xmlns:fox="http://www.foxpath.org/ns" ... countItems="1047">
<xs:simpleType fox:fileName="tei_all.xsd"
fox:path="/xs:schema[1]/xs:attributeGroup[12]/xs:attribute[1]/xs:simpleType[1]">
<xs:union memberTypes="xs:double xs:decimal">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:pattern value="(\-?[\d]+/\-?[\d]+)"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
...
</unionTypes>
Part 2: The infospace experience – examples
The examples in this section have been tested using as working directory the root
folder
of an installation of the Oxygen XML editor. When copying the example code into the
command
line, please remove linefeeds, which have been added for the sake of readability.
See Appendix A for instructions how to install
Foxpath.
2.1 What is an infospace experience?
The term infospace experience
has no precise meaning. It attempts to capture a
sensation inspired by code which takes advantage of a pervasive tree structure encompassing
the file system and file contents. It may be summarized as a dual
seamlessness, as explained below.
Seamless navigation
Foxpath navigation has three important properties:
Uniform: A single navigation model is used on both sides of the
resource boundaries – navigating between and within the resources. The model defines
navigation as a sequence of steps, each one returning the filtered content of a navigation
axis.
Cross boundary: Navigation on both sides of the resource boundaries
can be combined into a single trajectory.
Ubiquitous: As a consequence, navigation can connect any point with
any point, be it a folder, a file or an item of file content.
Navigation thus merges the outside
and inside
of files into a single, continuous
whole. Such navigation may be experienced as seamless.
Seamless combination of navigation and evaluation
Navigation is an operation which returns resource URIs and resource contents as
XDM values. The XDM value model (11) is
very abstract and flexible: a value is a sequence of items, an item has one of five
item
types, each item type is defined by a fixed set of item properties. Being an extended
version of XPath, Foxpath is a pure expression language and any Foxpath expression
consumes
and produces only XDM values. Processing can thus be experienced as a seamless combination
of navigation and evaluation.
2.2 Seamless navigation
In section “1.2 The Foxpath expression” several examples have been given which
demonstrate unified and seamless navigation. A further example involves navigation
into file
contents and from there back into the file system: locate XSLT stylesheets and collect
all
xs:import elements which reference non-existing files:
> fox "frameworks/tei//*.xsl\*\xsl:import\@href[not(resolve-link())]\.. => xwrap('hrefs', 'Pb')"
>>>>
<href ... countItems="9">
<xsl:import xml:base=".../stylesheet/docx/from/dynamic/tests/xspec/test-toc-scenario.xsl"
href="file:/usr/local/bin/xspec-v0.1/generate-tests-utils.xsl"/>
<xsl:import xml:base=".../stylesheet/profiles/agora/docx/to.xsl"
href="../../../docx/to/docxtotei.xsl"/>
<xsl:import xml:base=".../stylesheet/profiles/agora/html/from.xsl"
href="../../../tools/html2tei.xsl"/>
...
</href>Another example returns all TEI stylesheets containing a template with a
@mode attribute which contains an apply-templates element
without
@mode attribute, which may be intended, maybe not:
> fox "frameworks/tei//*.xsl[\*\xsl:template[@mode]\\xsl:apply-templates[not(@mode)]]
2.3 Seamless combination of navigation and evaluation
The Foxpath language has a focus on interactive use and the power of concise
expressions. Many extension functions have been added in order to enable meaningful
evaluation of navigation results with few keystrokes. In this section we give a few
examples.
2.3.1 Frequency distribution
The extension function freq() maps a sequence of values to a frequency
distribution.
Which file name extensions are used in the TEI framework?
> fox "frameworks/tei//*[is-file()]/file-ext() => freq()"
>>>>
...
.sch ....... (4)
.tei ....... (21)
.ttf ....... (20)
.txt ....... (28)
.TXT ....... (1)
.xcu ....... (1)
.xml ....... (92)
.xsd ....... (57)
.xsl ....... (395)
Which root element names have .xml files in the TEI framework?
> fox "/programme/*oxy*/frameworks/tei//*.xml\*\clark-name() => freq()"
>>>>
...
plugin ......................................................................... (5)
project ........................................................................ (42)
Q{http://www.jenitennison.com/xslt/xspec}description ........................... (1)
Q{http://www.jenitennison.com/xslt/xspec}report ................................ (1)
Q{http://www.oxygenxml.com/ns/ccfilter/annotations}contentCompletionElementsMap (1)
Q{http://www.tei-c.org/ns/1.0}elementList ...................................... (1)
Q{http://www.tei-c.org/ns/1.0}fLib ............................................. (1)
Q{http://www.tei-c.org/ns/1.0}TEI .............................................. (21)
Q{http://www.tei-c.org/ns/1.0}teiCorpus ........................................ (1)
Q{http://www.tei-c.org/ns/1.0}text ............................................. (1)
Q{urn:oasis:names:tc:entity:xmlns:xml:catalog}catalog .......................... (5)
structure-autocorrect .......................................................... (1)
styles ......................................................................... (1)
...
2.3.2 Path content
The extension function path-content() returns the path content
of a
node, that is, the paths leading from the node to its descendant nodes and their
attributes.
Which path content have title statements in ODD documents of the TEI framework? In
the
result, we notice that title elements rarely have a @type
attribute. Intrigued, we proceed to look at the attribute values and the containing
file
names.
> fox "frameworks/tei//*.odd\\tei:titleStmt => path-content-ec()"
>>>>
=== path-content ===============================
author ......... (23)
author/@id ..... (2)
author/email ... (2)
author/name .... (2)
editor ......... (3)
editor/@id ..... (3)
sponsor/orgName (1)
title .......... (28)
title/@type .... (2)
================================================
> fox "frameworks/tei//*.odd\\tei:title\@type\annotate(bfname()"
>>>>
main (tei_tite.odd)
sub (tei_tite.odd)
2.3.3 Hierarchical list
The extension function hlist groups a sequence of tuples in a
hierarchical way – first by the the first item, then by the second item, and so
forth.
For which XML document types do file name extensions stand? A hierarchical list can
tell us for each file name extension which XML root elements can be encountered, and
for
each root name the actual file names.
> fox "frameworks/tei//(* except (*.html, *.rng, *.xsl, *.xsd))[is-file()][is-xml()]
/tuple(file-ext(), \*\clark-name(), file-name())
=> hlist('File extension, Root name, File name')"
>>>>
==================================================================================
File extension
. Root name
. . File name
==================================================================================
.framework
. serialized
. . teip5.framework
. . teip5jtei.framework
. . teip5odd.framework
.isosch
. Q{http://purl.oclc.org/dsdl/schematron}schema
. . p5odds.isosch
.mod
. Q{http://relaxng.org/ns/structure/1.0}grammar
. . mathml2-qname-1.mod
.model
. Q{http://www.w3.org/1999/XSL/Transform}strip-space
. . stripspace.xsl.model
...
2.3.4 Filtering capabilities
Navigation and the selection of information are closely related to filtering. Foxpath
introduces the concept of a unified string expression. It is a string
which may be interpreted as a set of inclusive and/or exclusive patterns (glob or
regex),
or as a fulltext search. Exclusive patterns are marked by a preceding ~ character.
By
default, the expression is interpreted as a set of glob expressions. Options r
and ft
are used in order to have the expression interpreted as a set of regular expressions
or as
a fulltext expression. Options follow the expression, separated from it by the #
character. Literal # characters are escaped by doubling.
Get the frequency distribution of @source attribute values, excluding
values starting with http
or the # character, or containing a colon.
> fox "frameworks/tei//*[\tei:*]\\@source[matches-pattern('~http* ~##* ~*:*')] => freq()"
>>>>
mycompiledODD.xml .... (1)
quoteref7 ............ (1)
tei_bare.compiled.odd (1) All Foxpath extension functions interpret arguments with the semantics of a string
filter as unified string expressions. For example, function
descendant($nameFilter) returns all descendant nodes of the context node,
with a name matching the unified string expression $nameFilter. Here we
inspect the @type attribute of TEI elements with a name consisting of div
followed by digits:
> fox "frameworks/tei//*.xml/descendant('^div\d+#r')\@type => freq()"
>>>>
advert .... (2)
alinéa .... (2)
altRecipe . (2)
appendix .. (1)
article ... (1)
backmat ... (6)
book ...... (3)
chapitre .. (3)
chapter ... (5)
epistle ... (1)
livre ..... (2)
member .... (14)
part ...... (6)
recipe .... (7)
section ... (18)
subsection (13) The option ft
marks the expression as a fulltext
expression, for which a compact syntax has been defined. Find TEI ODD files
containing all of these phrases in any order: TEI corpus
, TEI customization
, TEI
conformance
, TEI modules
> fox "frameworks/tei//*.odd
[\matches-pattern('tei corpus / tei customization / tei conformance / tei modules # ft')]"
>>>>
.../frameworks/tei/xml/tei/custom/odd/tei_corpus.odd
2.3.5 XML and not
Not only XML, but also some non-XML formats can be parsed into node trees. This
expression returns the property names defined in a JSON schema:
> fox "frameworks/json//openAPIScenario.jschema/jdoc()\\properties\*\name()
=> sort() => distinct-values()"
>>>>
apiKey
authorization
body
bodyContent
bodyType
description
expectedResponse
httpBasic
httpBearer
...
2.3.6 Bulk validation
The infospace encourages to think of large and distributed sets of information items
as simple building blocks from which to compose expressions. You may validate sets
of
documents against sets of XSDs in a bulk fashion. The function returns an aggregated
report of validation results:
> fox -o validation-report.xml "frameworks/dita/(.//*.xml => xsd-validate-ec(.//*.xsd))"
2.3.7 Annotated file system trees
Function ftree-view() creates a tree representation of selected
resources, optionally annotated with expression values.
Get a tree representation of the from.xsl and to.xsl
stylesheets found in the TEI framework.
> fox "frameworks/tei//(from.xsl, to.xsl) => ftree-view()"
>>>>
<ftree context=".../frameworks/tei/xml/tei/stylesheet" countFo="134" countFi="128">
<fo name="profiles">
<fo name="acm">
<fo name="latex">
<fi name="to.xsl"/>
</fo>
<fo name="pdf">
<fi name="to.xsl"/>
</fo>
</fo>
<fo name="adams">
<fo name="docx">
<fi name="from.xsl"/>
</fo>
</fo>
<fo name="agora">
<fo name="docx">
<fi name="from.xsl"/>
<fi name="to.xsl"/>
</fo>
<fo name="html">
<fi name="from.xsl"/>
<fi name="to.xsl"/>
</fo>
...
</fo>
...
</fo>
</ftree> Now we proceed to add annotations to the tree: optional param elements
representing the names of stylesheet parameters. Note the curly braces enclosing
expressions passed to a function as arguments.
> fox "frameworks/tei//(from.xsl, to.xsl)
=> ftree-view(('params/param?', {\*\xsl:param\@name\string() => sort()} ))"
>>>>
<ftree context=".../frameworks/tei/xml/tei/stylesheet" countFo="134" countFi="128">
<fo name="profiles">
<fo name="acm">
<fo name="latex">
<fi name="to.xsl">
<params>
<param>attLength</param>
<param>classParameters</param>
<param>documentclass</param>
<param>longtables</param>
<param>spaceCharacter</param>
</params>
</fi>
</fo>
<fo name="pdf">...
</fo>
...
<fo name="adams">...
<fo name="agora">
<fo name="docx">...
<fo name="html">
<fi name="from.xsl"/>
<fi name="to.xsl">
<params>
<param>autoToc</param>
<param>bottomNavigationPanel</param>
<param>cssFile</param>
...
</params>
</fi>
</fo>
</fo>
...
</fo>
...
</ftree>
2.3.8 Modified file system copies
Several functions enable the modification of files (deleting, renaming, replacing,
inserting nodes). Combining the modification with a file tree copy, one achieves a
bulk
update resulting in an updated file tree. The following example creates a file tree
containing copies of all TEI XSD documents, with any xs:annotation elements
removed:
> fox "frameworks/tei//*.xsd/doc-resource()\delete-nodes({\\xs:annotation})\pretty-node()
=> file-tree-copy(tmp)"A second example creates a modified file tree with missing
@attributeFormDefault attributes added:
> fox "frameworks//*.xsd/doc-resource()\
insert-nodes({\*[not(@attributeFormDefault)]},
{xatt-ec('unqualified', 'attributeFormDefault')})
=> file-tree-copy('tmp')" Using Foxpath, checking the outcome is very convenient:
> fox "frameworks/tei//*.xsd => count()"
>>>>
57
> fox "frameworks/tei//*.xsd[\*\@attributeFormDefault] => count()"
>>>>
0
> fox "tmp//*.xsd[\*\@attributeFormDefault] => count()"
>>>>
57
Part 3: ixml in the infospace
3.1 Overview
One of the fundamental ideas of the infospace is the availability of file content
trees. Ideally, this would be the case for any kind of file. But how does reality
look like?
Depending on the underlying technology, some file formats are naturally represented
as
trees. Modern versions of XPath and XQuery like BaseX (1) can process
not only XML, but also JSON, CSV and HTML. Other formats with similar data models
like YAML
are not supported out of the box. Extracting structure from unsupported file formats
can
always be implemented by specialized parsers embedded as extension functions, but
this is a
rather heavy-handed approach and must be well justified.
Invisible XML (or short ixml) (7) was introduced as a light-weight, declarative
alternative. This is an excellent fit if one wants to enlarge the set of formats
that can be navigated: previously opaque text content is unlocked and reveals
its inner structure. By sharing ixml grammars, this additional structure may immediately
be
used by a multitude of different systems and projects.
We understand Foxpath not only as a practical tool for day-to-day work, but also
as a
test bed for how a user experience of the infospace might feel like. We identified
four
possible extension points where ixml significantly enhances its expressive power:
-
File formats are associated with ixml grammars, enabling the usual Foxpath
navigation capabilities to extend into these formats.
-
Inside of already tree-structured formats, there may be text nodes containing
inherently structured data. Applying ixml the user can navigate into these structures.
-
ixml grammars can be used as node tests, enabling complex quality control
scenarios.
-
Unified string expressions are extended to allow matching against an ixml grammar.
The flexibility of USEs is thus further enhanced, even suggesting a usefulness beyond
the confines of the Foxpath language.
3.2 Infospace definition document
As explained in the introduction, the heart of the infospace idea is the continuation
of
file system tree structure into file content tree structure. The more file formats
can be
parsed into node trees, the better, and we added a new extension function which supports
the
parsing of any file format for which an ixml grammar is available. But one should
also
remember that different file formats require different parsing approaches – different
built-in parsing functions (e.g. doc, json-doc,
html-doc, csv-doc) and different grammars. This variability
disturbs the sensation of seamless navigation, as it distracts from the navigation
logic
itself. (Which parse function must I call or which grammar must I use in order to
enter this
particular file?)
The ideal of seamlessness is restored by hiding the operation of parsing completely
-
the processor should know
when and how to parse a file. The user should have the illusion
that files are trees, rather than they can be parsed into trees. As
various examples in the preceding section showed, the problem of when
to parse has already been solved: any navigation into content is automatically preceded
by
parsing. But hitherto, automated parsing was always XML parsing. This limitation must
be
overcome in order to maximize the sensation of seamlessness. The processor must know
how to parse, which function or grammar to use. The goal is achieved
by a new feature, the infospace definition document.
Differences of how to parse a file are captured by the notion of a resource
type. A resource type is defined by a particular parsing function (e.g. XML
parse function or JSON parse function) or a particular ixml gramar. The infospace
definition
specifies an automated mapping of URIs to resource types. The document has three main
parts:
-
Element grammars – defines available grammars in terms of a file URI
and a short name
-
Element rtypes – defines resource types in terms of a parse function or
a grammar
-
Element rtypeUses – defines a mapping of resource URIs to a resource
type and, possibly, options to be used when parsing
The infospace definition helps to create the illusion of tree structure to be ubiquitous
and pre-existing, with no need to create or extend it. The following table shows a
few
examples enabled by the current standard infospace definition.
Table IV
Examples of seamless navigation into various resource
types.
| Goal |
Foxpath expression |
| Extract JSON field data |
frameworks/tei//*.json\\title => freq()
|
| Filter MS Words documents by fulltext pattern |
frameworks/tei//*.docx[\matches-pattern('relevant patent rights | royalty
payments#ft')]
|
|
Inspect HTML links
|
frameworks/tei//jteiHints.html\\*:a\@href => freq()
|
|
Count CSV rows
|
samples//sample.csv\\record => count()
|
|
List CSS properties
|
frameworks/tei//tei.css\\property\name[matches-pattern('font-*')] =>
sort()
|
The following listing shows a snippet of the standard infospace definition:
<ispace>
<!-- ========
Grammars
======== -->
<grammars baseURI="../grammar">
<grammar name="css" uri="css.ixml" type="ixml"/>
<grammar name="isbn" uri="isbn.ixml" type="ixml"/>
<grammar name="iso8601" uri="iso8601.ixml" type="ixml"/>
<grammar name="words" uri="words.ixml" type="ixml"/>
</grammars>
<!-- ==============
Resource types
============== -->
<rtypes>
<rtype name="xml">
<docFn>doc#1</docFn>
<parseFn>parse-xml#1</parseFn>
</rtype>
<rtype name="json">
<docFn>json:doc#1</docFn>
<parseFn>json:parse#1</parseFn>
</rtype>
<rtype name="html">
<docFn>html:doc#1</docFn>
<parseFn>html:parse#1</parseFn>
</rtype>
<rtype name="csv">
<docFn>csv:doc#2</docFn>
<parseFn>csv:parse#2</parseFn>
</rtype>
<rtype name="docx">
<docFn>docx:doc#1</docFn>
</rtype>
<rtype name="css">
<grammar ref="css"/>
</rtype>
<rtype name="words.ixml">
<grammar ref="words"/>
</rtype>
</rtypes>
<!-- ==================
Resource type uses
================== -->
<rtypeUses>
<!-- .xml etc. -->
<case>
<condition>
<file name="*.dita *.ditamap *.docbook *.nvdl *.odd *.tei
*.xhtml *.xml *.xsd *.xsl *.xslt "/>
</condition>
<rtypeUse rtype="xml" final="yes"/>
</case>
<!-- *.html *.htm -->
<case>
<condition>
<file name="*.html *.htm"/>
</condition>
<rtypeUse rtype="xml"/> <!-- try to parse as XML -->
<rtypeUse rtype="html" final="yes"/> <!-- parse as HTML -->
</case>
<!-- *.json -->
<case>
<condition>
<file name="*.json *.jsonld *.jsonschema *.jschema"/>
</condition>
<rtypeUse rtype="json" final="yes"/>
</case>
<!-- *.docx -->
...
<!-- *.csv -->
...
<!-- *.css -->
...
<!-- *.words.txt -->
...
<!-- Last attempt - parse as XML -->
<case>
<rtypeUse rtype="xml"/>
</case>
</rtypeUses>
</ispace>The user can provide an own infospace definition (option -s
infospace-path), which may take advantage of project-specific information.
Alternatively, she can extend the definition (option -x
infospace-path) by supplying an additional definition whose
grammars, resource types and resource type uses are merged into the standard definition.
For
example, an extension may map .csv files contained by a folder with a
particular name to a parsing of CSV which uses semicolons as separator and expects
a header
line. The following example also adds a mapping of .jso files to the JSON
resource type:
<ispace>
<rtypeUses>
<case>
<condition>
<file name="*.jso"/>
</condition>
<rtypeUse rtype="json"/>
</case>
<case>
<condition>
<file parentName="countries"/>
</condition>
<rtypeUse rtype="csv">
<options>
<option name="header" value="yes"/>
<option name="separator" value=";"/>
</options>
</rtypeUse>
</case>
</rtypeUses>
</ispace>
3.3 New extension functions
Sometimes it is required to parse a file using a grammar which is not automatically
associated with the URI. This is enabled by a new extension function idoc().
Other ixml-related extension functions offer the parsing of a string, the validation
of a
string against a grammar and the expansion of text nodes into a subtree of items.
In all
cases, the grammar can be identified by short name as assigned in the infospace definition,
or by URI.
Table V
Several ixml-related extension functions.
| Function |
Description |
|
idoc($uri, $grammar)
|
Returns the document resulting from parsing the resource at $uri
according to $grammar
|
|
iparse($text, $grammar)
|
Parses $text according to $grammar
|
|
ivalid($text, $grammar)
|
Tests if $text conforms to $grammar
|
|
iexpand-nodes($targetExpr, $grammar)
|
Replaces the text content of target nodes with the parse result according to
$grammar
|
Transforming content trees by replacing node values with some dynamically computed
value is already a feature of Foxpath (function replace-values($targetNodesExpr,
$newValueExpr)). The new function iexpand-nodes() has a similar
behaviour. It replaces text nodes selected by an expression with the result of parsing
that
text with an ixml grammar. This can be understood as revealing the full structure
of the
content that has previously been hidden in unanalyzed text nodes.
3.4 Examples
The examples in this section have been tested using as working directory the root
folder of an installation of the Oxygen XML editor. When copying the example code
into the
command line, please remove linefeeds, which have been added for the sake of readability.
See Appendix A for instructions how to install
Foxpath.
Example: Navigate CSS contents
Report the values of CSS property border-color.
> fox "frameworks/tei//*.css\\property[name eq 'border-color']\value => freq()"
>>>>
#000000 .. (3)
#3C3C3C .. (4)
#c5d8bb .. (5)
#ffe1ad .. (5)
#ffe7e8 .. (5)
Black .... (1)
blue ..... (2)
darkred .. (1)
green .... (1)
grey ..... (1)
LightBlue (1)
lightgrey (1)
red ...... (7)
white .... (2)
yellow ... (1)
Example: Use ivalid() - report invalid
ISBNs
Report invalid ISBNs grouped by file name.
> fox "frameworks/jats//*.xml
\\*:isbn[not(ivalid('#isbn'))]
\tuple(base-fname(.), .)
=> hlist('File name, Invalid ISBN')"
>>>>
================================
File name
. Invalid ISBN
================================
bitso-book-of-parts-oasis.xml
. 1-234 567890-123
. xxx-1
bitso-book-part1-oasis.xml
. 1-234 567890-123
. xxx-1
bitso-samplesmall-book-oasis.xml
. 1-234 567890-123
. xxx-1Example: Use iparse() - extract data from
text nodes
Find the oldest document.
> fox "frameworks/tei//*.xml\\dc:date\iparse('#iso8601')\\year => min()"
>>>>
1667
Part 4: Future work
Apart from continuously adding further extension functions, we plan to address the
following topics:
-
Integration of external tools, using ixml to import results
-
Access to the structure of binary files
-
Plugin system for user-specific extension functions
-
Visualization of expression values implemented as HTML pages
-
Control over the handling of ambiguous parse results
Part 5: Final thoughts
Foxpath can be understood as a proof of concept of the infospace experience. We regard
it
as successful.
However, it should be remembered that Foxpath is a language with a strong focus on
interactive use and the power of succinct expressions. After all, it is only an extended
version of XPath. This limitation of the Foxpath language must not be mistaken for
a feature
of the infospace. Its true potential can only be realized by a full-fledged programming
language. As XPath is a subset of XQuery and Foxpath is a backward-compatible extension
of
XPath, XQuery is an obvious candidate for a programming language which implements
the
infospace abstraction.
Appendix A. Installation of Foxpath
The Foxpath processor (3) is written in XQuery (10) and requires the installation of BaseX (1),
version 11 or higher. In order to install Foxpath, proceed as follows:
-
Download and install BaseX from here: basex.org/download
-
Clone the Foxpath project from here:
https://github.com/hrennau/foxpath
-
Add the bin folder of the BaseX installation to the classpath.
-
Add the bin folder of the Foxpath installation to the classpath.
-
Done – now you can use Foxpath on the command line by calling the
fox.bat or fox.sh shell script, depending on the operation
system.
Note that the Foxpath expression passed to the script should be enclosed in quotes
or
double quotes: fox "expression".
In order to execute a Foxpath expression stored in a file, use option -f:
fox -f prog-path.
References
[1] BaseX – an open source XML database. Homepage. http://basex.org
[2] Rennau, Hans-Jürgen. FOXpath – an
expression language for selecting files and folders.
Presented at Balisage: The Markup
Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup
Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016).
doi:https://doi.org/10.4242/BalisageVol17.Rennau01. https://www.balisage.net/Proceedings/vol17/html/Rennau01/BalisageVol17-Rennau01.html
[3] Rennau, Hans-Jürgen. Foxpath.
Github
repository. https://github.com/hrennau/foxpath
[4] Rennau, Hans-Jürgen. FOXpath navigation
of physical, virtual and literal file systems.
Presented at xmlprague, February 9 - 11, 2017. In XML Prague 2017 Conference Proceedings. https://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf
[5] glob (programming). Wikipedia article. https://en.wikipedia.org/wiki/Glob_%28programming%29
[6] Rennau, Hans-Jürgen. The XML info space.
Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9,
2013. In
Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies,
vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Rennau01
[7] Invisible XML. Homepage. https://invisiblexml.org
[8] Robie, Jonathan, et al., eds. XML Path Language
(XPath), W3C Recommendation 08 April 2014. https://www.w3.org/TR/2014/REC-xpath-30-20140408/
[9] Kay, Michael, ed. XPath and
XQuery Functions and Operators 3.0. W3C Recommendation 08 April 2014. http://www.w3.org/TR/xpath-functions-30/
[10] Robie, Jonathan, Michael Dyck, eds. XQuery
3.1: An XML Query Language. W3C Candidate Recommendation 18 December 2014. http://www.w3.org/TR/xquery-31/
[11] Walsh, Norman, et al., eds. XQuery and XPath Data
Model 3.1. W3C Recommendation 21 March 2017. http://www.w3.org/TR/xpath-datamodel-31/
×Rennau, Hans-Jürgen. The XML info space.
Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9,
2013. In
Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies,
vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Rennau01