foo.xml
) and want to retrieve all
countryCode
elements, no matter where they occur in the document.
The following XPath expression does just that:
countryCode
instances that do not contain a valid two-letter
code according to ISO 3166. As a code dictionary in XML format is available in the
interneturi
element containing the document URI, which may be a file name
or an HTTP address. In order to apply the check to all documents found in the
document list, again we adapt the first step of our navigation:
uri
elements containing a document URI, whereas all inner nodes have the purpose of
adding structure, implicitly creating groups of documents. All elements - inner
nodes and uri
elements - may have attributes supplying metadata
pertaining to the document(s) referenced by the element itself or its descendants.
Schematically, such an inventory might look like this: uri
elements which match certain conditions, and finally the code
dictionary – is achieved by a single expression, without taking actions like opening
a file and without shifting information from data sources into intermediate
variables.$faultyCodes
), we can later resume navigation, using those
nodes as starting points. If the documents contain somewhere a
changeLog
element, we can collect the change logs of all faulty
documents:
uri
elements, crosses over into the referenced documents and
continues its path down into that document:
select
commands are constrained to specify a primary key. fn:doc
function), for example due to the need of supplying
authorization information or an HTTP body: this is the fn:doc
function, which resolves
a URI to the corresponding document. This is a lookup requiring beforehand knowledge of
the key – neither structural nor semantic conditions can be specified. Regarded as a
means of navigation, the fn:doc
function produces “manifold motion making
little speed” (S. T. Coleridge, “The butterfly”), as it delivers only one result item
per item of knowledge. The fn:collection
function offers multiple items for
the price of a single URI, but the collection is a pre-defined unit, which often will
not match the actual requirements. xlink
axis with an “arc test” selecting arcs. The expression result
could then be collected from the ending resources of the selected arcs. role
property and arcs to have an optional
arcrole
property, where both properties have as value a URI.
Therefore a selection of arcs might be specified by two URIs, a role
selecting extended links and an arcrole
selecting arcs within those
links. If we now deviate from the XLink standard and assume that arcs have an
additional property, an “arclabel”
with an NCName as value, then arc
selection might be specified by a single QName, as a QName consists of a URI plus an
NCName. The “arc test” selecting an arc might now be specified by a QName, in
analogy to the name test which is also specified by a QName, but selects elements or
attributes . Consider the following expression:
role
equal to the URI
bound to prefix “x” arclabel
equal to “y” xlink
step is applied, or any ancestor
of that “a” element. xlink
axis, XLink navigation might
be supported by a new standard function, fn:xlink
. In this case, the
set of extended links to be considered might alternatively be taken from the static
context or from linkbase documents supplied as a parameter: //country[@code eq 'FR']
have a sweeping
nature which removes any structural assumptions apart from which document the
information is located in. But when the target might be located in other documents,
the lack of support may become an issue. Navigation between documents requires
knowledge of a URI. In the general case this will be the target document URI, as
support for named collections (the fn:collection
function) is
implementation-defined and hence not portable. Without such support we simply cannot
“look into” other documents without first providing a document URI. fn:collection
function, we can
access documents without knowledge of individual document URIs. But collections are
a problematic answer to the challenge of data-driven navigation. The collection is
not yet the destination of our navigation, but a set of candidates from which to
select the destination. In order to navigate into one country description, two
hundred country descriptions have to be parsed (unless they are located in an XML
database) and queried (unless we can resort to a database index). The greater the
collection, the greater the overhead incurred by processing the collection in order
to identify the one or few matching items. When there are a large number of
candidates – e.g. accumulated log messages – the approach of constructing all
candidate nodes and then filtering them can quickly become unfeasible. xs:untypedAtomic
. The value space of p-faces
is accordingly constrained. node URI | countryCode | neighbourCountries | rivers |
---|---|---|---|
file://countries/fr.xml |
fr | at, be, ch, de, es, it, lu, mc | Rhein, Loire, Maas, Rhone, Seine, Garonne, Mosel, Marne, … |
file://countries/de.xml |
de | at, be, ch, cz, dk, fr, lu, nl, pl | Donau, Elbe, Rhein, Oder, Weser, Main, Havel, Mosel, … |
file://countries/it.xml |
it | at, ch, fr, sm, si, va | Drava, Po, Adige, Tevere, Adda, Oglio, Tanaro, Ticino, ... |
fn:nodes
, and one of its
signatures might be:
fn:nodes
function would be translated into a SQL
query. It is also possible to store the p-faced collection in a NOSQL object
collection. The p-filter engine would then be the NOSQL data base system, and
the p-filter would be translated into a NOSQL query. Any p-filter engine can be
combined with the same node builder, as the mapping of a string to a node is in
no way influenced by the way how the string was obtained. and, or, not
) applied to the results of atomic and/or
Boolean conditions. A p-filter can therefore be viewed as a logical tree
with leaves representing atomic conditions and inner nodes representing
Boolean conditions. some
or every
) specifying if all
property value items or only at least one item must pass the test. filter
and a namespace URI yet to be decided. (For the time
being we assume the URI
http://www.w3.org/2013/xpath-structures
). In the following text
we use the prefix p to denote this URI. The filter element is either empty
or has element-only content. It can have any number of child elements. Each
child element represents either a property condition or a Boolean condition. Condition component | XML representationn | Explanation |
---|---|---|
Property name | The local name of the element. | Note that property names are constrained to be NCNames. If this is changed, the property name will be represented by the value of a @name attribute. |
Test value | Either the single text child of the element, or the
p:term child elements. |
If there is only one test value item, it is represented by the
text content of the element. If there are several items, use for
each item a p:term child element with a single text
child. Note that test value items are literal – they cannot be
references to other property values. Also note that test value
items cannot be accompanied by type information. |
Operator | Optional @op attribute; default value: =; valid values: =, !=, <, <=, >, >=, ~, %, #=, #!=, #<, #<=, #>, #>= | A leading # indicates a numeric comparison; ~ is a regex match governed by the XPath regex rules; % is a regex or pattern match with implementation-defined semantics. |
Quantifier | Option @qua attribute; default value: some ; valid
values: some, every
|
If the value is every , the condition requires every
item of the property value to meet the condition; otherwise,
only at least one item must meet the condition. |
fn:matches
. and, or
or
not
and the same namespace URI as the p:filter
element. Examples: every
) is encoded by a prefix character
preceding the operator string (e.g. $=). Example of a p-filter rendered in
query syntax:true
false
if at
least one child element evaluates to false
and
condition evaluations to
false
if at least one child element evaluates
to false
or
condition evaluates to true
if at least one child element evaluates to true
not
condition evaluates to false
if at least one child element evaluates to true
false
some
evaluates to true
if at least one property value
item and at least one test value item meet the comparison
constraint every
evaluates to false
if for at least one property
value item there is no test value item with which it meets the
comparison constraint xs:untypedAtomic
xs:double
fn:matches
function, using the property value
item as first argument and the regex and options parts of the
test value item as second and third arguments. If the test value
item contains a # character, the regex and options parts are the
substrings before and after the character; otherwise, the regex
part is the complete item and the options part is the empty
string. fn:nodes
, three variants may be useful: <p:collection>
element with two attributes providing
the collection URI and the (not URI-encoded) p-filter. Example: fn:doc
function to a URI which is a concatenation of collection URI and (URI-encoded)
p-filter in query syntax. The URI is expected to yield a document which
represents the selected nodes as child elements of the document element (see
fn:nodes
function might operate
as follows: Collection feature | Description | Remark |
---|---|---|
{operators} | A list of supported operators. The list must contain the “=” operator, all other operators are optional. | A p-filter must not use an operator not contained in {operators}. |
{every-supported} | If false , a p-filter must not contain the
every quantifier. |
Every implementation must support the some
quantifier. |
{multiple-test-supported} | If false , a p-filter must not specify test values
with multiple items. |
The use of multiple test values can be emulated using
p:or . |
fn:doc
. These functions might address: fn:doc
variant which delivers CSV data as an XML
node tree based on the table vocabulary. It enables us to select all invalid
country codes by a simple expression:
R1(G)
denotes the result of applying R1 to a
format instance G and R2(N)
denotes the result of applying R2 to a
node tree N. model
. fn:doc
which has an additional parameter
specifying the source data format:
table
element to have only
row
children and row
elements to have only
cell
children. An additional constraint might prescribe that
every row
element has the same number of cell children:
xml:type
attributes. However, knowing in
advance that the XML representation of JSON-supplied data will produce XML
documents which are studded with pseudo-attributes - rendering them hardly
readable - we are motivated to go a step further and truly extend the XML
syntax, introducing new syntax constructs. The goal is to ensure readable XML
representations also of such node trees as are constructed from JSON instances.
The following syntax extensions should therefore be considered: udl:markup
)
which identifies the syntax used to encode the element content. As a consequence,
the following text would be a valid XML document: getWeatherResponse
element
has three child elements, as the content of an element with
udl:markup='...'
is defined to be the nodes resulting from parsing
the text found between start tag and end tag according to the parsing rules for
format '...'. fn:doc
, fn:httpDoc
, ...) which hide the
actual data formats and expose the node structure. Taken together, the node structure of
all accessible resources provides a uniform substrate for navigation and discovery.