Nowadays, the Extensible Markup Language (XML) is broadly accepted as a standardized serialization format and universal transfer syntax. According to the well-formedness constraints ("properly nested"), XML allows the markup of content only in strict monohierarchical structures – markup in non-hierarchical or multi-hierarchical structures is not inherently provided in the language: This deficit is sufficiently discussed in the literature and represents a problem for many application scenarios.
Unfortunately there is a large gap between the XML tree model and the models traditionally used in software engineering[...] Markup-based models revolve around order and hierarchy whereas the more popular graph-based data models revolve around linking relationships and roles. A linked data structure (directed graph) can more directly express sophisticated information than can a simple tree.
The descriptive markup language, FML, provides an unrestricted freestyle markup that addresses problems and deficiencies with monohierarchical structure constraints. The term "freestyle" has been chosen to reflect the languages requirement to allow an intuitive-freehand markup.
According to the language taxonomy from Coombs 1987, FML is a generalized descriptive markup language. It is not restricted to any particular application scenario; it is a metalanguage, like SGML/XML, but more powerful since it supports more structuring features.
[...] it is time to determine what additional features are required from markup systems to make the formal description of such non-hierarchical phenomena straightforward.
As a preparation 17 requirements for the project FML have been carefully identified and evaluated. The development of the new data-centric markup language demands the consideration of the following distinctive aspects:
For the syntax of the language a precise system of rules has to be defined. Only a clear grammar allows an automatic analysis of language instances. Each FML-document has to be syntactically constructed by deduction sequences and recognized as being wellformed by an automaton. In comparison with to the 107 production rules of XML we have to make the notation rules of FML clearer and more understandable, according to the following statement:
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race.
For the benefit of the requirements entropy, ergonomics and only restricted necessary XML-compatibility many obsolete SGML-constructs, redundancies and practical irrelevant relicts may be segregated. FML has to concentrate on the essential: polyhierarchical markup of texts and data.
For FML-documents a Chomsky-type-2-grammar consisting of a minimal set of production ruleshas to be provided in EBNF notation. Thus FML may be produced by a context-free grammar and recognized by a nondeterministic pushdown automaton.
Nowadays XML is the lingua franca of the Internet and EDI-systems and probably the most common serialization format and transfer syntax.
What started out as a simple standard for electronic publishing has matured into one of the most important and widely used paradigms in distributed computing. The impact and influence of the standard and the conceptual base are visible in every area of computing today.
This fact deserves special attention. First of all FML syntactically will not be compatible with XML since it introduces new concepts and structures that XML could not prepare. Additionally critical constructs like CDATA Sections [Walsh 2003] and non-transferrable concepts like DTD will be excluded. Past studies often attach great importance to compatibility. This leads occasionally to quite complicated results and crude workarounds. FML defies all concerns regarding strict compatibility. That approach of course has advantages and disadvantages. Moreover FML will adopt as many accepted and well-proven XML-syntaxes and interpretations as possible without interfering other requirements. Thus an FML-document might be a wellformed XML-document but does not have to be. Anyway, there will be a high recognition factor.
For supporting the compatibility issue two further steps are taken: Developing migration-guidelines XML→FML and setting up a XML representation of FML (milestone approach) (see section “XML Representation”).
SGML-compatibility and the established OHCO-view
[...] we can describe a text as an "ordered hierarchy of content objects"
lead to an inseparable bond of markup and monohierarchical structures. Generally FML follows the appraisal
[...]we now know that the breaking of strict hierarchies is the rule, rather than the exception
but still has to pay tribute to the broadly accepted monohierarchical understanding: FML needs a nested property for deciding structural compliance. Furthermore monohierarchies shall be marked up exactly the same way as in XML.
Examinations about complex relations far beyond monohierarchies are coltishly made in an experimental novel (with more information in footnote-annotations than in the main content):
These pieces won’t halt: the boundary of a book is less than air to them. These pieces wink at each other, they shnoogle sighingly, they meet to confer, they part, they wave adieu and zip toward different mental planet zones, they reproduce, they tease us with coherence, they grimace and coil about and finagle, they repeat one another, they flaunt, they taunt, they sail away. Maybe only a deity – if deities exist – explains (or is) these splinters’ unity.
Texts (and data in general as well) may not be interpreted as a hierarchy of content-components, as Schmidt 2009 concludes:
Indeed, it is now generally recognised that literary and linguistic texts frequently or even predominantly exhibit overlapping structures.
The challenge of interference structures (crossover-markup, concurring-markup, overlapping-markup) and XML´s inability to represent them natively
Markup under any SGML/XML convention is unable to easily represent arbitrary structures or to represent overlapping or concurrent structures.
FML is requested to handle this issue naturally and without applying any cumbersome workarounds like milestone, stand-off, fragmentation, virtual joins, or redundant encoding.
If the content of a sample document is the symbol-sequenceand gets augmented with the two typographic properties italic and bluewe will get the following document structure:FML must encode this simple issue not fragmented (as it would be done in HTML)but native and intuitive:
Start- and end-tags enclose the elements content. In XML corresponding start- and end-tags get identified by an equal tag-name. If more than one tag-pair with the same tag-name is involved, then identifying correspondence is ambiguous when allowing interference.
To illustrate the dilemma lets draw another typographic example and introduce the tag u with its semantic underline. The markup scenariomight be interpreted either as V1 or V2:If a document contains n start-tags Ts with the same tag-name (and n end-tags Te with that tag-name), then the possible document interpretation I ranges between 1 and n! = |Ts|! = | Te|!. Thus, self-overlapping-markup is a special case of interference and may not be clearly interpreted without introducing a new concept:
FML must provide an mechanism (similar to the co-indexing scheme introduced in Huitfeldt 2001) for assigning an unique ID to a tag-pair. With ID´s, one start-tag gets unambiguously joined to a corresponding end-tag with the same name. Each assignment of an unique ID to a tag-pair decrements I. Tag-ID´s shall be optional. Due to compatibility, the Matrjoschka-doll-principle applies where the ID-mechanism is not in use.
The principle of pluralism also applies to markup structures. In FML parallel markup must be possible. Parent-child-relationships between tags do not have to be enforced where there is no hierarchical relation.
Lets mark up content with the typographic properties bold (tag-name: b), italic (tag-name: i) and red (tag-name: r):Without using any workaround constructs XML may mark up this structure inherently only by one of the two alternativesBut that is an overspecification. Neither is b a child of i nor is b parent of i.The content is just b AND i. If an author would insert text between </i> and </b> then content would be produced that is b but not i. Insertion between </b> and <r> would produce detached content without any typographic properties. This lack of representing congruent markup may lead to inconsistencies during document maintenance. Also the original congruent structure may not be reconstructed from the markup representation.
Therefore FML has to provide a native tagging-mechanism for the treatment of congruent semantics and secure document maintenance.
Heterogeneous structures have to be marked up in one redundant-free document. One semantic interpretation may exist besides other ones in the same document. Implied interference does not have to be subject to any restrictive structure constraints. Each markup-component in an FML document is non-ambiguously assigned to exactly one perspective (respectively default-perspective). During the transformation of an FML document to an FML graph particular perspectives may be optionally included or excluded. For an FML graph an unlimited number of perspective-nodes - sharing the same document content-nodes - will directly follow the document-root-node.
This important topic has been discussed for a long time [Stührenberg 2006] [Witt 2007] and there are plenty of scenarios that would benefit from markup ability to handle independent structures. For instance, the American National Corpus would not have to redundantly administrate their linguistic inventory seven times stand-off [Ide 2006]. They could administrate, search, edit, and analyze securely in one consolidated document.
FML-elements may be composed via distributed ordered segments. By this means, the linear sequential ordering of content-symbols will be totally broken up. This is a rather terrific feature that takes advantage of the reusability-paradigm and facilitates arbitrary creations within a profane content.A new element may use content symbols from any position and in any order. For this purpose, segment-ID´s and order-ID´s have to be allocated to participating segment-tags.
For instance this feature would allow to compose all words of the German language (120.000) in one redundant-free document having only the german alphabet as its content:
Under the term Discontinuous Structures a comparable approach has been discussed in Sperberg-McQueen 2008.
Each fragment of a wellformed FML-document must as well be a wellformed document. The cut-off points may be anywhere in the content and outside of embedded markup-components. One implication of that will be, that tag-pairs might break apart. Therefore, a mechanism of completion has to apply during transformation. The location of missing start- or end-tags can be assumed to be before or after the document borders.
In the document prolog a document may optionally be declared as being a fragment. This way external processors get supported in identifying and composing distributed document-fragments. XInclude confirms the usefulness of this concept, that prepares further modularization and inclusion mechanisms:
Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism.
Sometimes semantic objects, structural interruptions or relevant positions can be identified in a text- or data-sequence before adequate markup-tags are known for sure. FML shall offer a solution for those scenarios: A placeholder will be introduced. Embedding wildcards must be allowed everywhere in the content of a document. In due course the placeholder might be substituted for an empty-, start-, end-, multi-tag, processing instruction, comment, or content. The three states of validity (wellformed, nested, valid) are not influenced by wildcards.
The attributes of elements shall be passed to their children. True to the statement of J. W. v. Goethe:
That which you've inherited from your fathers, earn it in order to possess it.
A typographic scenario helps to illustrate this concept:If the properties italic an bold get marked upthen it is obvious, that the inner format element (row 6) implicitly owns the property .
For FML the following inheritance rules are set:
An elements attributes also apply for its child elements.
Child attributes dominate parent attributes.
Multiple inheritance leads to lossless consolidation in an ordered attribute-value list.
Queries proceed in the direction child→parent.
Therefore an element might have far more attributes than it declares itself.
Neither does this feature affect the syntax of an FML document (except introducing an ordered attribute-value list) nor the properties of an FML graph. But the concept must apply when attributes are retrieved within an implementation of accessors.
There are two representations of an FML-instance:
Each wellformed FML-document must have an unambiguously corresponding graph. The graph will interpret and visualize the document and is the basis for graph operations and for a data model within an implementation.
FML follows the fundamental statement
The computer representation of a document should reflect what really is[DeRose 1990]. The FML graph must be a consistent polyhierarchical graph following the evolution of data structures:
Long, long ago, people stored data in lists, because that was all that was available. Then, someone came up with the idea of storing data in tables. So relational databases came along and people moved up the ladder to tables. A few years ago, XML came along so data moved up again to trees. Can you guess what will happen next? The Semantic Web folks want us to move to using graphs. Should we move to graphs? Seems to be the next logical step in information evolution. What’s holding us back? Well, it’s probably too soon. The world is still in the tree phase.
Any derivation sequence of the context-free grammar of an FML document will lead to a monohierarchy. But the FML graph is polyhierarchical. Constructions have to apply, since document grammar is different to document interpretation. A system of rules for the bidirectional transformation has to be defined. For both translation directions it must be possible to include relevant or to exclude irrelevant perspectives.
An FML instance has to be unique. As well the interpretation of an FML document by an FML graph and the transformation between grammar and graph. In particular the following requirements have to be considered:
All nodes of an FML graph must relate to corresponding components.
All encapsulated informations in the nodes of an FML graph must relate to grammar-units.
The bidirectional transformation must be unambiguous.
No loss of information during transformation. Each change in one of the FML instance representation forms leads to a predictable change in the other one.
For SGML/XML that requirement gets discussed with the terms unambiguous markup and deterministic content model [Brüggemann 1993]. Unambiguousness guarantees integrity and consistency of FML instances, secure interpretation and reliability during information exchange with FML.
A markup meta-language has no influence on the structured content. FML may not regulate content redundancies, but controls the syntactic rules for markup. Intensive use of markup increases document volume. The real content may shrink to only a fraction of the markup document.
Compared to XML the markup coding expenses have to be reduced according to the principle of syntactical minimalism. XML deals very lavishly with markup characters. An example is the syntax for comments:
Why 'waste' 7 symbols? That style of encoding is not primarily subject to the requirement high entropy or minimal redundance.
The utility of a language as a tool of thought increases with the range of topics it can treat, but decreases with the amount of vocabulary and the complexity of grammatical rules which the user must keep in mind. Economy of notation is therefore important. Economy requires that a large number of ideas be expressible in terms of a relatively small vocabulary.
An FML document shall use only a minimal and necessary amount of information content for markup control characters. On the other hand notation minimalism must avoid an extensive overload of symbols. Therefore FML will use a psychological optimum considering the requirements compatibility and ergonomics.
Historically the most important processor of documents is the reading and writing human [Fischer 2004].
Several kinds of markup (punctuation, decorative, orienting, structuring, informative, commenting and print markup) apply to documents for supporting the natural reading process. For the data-centric approach
markup has nothing to do with publishing, Brian Reid, the developer of Scribe, pointed out in the conclusion of Reid 1998. Now FML is a generalized and not a typographic markup language and respects the data-centric approach. Still the established document-centric ergonomic demands are transferable to the human-FML interaction:
Convenience: providing necessary and minimize dispensable functionality,
Usability: high quality, capacity and performance,
Perceivability: technological transparency, clarity and unambiguousness,
Readability: comprehensible documents and models,
Learnability: minimal training period,
Documentation: covering all information needs required by users.
Additional criteria apply for an implementation.
Generally, FML documents may be much more complex (more difficult to read and to process) than XML-documents, since for a human mind they are not as easy to capture, as monohierarchies are. In spite of that, authors have a higher degree of freedom: Documents may be marked up natively and without any synthetic constructs. Structure breaks do not have to be restored. Insertion tasks are less critical. Document-interpretation with an FML graph is transparent.
Another ergonomic requirement is proposed: the freestyle-criteria. The term freestyle indicates that the performer of an activity may determine the way of technical execution to a certain extent by himself. Also the performer is largely independent from restricting rules. This general definition [Wikipedia 2009] shall also apply when the performer is the author of an FML document and the activity is modelling and writing a document.
FML must support multilingual content and identifiers and therefore strictly use the Unicode character encoding UTF-8. Secure interchange and global encoding potentials will benefit from that stringent document condition. This way FML follows the distinct encoding tendency and the general encoding recommendation of the W3C:
[. . . ]to provide an unambiguous encoding of the content of plain text, ultimately covering all languages in the world, but also major text-based notational systems for science, technology, music, and scholarship.
The secondary writing direction will be inalterable vertical, from top to bottom. The primary writing direction is horizontal, distinguishing dextrograde and sinistrograde writing. This way also semitic languages are supported.
Furthermore FML has to provide a stable reference implementation and a concise specification.
The technological architecture of FML is outlined, interaction of the function units is illustrated:
Everything should be made as easy as possible, but not easier.
— Albert Einstein
Production rule: fml.document
No special handling of formatting symbols (as in XML) applies in the grammar (
trim may apply at the implementation level): All symbols, also clear spacing, tabulators,
wrappings, etc. are integral part of the content sequence. Only the markup initiating symbol
< has to be escaped properly:
Production rule: fml.content
global document informations with the optional attributes
fml.name: name of the document
fml.uri: unique URI of the document
fml.description: document description
fml.version: FML version number (
fml.fragment: fragment identification
fml.schema: physical URI of a validating schema
fml.writing-direction="lr"|"rl": determination of the writing-direction for external content processors
global informations about possible perspectives within the document with the optional attributes
fml.perspective.uri: unique URI of the perspective
fml.perspective.description: perspective description
fml.perspective.schema: physical URI of a validating schema (overrides global schema-assignment for this perspective)
global informations about possible namespaces within the document with the optional attributes
fml.namespace.name: unique name of the namespace (may apply as a tag-name-prefix)
fml.namespace.uri: unique URI of the namespace
fml.namespace.description: namespace description
Production rule: fml.prolog
The start-tag and the empty-tag optionally contain any number of ordered attributes. Each of them contains an unlimited number of ordered values, a data-list. Attribute assignment may as well apply for start-tags and empty-tags in a multiple-tag. The name of an assigned attribute must be unique within one tag. Language inherent attributes are
fml.trim: instruction to a parser about the treatment of formatting symbols during document interpretation
fml.segment.pos: position within the segment (shortcut: tag-suffix
Each tag is unambiguously assigned to a namespace. The name of the namespace precedes the name of the tag as a prefix. If there is no namespace assigned, then the tag is in the default-namespace. Also each tag is unambiguously assigned to a perspective (or default-perspective).
ID may be assigned as a suffix to a start-tag or to an end-tag (see concept identification). In the same perspective
pi, in the same namespace
nsj and for the same tag-name
IDs must be unique in the set of start-tags
TS[pi][nsj][tnk] and they must be unique in the set of end-tags
Production rule: fml.tag
A comment is a simple markup-component for the representation of any documentations. It has absolutely no structuring motive and produces no semantic dependencies. A comment may not explicitly be assigned to a perspective, but is a direct or indirect child of all perspectives.
Production rule: fml.comment
PIs are targeted by external processors and neither have a structuring motive nor they are of any relevant semantic for the language FML. Actually processing instructions are a procedural relict within a declarative markup language. A processing instruction may explicitly be assigned to another perspective than the default-perspective.
A processing instruction encapsulates two literals:
target for identifying the external processor and
instruction as a command for that processor.
Production rule: fml.pi
A wildcard is a simple placeholder for later substitution into one component of type tag, comment or processing instruction. Before substitution no structuring motive has to be assumed. A placeholder is just a declaration of intent. A wildcard may explicitly be assigned to another perspective than the default-perspective.
Production rule: fml.wildcard
fml.document = fml.prolog? (fml.content | fml.tag | fml.comment | fml.pi | fml.wildcard)* ;
fml.prolog = fml.prolog.document fml.prolog.perspective* fml.prolog.namespace* ;
fml.prolog.document = '<@' ('fml.name="' fml.attribute.value '"') (space 'fml.uri="' fml.attribute.value '"')? (space 'fml.description="' fml.attribute.value '"')? (space 'fml.version="' fml.attribute.value '"')? (space 'fml.fragment="' fml.attribute.value '"')? (space 'fml.schema="' fml.attribute.value '"')? (space 'fml.trim="' ('true'|'false') '"')? (space 'fml.writing-direction="' ('lr'|'rl') '"')? '>' linewrap ;
fml.prolog.perspective = '<@' ('fml.perspective.name="' fml.attribute.value '"') (space 'fml.perspective.uri="' fml.attribute.value '"')? (space 'fml.perspective.description="' fml.attribute.value '"')? (space 'fml.perspective.schema="' fml.attribute.value '"')? '>' linewrap ;
fml.prolog.namespace = '<@' ('fml.namespace.name="' fml.attribute.value '"') (space 'fml.namespace.uri="' fml.attribute.value '"')? (space 'fml.namespace.description="' fml.attribute.value '"')? '>' linewrap ;
fml.content = ( (UTF-8-character - '<') | '\<' )* ;
fml.tag = '<' (fml.tag.start | fml.tag.end | fml.tag.empty | fml.tag.multiple) '>' ;
fml.tag.start = (fml.perspective.name '|')? (fml.namespace.name ':')? fml.tag.name (fml.tag.id)? (fml.tag.segment)? (space fml.attribute)* ;
fml.tag.end = (fml.perspective.name '|')? '/' (fml.namespace.name ':')? fml.tag.name (fml.tag.id)? ;
fml.tag.empty = (fml.perspective.name '|')? (fml.namespace.name ':')? fml.tag.name (fml.tag.segment)? (space fml.attribute)* '/' ;
fml.tag.multiple = (fml.tag.start | fml.tag.end | fml.tag.empty) (fml.tag.start | fml.tag.end | fml.tag.empty)+ ;
fml.tag.name = fml.name ;
fml.tag.id = '#' fml.name ;
fml.tag.segment = '%' fml.segment.id '%' fml.segment.pos ;
fml.segment.id = fml.name ;
fml.segment.pos = number ;
fml.perspective.name = fml.name ;
fml.namespace.name = fml.name ;
fml.attribute = fml.attribute.name '="' fml.attribute.value '"' (',"' fml.attribute.value '"')* ;
fml.attribute.name = fml.name ;
fml.attribute.value = ( (UTF-8-character - '"') | '\"' )* ;
fml.comment = '<!' fml.comment.content '!>' ;
fml.comment.content = | UTF-8-character | (UTF-8-character? ( ('!' (UTF-8-character - '>') ) | ( (UTF-8-character - '!') UTF-8-character) )+ );
fml.pi = '<?' (fml.perspective.name '|')? fml.pi.target space fml.pi.instruction '>' ;
fml.pi.target = fml.name ;
fml.pi.instruction = ( (UTF-8-character - '>') | '\>' )+ ;
fml.wildcard = '<' (fml.perspective.name '|')? '>' ;
fml.name = ( (UTF-8-character - escape-symbols.exclude) | escape-symbols.include )+ ;
UTF-8-character = [U+0000 - U+FFFF] ;
space = U+0020 ;
linewrap = U+000A ;
number = ( '0' | '1', '2', | '3' | '4' | '5' | '6' | '7' | '8' | '9' )+ ;
escape-symbols.exclude = ( '>', '<', '\', '@', '?', '!', '/', '|', ':', '#', '%', space ) ;
escape-symbols.include = ( '\>' | '\<', '\\', | '\@' | '\?' | '\!' | '\/' | '\|' | '\:' | '\#' | '\%' | '\' space ) ;
The grammar displays that there are not many constraints regarding character usage for names and identifiers. Actually, pretty much cryptic and delicate literals are constructable. But FML takes up the position of maximum freedom and justifiable demands minimal restrictions.
One additional rule applies: Names and identifiers must not begin with 'fml.'. This literal is generally reserved for language inherent constructs. A lexical parser has to check this constraint within postprocessing.
Three states of validity will be distinguished:
A FML-document conforming strictly to the grammar of the language definition (see section “Grammar”) will be in the state wellformed, and thus, may be transformed (see transformation) into a corresponding FML-graph (see FML graph).
If an FML-document is wellformed and its corresponding FML-graph has exactly one root-element-node and none of the other element-nodes have more than one parent (monohierarchical structure) it will be in the state nested.
The validity status nested responds to the question, whether the new monohierarchy-suspending freestyle features Interference (see concept interference), Congruence (see concept congruence), Independence (see concept independence), or Segmentation (see concept segmentation) have been used or whether structural XML-compatibility exists.
Now the graph gets defined, graph-theoretic characteristics are outlined.
fml.node-type = 'fml.node.document'
fml.trim? = 'true' | 'false'
fml.writing-direction? = 'lr' | 'rl'
(fml.perspective.name, fml.perspective.uri, fml.perspective.description, fml.perspective.schema)*
(fml.namespace.name, fml.namespace.uri, fml.namespace.description)*
Virtual node underneath the document-node for the subordination of all components that belong to a particular perspective. Not the perspectives declared in the prolog apply, only the perspectives actually in use. A special case is the
default-perspective, that applies for all components that are not assigned to any perspective (also if the optional perspective-concept
is not in use at all). No content may get excluded: The nodes of the type fml.node.perspective include all nodes of the type fml.node.content as their direct or indirect children.
fml.node-type = 'fml.node.perspective'
The UTF-8 character-sequence between two markup-components forms a node of type fml.node.content. The content may also be empty (∅) as an insert position for further editing, or filled only with formatting characters (~). So these nodes represent the real content and are always leafs in an FML graph.
fml.node-type = 'fml.node.content'
Elements are the primary instrument for structuring a polyhierarchical FML graph. Two corresponding tags will form a node of type fml.node.element. This node directly or indirectly subordinates exactly to one perspective, may be a child of several elements, and may itself be parent of other elements. Elements content may be enriched by attributes.
fml.node-type = 'fml.node.element'
fml.element.autocompleted? = 'start-tag' | 'end-tag'
fml.element.trim? = 'true' | 'false'
As a child an attribute enriches a node of type fml.node.element. It contains an ordered list of content.
fml.node-type = 'fml.node.attribute'
Comments represent any documentation an have no motive to structure anything.
fml.node-type = 'fml.node.comment'
Processing instructions are addressed by external document processors and have no structuring motive.
Components: processing instruction
fml.node-type = 'fml.node.pi'
fml.node-type = 'fml.node.wildcard'
The edge-set exclusively consists of directed edges between two nodes that are arranged in a hierarchical relation: . Except of fml.node.document each node possesses at least one incoming edge. fml.node.content, fml.node.comment, fml.node.pi and fml.node.wildcard will never have outgoing edges.
Each edge is labeled with . For all outgoing edges of a node, numbering starts with
0 and increments without any gaps. That value indicates the position of the corresponding
components within the linear sequence of an FML document.
Parent-child-relations between node-types are possible with directed edges and the following cardinalities:
An FML graph has the following characteristics:
No loop road exists.
Each edge represents a parent→child relation.
At most one edge between two different vertices.
A least one path exists from the root node to any other node.
Edges are labeled with numbers.
Nodes encapsulate attribute-value pairs.
There is exactly on node with children and without any parents.
Nodes without children exist.
A direct and an indirect path between a parent and a child may never exist simultaneously. Redundant paths between two nodes may exist, but exclude direct connections.
Nodes may have multiple parents. Each node (except root-node) can be reached by at least one directed path from the root. Polyhierarchies are self-similar: Each subgraph is as well a polyhierarchy.
The following graph exemplifies a polyhierarchy and is in accordance with the graph-theoretic characteristics:
With GXL , graphs and graph-schemata can be described and exchanged. At http://www.freestyle-markup.org/gxl/fml-graph.schema.gxl a GXL -schema is presented, that formally describes FML graphs and may validate particular graph-instances.
There are 9 fundamental FML-concepts, that consider all requirements and use the features of FML documents and FML graphs. These concepts mainly arise from new possibilities in an unrestricted polyhierarchical environment.
According to the two general characteristics for markup (
[. . . ] part of the text, yet distinguishable from it.
Raymond 1993), annotations are embedded in the content and separable from it. The position of markup-components (tag, comment, processing instruction, wildcard) in the content-sequence is very impotant: Changing it leads to a totally different document interpretation,
a different FML graph. Markup embedding is based upon simple rules:
Markup section start is indicated by the familiar symbol "<".
Markup section end is indicated by the familiar symbol ">".
The prolog-components are optional but have to be well ordered and the first annotations in the document.
Informations about the document itself, perspectives and namespaces may be declared at the very beginning of an FML document. These declarations are optional and a support for authors, editors and external processors for the evaluation of the document. The attributes of each declaration are self-describing and relevant for all use cases of a declarative markup language. For document interpretation all declared informations will be lossless transformed into the attributed root of an FML graph, a node of the type fml.node.document.
The following example contains all of the optional declarations with all possible attributes.
For FML the primary instrument for structuring document content are tags: empty-tags and start-tags with corresponding end-tags. A corresponding tag-pair forms an element in an FML graph interpretation and subordinates the data sequence in between. Tagging and interpretation is handled about the same as in XML/SGML, except that there are less restrictions and more features.
A root-element is possible, but not necessary. Isolated tags (tags with a missing corresponding start- or end-tag) are alright: Missing start or end is assumed beyond document borders (see concept fragmentation).
Attributes refine tags and enrich them with additional informations. This specialization is realized by declaring an ordered list of attributes within start-tags or empty-tags in an FML document. Within one tag each attribute name must be unique. Therefore elements may possess an unlimited number of children, nodes of type fml.node.attribute. Each attribute again contains a non-empty ordered value-list.
According to the object-oriented inheritance paradigm FML offers a mechanism for passing on attributes from parents to children: An element
e1 will inherit from element
e1 is a direct or indirect child of
e0. Neither this concept has syntactical implications for an FML document nor it influences the transformation into an FML graph. Rather the inheritance concept reflects on the level of implementation: With the
Attribut Element.getAttributes(boolean enableInheritance) attributes are queried with or without inheritance, depending on the parameter
The following scenario illustrates concept and usage of inheritance.
f*and the specializing attributes
ucase. Four markups along these properties will result in the following document structure:
f0.getAttributes(false): style="u", color="r" f0.getAttributes(true): style="u", color="r" f1.getAttributes(false): color="b" f1.getAttributes(true): color="b", style="u" f2.getAttributes(false): ucase="t" f2.getAttributes(true): ucase="t", style="i","u", color="r" f3.getAttributes(false): style="i" f3.getAttributes(true): style="i"
The markup of interfering structures is not subject to any restrictions and demands
no special attention. Start- and endpositions of semantic segments are marked up separately
and independently. Resulting interferences are absolutely permitted. A content-segment
that is located in the markup-area of
n corresponding tag-pairs will directly or indirectly be a child of
n elements in an FML graph.
The following scenario illustrates the handling of overlapping. For clearness, the semantic markup segments are typographic ones.
FML content: Marking up the content
i), bold (
b) and red (
The concept Identification serves for the secure assignment of corresponding tags. Generally a start-tag and an end-tag will correspond, if they exhibit the same name in the same perspective and in the
same namespace. But if a document contains several start-tags of the same name or several end-tags of the same name, then the identification of corresponding tags will be ambivalent for element-construction. This interpretation uncertainty will
be dissolved with the concept identification: The assignment (applying tag-name-suffix
#IDi) of an unique identification key
IDi to both parts of a tag-pair will weld them and assure correspondency.
This concept is convenient for scenarios where complex structures with interferring markup occur and where the available tag-set has a high degree of semantic overload.
The following typographic example illustrates the use of the concept identification:
FML content: For the markup of the content
f) is available. The specialization of
fgets effected with the attribute
styleand its value-range
bfor bold and
rfor red. The resulting document structure is the following:
If we now markup the interfering structures without applying the concept identification
<f>forms with which
</f>an element?" must be determinable (actually
2! + 3! = 8element arrangements are possible).
The right result can be achieved applying the simple and intuitive concept identification:
The markup of congruent structures gets realized using indecomposable multiple-tags. A multiple-tag unites any number of start-tags, end-tags and empty-tags in an irrelevant order, not allowing any content in between.
The concept congruence ensures structural security and consistency. Placing tags one after another in a sequence results in unintended hierarchical relations and the feasibility to break congruence by inserting content between the tags.
The following scenario illustrates the markup of congruent structures:
FML content: Marking up the content
i), bold (
b) and red (
FML admits a consolidated markup of heterogeneous perspectives into one redundance-free
document. One perspective represents an identified annotation layer. The components
tag, processing instruction and wildcard may optionally be assigned to exactly one perspective. The assignment to a perspective
Pi with the unique name
Piname will be effected by inserting the prefix
Piname| right after the markup delimiter
<. This assignment concept applies for components, not for attributes or for each tag in a multiple-tag.
All of the perspectives in use will share the documents content. Components that are not explicit assigned to a perspective will be subordinated to the default-perspective. Thus each FML document contains at least one perspective.
Perspectives may optionally be declared in the prolog. Neither declared perspectives have to be in use in the document nor the perspectives in use in the document have to be declared in the prolog. For a document interpretation within an FML graph only the perspectives that are really in use will be transformed to a node of type fml.node.perspective. During transformation optionally any perspectives may get included or excluded.
I1 + I2:
I1 + I2:
The powerful concept segmentation offers the composition of new content based on segments of the existing content. A virtual element encapsulates the content of participating tags and will be subordinated to the corresponding perspective-node (fml.node.perspective) in an FML graph as a direct child with the constant position
-1. Since beside position and run length also quantity and order of involved segments
is arbitrary via the language inherent tag-attributes
fml.segment.pos, boundless composition possibilities arise from the charachter pool of the linear
content sequence of a document.
An intensive use of this concept will decrease readability, but for pure mechanical operating fulminant potentials arise. For instance the treasury of words of a language could without redundancy be entirely marked up based on only the alphabet characters of that language.
In the following example
A fragment of a wellformed FML document is also a wellformed FML document. As a consequence of that, start-tags or end-tags may also be arranged outside of the document borders. If a start-tag can not find a corresponding end-tag, then it may be assumed that it marks up to the very end of the document. For the purpose of document interpretation the missing end-tag will be inserted at the end of the document. If a end-tag can not find a corresponding start-tag, then it may be assumed that it marks up from the very beginning of the document. For the purpose of document interpretation the missing start-tag will be inserted at the beginning of the document. If several tags have to be added at the beginning or at the end of the document, then those are subsumed into one multiple-tag.
That automatic completion takes place during the transformation of an FML document to an FML graph. For an FML graph-node of type fml.node.element the attribute
fml.element.autocompleted = "start-tag" | "end-tag" indicates that automatic completion applies. The re-transformation to an FML document will reconstruct the fragmented document with isolated tags.
The main benefits of the concept fragmentation are, that documents can be serialized and exchanged fragmented, and that - according to the requirement entropy - a sparingly encoding is generally possible.
In the following example FML document1 has been marked up with 4 tags. Each tag is isolated. Automatic completion (row
) will lead to FML document2, which then may be interpreted in an FML graph.
During transformation, an FML document will be converted to an FML graph. Markup has to be transformed into nodes. Parent-child-relations between the nodes have to be recognized. The result is a graph that visualizes the polyhierarchical relations between content and markup. The graph will be clear without ambiguity and may be transformed back into exactly the same original document.
Lexical analysis: The FML-document gets checked for coding- and grammar-consensus.
Parsing: Transforming the FML-document into a graph representing the document grammar (see section “Grammar”).
Identification: Identifying corresponding tag pairs.
Tag→Element: Corresponding tag pairs will be transformed. At this point the term element (representing the major FML-graph-node) gets introduced.
Reorganization: Realization of various reorganization actions for building up a consistent FML-graph compliant to the definition (see FML graph).
Segment node: According to the segmentation, virtual segment-nodes will be inserted.
A milestone-based XML-representation for FML is available:
The evaluation of the FML requirements has been accompanied by a survey about the freestyle-concepts
As expected, the acceptance correlates with the clearness of the document and graph visualisation, and lies between 13-79 percent. The primary reasons for refusal or scepticism were missing XML-conformance and a lack of imaginable application scenarios for the unfamiliar concepts. That is understandable. But we have to point out that the four discussed concepts are totally optional. If document structures are simple enough, there will be no need to care about features beyond monohierarchies. Therefore, each single positive voice justifies markup evolution in general.
The first timeline illustrates how FML arranges proportional among the most established comparable markup technologies:
The second timeline shows how FML proportionally integrates among these approaches:
The following overview evaluates how the listed approaches refer to the FML requirements. Does the technology discuss the requirement aspect in a solution oriented way (+), or is the aspect off its research (-) ?
Many markup syntaxes other than XML have existed. Also various approaches for representing markup structures in data models. Why FML? Why should FML be different? First of all FML generally addresses more requirements at once, therefore provides more features. And second of all it is more than a thought experiment or an academic mind game. In particular the data-centric approach and the need of data modelers and software engineers is addressed. The development of FML initially began with a carefull reference analysis and the identification of appropriate requirements and broadly desired markup features. The development will end with a consistent specification and reference implementation. FML strives for the goal to provide completed and verified results, an implementation and specification of a base technology, ready for immediate software-integration.
For the near future the following actions are planned:
setting up a reference implementation for FML,
launching an editor for authoring FML-documents and visualizing FML-graphs,
integrating expected feedback,
proofing sufficiency with concrete scenarios,
maintenance of the project platform http://www.freestyle-markup.org,
conceptualization for secondary technologies:
Working with markup implicates far more than only serializing and deserializing between FML-documents and corresponding FML-graphs. Markup documents will be validated, transformed, scanned, extracted, edited, enriched, visualized, compressed, transported, archived, and processed in various ways. The situation is comparable to other theoretical foundations like Codd’s RDBMS:
relational databases required hundreds of thousands algorithmic innovations to make it work well
Basically, processing a polyhierarchical structure will cost more efforts than processing XML, since the structure is less regular. Secondary technologies have to be discussed intensely. Future work continues here.
The future of markup is an incontrovertible matter [Adler 2010].
The new and innovative technology
Freestyle Markup Language
The basis of its constructive development was given by the dominant de facto standard XML, various deficit-discourses, discussions about alternative solutions, as well as personal developer experiences in the areas single source publishing, cross-media content production, content management systems, and data modelling in general.
The objective of FML is to overcome current markup restrictions and to encourage the evolution of markup. In particular, the terminologies "overlapping markup", "concurring markup", "crossover markup", "multiple annotated markup", "multi-dimensional markup", and "multi-hierarchical markup" are addressed. FML acts with its contrary freestyle-approach against the monohierarchical "properly nested"-idea of the restrictive language XML and encourages an almost unlimited and unrestricted use of markup without any root- or hierarchy-bondage. Even corresponding tags may be omitted: If a semantic unit starts and applies to the rest of the document, just and only place a start-tag, if a semantic unit ends and applies from the beginning of the document, just and only place an end-tag. Do not care about nesting! Concentrate on semantically marking your content up!
The identified and within a survey evaluated requirements for FML have been discussed, freestyle-concepts were pictured, a formal document-grammar and graph-representation were presented, a XML-milestone-representation was defined, and future work has been outlined.
Scenarios for the usage of FML would mostly arise from systems, where a
transparent basis technology for an unambiguous interchange of data is required,
various perspectives across the same redundance-free document and
semistructured data have to be marked up.
The following areas of application would be predestinated for a cooperation:
knowledge data bases,
typographic description languages,
CMP, CMS, EDMS-applications,
multiple authoring systems,
revision control systems,
communication protocols and
Finally, a typographic marked up scenario, illustrated as a word cloud, will visualize the primary terminologies relevant to this study:
[Adler 2006] Sharon Adler, Roberta Cochrane, John F. Morar Alfred Spector:
Technical context and cultural consequences of XML. IBM SYSTEMS JOURNAL ’Celebrating 10 Years of XML’, 45(2):207–223, Jun. 2006, http://www.research.ibm.com/journal/sj45-2.html.
[Adler 2010] Sharon C. Adler:
Why is Markup Still Important after 30 years and Will it Still be Important in Another
30. XML Prague 2010 - Conference Proceedings, Prague, Mar. 2010, http://www.xmlprague.cz/2010/presentations/Sharon_C_Adler_Why_is_Markup_Still_Important_after_30_years_and_Will_it_Still_be_Important_in_Another_30.pdf.
[Brüggemann 1993] Anne Brüggemann-Klein:
Formal Models in Document Processing. Habilitation, Institut für Informatik, Universität Freiburg, 1993, ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps.
[Cajori 1928] Florian Cajori:
Cajori's A History of Mathematical Notations: Notations in Elementary Mathematics. Open Court, 1928.
[Carletta 2003] Jean Carletta, Jonathan Kilgour, Timothy J. O’Donnell, Stefan Evert, Holger Voormann:
The NITE Object Model Library for Handling Structured Linguistic Annotation on Multimodal
Data Sets. In: Proceedings of the EACL Workshop on Language Technology and the Semantic Web
(3rd Workshop on NLP and XML, NLPXML-2003), Apr. 2003, http://www.ims.uni-stuttgart.de/projekte/corplex/paper/evert/CarlettaEtal2003.pdf.
[Chatti 2007] Noureddine Chatti, Souha Kaouk, Sylvie Calabretto, Jean-Marie Pinon:
MultiX: an XML-based formalism to encode multi-structured documents. In: Proceedings of Extreme Markup Languages, Montréal, Aug. 2007.
[Collins 1969] A. Collins, M. Quillian:
Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 8(2):240–247, Apr. 1969, doi:https://doi.org/10.1016/S0022-5371(69)80069-1.
[Coombs 1987] James H. Coombs, Allen H. Renear und Steven J. DeRose:
Markup systems and the future of scholarly text processing. Commun. ACM, 30(11):933–947, 1987, http://www.fdi.ucm.es/profesor/jlsierra/e-learning/primera-sesion/MarkupSystems.pdf, doi:https://doi.org/10.1145/32206.32209.
[DeRose 1990] Steven J. DeRose, David G. Durand, Elli Mylonas, Allen H. Renear:
What is text, really?, Journal of Computing in Higher Education, 1(2):3–26, Dec. 1990, http://antonietta.philo.unibo.it/IUcorso2006-07/risorse/ohco.pdf, doi:https://doi.org/10.1007/BF02941632.
[Durand 1996] David G. Durand, Elli Mylonas, Steven J. Derose:
What Should Markup Really Be? Applying theories of text to the design of markup systems. 1996, http://xml.coverpages.org/DurandWhatShouldTextBe-ALLC1996.pdf.
[Durusau 2002] Patrick Durusau, Matthew Brook O’Donnell:
Coming down from the trees. Next step in the evolution of markup?. In: Proceedings of Extreme Markup Languages, Montréal, Aug. 2002.
[Ernst 2009] Daniel J. Ernst, Daniel E. Stevenson, Paul J. Wagner:
Hybrid and custom data structures: evolution of the data structures course. SIGCSE Bull, Vol. 41, Nr. 3, 2009, pages 213-217, http://www.cs.uwec.edu/~ernstdj/Papers/hybrid.pdf, doi:https://doi.org/10.1145/1595496.1562945.
[Firesmith1995] Donald Firesmith:
Inheritance Guidelines. JOOP - Journal of Object-Oriented Programming, 8(2):67–72, 1995.
[Fischer 2004] Steven Roger Fischer:
A History of Writing. Reaktion Books, Apr. 2004.
[Freytag 2007] Asmus Freytag, Martin Dürst:
Unicode in XML and other Markup Languages. W3C Note, Mai 2007, http://www.w3.org/TR/2007/NOTE-unicode-xml-20070516.
[Goldbarth 2003] Albert Goldbarth:
Pieces of Payne: A Novel. Graywolf Press, Apr. 2003.
[Guitton 2010] Pedro Guitton:
A homage to Typography. Gingko Press, Barcelona, Spain, Apr. 2010.
[Hilbert 2005] Mirco Hilbert, Andreas Witt, Oliver Schonefeld:
Making CONCUR work. In: Proceedings of Extreme Markup Languages, Montréal, Aug. 2005.
[Huitfeldt 2001] Claus Huitfeldt, C. M. Sperberg-McQueen:
TexMECS: An experimental markup meta-language for complex documents. Jan. 2001, http://decentius.aksis.uib.no/mlcd/2003/Papers/texmecs.html.
[Ide 2006] Nancy Ide und Keith Suderman:
Integrating linguistic resources: The american national corpus model. In: Proceedings of the 6th International Conference on Language Resources and Evaluation,
[Jagadish 2004] H. V. Jagadish, Divesh Srivastava, Laks V. S. Lakshmanan, Monica Scannapieco, Nuwee
Colorful XML: One hierarchy isnt enough.. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of
Data, Paris, Jun. 2004, pages 251–262, http://www.eecs.umich.edu/db/timber/files/mct.pdf, doi:https://doi.org/10.1145/1007568.1007598.
[Orlowski 2003] Andrew Orlowski:
Bruce Lindsay on Codd’s relational legacy. The Register, Apr. 2003, http://www.theregister.co.uk/2003/04/25/bruce_lindsay_on_codds_relational.
[Prescod 2000] Paul Prescod:
From markup to object model. The XML abstraction problem and XML property objects. XML EUROPE 2000, 6 2000, ISOGEN Inc., Paris, France, http://www.gca.org/papers/xmleurope2000/pdf/s12-02.pdf.
[Raymond 1993] Darrell R. Raymond, Frank W. Tompa:
Markup reconsidered. Technical Report 356, Department of Computer Science, The University of Western
Ontario, 1993. Presented at the First International Workshop on the Principles of
Document Processing, Washinton DC, Oct. 21-23, 1992, http://www.oasis-open.org/cover/raymmark.ps.
[Renear 1993] Allen Renear, Elli Mylonas und David Durand:
Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies. In: N. Ide und S. Hockey (publisher): Research in Humanities Computer, Oxford University
Press, 1993, http://www.ideals.illinois.edu/bitstream/handle/2142/9407/RefiningOurNotion.pdf.
[Schmidt 2009] Desmond Schmidt, Robert Colomb:
A data structure for representing multi-version texts online. Int. J. Hum.-Comput. Stud., 67(6):497–514, 2009, http://www.itee.uq.edu.au/~schmidt/_articles/elsevier.pdf, doi:https://doi.org/10.1016/j.ijhcs.2009.02.001.
[Sperberg-McQueen 2000] C. M. Sperberg-McQueen, Claus Huitfeldt:
GODDAG: A Data Structure for Overlapping Hierarchies. In: DDEP/PODDP, Springer, Sep. 2007, pages 213-217, http://decentius.aksis.uib.no/mlcd/2000/Papers/Goddag.pdf.
[Sperberg-McQueen 2007] C. M. Sperberg-McQueen:
Representation of overlapping structures. In: Proceedings of Extreme Markup Languages, Montréal, Aug. 2007, http://conferences.idealliance.org/extreme/html/2007/SperbergMcQueen01/EML2007SperbergMcQueen01.html.
[Sperberg-McQueen 2008] C. M. Sperberg-McQueen, Claus Huitfeldt:
Markup Discontinued: Discontinuity in TexMecs, Goddag structures, and rabbit/duck
grammars. In: Proceedings of Balisage: The Markup Conference 2008, Montréal, Aug. 2008, doi:https://doi.org/10.4242/BalisageVol1.Sperberg-McQueen01.
[Sperberg-McQueen 2009] C. M. Sperberg-McQueen, C. Michael, Lou Burnard:
TEI P5: Guidelines for Electronic Text Encoding and Interchange. The Text Encoding Initiative Consortium, Jul. 2009, http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf.
[Stührenberg 2006] Maik Stührenberg, Andreas Witt, Daniela Goecke, Dieter Metzing, Oliver Schonefeld:
Multidimensional Markup and Heterogeneous Linguistic Resources. In: Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional
Markup in Natural Language Processing., Trento, Italy, Apr. 4, 2006, http://www.aclweb.org/anthology/W/W06/W06-2715.pdf.
[Stührenberg 2009] Maik Stührenberg, Daniel Jettka:
A toolkit for multidimensional markup: The development of SGF to XStandoff. In: Proceedings of Balisage: The Markup Conference 2009, Montréal, Aug. 2009, doi:https://doi.org/10.4242/BalisageVol3.Stuhrenberg01.
[Tennison 2002] Jeni Tennison, Wendell Piez:
The Layered Markup and Annotation Language (LMNL). In: Extreme Markup Languages Conference, Montréal, Aug. 2002, http://xml.coverpages.org/LMNL-Abstract.html.
[Wikipedia 2009] Wikipedia:
Freistil — Wikipedia, Die freie Enzyklopädie. Online, 2009, http://de.wikipedia.org/w/index.php?title=Freistil&oldid=58624703 [online as at 5.7.2010].
[Winter 2002] Andreas Winter, Bernt Kullbach, Volker Riediger:
An Overview of the GXL Graph Exchange Language. Lecture Notes in Computer Science, 2269:324–337, 2002, http://www.gupro.de/GXL/Publications/winter+2002.pdf, doi:https://doi.org/10.1007/3-540-45875-1_25.
[Witt 2002] Andreas Witt:
Meaning and interpretation of concurrent markup. In: ALLC/ACH 2002, New directions in humanities computing. The 14th Joint International
Conference of the ALLC and ACH., pages 145–147. Tübingen, 2002, http://xml.coverpages.org/Witt-allc2002.html.
[Witt 2004] Andreas Witt:
Multiple hierarchies: new aspects of an old solution. In: Extreme Markup Languages Conference, Montréal, Aug. 2004, http://conferences.idealliance.org/extreme/html/2004/Witt01/EML2004Witt01.html.
[Witt 2007] Andreas Witt:
Guideline: Multiple Hierarchies. In: Digital Historical Corpora - Architecture, Annotation and Retrieval (Dagstuhl
Seminar Proceedings), 2007, http://drops.dagstuhl.de/volltexte/2007/1040/pdf/06491.WittAndreas.Paper.1040.pdf.
 UTF-8 is used by 55.3% of HTML/XHTML-websites whose character encoding is known [http://w3techs.com/technologies/overview/character_encoding as at 5.7.2010].
 Geometrical shape, foreground, background, font, text (labeling is optional and may be shortened arbitrarily by '...'-substitution).
 Notation: UML-multiplicity with
 Notation: UML-multiplicity with
 Backslash escaping instead of character entities has been choosen as a design choice of question. Character entities are cryptic and not necessary in an UTF-8-document (FML-documents must be UTF-8 encoded!). Developers commonly prefer the accepted escape character '\'.
 That is the only known german true pangram.
 This first palindromic sentence in English is credited to John Taylor (1580 - 1653). It uses an acceptable 17th century spelling of dwell.
 The tabular classification gives a basic overview, but only pretends an absolute assignment: Each markup approach has to be distinguish for itself individually.
 This binary classification is not a judgment, just a subjective appraisal in the in the authors discretion.