The Concrete Syntax of Documents: Purpose and Variety

Mary Holstege

Abstract

In the mid-eighties a group at Stanford built the MUIR language-development environment as a system for notation design with rendering and layout from the abstract syntax, parsing from concrete syntax, and semi-automated transformation between language variants. We developed models for representing documents at all levels and understanding how the levels relate to one another.

Presentation widgets have a purpose: to convey specific abstract syntax relationships. Having an account of what kinds of widgets there are, what kinds of abstract relationships there are, and how the two connect allows for an analysis of how the notation works as a whole. The concept of "notation" taken here is a broad one, encompassing programming or technical notations as well as the form of structured documents of various kinds.

Notation designers can apply such an analysis to improve their designs so that the structure is more clearly conveyed by the concrete syntax or so that humans can more readily use the notation without confusion. Software can render or parse instances of notations using rules that capture the concrete syntax, the abstract syntax, and the rules between them in a declarative.

Introduction

The Muir system (Winograd87) was built in the mid-eighties as a language development environment: to support parsing and rendering of samples of a programming language under development, but to also support changes in language design over time, and the re-rendering of scores of examples given changes to that design. As such, traditional parsing technologies were not up to the task, as details about keywords and ordering are intertwined. In addition, the motivating language relied on the use of nested space and tabular presentations for many effects. Again, traditional parsing technologies were not up to the task, and considerations that were normally part of document layout design came into play.

What did we learn building this system?

Separate the abstract syntax from the concrete system (aka separate presentation from structure). For this system this amounts to almost a pre-requisite, as the concrete syntax changed constantly.
Run rules both ways: from form to structure via parsing and from structure to form via rendering.
Language versioning is a form of language translation, differing mainly in degree.
Language versioning entails transformations of examples and the application of new rules. Some inference or human guidance may be required for difficult cases (see Normark87).
Use an abstract syntax specification that distinguishes rules that would manifest in the structure from rules that allow better organization of the grammar itself. In BNF, organizational non-terminals end up in the parse tree, but when the language is changing, this becomes clutter than makes change and language-to-language transformation more difficult. The Muir system used a kind of operator-phylum grammar derived from similar grammars in Donzeau-Gouge80 and Notkin86.
Separate the type of abstract syntax unit from its (named) role within its parent construct. The extensions made to the operator/phylum grammar gave us this, and allowed us to target specific subcomponent with rules.
Separate concrete syntax into distinct mini-rules. For change, it is better to pin keywords to specific pieces of the abstract syntax in separable chunks.
Presentation order relates closely to layout across space, and for non-lists is an aspect of concrete syntax that can change from one language version to another.
Bootstrap from a self-describing meta-grammar (Holstege87). This allowed us to change the rules driving the system itself more easily, using the system itself.

Holstege89 grew out of work on the Muir system, and extended the ideas to provide an account for documents of various kinds. It extended the model with several key notions:

Presentation widgets have a purpose: to convey specific abstract syntax relationships. Knowing the purpose of presentation widgets allows for analysis of a notation as a whole.
The introduction of an abstract geometry as well as a concrete geometry, having a similar relationship to each other as abstract syntax and concrete syntax do. The abstract geometry describes an abstract partitioning of 2½D space and provides a target for certain relationships. Concrete geometry handles physical constraints.
Various presentational effects can be analyzed as the fracturing of an abstract geometric space due to the constraints of the concrete geometry and the rendering of the content within its constraints.
Extension of the operator/phylum grammar for the abstract syntax rules to allow for cross-classifications by more than one phylum, partial inheritance of structural components from phyla to operators, and the introduction of a special "unmarked" operator.
A taxonomy of concrete syntax functions. These relate directly to the fairly small set of basic abstract syntax relationships.
A taxonomy of concrete syntax mechanisms. There is a wide variety of concrete syntax widgets, more if non-textual media are considered. The taxonomy provides for some way of organizing the madness, and making reasoned decisions about how mechanisms and functions relate in notational systems.
Use of case frames and marking theory as a inspiration for some aspects of the model.

The overarching theme of the work is that notations are notational systems that can be understood in relation to other notations and conventions. Furthermore, "notation" can be taken very broadly indeed, to encompass both programming languages, such as C++ or DL (the original target of the Muir system), but also layout conventions in natural language documents of various kinds such as restaurant menus, dictionaries, and research papers.

In this paper I will look at that model. Rather than attempt to recap an entire thesis in detail in one short paper, I will focus on the taxonomies and their application to understanding notational systems. A brief overview of the overall model will be provided, to provide the necessary context.

Model Overview

The gist of the model is that there is content (the flow of text and such) which is structured hierarchically (with some amount of cross-referencing) according to the rules of the abstract syntax. On the other hand there is the abstract geometry, which partitions space into hierarchical regions according to a set of abstract geometry rules (very similar to abstract syntax rules). Concrete syntax rules define how the content is annotated and partitioned into the abstract space. Given a particular concrete geometry (e.g. a specific page size) and a particular concrete rendering of the content into the abstract spaces, layout rules come into play, as well as rules for fracturing content groups that won't fit into their allocated space.

Running the rules the other way is, of course, rather more difficult, and may involve some amount of inference, depending on how well the notational system is put together. Given a formatted text, and knowledge of the rules of space and annotations, one can recover the marked text from the formatted text and the structured text from the marked text, because that is the point of those presentational devices. The hard part in practice, when applied to human texts, is (a) knowing what the rules are, because they differ from document to document and (b) handling ambiguities that humans are more tolerant of or that humans are indifferent to.

Abstract syntax rules define the kinds of logical structures there are: programs, statements, and expressions or articles, paragraphs, and figures. Cross-classifications may define secondary organizations as well. Abstract syntax rules define the composition of logical structures as either lists of the same kind of logical component, or as a group of named subunits of various kinds: condition:expression, consequence:statement, alternative:statement. Some of the subunits are identified as references to logical units of some kind: reference to section, location of figure.

Abstract geometric rules define the kinds of logical spaces there are: book, page, line. They define how those spaces are subdivided: a book consists of list of pages, a page consists of a header, a footer, and a body.

Concrete syntax rules bind the two together and define what other presentational devices come into play.

Figure 1: Overview

      BASE TEXT (CONTENT) ⟲ rethink
         ⭣       ⭡
   formulate  interpret           ⟸ ABSTRACT SYNTAX ⟲ change structure
         ⭣       ⭡
     STRUCTURED TEXT ⟲ reformulate
         ⭣       ⭡
        mark  parse               ⟸ CONCRETE SYNTAX ⟲ change marking
         ⭣       ⭡
       MARKED TEXT ⟲ hand mark
         ⭣       ⭡
      format  tokenize            ⟸ ABSTRACT GEOMETRIC ⟲ change layout
         ⭣       ⭡
      FORMATTED TEXT ⟲ hand layout
         ⭣       ⭡
      render  decode              ⟸ MEDIA RULES ⟲ change media spec
         ⭣       ⭡
     CONCRETE DOCUMENT ⟲ redraw

Rules of various sorts mediate processing between layers. Editing at various levels circumvents or augments those rules.

Taxonomy of Marking Functions

Marking functions are broken into broad classes based on their significativity, which defines their standing with respect to logical units in the abstract syntax structure.

Identifying

An identifying mark is significative, standing for the logical unit itself. Such marks are used to either label or reference a specific logical unit. For example, "§3.4.1" references a particular subsection of a document from elsewhere within it.

Label

A label defines a unique logical unit.

Labels may be some name component of that logical unit, or may involve counters of some kind (e.g. "Table 22").

Cross-reference

A cross-reference captures a long-distance dependency. References are references to something of a specific kind. References may normal references, location references (referring to the geometric space in which the referred component is placed), or indexed references (referring to some index count relative to a scoping logical unit).

Cross-references cut across the hierarchical organization and linear flows: "$definedVariable", "see section 12.1".

Structural

A structural mark is parasignificative, standing for some characteristic of the logical unit. Such marks are key to highlighting (or inferring, if you run the rules the other way) the logical structure of the document. Structural marks indicate the class of a logical unit or the case relationship between a logical unit and its parent. For example, "Education:" on a resume indicates the section of a resume containing information about degrees attained. The "else" in a conditional statement indicates the alternative part of the statement.

Type mark

A type mark identifies the logical type or class of a logical unit. An empty type mark is a special case. It is used for an empty or absent component.

The use of the word "Figure" to label a figure in a document is a type mark. The keyword "class" in C++ class declarations is a type mark.

Case mark

A case mark indicates composition of logical units into their subparts. An empty case mark is a special case. It is used for an empty or absent subcomponent.

Keywords such as "then" (C, etc.) or "where" (SQL, etc.) are case marks. They indicate not what kind of thing follows, but what its relation to the parent construct is.

Coherency

A coherency mark is non-significative, not standing for a logical unit at all. Such marks serve to bind or separate logical units, distinguishing them visually. Separation and binding are duals: binding one group necessarily separates it from other groups. For example, the lines around a table provide a boundary to contain the contents of the table and to distinguish it from its surroundings. Fracture marks are a special kind of binder, to handle situations where a group would be broken across concrete spaces due to limitations in the concrete geometry. Extra space at the bottom of a page to prevent a table from breaking is one example of a fracture mark. A discretionary hyphen at the end of the line to indicate a word break is another. So is the reduction in font size for an entry in a table that won't fit in the space allocated for a column.

Separator

A separator creates some visual break between the logical unit and others.

Whitespace is commonly used as a separator. In XQuery, commas separate items in a sequence.

Binder

A binder creates visual unity among a group of logical units. A fracture mark is a special case. It is used when concrete geometry limitations forces a group to break into a new concrete space.

Boxes, lines, and changes in background colour often serve as binders in human documents. In programming notations, binders usually fall into the informal practice, although some cases exist in the formal notation: the use of "BEGIN" and "END" to mark statement blocks in Pascal, for example.

Affective

Affective marks have a non-functional role, serving to set a tone or style for the document as a whole, or relate it to other documents by its similarity or contrast with them. For example, the choice of particular font family for the text in a document usually serves only a stylistic purpose. It may be possible to analyze affective marks as having functional roles in relating different documents or different kinds of documents, but that is outside the scope of this model.

Style Mark

A style mark sets default choices, defining the standard baseline against which all other marks are set in contrast.

Taxonomy of Marking Mechanisms

Marking mechanisms (or, for brevity, marks) are broken into broad classes based on their scriptality, which defines their standing with respect to script elements, and their lexicality, which defines their standing with respect to the characters forming the base of the notation. Lexical marks consist of lexical units (characters) themselves. Paralexical marks are co-occurrent with the lexical units, but not themselves lexical. Non-lexical marks use relationships between lexical units or other means not involving lexical units: arrangement in space, for example.

These dimensions are not entirely independent: a non-scriptal mark must be non-lexical, under the assumption we are describing a lexically based visual notation. Similarly, a scriptal mark cannot be paralexical: it either introduced a lexical element or it did not.

Punctive

Punctive marks add scriptal elements to the marked logic unit. They may be pure or symbolic, depending on whether the added element is lexical or non-lexical. For example, a question mark is a pure punctive mark. A logo marking the bottom of every page would be a symbolic punctive mark. Punctive marks are generally what people talk about when they talk about concrete syntax rules: what are the keywords?

Insertion

An insertion marks a logical unit by introducing a concrete mark to stand in place of it.

The use the "null" to stand for an empty list is an insertion.

Adjoinment

An adjoinment marks a logical unit by adding a concrete mark next to it, in some direction. Direction may be absolute (e.g. left, down) or relative to the prevailing direction (pre, super). The adjoinment creates a new group, consisting of the marked element and the concrete mark. Different groups of this kind may have different strengths, which may be specified.

Keywords in programming language notations are frequently adjoinments. The angle brackets in XML are adjoinments. Starting each item in a numbered list with a counter is an example of a prefixing adjoinment.

Lining

Lining is the adjoinment of an extended or repeated mark. In the abstract geometry of the model, it is placed not within a box, but within the margin of the box. The stretch and shrink of the margin carries the mark with it. In addition, since the mark is in a margin, it is inseparably cohesive with the contents of the box and cannot be subject to fracturing.

Underlining the words in a title is an example of lining in the down direction.

Prosodic

Prosodic marks are parascriptal, altering the appearance of existing scriptal elements. These tend not to see much action in the context of programming language notations, but play a significant role in structuring human documents. Italics, indentation, uppercase letters: the kinds of marks see a great deal of use. That said, even in programming languages as practiced, "stylistic rules" involving the use of space and font apply. You doubt me?

Lexical	Lexical prosodic marks use character functions to map one lexical element to another. For example, rendering clickbait headlines in uppercase letters would be a lexical prosodic mark.
Intonational	Pure intonational marks are paralexical, changing the rendering of marked logical unit by substituting different character glyphs. Many common font effects, such as size, boldness, or colour are pure intonations.
Positional	Positional intonational marks are non-lexical, using a local variation in the positioning of the marked logical unit relative to the normal positioning. If affects the attributes of the abstract geometric box into which the marked item is placed. There are several different kinds of positional intonations, depending on which attribute is being affected. (See below.)
Use Function	A function defined in terms of combinations of marks can be applied in a way that acts a lot like an intonation.

Boxes have a variety of properties that positional intonations may affect: their orientation, the direction of text flow, their internal and external alignments, their size, and the size, stretch, and shrink of their margins.

Reorientation	Reorientation changes the orientation of the text, for example, from horizontal to vertical. Reorientations are uncommon, although they may occur as fracture marks. The mathematical choice operator uses a reorientation to vertical, for example:
Redirection	Redirection changes the direction of the text flow, for example, from forwards to reverse. Boustrophedon writing can be analyzed as a redirection used as a fracture mark on the line, for example.
Realignment	Realignment may be internal, shifting the contents of a box with respect to its boundary, or external (reframing), shifting the contents of the box with respect to the box's neighbours. Realignments are defined by changes to the appropriate reference point. For example, matrix subscripts in `"M_ij"` are a lower reframing.
Repadding	Repadding is a change to the size or extensibility of the margins of the box containing the marked item. Such effects may be subtle if there is substantial stretch and shrink to make up for the difference. Indentation of the start of a paragraph is one example of repadding. The reduction in space between items in certain kinds of lists is another.
Reshaping	Reshaping is a change to the size and shape of the box containing the marked item. Expanding column sizes in a table to exactly fit the contents is a reshaping.

Relational

Relational marks are non-scriptal and therefore non-lexical. They work somewhat indirectly. For programming language notations, typically only ordering gets much use, although there are some that do rely on placement (with respect to lines).

Placement

Placement is the encapsulation of a logical group into a box in the abstract geometry. Placement can be into a simple box, into a named subbox, or into a box which has subboxes. The group will fill the box (or subbox), subject to fracture rules. Placement into a box with subboxes will fill each subbox in turn.

Placement is the basic binding to the abstract geometry, and is ubiquitous.

Ordering

Ordering defines the relative position of a logical unit with respect to its parent in the text flow. List operators have an intrinsic order, although in rare circumstances this may be perturbed.

The notational difference between a do loop and a while loop is, under this account, both a difference in keywords (adjoinments) but also a difference in ordering (condition before statement vs statement before condition).

Rebinding

Groups have an inherent cohesiveness, that comes into play when fracturing occurs. Adjoinments create groups. Each operator defines a group implicitly. Rebinding changes the relative cohesiveness of a group. Strengthening the binding of a group reduces the chance of it being broken due to constraints of the concrete geometry. Weakening the binding increases that strength.

Zeroing

Zeroing removes marks that would otherwise be present. This is an unexpected reversal of the norm, where marks are added to reflect a non-default situation. Zeroing therefore usually occurs in combination with the addition of some replacement mark.

Deletion

Deletion is the complete removal of a logical element from the presentation.

Full deletions typically involve specialized modes of presentation, for example, an outline mode where all by the section headers is deleted.

Cancellation

Cancellation suppresses some other mark.

For example, if all lists are rendered surrounded by parentheses, rendering an empty list as "null" requires the cancellation of these adjoinments. In XML, the empty element syntax "<i_am_empty/>" involves a cancellation of the normal start and end tag syntax.

Since we last were here

Much has changed since the development of the model described above: Unicode, the entire XML stack (XML, XSD11.1, XQuery31, XSL1.1, XSLT2.0), HTML (HTML4.01), CSS (CSS). While the model certainly covers much of the same territory, it comes from a very different community with very different concerns: syntax-directed program editing. There is a difference in emphasis: where the XML stack puts more emphasis on rendering concrete documents from structured documents, the program editors have always been more concerned with parsing at least fragments of concrete documents to get to the structure. This is not to say that syntax-directed editing of XML has not been a concern: it has, and there have been a number of commercial tools that do it. On the other hand, they do not typically concern themselves much with parsing concrete renderings to produce XML, but more with using XML constraints to guide a WYSIWYG presentation. The Muir language development environment, in part due to the more layout oriented features of the original target language, and in part due to the linguistic sensibilities of the participants, took a more expansive view of the scope of syntax-directed editor than other projects. It therefore has more to say about human documents in an XML context.

Lessons

Many of the lessons and insights from the model find their echo in these newer technologies and others could perhaps be applied to great benefit:

Separate the presentation from the structure.

As I type these words in an XML format in which font, layout, and indentation choices do not appear, it is clear this is not a novel observation. Even in the much more presentation-focused HTML world, a great deal of the presentation is usually separated into CSS rules. Setting aside ordering (and in some cases even that), a W3C XML Schema or Relax NG Schema can be seen as defining abstract syntax rules for a document. True, there is also a conventional concrete syntax for an unrendered structured document: the XML form. This makes the claim confusing. Is it not a concrete syntax specification, then? Where XML is the abstract syntax and the XML document is its rendering, yes. Where the formatted document of some specific kind is its rendering, no. The form of this document that would appear on the Balisage web site: this is the concrete syntax form of the document whose abstract syntax rules are defined by the Balisage tag set schema.
Run rules both ways.

Up-conversion from a concrete document to well-structured XML is a process of undoing the rules at all levels. In practice this also requires uncovering what those rules are in the first place. This is most of what makes it difficult. The other part that makes it difficult is notations that do not work well as a system, or that have a lot of ambiguity. Documents in these notations can be (and are) misinterpreted by human beings also, but human beings are more clever than programs, and more adept at bringing to bear common sense reasons to prefer one interpretation over another.

Nevertheless, I believe it is helpful to regard the problem of up-conversion as fundamentally a parsing problem combined (perhaps) with a rule discovery problem.
Language versioning is a form of language translation.

The entire vexed discussion of namespace versioning and XML schema versioning speaks agreement to this point. Try as you might to plan for it or minimize it, some changes to vocabularies are breaking changes, and must be treated in some ways as a new language.
Language versioning entails transformations and the application of new rules.

The same is true in the XML world, where (a concrete representation of) the structured form holds primary. Putting that document in a new language version means transforming that document. XSLT suits this purpose admirably.
Use an abstract syntax specification that distinguishes rules that would manifest in the structure from rules that allow better organization of the grammar itself.

Such mechanisms as DTD parameter entities, XML Schema named groups, type inheritance, and substitution groups and the abstract elements that go with them accomplish some of the same goals.
Separate the type of abstract syntax unit from its (named) role within its parent construct.

One could write XML Schemas using only local elements and named types to get close to this. The case relation (the role) would be captured by the local element name, and the structural kind would be captured by the type name. Unfortunately, the type name is not manifest in the abstract syntax representation, and following this pattern interferes with the ability to use substitution groups to provide for organizational classes. The idea that each piece of an abstract syntax instance has a manifest named role distinct from the name of that non-terminal is absent.

Where the XML stack needs to target components of a parent, it relies on parent child XPath match patterns perhaps with position counters (in XSLT) or CSS selectors (in CSS). Adding metadata through attributes can reclaim the distinction. CSS classes are often used to provide role information. Sometimes other attributes are used instead, or as well.

Should XML element names name kinds of things or roles of things within a larger entity? The debate rages, and vocabularies are inconsistent. Being able to consistently name both aspects would be helpful.
Separate concrete syntax into distinct mini-rules.

Both CSS and XSLT provide for the ability to define separate rules for pieces of a larger construct. This enables a great deal of their flexibility.
Presentation order relates closely to layout across space, and for non-lists is an aspect of concrete syntax.

In the XML stack, XSLT can be used to output in an order distinct from the underlying order in the abstract syntax, but ordering is clearly taken as part of the abstract syntax. Reordering is seen as a transformation effect, not a rendering effect.

Where order is fixed, a specific ordering conveys no information. Since the order is known in advance, additional presentation marks are not required to tell you which subcomponent is which. Where order is free, every specific order conveys specific information. Since the order could be anything, other presentational marks are required to keep subcomponents straight. As such, ordering is intrinsically bound up with other presentational devices, just as word order in natural language is intrinsically bound up with morphological devices.

Failing to treat it that way leads to adding pointless flexibility and complexity to content models, or pointless complexity and the need for transformations in order to render properly, to the detriment of all.
Bootstrap from a self-describing meta-grammar.

The W3C XML Schema for schemas is self-describing, but neither the stack as a whole nor other pieces of it are. While this is a handy property for getting systems off the ground, testing their efficacy, and for accommodating changes to them, it is by no means crucial.
Presentation widgets have a purpose: to convey specific abstract syntax relationships. Knowing the purpose of presentation widgets allows for analysis of a notation as a whole.

This is the key to designing better notations (by which I mean to include rendered documents in general) and to recover the structure of such concrete rendered documents.
Separate the abstract geometry from concrete geometry.

Various presentational effects can be analyzed as the fracturing of an abstract geometric space due to the constraints of the concrete geometry and the rendering of the content within its constraints.

Neither CSS or XSL-FO has a concept of abstract geometry, per se. Various common specific fracture situations need to be captured through specialized rules and properties. Given that these are common solutions to similar problems, it is by no means a bad thing that systems designed for humans with those problems take special note of them. Still, the concept of certain marks as responses to fractures can be a useful unifying principle for understanding concrete documents.
Extension of the operator/phylum grammar for the abstract syntax rules to allow for cross-classifications by more than one phylum, partial inheritance of structural components from phyla to operators, and the introduction of a special "unmarked" operator.

W3C XML Schema complex types and multiple substitution groups (in 1.1) capture many of the same instincts.
A taxonomy of concrete syntax functions.

Understanding CSS or XSLT stylesheet rules in terms of their functional role can serve to clarify how to organize them. In the context of developing a vocabulary and thinking about the rules for rendering it, considering marking functions can help in clarifying what the underlying vocabulary needs to distinguish, and what metadata may need to be added.
A taxonomy of concrete syntax mechanisms.

There sometimes seem to be an endless sea of possible presentation widgets in the world. A perusal of all the CSS or XSL-FO properties is mind-numbing. It can be helpful to see that in general a notation picks similar kinds of devices to convey parallel functions: if adjoinment is used to mark one case relation, it will be used to mark another. Thinking of a notation in such a holistic way, as a system of rules, can help in producing better and more consistent notations (or document renderings). It can also help guide rule inference. Knowing that larger scale components are marked with larger scale marks — more space, larger fonts, bolded text — one can infer something about the rules used to mark different levels of subdivision, and from this recover the structure of a document from its concrete form.
Use of case frames and marking theory as a inspiration for some aspects of the model.

Marking theory teaches us that marks indicating more specific or unusual entities will be more elaborate than marks indicating more common situations. It followed that mark cancellations apply in the same direction: marks for more specific functions cancel marks for more generic ones, in order that the mark we end up with is the more specific one. This principal finds an echo in the rules about template selection in XSLT and selector selection in CSS. Specificity wins. Case relations (parent/child) win over type relations (bare element names). Labels (ids) win over case relations. XSLT and CSS obviously have more elaborated sets of relations expressed in their selectors, however.

Applying the same perspective to document up-conversion allows us to make inferences about structural relationships. If a certain kind of mark (a bolded adjoinment with a colon separator, perhaps) indicates a case relation in one instance, it likely does in another as well. A very different kind of mark likely indicates that the component stands in a different logical relationship: part of a different parent group entirely.

Designing a Notation: A Small Exercise

Let us conduct a small thought experiment: let us design a notation, say, the price sheet for Mary's House of Excellent Jams and Jellies, applying these insights.

Our task list looks something like this:

Define the abstract syntax: what are the types? the case frames? the unique and referenced entities?

I have a price sheet. It has information about the store and a collection of items for sale. The store information includes a name, a description, various kinds of contact information (physical address, email address, phone number). Those items for sale come in groups, where each group has a label and a description. The items have a name, a description, and a price.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:this="http://mathling.com/price-sheet"
           targetNamespace="http://mathling.com/price-sheet"
           elementFormDefault="qualified">

<xs:element name="price-sheet" type="this:price-sheet"/>

<xs:complexType name="price-sheet">
  <xs:sequence>
    <xs:element name="store-info" type="this:store-info"/>
    <xs:element name="groups" type="this:item-group-list"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="store-info">
  <xs:sequence>
    <xs:element name="name" type="this:main-label"/>
    <xs:element name="description" type="this:para-list"/>
    <xs:element name="contact-info" type="this:contact-info"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="item-group-list">
  <xs:sequence>
    <xs:element name="group" type="this:item-group" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="item-group">
  <xs:sequence>
    <xs:element name="title" type="this:section-label"/>
    <xs:element name="description" type="this:para-list"/>
    <xs:element name="items" type="this:item-list"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="item-list">
  <xs:sequence>
    <xs:element name="item" type="this:item" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="item">
  <xs:sequence>
    <xs:element name="name" type="this:short-label"/>
    <xs:element name="description" type="this:para-list"/>
    <xs:element name="price" type="this:price"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="contact-info">
  <xs:sequence>
    <xs:element name="address" type="xs:string"/>
    <xs:element name="email" type="xs:string"/>
    <xs:element name="phone" type="xs:string"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="para-list">  
  <xs:sequence>
    <xs:element name="para" type="this:para" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:simpleType name="short-label">
  <xs:restriction base="xs:string"/>
</xs:simpleType>

<xs:simpleType name="main-label">
  <xs:restriction base="this:short-label"/>
</xs:simpleType>

<xs:simpleType name="section-label">
  <xs:restriction base="this:short-label"/>
</xs:simpleType>

<xs:simpleType name="para">
  <xs:restriction base="xs:string"/>
</xs:simpleType>

<xs:simpleType name="price">
  <xs:restriction base="xs:decimal"/>
</xs:simpleType>

</xs:schema>

Here I have followed a convention of using named types and local elements, with lists x elements defined by a x-list type. Complex types are defined using xs:sequence but this ordering matters not to my notation, but to an XML parser processing an XML document that satisfies this schema. In fact, since ordering does not matter at the abstract syntax level, fixing the order at that level is fine.

<price-sheet xmlns="http://mathling.com/price-sheet">
<store-info>
  <name>Mary's House of Excellent Jam and Jellies</name>
  <description><para>We have been making artisanal jams and jellies from all-natural organic ingredients since 1998.</para></description>
  <contact-info>
    <address>111 Any Road, Some Town, USA</address>
    <email>nonesuch@example.com</email>
    <phone>408-555-1212</phone>
  </contact-info>
</store-info>
<groups>
  <group>
    <title>Jams</title>
    <description>
      <para>Jams are filled with delicious organic fruit.</para>
      <para>They retain more of fruit pulp than jellies.</para>
    </description>
    <items>
      <item>
        <name>Golden Summer</name>
        <description><para>Yellow plums, habañero, meyer lemons. Excellent with brie.</para></description>
        <price>5.25</price>
      </item>
      <item>
        <name>Christmas Jam</name>
        <description><para>Cranberries, oranges, ginger, and cinnamon. Have it with your turkey!</para></description>
        <price>6.00</price>
      </item>
    </items>
  </group>
  <group>
    <title>Jellies</title>
    <description><para>Jellies are strained and no longer contain fruit pulp.</para></description>
    <items>
      <item>
        <name>Hot Quince</name>
        <description><para>Quince and ghost pepper. Sweet heat!</para></description>
        <price>6.25</price>
      </item>
      <item>
        <name>Purple Bliss</name>
        <description><para>Pomegranate and blackberry. Pure decadence!</para></description>
        <price>10.25</price>
      </item>
    </items>
  </group>
  <group>
    <title>Chutneys</title>
    <description><para>Chutneys balance sweet, spice, and savoury.</para></description>
    <items>
      <item>
        <name>Classic Indian Chutney</name>
        <description><para>An exciting blend of fruits and spices. Water chestnuts add crunch. You'll eat it with a spoon!</para></description>
        <price>5.25</price>
      </item>
    </items>
  </group>
</groups>
</price-sheet>

Define the abstract geometry: what are the spaces? how do they compose? what are their properties?

I have a sheet, which may have multiple pages, each of which has a header space, a body space, and a footer space. The body space has columns, which consist of lines. Expressing this in the context XML/HTML stack is a little tricky, because neither XSL FO nor HTML/CSS make a distinction between abstract and concrete geometry and various common document devices (headers, footers, columns, lines) are treated specially and asymmetrically. We could sketch this out as an HTML template:

<html>
<body>
  <div class="header">
  </div>
  <div class="body">
    <div class="column">
    </div>
  </div>
  <div class="footer">
  </div>
</body>
</html>

With some basic CSS:

/* No "page" object to specify: use @page */
@page {
  height: 11in;
  width: 8.5in;
}

.header {
  line-height: 15pt;
  height: 2in
}

.footer {
  line-height: 15pt;
  height: 1in
}

/* No "column" object to specify: use column properties */
/* Note: to get this working in real browsers need more here */
.body {
  column-count: 2;
  column-gap: 1in
}

/* No "line" object to specify: put properties as document default */
html {
  line-height: 15pt
}

Or as a skeleton XSL FO:

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
  <fo:layout-master-set>
    <fo:simple-page-master master-name="sheet" 
        page-height="11in" page-width="8.5in">
      <fo:region-body column-count="2"/>
      <fo:region-before extent="2in"/>
      <fo:region-after extent="1in"/>
    </fo:simple-page-master>
  </fo:layout-master-set>

  <fo:page-sequence master-reference="sheet">
    <fo:static-content flow-name="xsl-region-before">
      <fo:block line-height="15pt"/>
    </fo:static-content>

    <fo:static-content flow-name="xsl-region-after">
      <fo:block line-height="15pt"/>
    </fo:static-content>

    <fo:flow flow-name="xsl-region-body">
      <fo:block line-height="15pt"/>
    </fo:flow>
  </fo:page-sequence>
</fo:root>

We could capture the full abstract geometry model for design purposes in another XML Schema with extensions. It could then be used to generate HTML or XSL FO with the full concrete syntax:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:this="http://mathling.com/sheet/geometry"
           xmlns:ag="http://mathling.com/abstrct-geometry"
           targetNamespace="http://mathling.com/sheet/geometry"
           elementFormDefault="qualified">

<xs:element name="sheet" type="this:sheet"/>

<xs:complexType name="sheet">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>depthwards</ag:orientation>
      <ag:direction>forwards</ag:direction>
      <ag:extent thickness="1layer+INF"/>
    </xs:appinfo>
  </xs:annotation>
  <xs:sequence>
    <xs:element name="page" type="this:page" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="page">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>vertical</ag:orientation>
      <ag:direction>forwards</ag:direction>
      <ag:extent width="8.5in" height="11in" thickness="1layer"/>
    </xs:appinfo>
  </xs:annotation>
  <xs:sequence>
    <xs:element name="header" type="this:header"/>
    <xs:element name="body" type="this:body"/>
    <xs:element name="footer" type="this:footer"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="body">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>horizontal</ag:orientation>
      <ag:direction>forwards</ag:direction>
    </xs:appinfo>
  </xs:annotation>
  <xs:sequence>
    <xs:element name="column" type="this:column" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="column">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>vertical</ag:orientation>
      <ag:orientation>reverse</ag:orientation>
    </xs:appinfo>
  </xs:annotation>
  <xs:sequence>
    <xs:element name="line" type="this:line" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:simpleType name="header">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>horizonal</ag:orientation>
      <ag:direction>forwards</ag:direction>
      <ag:extent height="2in"/>
    </xs:appinfo>
  </xs:annotation>
  <xs:restriction base="xs:string"/>
</xs:simpleType>

<xs:simpleType name="footer">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>horizonal</ag:orientation>
      <ag:direction>forwards</ag:direction>
      <ag:extent height="1in"/>
    </xs:appinfo>
  </xs:annotation>
  <xs:restriction base="xs:string"/>
</xs:simpleType>

<xs:simpleType name="line">
  <xs:annotation>
    <xs:appinfo>
      <ag:orientation>horizonal</ag:orientation>
      <ag:direction>forwards</ag:direction>
      <ag:extent height="15pt"/>
    </xs:appinfo>
  </xs:annotation>
  <xs:restriction base="xs:string"/>
</xs:simpleType>

</xs:schema>

Define the concrete syntax: how do we choose to mark the types? the case relations? the unique and referenced entities? how do we choose to bind the abstract components to space? how shall we handle breaks?

Start with the type marks. Think of this as "which entities to I wish to mark in a way that makes them distinct from other kinds of entities?" Let us say we want to indicate the store info with surrounding box, and descriptions with italics, prices with a dollar sign.

Case marks are next. In technical notations or data tables literals are common. In human prose ordering comes into play. Here we decide that the email, and phone components of the contact info should be indicated with some text marks, and we will fix the ordering of the heterogeneous children of all components.

Are their labels or cross-references? How should they be marked? The short labels (variously with the case 'name' or 'title') will function as labels and be marked with bold, centered text, in decreasing sizes depending on the scope of the label.

Finally, let us consider grouping and separators. Certainly we will use whitespace to separate items in lists (group from group, paragraph from paragraph, etc.) with larger scale units separated by larger amounts of space. We also decide to separate the groups with horizontal rules. For grouping we determine which components are bound to which boxes: the store info to the header, the groups to the body. Finally, the fracturing rules are conventional: filling up a line overflows creates a new line, filling up a column creates a new column, filling up a page creates a new page and replicates the header.

Concrete syntax rules can be representing with a combination of XSLT and CSS. The XSLT maps the XML to something CSS can target and applies ordering rules.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ps="http://mathling.com/price-sheet"
                exclude-result-prefixes="xsl ps"
                version="2.0">

  <xsl:output method="html" indent="yes"/>

  <xsl:import-schema namespace="http://mathling.com/price-sheet" schema-location="jelly.xsd"/>

  <!-- Affective -->

  <xsl:template match="/">
    <html>
      <head>
        <link rel="stylesheet" type="text/css" href="jellyh.css"/>
      </head>
      <body>
        <xsl:apply-templates select="." mode="grouping"/>
      </body>
    </html>
  </xsl:template>

  <!-- Labels -->

  <xsl:template match="element(*,ps:short-label)" mode="labels"> 
    <span id="{generate-id()}"><xsl:apply-templates select="@*|node()" mode="grouping"/></span>
  </xsl:template>

  <xsl:template match="*" mode="labels"> 
   <xsl:apply-templates select="@*|node()" mode="grouping"/>
  </xsl:template>

  <!-- Types -->

  <xsl:template match="element(*,ps:price-sheet)" mode="type">
    <div class="price-sheet"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:store-info)" mode="type">
    <div class="store-info"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:item-group)" mode="type">
    <div class="group"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:item)" mode="type">
    <div class="item"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:contact-info)" mode="type">
    <div class="contact-info"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:para)" mode="type">
    <p class="para"><xsl:apply-templates select="." mode="jointcase"/></p>
  </xsl:template>

  <xsl:template match="element(*,ps:short-label)" mode="type">
    <div class="label"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:main-label)" mode="type" priority="3">
    <div class="main-label"><xsl:next-match/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:section-label)" mode="type" priority="3">
    <div class="section-label"><xsl:next-match/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:price)" mode="type">
    <div class="price"><xsl:apply-templates select="." mode="jointcase"/></div>
  </xsl:template>

  <xsl:template match="*" mode="type"> 
   <xsl:apply-templates select="." mode="jointcase"/>
  </xsl:template>

  <!-- Case -->

  <xsl:template match="element(*,ps:price-sheet)" mode="jointcase">
    <xsl:apply-templates select="ps:store-info" mode="grouping"/>
    <xsl:apply-templates select="ps:groups" mode="grouping"/>
  </xsl:template>

  <xsl:template match="element(*,ps:store-info)" mode="jointcase">
    <xsl:apply-templates select="ps:name" mode="grouping"/>
    <xsl:apply-templates select="ps:description" mode="grouping"/>
    <xsl:apply-templates select="ps:contact-info" mode="grouping"/>
  </xsl:template>

  <xsl:template match="element(*,ps:contact-info)" mode="jointcase">
    <xsl:apply-templates select="ps:address" mode="grouping"/>
    <xsl:apply-templates select="ps:email" mode="grouping"/>
    <xsl:apply-templates select="ps:phone" mode="grouping"/>
  </xsl:template>

  <xsl:template match="element(*,ps:item-group)" mode="jointcase">
    <xsl:apply-templates select="ps:title" mode="grouping"/>
    <xsl:apply-templates select="ps:description" mode="grouping"/>
    <xsl:apply-templates select="ps:items" mode="grouping"/>
  </xsl:template>

  <xsl:template match="element(*,ps:item)" mode="jointcase">
    <xsl:apply-templates select="ps:name" mode="grouping"/>
    <xsl:apply-templates select="ps:price" mode="grouping"/>
    <xsl:apply-templates select="ps:description" mode="grouping"/>
  </xsl:template>

  <xsl:template match="*" mode="jointcase">
    <xsl:apply-templates select="." mode="labels"/>
  </xsl:template>

  <xsl:template match="ps:address" mode="case">
    <span class="address"><xsl:apply-templates select="." mode="type"/></span>
  </xsl:template>

  <xsl:template match="ps:email" mode="case">
    <span class="email"><xsl:apply-templates select="." mode="type"/></span>
  </xsl:template>

  <xsl:template match="ps:phone" mode="case">
    <span class="phone"><xsl:apply-templates select="." mode="type"/></span>
  </xsl:template>

  <xsl:template match="ps:name" mode="case">
    <span class="name"><xsl:apply-templates select="." mode="type"/></span>
  </xsl:template>

  <xsl:template match="ps:price" mode="case">
    <span class="pricechild"><xsl:apply-templates select="." mode="type"/></span>
  </xsl:template>

  <xsl:template match="ps:description" mode="case">
    <span class="description"><xsl:apply-templates select="." mode="type"/></span>
  </xsl:template>

  <xsl:template match="*" mode="case">
    <xsl:apply-templates select="." mode="type"/>
  </xsl:template>

  <!-- Grouping -->


  <xsl:template match="element(*,ps:store-info)" mode="grouping">
    <div class="header"><xsl:apply-templates select="." mode="separators"/></div>
  </xsl:template>

  <xsl:template match="element(*,ps:item-group-list)" mode="grouping">
    <div class="body"><xsl:apply-templates select="." mode="separators"/></div>
  </xsl:template>

  <xsl:template match="*" mode="grouping">
    <xsl:apply-templates select="." mode="separators"/>
  </xsl:template>

  <!-- Separators -->

  <xsl:template match="element(*,ps:item-group-list)" mode="separators">
    <div class="grouplist"><xsl:apply-templates select="." mode="case"/></div>
  </xsl:template>

  <xsl:template match="*" mode="separators">
    <xsl:apply-templates select="." mode="case"/>
  </xsl:template>
</xsl:stylesheet>

The general strategy here is to use modes to properly order the different kinds of marks from the least to most specific: affective marks before coherency marks before structural marks before identifying marks. We order case marks before type marks to ensure that the children of a grouping element have class names with the case labels. Joint case marks (ordering, principally) are separated from other case marks. We also need to use <xsl:next-match/> to march up the type hierarchy and make sure we get type and subtype marks. Priority is necessary to ensure we match subtypes first. For type marks we use match patterns that target those types.

Once we have the XSLT generating HTML with proper class attributes, the CSS applies the other marking mechanisms.

/* Affective */
.price-sheet              { font-family: "Comic Sans"; 
                            font-size: 12pt;
                            text-align: left }

/* Labels */

.label                    { font-weight: bold }
.main-label               { font-size: 20pt }
.section-label            { font-size: 16pt }

/* Types */

.store-info               { outline: 1px solid black }
.group                    { }
.item                     { }
.contact-info             { text-align: center }
.description              { font-style: italic }
.price:before             { content: "$" }

/* 
 * Here we see how CSS uses selectors both to target items in the abstract
 * syntax and (:before) as part of the definition of the concrete syntax
 * widget.
 */

/* Case */

.email:before             { content: "Email: " }
.phone:before             { content: "Tel.: " }

/* Separators */

.group + .group           { padding-bottom: 0.5in }
.item + .item             { padding-top: 0.25in }
.para + .para             { padding-top: 3px }

.store-info > .name       { display: block }
.store-info > .description  { display: block }
.store-info > .contact-info { display: block }
.group > *                { padding-right: 0.25in; display: block }
.grouplist > *            { border-bottom: 1px solid black }
.item > .name             {}
.item > .description      { display: block }
.item > .pricechild       { padding-left: 0.25in }
.contact-info > .address  { display: block }
.contact-info > .email    { display: block }
.contact-info > .phone    { display: block }

/* Grouping */

.store-info               { display: block }
.group                    { display: block }
.item                     { display: block }

/* Mechanics */

div                       { display: inline }

Define the concrete geometry: how big is the paper? how much space do we allocate for each part?

In an HTML plus CSS world, this comes down to @media rules for printed output, perhaps with exact positioning. This is the level XSL FO plays at, for the most part. It is also where this model gets tricky to apply directly, because the explicit separation of abstract and concrete geometry is not how rendering is conceptualized and the conventionalized rules (e.g. flowing from line to line and page to page) are not something we have direct control over, so there is nothing to add.
Iterate with actual or sample content until satisfied

Let's take a moment to look at the design of this notation as a system: where are we using similar kinds of devices for marking and where are we using different kinds of devices? Where will these choices create a harmonious and easy to understand form, and where might they be confusing or awkward?

The first thing to note is that our type marks are all over the map: we have lexical intonations (contact-info and description), linings (store-info), adjoinments (price), and no marking at all (group and item). Not having any marking at all is not a problem per se: it tells us that these types are the "normal" or "expected" type against which other types contrast. Is that our understanding of our little price sheet? Yes.

The case marks are mostly ordering, with a couple of adjoinments. The fact that we also use adjoinment for just one of the type marks is suggestive: perhaps we are analyzing our notation incorrectly, or leading readers to analyze it incorrectly: perhaps the email and phone marks are type marks, not case marks. Or perhaps the price mark is a case mark, not a type mark. Either way, it is suspicious to have components that function similarly at one level treated differently at another.

Moving on to the separators, we spot another anomaly: the use of lining as a separator for components of a group list. This is the only separator using this kind of mechanism, and the only other place such a mechanism is used is for the store information type mark. So again we ask the question: are we misanalyzing our notation, or leading readers to misinterpret it? Or creating ugliness?

In this small example, these small inconsistencies are unlikely to cause any problems. Indeed, there are small inconsistencies in any notation and it is a fool's errand to try to purge them entirely. However, inconsistencies like this can cause real problems in real notations. Pascal suffers from the inconsistent use of semi-colon adjoinment to indicate sometimes a separator and sometimes to bind a type as well as the inconsistent use of semi-colon versus comma as a separator. Ripley78 shows that programmers make disproportionate number of syntax errors in these areas.

Similarly with Java, which uses semicolons to terminate most (but not all) declarations and statements and sometimes as a separator, and which uses small syntactic differences to mark large semantic ones. One large scale study of Java novices Altadmri15 found that the most frequent syntax errors, after unbalanced parentheses, involve confusing doubled symbols with single symbols ("==" vs "=", "||" vs "|" etc.), adding extraneous semicolons where they don't belong (after condition in conditional statement, after signature in a function declaration), and confusing function call syntax with function declaration or method use.

Summary

When presented with a concrete document, we use the various presentation devices and widgets to recover the deep structure of the document: the hierarchy of organization, the components and their relationships to one another within and across that hierarchy, their classifications. We recover the relationship of the document as a whole to a genre of similar documents. That is the fundamental purpose of those presentation devices. By having an account of purpose of them, we can begin to see how they work together in a notational system and begin to understand how to recover deep structure from concrete presentation.

Going the other direction, having an account of the organization of notations at different levels can drive methodologies for developing notations, be they technical notations or more general styled documents.

References

[Altadmri15] Altadmri, Amjad and Brown, Neil C.C. (2015) 37 Million Compilations: Investigating Novice Programming Mistakes in Large-Scale Student Data. In: SIGCSE '15: The 46th SIGCSE technical symposium on Computer science education, 4th - 7th March 2015, Kansas City, Missouri. doi:https://doi.org/10.1145/2676723.2677258.

[CSS] W3C: Tab Atkins Jr., Elika J. Etemad, Florian Rivoal, editors. CSS Snapshot 2017 Working Group Note. W3C, 31 January 2017 http://www.w3.org/TR/css-2017/

[XSL1.1] W3C: Anders Berglund, editor. XSL Transformations (XSLT) Version 2.0 Recommendation. W3C, 05 December 2006. http://www.w3.org/TR/xsl11/

[XML] W3C: Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, editors. Extensible Markup Language (XML) 1.0 (Fifth Edition) Recommendation. W3C, 26 November 2008, http://www.w3.org/TR/xml/

[Donzeau-Gouge80] Veronique Donzeau-Gouge, Gerard Heut, Gilles Kahn, and Bernard Lang. Programming Environments based on Structured Editors: The Mentor Experience Rapports de Recherche 26, INRIA, July 1980.

[XSD11.1] W3C: Shudi (Sandy) Gao 高殊镝, C.M. Sperberg-McQueen, and Henry S. Thompson, editors. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures Recommendation. W3C, April 2012. http://www.w3.org/TR/xmlschema-11-1/

[Holstege89] Holstege, Mary. Marking and the Design of Notations, PhD thesis, Stanford University, Department of Computer Science, Stanford, CA 94305, June 1989. Report No. STAN-CS-89-1270.

[Holstege87] Holstege, Mary. The Meta Grammar for the Muir System. Informal note IN-CSLI-87-7, Center for the Study of Language and Information, March, 1987.

[XSLT2.0] W3C: Michael Kay, editor. XSL Transformations (XSLT) Version 2.0 Recommendation. W3C, 23 January 2007. http://www.w3.org/TR/xslt20/

[Normark87] Normark, Kurt. Transformation and Abstract Presentations in a Language Development Environment, PhD thesis, Aarhus University, 1987. Published also as informal note IN-CSLI-87-9, Center for the Study of Language and Information.

[Notkin86] Sharing and Modularization in Structure Editing Environments. In proceedings of the 19th Annual Hawaii International Conference on System Science, Volume II: Software, pages 567-575. 1986.

[HTML4.01] W3C: Dave Raggett, Arnaud Le Hors, Ian Jacobs, editors. HTML 4.01 Specification Recommendation. W3C, 24 December 1999. http://www.w3.org/TR/html4/

[Ripley78] G.D. Ripley and F.C. Druseikis. A Statistical Analysis of Syntax Errors Computer Languages 3:227-240, 1978. doi:https://doi.org/10.1016/0096-0551(78)90041-3.

[XQuery31] W3C: Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, editors. XQuery 3.0: An XML Query Language Recommendation. W3C, 21 March 2017 http://www.w3.org/TR/xquery-31/

[Winograd87] Winograd, Terry. Muir: A Tool for Language Design Technical Report CSLI-87-81, Center for the Study of Language and Information, March 1987.

Mary Holstege

Principal Engineer

MarkLogic Corporation

`<mary.holstege@marklogic.com>`

Mary Holstege is Principal Engineer at MarkLogic Corporation. She has over 25 years experience as a software engineer in and around markup technologies and information extraction. She holds a Ph.D. from Stanford University in Computer Science, for a thesis on document representation.

BalisageThe Markup Conference

Balisage Paper: The Concrete Syntax of Documents: Purpose and Variety

Mary Holstege

`<mary.holstege@marklogic.com>`

Table of Contents

Introduction

Model Overview

Taxonomy of Marking Functions

Identifying

Structural

Coherency

Affective

Taxonomy of Marking Mechanisms

Punctive

Prosodic

Relational

Zeroing

Since we last were here

Lessons

Designing a Notation: A Small Exercise

Summary

References

`<mary.holstege@marklogic.com>`

Balisage Series on Markup Technologies