How to cite this paper

White, David. “Smart Content for High-Value Communications.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). https://doi.org/10.4242/BalisageVol15.White01.

Balisage: The Markup Conference 2015
August 11 - 14, 2015

Balisage Paper: Smart Content for High-Value Communications

David White

CTO

Quark Software Inc.

Dave White has been with Quark Software Inc. for 7 years and is currently CTO. Dave has been in the XML (and SGML) authoring and publishing software business since 1994, including 13 years at Arbortext in a variety of sales, product management, and business development roles.

Copyright © 2015 Quark Software Inc.

Abstract

14 years after the original XML specification reached recommendation status and more than 30 years since SGML solutions had proven the rich value and significant return on investment for technical documentation, there is still a relatively low number of XML-based publishing system deployments for non-technical, high-value communications. Even though marketing departments, product managers, and enterprise publishing departments face similar challenges as those that documentation departments have addressed, the value of automated publishing from structured content has eluded these additional audiences. For these teams of non-technical, subject matter experts and supporting communications departments, there continue to be too many roadblocks on the value path to an XML-based dynamic publishing solution. Quark's Smart Content methodology and RNG schema is meant to address the needs of non-technical communicators with a rethinking of the fundamental differences required to allow this new user base to join the dynamic publishing community.

Table of Contents

Introduction
Purpose
XML Authoring Usability: Restrictive Content Models for Use of Blocks and Inlines
XML Authoring Usability: Archetypes
XML Consistency: The use of Metadata beyond XML Attributes
Controlling Order and Occurrence Validation by Type Attribute Value
Authoring Usability: Component Reuse
An Overview of the Smart Content Schema
Formal Groups of Blocks: <section>
Blocks: <p>
Inlines: <tag>
Miscellaneous Elements: <table>, <image>, <video>, <xref>, lists, emphasis, etc.
Metadata: <meta>
Under Consideration: Block Combinations and Simple Block Combinations: <bodyDiv> and <simpleBodyDiv>
Summary
Smart Content Schema Sample

Introduction

There are two fundamental areas that must be addressed to attract a non-technical communications market to XML-based Dynamic Publishing: the usability of authoring in semantically-rich XML; and the ability of an XML publishing engine to address the needs of content types that do not easily fit into the design constraints of documentation and reports. Publishing engine constraints can only be solved by the software engineers that build these engines. The XML schema for document input is generally orthogonal to the types of formatting features a publishing engine can support. Therefore, Smart Content primarily focused on addressing authoring usability and, similar to DITA, can simplify the implementation of a solution by providing a well thought-out base from which to start an implementation.

There are many non-techdoc content types that could be well served by an XML-based publishing system. Both content types share many aspects, including but certainly not limited to the following:

  • content volume

  • publishing frequency of both new and revised documents

  • information sets with differences specific to a particular audience

  • language translation needs

  • opportunities to reuse or repurpose content components across different publications

We have seen successful implementations in Financial Services for investment research reports, fund fact sheets, ratings reports, and insurance guidelines; in government for the support and development of laws; in manufacturing for product marketing materials; and across many industries for standard operating procedures. Many of these content types also happen to be simpler than techpubs documents in the sense that they have fewer block and inline markup types and often have less need for complex and restrictive content models. However, they may have as much, more, or at least very different requirements for presentation style on output.

One of the strongest characteristics shared across all of these content types is that the content authors have a primary role in their company which is not authoring. They are subject matter experts; they are often not technical-minded (in a software technology sense); authoring is -at best- secondary to their business function; and the frequency for which they author may vary widely from once a year to a few hours a day.

These authors have traditionally used MS Word as their primary content creation tool. Some may have used on-line tools such as Google Docs. Common to all of these word processing applications is the freeform and style-driven nature of the user experience. Write a sentence, apply a named-style or build up a style from a formatting UI, add a table or graphic. There are almost no rules except the limitations of the features available within the tool. Want to apply a Chapter Title style to a sentence with an inline Table? Sure, why not. Hand-crafted use of style is easy and inviting. However, anyone in the XML-content community knows why this is a problem, and it's the same problem when one attempts to deploy automation to any hand-crafted process: Garbage-in, Garbage-out. There is no consistent and reliable way to deploy automation without control and structure of the inputs.

Authoring with structure and validation adds intelligence to content and enables powerful automation typically in the form of multi-channel publishing. Publishing automation benefits include efficiencies associated with the “single source of truth”, re-use, and repurposing---all of which contribute to faster time-to-market. Automation also provides significant quality improvements like improved reading comprehension, style consistency, and higher message relevance to a particular audience.

XML has been widely deployed for a variety of content and document applications with different purposes. Even MS Word uses Office Open XML as the file format for a variety of MS Office applications. As anyone that tries to automate publishing from MS Word files can explain, XML isn't the complete answer. The answer for "how to automate content processes" has and continues to be the deployment of a rules-driven, semantic-XML content process where order and occurrence as well as meaning and purpose are clearly defined within the content at a very fine-grained level -even down to a single character if required. Semantic XML enables a software program to automatically use, filter, index, or transform the enriched, XML source to many outputs for multiple uses with high-speed, high-quality, high-value, and low risk of failures.

If semantic, structured XML is the input, then authoring directly in XML is the most direct path to success. But converting an audience of non-technical, occasional authors from a free-form, style-based word processing tool to using a complex, rules-driven XML authoring tool is a very big challenge. The problem, defined through consistent feedback from authors who have tried to make the transition, is usability. XML rules constrict the author within new and fine-grained boundaries where these authors have never previously experienced limitations.

Purpose

The goal of presenting Quark's Smart Content model is to gauge the interest in our methodology and ultimately to see if there is community support to initiate a public standardization effort.

XML Authoring Usability: Restrictive Content Models for Use of Blocks and Inlines

If automation provides much of the value for content processes, and automation requires structured and validated input to be successful, then rules must be enforced at the content authoring stage. A close analogy is the value of validated HTML forms for data input which provide the user instant feedback while they type data into a field. However, writing long-form prose is quite different than entering data into a form, so the nature of how the rules are expressed has to be considerably different.

Most XML authoring usability issues are caused by some form of "hidden rule." A definition for "hidden rule" is any restriction or boundary that the software enforces but in order to understand the restriction -or even know that one exists- the author must read documentation, manipulate one or more user interface widgets, and/or try an action only to have the action fail. XML schema for content authoring have many structures that create hidden rules. One very significant difference between XML and free-form word processing is the use of containment and nested hierarchy which is quite foreign to non-technical authors. Most XML authoring tools minimize the impact of containment by deploying structure-tree UIs such as a table of contents for section/topic/chapter type divisions. Depending on the level of markup included in a structure-tree view (sections only, all markup, user configurable), a non-technical author may still find the usability challenging. Also a structure-tree view may not solve the problem for all containment contexts, for example identifying a run of multiple paragraphs as a sidebar or callout.

As a result, even authoring in the best, long-standing XML-authoring applications is easiest if the "tags-on view" is used, because this gives immediate visual context to the structure of the markup used in the document and provides hints to the rules the user can/should expect -though that knowledge still requires repeated experience with the system to truly be internalized. Tags-on view only partially helps but does not solve the full problem of hidden rules, and the extra visual distraction caused by the display of markup boundaries is too foreign and unpleasant. As soon as tags-on view is a requirement to improve usability, non-technical authors rebel. For additional insight into the challenges and opportunities for authoring usability improvement see Flynn, Peter. “Could authors really write in XML one day?” Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:10.4242/BalisageVol10.Flynn02.

Free-form word processing tools have very few hidden rules because nearly any content is allowed in nearly any location within a document. When an author hits one of the few hidden rules that do exist, the experience is unpleasant and frustrating. In MS Word for example, when placing the cursor at the end of a pre-existing run of text with a bold emphasis, the user is never quite sure if typing more characters are going to inherit the bold styling or not. There appear to be several hidden rules controlling emphasis application that depend on whether the entire paragraph is selected and how the cursor was placed (mouse click, arrow from within bold region, arrow from outside of bold region, directly next to non-space character, with space character between bold and cursor, etc.). MS Word has other hidden rules when using tab-space, multi-column flows, and nested and outline lists, to name a few.

Hidden rules in XML authoring are problematic for some very common and frequently used actions: moving and reordering content through cut/copy/paste and drag/drop, adding structural content such as section/title, and adding specific types of content such as lists, tables, and multi-media. The usability problems caused by hidden rules are often exacerbated by the over specification of order and occurrence rules at the block and inline markup level in the XML schema definition.

One simple example of over specification is the restriction of inline markup within Title elements. XML makes it easy to define a schema which limits the markup that can be used within a Title. Within the same XML document type a schema may allow for the use of inlines within a paragraph such as keyword, bold, italic, underline, company, name, trademark, location, and possibly more. But often an XML schema developer may come to the conclusion that Title elements should not contain any of these elements because the output formatting will ignore them. By extension, a developer may choose to avoid tempting an author to use these elements in the first place by excluding them from the content model definition for Title.

Take the following simple example of a title and a paragraph:

	<title>How to Make</title>
	<para>Begin with the ingredients from the <keyword>Thanksgiving Recipe</keyword>.</para>
			

If the user selects and copies the phrase ‘the <keyword>Thanksgiving Recipe </keyword>.’ and pastes that after 'Make' in the <title> then the authoring tool might block that paste, because the controlling schema doesn’t allow <keyword> inside a <title> element. That’s frustrating, and worse, the reason for the failed paste is often hidden from the user - they can’t figure out why it’s blocked so they think the tool is broken.

Of course a trained, full-time technical author would have a good idea what happened, would turn on “show tags” (actually they probably started work with tags being displayed) in their tool of choice, and only select the text they wanted - skipping the keyword tag. This is a simple example, but many similar use cases exist. It’s a problem the Quark team refers to as “gross-edits,” and is a significant issue when it blocks a business user from authoring with the ease with which they are used in a word processing tool. That ease-of-use -even the openness to the adoption of a structured authoring tool- is predicated on NOT showing the XML structure in an XML way.

By limiting the content model of Title, the well-meaning XML schema developer has just created a new hidden rule -only discoverable by the author while having the cursor in a Title element and using the insert inline markup UI widget. The author will also bump against this hidden rule when trying to paste text copied from a para that contains one or more forbidden inline markups, and problematically, many XML authoring tools will just not allow the paste nor provide any meaningful feedback to the author on why the action was canceled [note: it is also of high value to have the authoring tools improve the amount of feedback provided for these types of conditions]. Alternatively, if the schema allows the inline markup within Title, the author may not get the results they expect when that inline markup is ignored during output transformations. This is certainly a trade-off and one that requires careful consideration. Our experience in these cases: improving usability by reducing user steps has the highest value for authors, as the ultimate success of any tool lies in its adoption.

This is just one example. There are many other contexts where the restriction of content models in the XML schema create new hidden rules which may seem logical and helpful, but actually create more authoring usability problems than the value the restrictions may offer. While modifying the authoring tool(s) to improve the user experience in these use cases could improve usability to some degree, to do so would require the tool to provide the user more feedback about the underlying markup and content model restrictions with choices to resolve the copy/paste actions when source and target content models do not match. While this may be acceptable and preferred in some content types and for some authors, for non-technical, occasional authors it would just move their usability frustration to the "extra resolution step" which they are not used to in free-form word processing. It makes sense to limit the number of hidden rules that are created in the first place. The system can offload the resulting content structure challenges downstream to an automated step, such as when creating output.

For this reason, the base architecture of the Smart Content schema only allows for the definition of which blocks and inlines can be used at a section archetype level. Importantly, there are no controls for order and occurrence of blocks and inlines within a section. You can use any block at any time as frequently as desired and you can place blocks in any order. The same is true with inline markup. This significantly reduces the opportunity for the XML schema developer to over-specify content models that will reduce authoring usability. And while this does not solve all usability issues, it significantly moves in the right direction.

XML Authoring Usability: Archetypes

Developing an authoring system that supports and enforces arbitrary XML schema definitions with complex structure and varying content models that is both performant and highly useable on a wide variety of computing platforms -and it is clear that contributing to the content process from mobile devices with varying computing power is a growing requirement- is a very big challenge. While its relatively easy for a batch XML parser to validate and report errors against an XML document instance with a specific XML schema, it is much more difficult to provide the same capabilities in real-time during authoring. This is true due to both "arbitrary XML" support and the nature of XML parser processing expectations. To minimize some of this complexity, existing XML schema such as DITA have utilized an extensible information archetype model. This has enabled many tools to offer DITA support tailored for a specific purpose without having to claim support for arbitrary XML schema. The base DITA schema is highly complex with its original target of solving recurring problems in technical publications, though there have been and continue to be efforts to offer a simplified version that would be appropriate for non-technical documents.

The main advantage of a system based on archetypes is that an application can apply default processing to any markup which has an assigned root class. In arbitrary XML schema authoring implementations, the software must provide an additional configuration file that describes the basic processing for each XML element, e.g. <myBlock> should have a hard return before and after. With an archetype-based system, system implementation work is reduced since there can be fewer configuration files to develop, test, and deploy. A challenge with archetype-based systems is when the base archetypes of the system do not include an information type required from which to start.

Like XHTML and DITA, Smart Content schema starts with a set of common elements which nearly every document type will contain. These are section, block (<para>), inline (<tag>), table, image, cross reference, reference notes (e.g. footnote, endnote, etc.), and metadata. Other content types that are available by default include a variety of emphasis and lists. Currently being discussed are extensible models for bodyDiv in which the content model is fixed (e.g. a figure) and a simpleBodyDiv, where the content model is any block(s) from the parent section and would enable simple semantic wrapping of a sequence of blocks.

The advantage of having a set of base elements as information archetypes is that the system can treat customizations of these base types with common processing. Some examples include:

  • all types of Section appear in a TOC

  • all types of Para get white space before and after

  • all types of cross-reference can utilize the same source and target selection user interface

In Smart Content each of these base archetypes can be extended through RNG configuration for custom semantics Similar to XHTML's "class" attribute, the persistent form of a custom semantic for Smart Content is attribute based: <section type="Purpose">. There are two reasons for this:

  • It's extremely friendly to HTML developers, is easily transformed to XHTML, and can support direct presentation in HTML browsers using CSS techniques similar to the XHTML class attribute.

  • Using a type attribute instead of changing the element name provides an implementation methodology that better supports the cut/copy/paste/drop/drop of elements between document contexts. This approach also avoids the traditional pain related to parser errors associated with an invalid move and thereby skips the common frustration encountered when an element is out of context in the new location. When the value of a Type attribute is used for order and occurrence control, it may be easier using available parsers to resolve the issue in a more user-friendly manner and with high performance as the system does not have to validate all XML elements within the fragment at one time. If the base type is allowed, then it is a simple matter of assigning an available Type value. However, this also requires some unique features of RNG and would be difficult to implement in systems that support only DTD or XSD schema languages.

XML Consistency: The use of Metadata beyond XML Attributes

For many years, XML document systems, authoring tools, and schema developers have struggled with the limitations of using XML attributes to capture rich metadata. A simple example of a multi-value attribute must be expressed in XML using a text delimited attribute value such as: <section security-audience="Employees; Partners; Customers">.

As a result custom programming must be used to define the user experience that presents and constrains the author of multi-value attributes, the validation of a multi-value attribute, and finally process a multi-value attribute value such as when publishing a document.

The problem is multiplied if the attribute values should have hierarchy such as when describing geo-based regions: <section geo-audience="North America:CN,US;EMEA:UK,IR;">

However, most of these use cases can be handled if treated as an XML Fragment rather than as XML attributes:

<section>
	<meta>
		<attribute type="security-audience">
			<value>Employees</value>
			<value>Partners</value>
		</attribute>
	</meta>
</section>
				

Smart Content assumes that sections and inlines (and in the future, blocks), can have a <meta> element directly after the start tag and that, regarding any cut/paste, publish or other process, that metadata should be treated in the same way that XML attributes are treated. They are "children" of the element and apply to the element as a whole.

While this does express attributes in a verbose way, it enables the use of existing XML tools for editing, validating, and processing metadata while providing much richer expression, constraints, and validation without requiring custom processing. It does however require tools that process the XML to implement the "lock" of the <meta> fragment to the element that directly contains it, for example when copying and pasting text at the beginning of an element that has metadata.

Controlling Order and Occurrence Validation by Type Attribute Value

XML parsers are built to validate the structure of the document using the element names of a document. Parsers also validate the value of attributes, but attribute validation is atomic: the validation test is only if the value of the attribute is correct regardless of where in the document structure or even to which element the attribute is attached.

The goal of maximizing authoring usability first and minimizing developer work second is important in this context. Nearly all XML document schema are defined and customized by modifying the element names. This of course works great for the parser to validate and control the authoring experience, but it causes significant editing problems when copying and pasting fragments of XML. The problem is that more than one element of an XML fragment within a Paste buffer might not be valid at the target location.

The problem might be solved (and there have been many attempts) through authoring tool development. On a paste of an element into a new context, the authoring tool has to validate the entire structure of the paste-buffer fragment against the new target content location. The fragment root element might be invalid, or a child of the fragment root might be invalid, or the entire fragment structure might not be valid. To solve all of these cases programmatically ranges from difficult to near impossible. The simple answer then is to disallow the paste, thus placing the burden of resolution on the author while increasing his/her frustration. The next best and reasonably feasible programmatic answer is to shut-off the real-time validation parser, allow the paste, and then hope that the author can figure out how to re-assign the element names with very little guidance. And of course, they would have to turn the tags view on to do this work.

But the frequency of these problems can be dramatically reduced by using archetypes elements with Type attribute values to control order and occurrence. Then, assuming that the base elements are allowed nearly everywhere in the document structure, a paste can occur, the element structure validation parser is satisfied, and only the Type attribute values might be invalid. Providing a user interface for tracking invalid Type attribute values is relatively simple, though above and beyond normal XML parser processing.

As a reminder, Smart Content's methodology allows defined blocks and inlines anywhere within a given Section type. Assuming a Section has at least one block and one inline defined, then copy and pasting across Sections with differing Sections type definitions is always allowed and a Type attribute value user interface can be provided based on: automatically set to the only available type value; or if multiple type values are allowed, then alert the user through simple formatting and other user interface controls that action is required to redefine the type value from a list of available types.

Authoring Usability: Component Reuse

One of the most heavily marketed features of an XML authoring and publishing system is the use of content components: the ability to reference an external asset of any type (xml fragment, image, etc.) and the system resolves that reference as if the target asset is "inline" with the master document. This "single-source reuse" (versus traditional copy/paste or re-authoring) has an extremely positive impact on the ROI of a solution. It reduces the time and costs of content maintenance, enables parallel authoring and review at a sub-document level, decreases cost of content language translation by reducing the amount of content sent to a translator, and generally improves the quality of the content by increased consistency through synchronization of component edits to all referring parent documents.

Componentization is also one of the most complex features of XML when it comes to system deployment spawning a whole marketplace of Component Content Management Systems. The more fine-grained the content can be targeted for reuse-by-reference (e.g. Paragraph versus Section), the more complex it is for authors to understand what impact their changes are going to have on the system. For this reason, and again with the filter of Smart Content's target market, Componentization is limited to Sections and various special object types such as images, tables, and in consideration are Block Combinations.

Smart Content has adopted the componentization syntax of DITA using "conref" attributes to define the target content, though implementation of XLink or another syntax that is system-specific is in consideration.

An argument can be made that there are use cases in any document type for support of more granular component referencing. However, the added complexity for the non-technical, occasional author will likely be not worth the tradeoff unless the implementor develops use-case specific customizations. It's clear that XML techniques can solve almost any challenge but not always in ways that allow for easy adoption, implementation, training, maintenance, and most critical - authoring usability. The Smart Content schema does not currently define component boundaries, only the reference syntax so an implementation could determine how and when to enable component references.

An Overview of the Smart Content Schema

The Smart Content schema, defined in the RNG schema language, is heavily influenced by information archetypes such as DITA, in that Smart Content provides a base vocabulary of information elements that can be extended through configuration. However, Smart Content differs significantly from DITA in how those custom types are defined and how they persist in XML syntax. In this regard, Smart Content is more like XHTML. A very positive benefit of being XHTML-like is that the implementation can be more easily understood and rapidly adopted by the large volume of web developers.

The Smart Content methodology expressed in the schema has the goal of guiding system implementers away from creating document structures that cause authoring usability problems while also attempting to solve some of the long-standing limitations of XML markup as applied to complex, authored documents. An example of the latter is the application of element-level metadata that is richer than simple XML attributes support.

The following is an overview of the significant base elements:

Formal Groups of Blocks: <section>

The base element for a formal (i.e. having a Title) group of blocks is <section>. The intent of Section is similar to HTML div or DITA topic. Section in the RNG is used to create a custom typed container for a group of blocks, the list of blocks and inlines that can be used within the Section, and the metadata elements for the Section. Typically Sections will also start with a title element to enable easy identification of the boundaries of the section as well as provide a handle for Section navigation such as a hyper-linked table of contents.

The use of "section" as the semantic for a group of blocks was chosen as the best compromise given that:

  • Division or <div> is heavily used in HTML for a wide variety of mostly programmatic purposes and heavily overloaded in the XHTML domain

  • Topic or <topic> might invite unnecessary confusion between DITA and Smart Content

  • Chapter, Article, Part all have some specific definition in a variety of contexts which may or may not overlap with Smart Content's usage

Sections have the common form of:

<section type="mySection">
	<meta>...my metadata fragment...</meta>
	<title>Title Text</title>
	<body>...run of blocks...</body>
	[...zero or more sub-sections...]
</section>
				

There is currently no distinguishing characteristic between Section as a component and Section as a document. One use of a Section may be as a root for a publication and the same Section may also be a component child of another Section for a different publication. Smart Content leaves the definition of which Sections can be considered a root for a publication up to system implementation.

Sections can be configured in the following ways:

  • Section type, persisted in XML documents as <section type="mySection">

  • List of Blocks allowed within the section

  • List of inline elements allowed within the section

  • Metadata that applies to the section as a whole, persisted as an XML fragment just after the start tag: <section type="mySection"><meta>...my metadata fragment</meta><title>...</title></section></listitem>

Note that the definition of a Section does not allow for the control over order and occurrence of blocks nor inlines. If a block or inline is defined in a Section model, they can be used in any order and frequency. Under consideration is an exception to this rule for <bodyDiv> defined below.

Blocks: <p>

The base element for blocks is <p> (as in "paragraph"). Blocks are intended to hold runs of text and inline elements in any combination.

Blocks can be typed in the following ways:

  • Block type, persisted in the XML document as <p type="myBlock">

  • Future consideration is to allow for Metadata that applies to the block as a whole and will be persisted as an XML fragment just after the start tag: <p type="myBlock"><meta>...my metadata fragment...</meta><t>...Paragraph text and inline content...<t><p>

Note that the definition of a block does not allow for the control over order and occurrence of inlines. Inlines are defined at the section level and apply to all blocks within a defined section. A potential exception to this rule is <bodyDiv>.

Note that the pattern of <element><meta></meta><t></t></element> is used consistently in Smart Content for mixed content models. It enables clear and consistent addressing of the element as a whole, the metadata for the element, and its mixed content.

Inlines: <tag>

The base element for Inlines is <tag>. Inlines are intended to hold runs of text and other inlines in any combination. They are used to call out unique semantics for a phrase within a block or inline.

Inlines can be typed in the following ways:

  • inline type, persisted in the XML document as <tag type="myInLine">

  • Metadata that applies to the paragraph as a whole persisted as an XML fragment just after the start tag: <tag type="myTag"><meta></meta><t>...inline text and element content...</t></tag>

Note that the definition of an inline does not allow for the control over order and occurrence of nested inlines. Inlines are defined at the section level and apply to all blocks and inlines within a defined Section. An exception to this rule is the use of Block Combinations.

Miscellaneous Elements: <table>, <image>, <video>, <xref>, lists, emphasis, etc.

These elements are common to most XML document markup languages. Smart Content mainly follows XHTML markup for these elements. The exception is that Table is currently a modestly modified version of the CALS Exchange table model. The table modification is to support the capture of additional styling information such as would be generated when converting an MS Excel table to CALS Exchange.

A few additional notes on objects:

  • Block objects (table, image, lists, etc.) can be used anywhere a block can be used within a Section

  • inline elements (xref, emphasis) can be used anywhere an inline is allowed: in a text run

  • <image> is treated as an inline, but can be expressed as a block by being the only child of a block

  • Lists are <ol> and <ul>; there is a future consideration to allow for types of lists and additional structures including multiple paragraphs per list item, or specific content models such as might be used for a definition list

  • A list can be a child of list item so that nested lists can be expressed

Metadata: <meta>

User defined element metadata is defined using XML element structures. The <meta> element is allowed after the start tag of many base types including sections, inlines, and tables. The use of XML fragments for capturing metadata removes the need to escape XML processing tools in order to capture and persist complex metadata structures. To further simplify implementation, meta has a content model that enables the automatic creation of a user interface to capture or view metadata.

Note: At this time, XML attributes are limited for use by the Smart Content processing system to support typing of base elements and other system metadata such as element ID. Under consideration is the use of additional system-level attributes for specific Smart Content purposes. One example being considered is a "level" attribute that would support the free-form use of increase/decrease indent of blocks. This could enable the creation of outline-like structures without requiring the overhead of additional nested Section definitions whose only difference is their position in an explicit or implicit hierarchy. The value of reducing containment structure overhead for authoring usability and performance may be significant.

Simple attributes are defined:

<meta><attribute name="system"><value>disclosure</value></attribute></meta>

Multi-value attributes:

<meta><attribute name="outputs"><value>print</value><value>web</value></attribute></meta>

A Collection element with Member elements can be used to build a repeating metadata structure:

<meta>
	<collection name="contributors">
		<member name="contributor">
			<attribute name="role">
				<value>Supervisor</value>
			</attribute>
			<attribute name="name">
					<value>Sam Markup</value>
				</attribute>
		</member>
		<member name="contributor">
			<attribute name="role">
				<value>Legal</value>
			</attribute>
			<attribute name="name">
					<value>Marcia Tag</value>
				</attribute>
		</member>
	</collection>
</meta>
				

A Group element can be used to allow for a presentation that highlights a set of related metadata such as drawing a box around them with a shaded background:

<meta>
	<group name="CompanyInfo">
		<attribute name="company"><value>Quark Software Inc.</value></attribute>
		<attribute name="phoneNumber"><value>+1 303-894-8000</value></attribute>
	</group>
</meta>
				

Under Consideration: Block Combinations and Simple Block Combinations: <bodyDiv> and <simpleBodyDiv>

The base element for Block Combinations is <bodydiv>. Block Combinations, like Section, defines a group of blocks. Unlike Section, they enable fixed content models of blocks and inlines with both order and occurrence control. Block Combinations would not follow the default processing of Sections and therefore would not be used to generate an overall document navigation structure like a TOC. Block Combinations might be used to generate a "list of typed Block Combinations."

An example Block Combination is a traditional formal "Figure" which has a title element, an image element, and a caption element. As a group, Figure should be considered one object such that add, delete, copy, paste actions always includes all three elements. Smart Content currently has a figure element:

<bodydiv type="figure">
	<p type="title"><t>Title Text</t></p>
	<p><t><image cx="80%" cy="" href="qpp://assets/110"/></t></p>
	<p type="desc"><t>Description Text</t></p>
</bodydiv>	
				

Under consideration is the generalization of this idea such that Block Combinations can be typed and the RNG allow for full content model control within a <bodydiv> structure

While Figure is a simple example, Block Combinations could be extended to support a variety of other models including authoring slide-shows (list of images or figures with an expected output behavior), input forms, and more. It's a general mechanism for any processing system to understand that the block combination fragment is special and has special processing requirements.

For an XML schema developer, the obvious question might be, "Why only allow control over block and inline order and occurrence in block combinations? In all other document schemas these are available for the entire model." The answer is twofold: Authoring usability is a primary and fundamental goal, so Smart Content "encourages" the limited use of restrictive content models by making the definition of such a model an exception rather than the norm; second, Block Combinations are a powerful concept for an entire class of content types that can trigger custom user experiences in authoring, publishing, and interactive consumption.

Simple Block Combinations are more like HTML Div elements: they would enable authors to "wrap" any existing collection of blocks and provide a semantic type to the collection. One example is a Sidebar where the content allowed in a Sidebar is any content that is allowed within the Section. So there would be no additional content model definitions within a simple block combination. The markup under consideration is:

<simplebodydiv type="sidebar">[any blocks and inlines allowed in section]<simplebodydiv>
				

Summary

The Smart Content Schema is both simple in its design and powerful in its flexibility. With a focus on the user, the model targets a broad adoption of XML tool sets and the simplification of adding intelligence into authored content. Successfully addressing the needs of non-technical, business users and occasional writers requires a significant rethinking of traditional XML content implementations. Some of the changes are difficult to accept for XML experts and purists [of which, this writer was one]. While there may be many ways to solve a set of specific problems, the Smart Content methodology is one opportunity to improve on traditional uses of XML with a new audience in mind. We invite you to share your feedback on the model as well as any interest in working with us toward standardization of the model.

Smart Content Schema Sample

The technical details of the Smart Content Schema are less important at this stage then the goals and methodology. However, codifying the methodology required an evaluation of multiple schema languages and RNG was selected for its very flexible and powerful support of inheritance. RNG enables a natural implementation of the base elements and intended configuration types.

The root schema is currently modularized as Smart Content (base section model, base p block, and root of the schema), Smart Meta (for meta content model definitions of attribute, value, collection, member, group), and Smart inlines (for tag definitions). A sample RNG for a typed content model of a Section named "SOP" is defined:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
	xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
	datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

	
	<include href="../SOP Purpose/SOP Purpose.rng"/>	
	<include href="../SOP Scope/SOP Scope.rng"/>
	<include href="../Procedure/Procedure.rng"/>
	<include href="../SOP Legal/SOP Legal.rng"/>
	<include href="../SOP Background/SOP Background.rng"/>

	<start combine="choice" >
		<ref name="sop"/>
	</start>

	<define name="sop">
		<grammar>
			<include href="Smart-Section.rng">
				<define name="section-type">
					<value>sop</value>
				</define>
				<define name="para-types">
					<parentRef name="sop.para-types"/>
				</define>
				<define name="tag-types">
					<parentRef name="all.tag-types"/>
				</define>
				<define name="section-tags">
					<parentRef name="all.section-tags"/>
				</define>
				<define name="content-model">					
						<parentRef name="purpose"/>										
						<parentRef name="bginfo"/>										
						<parentRef name="scope"/>					
					<oneOrMore>
						<parentRef name="procedure"/>
					</oneOrMore>
					<oneOrMore>					
						<parentRef name="legalnotice"/>	
					</oneOrMore>
				</define>
				<define name="section-meta">
					<parentRef name="lang"/>
					<parentRef name="audience"/>
					<parentRef name="keywords"/>
					<parentRef name="contribs"/>
					<parentRef name="dates"/>
					<parentRef name="organization"/>
					<parentRef name="permissions"/>
				</define>
				<define name="para-meta">
					<parentRef name="keywords"/>
				</define>
			</include>
		</grammar>
	</define>
	<!-- to be created in a common file with choice option as in meta tags file-->
	<define name="sop.para-types">
		<choice>
			<value>heading</value>
			<value>note</value>
			<value>lq</value>
		</choice>
	</define>
</grammar>