XML Out - reducing the clutter.
- Lots of pressure to force all XML implementations to correct errors
- Allowing error recovery is pobably the same as forcing implementations to accept malformed documents
- Web browser implementors tell me they need this
- Presumably they often encounter non well-formed “XML”
Why?
- Where are these corruptions entering the world?
Whence 1
- People hand-authoring transcriptions of Mediæval Slavonic manuscripts...
- Must be! ...
- It's those TEI people at it again...
- Tens of millions of them, causing pressure on the Web browser impl...er....
Whence 2
- OK, but the TEI people don’t tend to share their
transcriptions in XML without checking them, because
they want to use them....
- So, not the TEI people...
- Probably not anyone hand-authoring transcriptions.
Whence 3
- If not the TEI crowd, who else would be so evil and corrupt?...
- Probably not technical documenters, for the same reasons...
- So who?
Who 1?
- The request mostly comes to me from people in the Web browser world...
- Is it XHML? ...
- Who cares? XHTML is mostly served up as text/html, which
browsers already auto-correct....
- ...and when it's served as XML the browsers choke on
errors today.
Who 2
- Wait. ...
- I know! It's...
- RSS!
RSS
- RSS feeds are usually generated automatically. ...
- As are configuration files, SOAP replies...
- Surely computer programmers would be careful to test their work thoroughly?
Error
- Here's a real example from PHP code that writes a
configuration file:
write($config, " <reg reg.1.displayName=\"$locmark\" reg.1.address=\"$stationid\" reg.1.label=\"$stationid\" reg.1.type=\"private\" reg.1.lcs=\"\" reg.1.csta=\"\" ...
Random Quote
The author represents, in string colours, the causes of error, arising
from the disorders of the imagination and passions, the
abuse of liberty, and an implicit confidence in the senses.
Problems
- Code is error-prone; ...
- It's hard to read and messy; ...
- XML quoting isn’t obvious to programmers.
What can we do?
- It’s easy to use printf/echo/print, the
programmer doesn’t have to think...
- but they need to think ...
- So, they should use a print routine that automates XML escaping!
- but there often isn’t one.
XMLout - a family of libraries
- Designed to be more attractive to programmers than using print statements...
- Multiple libraries/modules because multiple languages...
- Initial target languages: PHP, Perl, C, C++, JavaScript...
Multiple designs (goals)
- To be designed to feel natural in each language;...
- Multiple versions in some cases to support different programming styles;...
- Lightweight, not “bloated” ...
- Does not build a tree, so low memory usage...
- Reasonably fast (compared to multiple I/O calls)
Status
- Examples exist...
- JavaScript and C closest to useable...
- PHP, Perl in progress...
- Strategy is to understand major issues across languages before
releasing first versions, to get the versions as similar as possible.
Library functionality
- Automatic XML-quoting (escaping) of strings;...
- Automatic element tag balancing;...
- Can override automatic XML-quoting;...
- Can double-escape for RSS descriptions...
- No tree manipulation supported; it's a string!
Example 1:
- <p class="hi">hello</p>
- JavaScript/Java:
Xelem("p", "Hello").Xattr("class", "hi");
- PHP:
Xelem("p", array("class" => "hi"), "Hello");
- Perl:
Xelem("p", { "class" => "hi"}, "Hello");
Example 2: existing C
char *writeConfig1()
{
char *buffer = malloc(LOTS);
sprintf(buffer, "<config version=\"1.2\">\n");
strcat(buffer, "<name>");
strcat(buffer, "<first>");
strcat(buffer, "Liam");
strcat(buffer, "</first>");
strcat(buffer, "<last>");
strcat(buffer, "Liam");
strcat(buffer, "</last>");
strcat(buffer, "</name>\n");
strcat(buffer, "</config>\n");
return buffer;
}
Example: better C
char *writeConfig()
{
XMLOut *stuff;
/* tree-based */
stuff = XOelem("config", NULL);
Xattr(stuff, "version", "1.0");
Xcontent(stuff,
"\n",
Xelem("name",
XOelem("first", "Liam", NULL),
XOelem("last, "Quin", NULL),
NULL
),
"\n",
NULL
);
return XOtostring(stuff);
}
Some design issues
- C does not natively handle lists, hence trailing NULLs
- If XOelem() runs out of memory it returns a static error object,
to make the nested calls safe. Could register a callback.
- XOelem frees element objects passed as arguments, but not
plain strings.
- C Can’t safely use the func().func().func() paradigm because your program
crashes if one returns NULL, so programmers are (rightly) wary of it.
OO languages
- Perl, PHP and JavaScript all have runtime typing and can
deal with lists without a trailing NULL
- May add an OO API as well as the functional and procedural.
- $x = new XML::Out; $x->elem("p", { class => "happy" });
- would be a separate module/library to avoid accusations of bloat.
Mixed Content and Escaping
- Xelem("p", "hello John & ", Xelem("q", "Susan"), ".");
- <p>hello john & <q>Susan</></p>
- In C or C++:
Xelem("p", "hello John & ", Xelem("q", "Susan", NULL), ".", NULL);
- Behind the scenes uses __attribute__ ((sentinel)); in C header file to require the trailing NULLs, and a magic value to distinguish strings and structs.
Directions
- Library could construct a light-weight tree to support serialization options ...
- Perl (and PHP 5) have objects and classes, so want also an OO API
- Flavours: Some people may prefer XMLOutElementAsString() to Xelem()
Next Steps
- Downloadable code (August)...
- Useability testing in each language...
- Portability testing...
- CPAN module for Perl...
- Negotiate with php to get into core...
- Volunteers welcome!
Random Quote
The bits of paper upon
which he had written these thoughts, were found, after his
death, filed upon different pieces of string, without any
order or connection; and being copied exactly as they
were written, they were afterwards arranged and published.