Automatic upconversion using XSLT 2.0 and XProc: A real world example
All too much of the data on the Web appears in unstructured presentation-centric formatting that isn't suited for structured searching and retrieval. Upconversion to a more data-centric information storage format offers a potential for many new uses of the data. The starting point of our work is a collection of HTML documents containing video game reviews. Our goal is to describe a target XML format that supports certain elements and attributes containing information that we consider valuable. Furthermore, the conversion process itself should be carried out automatically by means of an XProc pipeline. We conclude our paper with a demonstration of typical benefits of the highly structured data that results from our conversions.