Balisage Paper: Comparing and diffing XML schemas
Priscilla Walmsley is a senior consultant and managing director at Datypic, specializing in electronic publishing, data architecture and information exchange. Priscilla is the author of Definitive XML Schema (Prentice Hall PTR, 2012), and XQuery (O'Reilly Media, 2007). In addition, she co-authored Web Service Contract Design and Versioning for SOA (Prentice Hall 2008).
Schemas evolve over time, and it is useful to be able to automatically compare versions of a schema in order to provide detailed, accurate documentation to implementers. Automatically “diffing” schemas is also an effective quality control technique, ensuring that inadvertent changes were not made, and that all changes made are backward compatible (if that is a goal).
When taking into account the variety of ways of expressing a content model, and the possibility that advanced schema features were used, it is necessary to go beyond simple text diffing or even XML diffing. By first “canonicalizing” schemas to make them easier to compare, and then cataloging the differences between schemas we can answer questions like “Is this schema backward compatible?” and “Is this schema a subset or superset of another schema?”