Balisage Paper: IPSA RE: A New Model of Data/Document Management, Defined by Identity, Provenance, Structure, Aptitude, Revision and Events
Walter E. Perry
Walter Perry, PhD. is Managing Director of Fiduciary Automation in New York, where he has 27 years experience building distributed systems for processing transnational financial transactions, their settlements, and associated regulatory and accounting compliance and reporting. He has spoken widely over the past dozen years on XML-defined transaction processing, XML database issues, and the elaboration of semantics from XML syntax. He was the founder and leader of the XML Special Interest Group of New York.
In the world of private (not publicly traded) investment fund dealing a very substantial portion of the data which should result from daily transactions is, and historically has always been, unavailable, misstated, or flat-out in error. Transactions are a tortuous path of one-at-a-time interactions between each of the entities with one other — the "high net worth" individuals who put up money for investment; the named investors who aggregate that money; the nominee banks where those monies are lodged (and aggregated); the marketers who solicit the named investors on behalf of particular investment Funds; the managers who allocate assets of those funds to particular investments; the Prime Brokers who execute the transactions to realize those managers' asset allocations; the custodians who hold securities in the name of those Funds; and the administrators hired to oversee and account for the business done by Funds and managers. In each interaction each of the two parties is — by definition of his role — transacting business at a different granularity than his counter-party, and in most cases with a materially different understanding of the substance of the transaction, as that substance might be formally defined with the basic semantic operators IsA and HasA. It is usual that managers (and further up the chain of documentation and accounting, administrators) do not know accurately whose money has gone into, or come out of, a given transaction nor, from the other point of view, in which particular transaction was a investor's stake in a Fund secured, and at what basis. Historically these problems are considered intractable. However, investor skepticism in the wake of recent losses and scandals, and government insistence on regulation will not allow these problems to remain unsolved, and the particulars of regulation are grounded in knowledge and transparency about whose money, and through what chain of provenance, is deployed in what exact amounts in which transactions for which investment assets at what basis.
Databases and document stores depend on static, or at least general, definitions of structure or linkage: the database schema which defines a table as the particular structure of attribute columns, or the document type definition or other schematic representation of a document by the structure of its sub-entities. In any case, in databases and document stores the structural definition is not itself the instance data stored in each record nor, more exactly, an instance aggregation of instance linkages into a unique record. Yet by redefining the substance of the "data" record as just such an instance aggregation of linkages we can insure an instance record transactable across gross differences of granularity separating the parties to the transaction, and across widely different understandings of the IsA and HasA semantics of the instance transaction. As a matter of implementation, the design of Google BigTable and the API for Google App Engine applications built atop BigTable are far more hospitable than either a relational database or a document sore model to the "linksbase" design. An enthusiastic application of Ockham's Razor to linksbase implementation practices leaves me with six fundamental entity types for recording each unique linksbase instance: Identity, Provenance, Structure, Aptitude, Revision and Events. Any recorded linksbase instance may be understood as an extended arc, on the path of which my lie any number of instances of any of these entity types, each separately influencing the aggregate vector weight of the resultant arc. In other words, most specifically in contrast to the database and document store models, the cardinality of any particular attribute on an instance record is unlimited, while the permissible — even assumed — uniqueness in the structure of any instance "data" record means that the presumed cardinality of any given record type is 1. As it turns out, this "backwards" thinking about record types and the attributes upon them is particularly facilitated by the design of Google BigTable, and the implementation of an IPSA RE linksbase seems well-suited to Google App Engine.