Balisage logo

Proceedings

Agile Business Objects Management Application for Electronic Records Archive Transfer Process

Quyen L. Nguyen

National Archives and Records Administration, Systems Engineering Division of ERA Program Management Office

Betty Harvey

Electronic Commerce Connection, Inc.

Balisage: The Markup Conference 2009
August 11 - 14, 2009

Copyright © 2009 by the authors. Used with permission.

How to cite this paper

Nguyen, Quyen L., and Betty Harvey. “Agile Business Objects Management Application for Electronic Records Archive Transfer Process.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:10.4242/BalisageVol3.Harvey01.

Abstract

In order to continue to fulfill its mission in the information technology age, the National Archives and Records Administration (NARA) has made the decision to develop the Electronic Records Archives (ERA) system. One of the goals is to provide to the archivists a modernized system with automatic workflow that can streamline the digital archive business process.

For an archival system, Ingest is one of the core components. As part of the ingest process, this component would allow the record Producer to negotiate submission agreement before transferring digital materials into the system. Within the framework of a service-oriented architecture with business process management, the ERA system uses XML to represent business objects and metadata. In this paper, we will show how the synergetic combination of XForms and Genericode makes the system agile and responsive to business user requirements. Furthermore, the approach fits well with ERA's design principle to use international and industry standards, and facilitates the integration of XML business objects and the electronic records metadata. We believe that the standard-based approach of XForms+Genericode exposed in this paper can be generalized to develop any e-Forms system with a set of control values and vocabularies.

Table of Contents

Introduction
Archival Business Objects Management
Archival Business Objects
Authority Lists
Functional Requirements
XForms
Overview
Advantages
Genericode
Overview
Why is Genericode Valuable to NARA Enterprise
Advantages
XForms and Genericode Together
Advantages
Development Process
Implementation
Populating Fields from Code List Lookups
Business Rules
Content Governance
Asset Catalog Entry
Archival Management Taxonomy
XForms and Business Process Management (BPM)
Conclusion
Appendix A. Acronyms

Introduction

In response to the growing usage of information technology for conducting business by federal agencies, NARA has made the decision to build the ERA system. Its main goal is "to preserve electronic records independent of the hardware and software that created them" [1]. For the ERA system to store, preserve, and provide access to electronic records, it has to cope with the following challenges

  • Scope. As for the mandate, records will come from the entire federal government.

  • Variety. Since the federal government deals with different application domains, from health care, education, defense, space exploration, energy, environmental protection, etc., records will contain various types of knowledge. Moreover, their manifestation and representation may have different formats such as Microsoft Office documents, relational database files, or GIS artifacts.

  • Obsolescence. Added to the variety of domain and format above is the constantly changing technology and application software that were used to create the records. By the time they are ingested into ERA, these applications will be most likely obsolete or belong to old versions of the software.

  • Volume. It is estimated that the total volume of incoming records will be enormous, and will continue to grow over the years. Petabyte and exabyte range of data is not unimaginable.

In the face of these functional challenges, the ERA architecture will be designed in such a way to satisfy the following system qualities:

  • Extensibility. New record types, data types, and services could be added to the system without extensive redesign.

  • Evolvability. New technologies in software and hardware could be inserted using standards APIs and interfaces.

  • Scalability. The system must have the ability to adapt to the growth of record volume and user community.

Besides these qualities, the ERA system and its components have to be user-friendly, secure, and highly available to protect the assets and serve the public.

This paper is organized as follows. In section 2, we will give an overview of the authority lists and the business objects that are used to manage and govern the archival system. Sections 3 and 4 discuss the benefits of XForms and Genericode in general as well as they are applied to ERA. Then, we describe our approach of combining XForms and Genericode, that fits into our overall system architecture. Implementation process with concrete examples is also reported. In section 6, we show how business objects contribute to the Archival Management Taxonomy that can help governing the content. We summarize our design approach in the conclusion.

Archival Business Objects Management

Archival Business Objects

Before the transfer of any set of records, the Producer (usually a government agency) has to submit two business documents, namely Record Schedule and Transfer Request to NARA. The Record Schedule contains instructions the general disposition and maintenance of various types of records, such as permanent versus temporary records, and the retention period. Over the lifecycle of the Record Schedules, Transfer Requests will be created for every physical transfer to specify:

  • Transfer mode with detailed information

  • Access restrictions.

Transfer mode can be electronic wiring or via physical media such as audio cassette, microfilm, CD, DVD, video cassette, parchment, or photographic print. Each business object goes through multiple business processes and negotiation of disposition between NARA and the producer before records are transferred from the contributing entity to NARA. Access restrictions may be applied to records that contain privacy data.

Within the ERA system, Record Schedule, Transfer Request and other archival business objects (ABO) are implemented as e-Forms. Currently, ERA e-forms are encoded using JSP (Java Server Pages) pages One main advantage of e-Forms over traditional paper documents is that e-Forms facilitate deterministic processing by the system thanks to:

  • Elimination of free text form.

  • Structured fields that conform to a pre-defined data model.

  • System validation of input data. Such validation can be performed based on the data model with respect to data type specification, or required/optional characteristics.

  • Elaborate validation based on embedded business rules during creating/editing business objects. For instance, fields may be dependent on each other, and only values from authority lists are allowed.

The ERA users manage these ABOs via the Archival Business Object Management Application defined by:

  • Functional requirements for CRUD (create, update and delete) operations on ABOs.

  • Non-functional requirements of flexibility, extensibility, user-friendliness, open standards, performance and security.

With respect fo flexibility and extensibility, the design should allow adding or modifying a field to an ABO form to be done at low development cost. Moreover, changes to business rules that govern the forms or the fields within a form should be easily accomodated without requiring a lot of coding effort.

Authority Lists

Authority Lists, also known as code lists consist of values used to establish normalized values for certain key fields in a business object. The goal of Authority Lists is to institute a controlled vocabulary to be used in e-Forms and archival descriptions which is part of the NARA metadata. Examples of authority lists are:

50 states with 2-letter abbreviations.

Table I

CA

California

MD

Maryland

VA

Virginia

...

...

Federal agency names.

Table II

NARA

National Archives and Records Administration

USPTO

United States Patent and Trademark Office

NOAA

National Oceanic and Atmospheric Administration

...

...

On the access side, Authority Lists play an important role in building queries from various forms of search terms. For instance, with the use of Authority Lists, searching for NARA is the same as National Archives and Records Administration. Due to its importance and sometimes fluidity of the data, the ERA system is required to allow privileged business users the ability to manage the Authority Lists. Any update to the Authority Lists should take effect immediately and be made available to future creation and updates of business objects and forms.

Functional Requirements

Notably, the design for the application should exhibit low cost for development and maintenance. Moreover, since ERA is an Service Oriented Architecture (SOA) system with Business Process Management (BPM) and Enterprise Service Bus (ESB), such design should show an ease of integration with the workflow. The functional requirements for ABO can be summarized as follows:

  • Presentation. ABO should be viewed and browsed via W3C standard browsers. Moreover, the system should provide a friendly format rendering and printing capability of ABOs.

  • Management. CRUDVS (Create, Retrieve, Update, Delete, Versioning, Search) operations will be supported. The creation of ABOs can also be based on predefined and pre-configured templates.

  • Workflow. According to the requirements, "The system shall provide the capability to integrate forms into workflows" [18]. The system will implement a workflow for ABOs, which consists of a simple Draft-View-Approve cycle. In terms of governance, the ABOs will play an essential role in managing the lifecycle of electronic records to be ingested into the system. Therefore, the management of ABOs should be integrated with BPM and BPEL-based system orchestrations [9].

  • XML Format. In order to persist the ABOs for a long term and in a fashion that is independent of specific software and hardware environment, the ABOs will be stored as XML documents.

The following diagram depicts the components and services of the ABO Application from the SOA layer pattern perspective. The applications for Business Object Management, Business Object Review, and Business Object Approval can be linked together by a Business Process, which can be expressed in (Business Process Management Notation (BPMN) [10] and executed by a BPM engine.

Figure 1

image ../../../vol3/graphics/Harvey01/Harvey01-001.jpg

Archival Business Object Management

XForms

Overview

XForms [2] is a W3C specification for implementing an XML-based Web forms. In some sense, XForms can be viewed as a next-generation of HTML Forms. While HTML Forms is a mix of markup for presentation and form data. XForms takes the approach of separating data from presentation. Data in XForms can conform to an XML schema which makes data validation against the schema pretty straightforward. For Ajax server-side XForms [4], real-time validation would provide immediate feedback in case of error in user input. Therefore, the user doesn't have to finish a lengthy form to know that an error has occurred. Note that in the case of an ABO, a form may span more than one scrolled page. Unlike the HTML form, the output of an XForms is an XML instance ready to be stored in an XML-aware repository. On the other hand, XForms has constructs to control user input events; but, these controls are not dependent on the particular presentation modality. Therefore, it is possible to create a form that will take input from various input devices, such as keyboard and voice. In the paper Multimodal Interaction with XForms [6], Mikko Honkala et al. proposed XForms Multi-modality (XFormsMM) to facilitate concurrent support of multiple modalities using XForms. The XFormsMM architecture comprises of the following components:

  • Separate abstract user interface (UI) controls from modality specific elements such as style sheets

  • Interaction manager to switch and coordinate different modalities

  • Set of modality renderers for CSS style sheets

The application of the multi-modality is to enable a web form to support various UI devices such as desktop, wireless handset, IVR applications, etc. In the case of our application, this interesting feature might be exploited to develop user interface providing access to all users, impaired and non-impaired.

Advantages

The advantages of XForms have been identified in the literature and past conferences [4] [5]:

  • Data integrity. Since XForms is associated with an XML schema, data integrity is preserved due to the compliance of the forms with data constraints specified by the schema.

  • Data exchange. The output of an XForms is an XML document itself, which can readily carry data between SOA components and services. In the case of WSDL (Web Services Description Language), the data will be embedded in SOAP messages. If RESTful Web services are used, then the URL will contain the reference to the XML instance.

  • Performance. Response time and user experience are greatly enhanced as latency is reduced thanks to Ajax-based implementations.

  • Consistency. XForms specifies a construct to handle XML errors, thus facilitates uniform and consistent error handling and error messages.

  • Modularity and reuse. An XForms document can be composed of sections. Consequently, parallel development can be planned and organized. Building up a library of reusable XForms sections will therefore be possible. For example, in the case of ABO, we have a section for Personal Contact, and another one for Organization Information. Moreover, XForms constructs to control user input events will definitely save development time and cost.

  • Low-cost system requirement. Server-side XForms processing does not impose any requirements on end-user browsers. Note that ABO Forms are to be used by NARA archivists and agencies' record managers, and we don't want to levy any configuration requirements for using the application.

  • Standard support. Being based on XML itself, XForms can be easily integrated with other XML-related open standards such as XML Schema, XSLT, XSL-FO, XHTML, XPath, and XQuery.

Genericode

Overview

Genericode [3] defines a standardized model to manage code lists using a defined XML schema. Essentially, the idea is for a code to have a code key, and multiple code values. Every XML project has controlled lists that need to be supported in the XML application. There are two schools of thought for controlling code lists. The Universal Business Language (UBL) [19] is an OASIS-Open standards effort to describe business documents, e.g. invoice, purchase orders, etc. in XML. In version 1.0 of the standard, all the code lists for codes such as country codes, currency codes, etc. were embedded in the XML schemas. As individual countries started adopting UBL, it became apparent that placing large code sets in the schemas was a problematic approach. Some of the codes change rapidly and during implementation this required modification of the "standard" schemas which theoretically meant they were no longer UBL compliant. The definition of the codes were in the documentation portion of schema and not readily available to the application. Modifying schema to support changing codes became a configuration problem.

Members of the UBL technical committee developed the Genericode concept and it was adopted by UBL and other organizations. Genericode has now its own OASIS-Open technical committee[r20] , and is currently at version 1.0.

Why is Genericode Valuable to NARA Enterprise

Besides the codes that are located in the business objects and metadata descriptions of records, NARA receives records from agencies in the U.S. Federal government, as well as some private collections. Every record set has its own set of codes. The codes themselves can have the same meaning but different codes across multiple records. For example, if we look at the records of ship manifests from the Irish Potato Famine, a code as simple as the age of the person can have several codes representing the age and/or event depending on individual ship manifests.

The logical expectation is that a person"s age is pretty straightforward. However looking at the table below, which contains actual values of codes from the passenger list maintained by NARA [r21] it is readily apparent and a person"s age in a record isn't as cut and dry as some would expect. For example you wouldn't expect to see a value of 900 to represent a person's age.

Table III

Typical Codelist Representation in NARA Records

Code

Meaning

900

Born at Sea

901

Infant in months: 01

909

Infant in months: 09

800

Unknown

1

age 01

001

age 01

2

age 02

002

age02

003

age 03

3

age 03

03

age 03

The table below shows a representation of how the Genericode is organized. The advantage of the Genericode approach is that multiple codes can represent a single concept.

Table IV

Genericode Table Representation

Meaning

Ship1

Ship2

Ship3

Born at Sea

900

888

766

Infant in months: 01

901

.1

500

Infant in months: 09

909

909

909

Unknown

800

800

800

age 01

1

001

1

age 02

2

002

002

age 03

3

003

3

The Genericode standard has 3 major sections:

  1. Identification: Identification and location information (metadata).

  2. ColumnSet: Description for each column in the Genericode list.

  3. SimpleCodeList: The container for the actual code list.

If we look at the table Table IV above, the Genericode representation would be:

<CodeList xmlns="http://docs.oasis-open.org/codelist/ns/genericode/1.0/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Identification>
        <ShortName xml:lang="en">Age Code List</ShortName>
        <LongName xml:lang="en">Age Code List Irish Famine, period 1/12/1846 - 12/31/1851</LongName>
        <Version>0.3</Version>
        <CanonicalUri/>
        <Agency>
            <LongName xml:lang="en">
                United States National Archives Records Administration
            </LongName>
            <Identifier>1</Identifier>
        </Agency>
    </Identification>
    <ColumnSet>
        <Column Id="Description" Use="Required">
            <ShortName>Description</ShortName>
            <Data Type="xsd:string" xml:lang="en"/>
        </Column>
        <Column Id="Ship1" Use="Optional">
            <ShortName>Ship 1</ShortName>
            <Data Type="xsd:string" xml:lang="en"/>
        </Column>
        <Column Id="Ship2" Use="Optional">
            <ShortName>Ship 2</ShortName>
            <Data Type="xsd:string" xml:lang="en"/>
        </Column>
        <Column Id="Ship3" Use="Optional">
            <ShortName>Ship 3</ShortName>
            <Data Type="xsd:string" xml:lang="en"/>
        </Column>
    </ColumnSet>
    <SimpleCodeList>
        . . .
        <Row>
            <Value ColumnRef="Description">
                <SimpleValue>age 03</SimpleValue>
            </Value>
            <Value ColumnRef="Ship1">
                <SimpleValue>3</SimpleValue>
            </Value>
            <Value ColumnRef="Ship2">
                <SimpleValue>003</SimpleValue>
            </Value>
            <Value ColumnRef="Ship1">
                <SimpleValue>3</SimpleValue>
            </Value>
        </Row>
        . . .
    </SimpleCodeList>
</CodeList>

Advantages

As stated above, enumerated data can be problematic in XML documents. One approach is to have allowed values for a data element be enumerated in an XML schema, so that associated XML documents and their enumerated values can easily be validated by a standard XML parser. However, if there is a need to add, remove, or change values to the enumeration lists, then the schema has to be modified. In [12], the author proposed different solutions to extend the enumeration lists in an XML schema, by using XML mechanisms such as <xsd:union>, <xsd:pattern>, or <xsd:annotation>. The author argued that some of these solutions are advantageous because they require only one pass validation, thus avoiding performance penalty.

On the contrary, Genericode offers an approach that allows the management of the enumerations to be independent from the XML schema. Although this would imply a separate validation for the code list values, it does have some advantages [13]:

  • Genericode is a flexible scheme to manage the code lists for applications where business logic parsing performance is not a critical requirement. This is the case for the ABO Management application.

  • If a change is confined to the display value of a code, then any application data using the code key will not be affected.

  • Adding or removing a code from a list can be done directly in the XML code list. All forms using that code list will be changed simultaneously without requiring any programming, as in the case of using JSP and enumeration in the schema. In our system, we can build a simple application to allow NARA policy makers to manage the code lists that govern the critical values in the forms for ABOs.

Another key advantage for NARA to maintain all their code lists in a "generic" common format is the ability to create a standard NARA-wide authoring environment for developing and maintaining code lists across the enterprise. NARA can ultimately have thousands of code lists when preserving and describing electronic records for the entire federal government.

XForms and Genericode Together

Advantages

By combining XForms and Genericode, our approach can benefit the advantages of both XForms and Genericode. Indeed, the introduction of Genericode into XForms provides a separation of concerns in the software and data development:

  • Evolvability. From the data management perspective, the XML schemas associated to the ABOs and the controlled enumerations can evolve independently of each other. Thus, their maintenance will be more efficient. In business practice, the code lists would experience more changes than the schema. Moreover, with respect to software maintenance, it would not be desirable to change too frequently XML schemas.

  • Modularity. From the software engineering perspective, we can easily design and develop two separate modules: one to process the XForms, and the other to manage the code lists. Due to the separation of data, changes to the code lists would not affect the XForms processing module.

  • Separation of Control. The modularity of software fits the business organization of NARA, where the group responsible to control the code lists is different from the one managing the ABOs. Access to each of these modules can be implemented using RBAC (Role-based Access). Note that the potential users of the ABOs include NARA archivists as well as record managers from all federal agencies.

Development Process

The development process for XForms-based approach consists of the following steps:

  • Develop XForms model based on XML Schema, which conforms to and ERA conceptual data model.

  • Develop XForms input control for the data elements in the XForms model.

  • Develop XForms data validation rules based on the business rules provided by the record managers and processors. The implementation makes use of XForms binds derived from the XML schema constraints.

  • Develop error handling, and error message in order to provide consistency, hence user-friendly and ease of maintenance.

  • Develop CSS (Cascading Style Sheets) for each form. This phase would involve interactions with end-users in order to get their feedback and suggestions.

  • Define SOAP messages used in Web Services that implement business workflow to process XForms instance upon XForms submission.

Implementation

Our implementation of XForms/Genericode uses the Apache Tomcat application server as the infrastructure. The Orbeon XForms Server is used as a platform for managing the forms. We chose Orbeon over other XForms applications for the following reasons:

  • A large user base

  • Capability for support in the future

  • Well deployed

  • Ability to easily integrate with XML repository

  • XInclude support

We chose to use a standard XML database as the repository for storing XML components. Initially we used eXist Open Source repository then moved to MarkLogic (commercial software) for the prototype that integrates XForms and BPM section “XForms and Business Process Management (BPM)”:

  • Reusable XForm components

  • Genericode code lists

  • Converted code lists used for consumption in the form

  • XML business objects saved from form

The figure below shows the interaction of the forms to the Genericode.

Figure 2

image ../../../vol3/graphics/Harvey01/Harvey01-002.jpg

XForms+Genericode Architecture

Maintaining the code list external to the form provides the ability for the use of the same code list in multiple forms or multiple applications. When a code list is updated, all the forms that consume the code list will be automatically updated without any recompilation of code.

Populating Fields from Code List Lookups

Certain fields can be automatically populated based on the selection of a single code. There are several places when this becomes important for NARA. A good example is the "Record Group ". NARA classifies all records by a number which represents a title of the record group. A record group number can represent an agency or a collection records. For example the record group 21 represents" Records of District Courts of the United States ". Once a record group is assigned it never changes. Every federal agency is associated with one or more record groups (usually just one).

In an ABO form, the user selecting his/her agency will only be presented with the record group(s) associated with his/her agency. Once they select the record group, the record group title is automatically populated in the XML. Below is the XForms code which provides this functionality:

    <xforms:variable
      name="AgencyName"
      select="../era:OrganizationInformation/era:AgencyInformation/era:AgencyName"/>
    <xforms:select1
      ref="instance('TransferRequest-instance')//era:RecordGroupNumber"
      appearance="full">
        <xforms:itemset
          nodeset="instance('AgencyRecordGroup-instance')/Row[AgencyName
          = $AgencyName]/RecordGroup">
            <xforms:label ref="."/>
            <xforms:value ref="."/>
        </xforms:itemset>
    </xforms:select1>
</xforms:group>

The example above shows that the pull-down list is being populated from the AgencyRecordGroup-instance XML codelist. This codelist is set in the XForms model section using the <xforms:instance> element shown below.

<xforms:instance
  id="AgencyRecordGroup-instance"
  src="http://localhost:8080/exist/rest/db/home/era/CodeLists/Agency-RecordGroup-Form.xml"/>

The codelist is being pulled from an eXist XML repository. An major advantage to having the codelist maintained in a external XML file is that if an agency name changes (and they do quite often), the form does not require modification. Once the code list is updated, all forms using the code list are automatically updated. The record title gets populated by using an XForms <xforms:bind> function using an XPath statement. The attribute calculate attribute is used to set the value.

<xforms:bind
  nodeset="instance('TransferRequest-instance')/era:IdentificationInformation/era:RecordGroupTitle"
  calculate="instance('RecordGroupTitle-instance')/RecordGroupInfo[RecordGroupNumber
  = $RecordGroupNumber]/RecordGroupTitle"/>

The XPath statment in the calculate statement above basically says get the value of the record group title from the code list instance called "RecordGroupTitle-instance" where the <RecordGroupNumber> in the list matches the "RecordGroupNumber" variable. The $RecordGroupNumber has been set previously in the form.

Business Rules

The ABO forms must follow business rules that dictate the dependencies between the fields within a form, and also between the fields in one ABO to another. For example, if the "Required" indicator of a group of fields is checked, then valid values must be supplied to all fields within that group.

Some of the rules for the interaction of code lists in forms can be quite complex. The use of XForms and XML code lists allow the ability to define these rules using XPath statements.

Figure 3 show a pull-down selection bar for access restrictions. The form components change based on the selection in he pull-down list. The values of the pull-down list are populated by a code list for access restrictions. There are business rules associated with what information needs to be completed based on the value of the access restriction. For instance, if the user selects " Presidential Records Act (p)(3) Statute " the user must select a statutory citation. (See Figure 4)

Figure 3

image ../../../vol3/graphics/Harvey01/Harvey01-003.jpg

Example Pull-down List

Figure 4

image ../../../vol3/graphics/Harvey01/Harvey01-004.jpg

Statutory Citation pops up.

The XForms construct that controls whether the field is displayed based on the selection is below:

<xforms:group ref="era:AccessRestriction[contains(., 'FOIA(b)(3) Statute')] |
era:AccessRestriction[contains(., 'Presidential Records Act (p)(1)
  ')]">
. . .
</xforms:group>

The XPath above states that if the access restriction contains "FOIA (b)(3) Statute" or "'Presidential Records Act (p)(1)" then display the field.

Figure 5 shows the user selecting "PRMPA- Personal Privacy (D)". You will notice that in Figure 6, nothing is provided the user.

Figure 5

image ../../../vol3/graphics/Harvey01/Harvey01-005.jpg

PRMPA - Personal Privacy (D) Selection

Figure 6

image ../../../vol3/graphics/Harvey01/Harvey01-006.jpg

PRMPA- Personal Privacy (D) Selection Result Screen

Content Governance

As mentioned earlier, the ABOs serve to administer the transfer of archival records into the open archival system. Therefore, the ABOs will provide provenance and management metadata to the digital objects ingested and stored in the system according the Archival Information System (OAIS) information reference model [14]. In ERA, the metadata of a digital object is embodied in an Asset Catalog Entry (ACE).

Asset Catalog Entry

An ACE is represented by an XML document that conforms to a well-defined XML schema. At the high level, the structure of an ACE is compliant with the various kinds of metadata as described in the OAIS information reference model [14]. The following aspects have been considered in the design of the ACE structure:

  • Information type. From the OAIS model, an ACE should contain information about its associated digital object in terms:

  • Reference for uniquely identifying the digital object. Usually, this identifier is location and protocol independent.

  • Provenance for preserving the history and chain of custody of the object.

  • Context for recording the circumstances of the object's creation.

  • Fixity for storing authenticity mechanisms of records such as digital signature, and checksum.

  • Descriptive used for object search and discovery.

  • Standard integration. Given the diversity of data, business and knowledge domains as well as types, one standard alone cannot cover all the information types of an ACE. Therefore, the structure of an ACE should facilitate the integration of different XML standards. For instance, the ACE's schema should incorporate PREMIS (Preservation Metatdata Maintenance Activity) standard [16] for preservation metadata. If the digital object is a still image, then MIX standard [17] will be included to describe the technical metadata of the image object. NARA has its standard called Lifecycle Data Requirements Guide [15] that archivists have used to convey descriptive information of an object.

  • Metadata aggregation. The processing of an archived digital object can be performed by different archivist groups using varied archival processing systems and technologies. Associated metadata will be collected at each stage, and finally accumulated in the ACE stored into the ERA system. Within this environment, the ACE structure should be designed in such a way to facilitate easy and efficient import of metadata generated by the processing applications. In order to achieve this, the ACE can be divided into slots. Each slot is reserved for an archival processing system, and will have disjoint data elements to avoid duplication and complex crosswalk.

  • Technical aspect. One last challenging aspect to consider is that some metadata is unchanged since the time of ingest into ERA such as file type and size. Other metadata of the same object will certainly undergo changes such as description, and access rights. Therefore, two types of data management systems have to be combined to accommodate these two modes of metadata, mutable vs. immutable parts.

Archival Management Taxonomy

The design of ACE structure should allow the classification of digital objects in ERA according to different taxonomies. Archival Management Taxonomy is a special taxonomy that is mostly used by archivists, or record managers from federal agencies. A public user would be most likely interested in other taxonomies associated with the domain of the content of the data. In order to support the Archival Management Taxonomy, an ACE must include information extracted from the ABOs (Record Schedule and Transfer Request) that were created before the digital object represented by this ACE got ingested. With this organization, we can develop an application that allows business users to browse all digital objects grouped under a set of related ABOs.

XForms and Business Process Management (BPM)

Recently the NARA ERA Systems Engineering Division developed a prototype to determine the feasability and challenges of interjecting new technologies based on standards into the current system. The ERA system requires that workflow processing be flexible and incorporated into the system in a timely fashion by avoiding hard-wired and high maintenance cost business orchestrations. The goals of the prototype were:

  1. Find the best of breed BPM tool that supports modeling human interactions with the business process compliant with BPMN (Business Process Modeling Notation) standard.

  2. The BPM workflow should integrate seamlessly with our Forms Management (XForms) solution.

  3. Find the best of breed system orchestration tool compliant with BPEL (Business Process Execution Language) [r22]. BPEL orchestrations run on ESB should be able to integrate seamlessly with the BPM workflow above.

One of the challenges was the current lack of support for XForms solution among BPM vendors. All BPM vendors have their own forms management system internal to their software. Since the business objects (forms) are an integral part of the ABO Application capability, it would be desirable to have a plug and play architecture. It would be difficult to replace BPM software with another BPM if the forms were tied up in a proprietary format. Therefore, we only consider products that support the abstraction of the form from the application layer. The XForms/BPM prototype simulated business process capability of XForms interacting with various BPM candidate software packages using a web service call to the BPM. The same web service was used to interact with these packages for testing and evaluation. In the current system, LDAP was used to authenticate the user and their roles in the portal.

The graphic below demonstrates the flow and interaction of the various components.

Figure 7

image ../../../vol3/graphics/Harvey01/Harvey01-007.jpg

For illustration, we are showing in the following diagram a BPMN Workflow for the creation of a Transfer Request (TR). A TR is normally required every time a Producer wants to transfer electronic materials to the system. This business object must be created and approved before any actual transfer can occur. It should be noted that the workflow integrates different key technologies presented in this paper: XForms, Genericode, BPM, BPEL, and XML database. Thanks to the component and service architecture as exhibited in the diagram, the design is very flexible and offers low cost maintenance, should we need to modify the workflow. Moreover, the actual implementation and evolutio of components and services involved in the workflow should not affect each other as long as the interface is maintained.

Figure 8

image ../../../vol3/graphics/Harvey01/Harvey01-008.jpg

BPMN Worlkflow for a Transfer Request Business Object

Conclusion

In this paper, we have described an agile approach which is integrated into our XML-based stack from presentation, business logic, to persistence store for managing Archival Business Objects.

Figure 9

image ../../../vol3/graphics/Harvey01/Harvey01-009.jpg

XML-based Stack for ABO Management.

Our approach leverages the synergy of various XML standards and technologies at multiple layers of the software architecture:

  • Presentation layer with XHTML, XSL-FO

  • Form layer with XForms

  • Data layer with XSD and Genericode

  • Persistence storage layer with XML database.

We have shown that the combined application of XForms+Genericode provides such a flexibility that the software process can easily adapt to changing and evolving business requirements at NARA. Although the paper was based on our experience in developing ERA, we believe that the scheme described herein can be generalized to other applications that need a flexible management of business objects via web forms with code lists. Furthermore, the integration with BPM and BPEL greatly enhance the flexibility of the whole design.

Appendix A. Acronyms

Table V

ABO

Archival Business Objects

ACE Asset Catalog Entry
BPEL Business Process Execution Language
BPM Business Process Management
BPMN Business Process Management Notation
CRUD Create, Retrieve, Update, Delete
CRUDVS Create, Retrieve, Update, Delete, Versioning, Search
ESB Enterprise Service Bus
ERA Electronic Records Archive
IVR Interactive Voice Response
PREMIS PREservation Metadata: Implementation Strategies
SOA Service Oriented Architecture
SOAP Simple Object Access Protocol
UI User Interface
WSDL Web Services Description Language

References

[1] An Electronic Records Archives (ERA) Update. Available: http://www.diglib.org/preserve/ERA2004.htm.

[2] W3C. XForms 1.1. Available: http://www.w3.org/TR/xforms11/. 2007.

[3] OASIS. Code List Representation (Genericode), Version 1.0, Committee Specification 01, 28 December 2007. Available: HYPERLINK "http://docs.oasis-open.org/codelist/cs-genericode-1.0/doc/oasis-code-list-representation-genericode.pdf" http://docs.oasis-open.org/codelist/cs-genericode-1.0/doc/oasis-code-list-representation-genericode.pdf

[4] Eric Bruchez. XForms: an Alternative to Ajax ?. XTech 2006, Amsterdam, The Netherlands.

[5] Richard Cardone, Danny Soroker, and Alpana Tiwari. Using XForms to Simplify Web Programming. WWW 2005, May 10-14, 2005, Chiba, Japan.

[6] Mikko Honkala, and Mikko Pohja. Multimodal Interaction with XForms. ICWE’ 06, July 11-14, 2006, Palo Alto, California

[7] R. Bourret. XML and Databases. Available: http://www.rpbourret.com/xml/XMLDBLinks.htm

[8] Orbeon. Available: http://www.orbeon.org

[9] OASIS. Web Services Business Process Execution Language Version 2.0. Available: http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf.

[10] Object Management Group/Business Process Management Initiative. http://www.bpmn.org/.

[11] eXist, Open Source XML Database. Available: http://exist-db.org/

[12] W. Paul Kiel. Extend enumerated lists in XML schema – Explore options for your extension solution. Available: http://www.ibm.com/developerworks/library/x-extenum/.

[13] G. Ken. Holman. Introduction to Code Lists in XML (Using Controlled Vocabularies in XML Documents). Available : http://www.xmlprague.cz/2009/presentations/G-Ken-Holman-Introduction-to-Code-List-Implementation.pdf.

[14] The Consultative Committee for Space Data Systems. “Reference Model for an Open Archival Information System (OAIS)”, 2002. Available: http:// public.ccsds.org/publications/archive/650x0b1.pdf [Feb. 16, 2009].

[15] National Archives and Records Administration. Lifecycle Data Requirements Guide. Available: http://www.archives.gov/research/arc/about-arc.html#descriptions.

[16] PREMIS. Available: http://www.loc.gov/standards/premis/.

[17] NISO Metadata for Images in XML Schema. Available: http://www.loc.gov/standards/mix/.

[18] ERA Requirements. Available: http://www.archives.gov/era/about/documentation.html#requirements.

[19] OASIS Universal Business Language (UBL), http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl

[r20] OASIS Code List Representation, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=codelist

[r21] Famine Irish Passenger Record Data File (FIPAS), 1/12/1846 - 12/31/1851. URL: http://aad.archives.gov/aad/display-partial-records.jsp?f=640&mtch=11&q=Irish&cat=all&dt=180&tf=F#

[r22] OASIS Web Services Business Process Execution Language (WSBPEL) TChttp://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsbpel

Quyen L. Nguyen

National Archives and Records Administration, Systems Engineering Division of ERA Program Management Office

Quyen Nguyen is currently working in the Systems Engineering Division of the ERA Program Management Office at the U.S. National Archives and Records Administration. Before joining the National Archives, he has worked for telecommunications software companies. His experience is in developing software systems for large scale deployment. He has a BS in Computer and Information Science and Applied Mathematics from the University of Delaware and a MS in Computer Science from the University of California at Berkeley.

Betty Harvey

As President of Electronic Commerce Connection, Inc. since 1995, Ms. Harvey has led many federal government and commercial enterprises in planning and executing their migration to the use of structured information for their critical functions. Over the past 14 years she has helped develop strategic XML solutions for her clients. Ms. Harvey has been instrumental in developing industry XML standards. Ms. Harvey is a member of OASIS Open and is currently an active participant in the Universal Business Language initiative. Previously she was a member of the Core Components subcommittee of the ebXML initiative. She is the co-author of "Professional ebXML Foundations" published by Wrox. Ms. Harvey founded the Washington, DC Area SGML/XML Users Group in 1995. She still coordinates the users group which is the longest standing XML users group. Ms. Harvey is also a member of "The XML Guild" and recently coauthored the book "Advanced XML Applications From the Experts at The XML Guild" published by Thomson. Currently, Ms. Harvey is working with the National Archives and Records Administration (NARA) on developing future system evolution for the Electronic Records Archive (ERA) system.