Mapping the JFK Library Archive onto OAIS

Mapping the JFK Library Archive onto OAIS

This is my third post about the digital archiving effort at the JFK Library. The effort is less than two years old and already a large amount of JFK’s papers and photos have been digitally archived.

The software and hardware being used to process and store the information was donated by EMC, and the process of preserving the artifacts was designed by the archivists working at the JFK Library.

Part of their design process was a consideration of OAIS: an Open Archival Information System. OAIS is a standard that defines the creation of an archive. OAIS provides common terms and comparison points with other, similar archival systems. The team also considered a reference model known as “Trusted Digital Repository” document (see comments from JFK archivist James Roth below).

The attention given by JFK’s archivists to OAIS allows me to write this article using terms that are familiar to archivists and digital curators throughout the world. As I’ve learned about OAIS I find myself focusing on what seems to be an acronym of primary importance: AIP.

The Archival Information Package.

One of the functions supported by an OAIS is the actual “preservation” of the information.  The archival information package (AIP) models everything needed to preserve information over time. There are other types of packages that model submissions into the archive (SIPs) and dissemination out of the archive (DIPs). For this particular post I’ll to focus on the AIP. The diagram below breaks the AIP into two fundamental pieces:

Aip

 

Click to enlarge

At a high level I view “Content Information” as the information being preserved, and “Preservation Description Information” as meta-data added by the archivists.

For a detailed description of the diagram please refer to the Reference Model for OAIS.

Given this very brief overview it’s an interesting exercise to describe the AIP implementation chosen by the JFK Library archivists. A review of my previous post on the JFK folder numbering scheme would be helpful.

AIP = Folder

The JFK Library chose to associate an Archival Information Packet with each individual folder holding JFK’s documents, pictures, and other items.  All the content from the folder labeled “JFKPOF-001-001” is stored on EMC’s infrastructure as an archival information packet.  This fact allows us to ultimately understand the JFK archiving process and how it maps onto the software and hardware provided by EMC.

The diagram breaks down the archival information packet into “content information” and “preservation description information”.

Content Information

How is the JFK Library implementing the “content information” aspect of an AIP?  Content information is made up of two things: data objects and Representation Information.  The data objects are the “digital scans” that come from each physical object in the JFK folders.  Think of data objects as the physical bits resulting from the scan.  The Representation Information describes how to interpret (e.g. view) that information. For example, the representation information for JFK’s scanned documents would be something along the lines of “image/tiff; 24 bits, uncompressed, 600 ppi”.

Preservation Description Information

The second part of the Archival Information Package is the PDI. Think of the PDI as extra meta-data that is added by the archivists.  The JFK Library implementation of PDI is as follows:

  • Reference Information: this field needs to unambiguously define a persistent identifier for the AIP. In the case of the JFK library this is the JFK reference number (e.g. JFKPOF-001-001).
  • Provenance Information: this field documents the history of the AIP.  Evelyn Lincoln (JFK’s personal secretary), for example, could be listed in the provenance information for JFK’s presidential office files. Other provenance information could include the reference number itself, the person who conducted the scan, along with the scanning equipment and software used to digitize the information (e.g. Fujitsu Image Scanner, Documentum ApplicationXtender, etc).
  • Context Information: this information allows the archivist to add additional information about documents related to this AIP. The JFK Library uses collections (e.g. the President’s office files or National Security documents) and series (e.g. speeches) to provide additional context.
  • Fixity Information: this information is used to authenticate the information. The JFK Library uses the JFK reference number as a cross-reference to validate the authenticity of scanned documents.

Best Effort & Dublin Core

I’ve already stated that the “Trusted Digital Repository” is not a standard, but the JFK archivists certainly gave it due consideration.  TDR recommendations were implemented when they were realistic and attainable given the level of resources (people and equipment) available to do the archiving. In other words, recommendations were implemented where practicable.

Apart from OAIS, the team also chose to use the Dublin Core meta-data standard.  Dublin Core is an international standard for describing digitized cultural resources.

I haven’t come close to describing the entirety of the OAIS standard, but I’ve covered just enough to map the AIP onto the EMC hardware and software at the JFK Library.  I will catalogue this infrastructure in an uncoming post.

For those of you familiar with OAIS, I’d love to receive your comments.

Steve

Many thanks again to the JFK archivists for their time….

3 Comments

  1. Ken Knowles

    Just to clarify one point, the OAIS actually is an ISO standard–ISO 14721:2003. I only mention it because I believe that one of the most important keys to success for long-term preservation is the pervasive use of standards.
    I think your point was that the reference model doesn’t specify how to map information into the model. But, it does say what information is required and where to put it in the model. Would you agree with that?
    Also, section 3.1 specifies mandatory responsibilities and some of these have an impact on the IT functions of the archive. So, that would be another place where the OAIS has to be treated like a standard, not merely a strong recomendation.

  2. James Roth

    We completely agree with Ken’s comments that OASIS is a standard, one which we are indeed following. We are also following DACS (Describing Archives: A Content Standard http://www.archivists.org/governance/standards/dacs.asp) an output-neutral set of rules for describing archives, personal papers, and manuscript collections, and can be applied to all material types. It is the U.S. implementation of international standards (i.e., ISAD(G) and ISAAR(CPF)) for the description of archival materials and their creators.
    We are also following Dublin Core (ISO Standard 15836-2003 of February 2003 [ISO15836], ANSI/NISO Standard Z39.85-2007 of May 2007 [NISOZ3985], and IETF RFC 5013 of August 2007 [RFC5013]. The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for use in resource description. The fifteen element “Dublin Core” described in this standard is part of a larger set of metadata vocabularies and technical specifications maintained by the Dublin Core Metadata Initiative (DCMI). The full set of vocabularies, DCMI Metadata Terms [DCMI-TERMS], also includes sets of resource classes (including the DCMI Type Vocabulary [DCMI-TYPE]), vocabulary encoding schemes, and syntax encoding schemes. The terms in DCMI vocabularies are intended to be used in combination with terms from other, compatible vocabularies in the context of application profiles and on the basis of the DCMI Abstract Model [DCAM] http://dublincore.org/documents/dces/.
    What we were describing as a reference model was the “Trusted Digital Repositories” document. From the document: “In 2002, RLG and OCLC jointly published “Trusted Digital Repositories: Attributes and Responsibilities” (TDR), which…articulated a framework of attributes and responsibilities for trusted, reliable, sustainable digital repositories capable of handling the range of materials held by large and small cultural heritage and research institutions….In 2003, RLG and the National Archives and Records Administration created a joint task force to specifically address digital repository certification. The goal of this task force has been to develop criteria to identify digital repositories capable of reliably storing, migrating, and providing access to digital collections. The challenge has been to produce certification criteria and delineate a process for certification applicable to a range of digital repositories and archives, from academic institutional preservation repositories to large data archives and from national libraries to third-party digital archiving services.”
    As part of the National Archives and Records Administration, the John F. Kennedy Presidential Library and Museum staff believes we should be adopting the Trusted Digital Repositories report as our guideline. However, we may not be able to adopt every criteria particularly because this document is based upon born digital objects, while we are creating digital surrogates. We are making every effort to adhere to the guidelines in order to meet the certificate criteria to become a Trusted Digital Repository.

  3. Ken,
    Your clarification was indeed on the mark and I updated my post to more accurately reflect both OAIS and TDR. Thanks for the contribution.
    Steve

Comments are closed