JFK Digital Archive Solution

JFK Digital Archive Solution

Today, January 13, 2010, the John F. Kennedy Library and Museum announced the availability of a new digital archive that allows public access to a portion of the museum’s assets.

I have some personal thoughts about the archive; I’ll share them in a separate post. This post focuses on the architecture that allows the general public to view and search the archive.

While the internal digital archive was assembled using EMC products, the full public solution was a partnership between EMC, Iron Mountain, AT&T, and Raytheon.  The diagram below, produced by Raytheon as part of their project leadership role, describes how AT&T and Iron Mountain augment the archive itself by providing public web access and disaster recovery capabilities.

Arch

  1. The primary archive is based on EMC’s Centera and Celerra network storage systems and is accessed using EMC’s Documentum Enterprise Content Management software. Documents enter the archive via scanners; audio/video is ingested (via Documentum) after it was produced by Iron Mountain Digital Services.
  2. The public experience of the digital archive is provided by the AT&T Synaptic Hosting environment. Web-ready renditions of documents and photographs are generated by the Documentum system at the JFK Library at the time the full resolution digitized files are produced and ingested.
  3. This backup archive, also built on the EMC Centera storage system, is located in an Iron Mountain secure underground operations facility. In the event that data in the Primary Archive is lost or compromised, the Disaster Recovery archive provides the capability to readily restore the full data holdings, including all associated meta-data.
  4. The AT&T secure network is the inner backbone of the Digital Archive, linking the Primary Archive at the JFK Library with the Disaster Recovery Archive at Iron Mountain and with the  Web hosting site at the AT&T data center. This secure network employs AT&T’s Enhanced Virtual Private Network (EVPN) Multi-Protocol Label Switching (MPLS) technology.

In my discussions with the archivists working at the JFK Library, the new archive goes far beyond any existing archive that NARA has ever assembled. There are several distinguishing characteristics of the archive that are state-of-the-art and quite differentiable from previous systems:

  • each artifact is scanned only once, minimizing the possible damage to fragile documents, speeding the ingest process, and …
  • the EMC Documentum workflow and rendition capability preserves the initial high definition scan while automatically generating smaller files to be used for fast internet access and/or viewing over slower links
  • the automated scanning and classification software is part of an integrated package with the high-speed data storage and retrieval systems
  • the replication technology enables seamless offsite storage of all high definition copies
  • previous historical archives the size of the JFK Library would default to a tape-based design;   a disk-based archive (e.g. Centera) provides renditions that allow for much faster access

The majority of the museum’s assets are still “boxed up” in the back room. A small (and growing) percentage of those assets are now available to every student, historian, and person on the planet. That’s powerful. The rest of the documents show up as “empty boxes” when the archive is publicly accessed. These boxes (known as cabinets in Documentum) contain metadata about what lies inside. The hope is that when a JFK researcher looks into a given box and sees that it is empty (i.e. the archivists have not scanned the documents from a given box), they will be able to request that scanning occur. This feature, when implemented, would guide the archivists in their scanning activity going forward.

Steve

Information Playground

Twitter: @SteveTodd

EMC Intrapreneur

 

 

1 Comment

  1. Ryan Gadsby

    “The rest of the documents show up as “empty boxes” when the archive is publicly accessed. These boxes (known as cabinets in Documentum) contain metadata about what lies inside. The hope is that when a JFK researcher looks into a given box and sees that it is empty (i.e. the archivists have not scanned the documents from a given box), they will be able to request that scanning occur. This feature, when implemented, would guide the archivists in their scanning activity going forward.”
    That’s a cool feature. The producer-consumer relationship at its finest.

Comments are closed