This week I had the chance to visit the John F. Kennedy Presidential Library and Museum. It’s been nearly two years since EMC and the JFK Library announced a joint effort to digitally preserve massive amounts of JFK’s legacy.
The storage being used at the JFK Library includes CLARiiON and Centera (two products I’ve helped develop). I wanted to understand how these two products have been combined with other EMC products to form a digital preservation infrastructure.
I also wanted to meet the team that was performing this digital conversion, and try to understand their process and how it mapped on top of this infrastructure.
In other words, my purpose was to understand the process and the infrastructure for a very ambitious digital preservation effort.
I was not disappointed.
On the drive back to EMC my thoughts fell into several different categories.
#1: They filled it up
I found out that EMC’s initial Centera donation was 19 terabytes. The JFK library hired interns in June of 2007 to start the scanning process. By December 2007 the Centera was full. The team has scanned nearly 70,000 pages and 72 photographs during that period. Earlier this year EMC installed another 15 TB. You know what this tells me? Their process is working.
Which makes you wonder: what is their process?
#2: They Looked Before They Leaped
You’ll notice that a year had elapsed between the June ’06 announcement and June ’07 scan. A large part of this time was surely related to the logistical issues associated with deploying the infrastructure. During this time the JFK library also made some smart hires in the form of meta-data cataloguers. The team began to build a preservation process long before any scanning took place.
The process had to consider the current form of JFK’s materials, the hardware and software from EMC, available staff and resources, and conformance to applicable standards from groups such as the Research Libraries Group (RLG). Which brings me to my next point.
#3: They Have Acronyms and Standards Bodies Too
Storage industry acronyms (like HBA) and standards bodies (SNIA) are a fact of life for someone like myself and you really can’t have a conversation without them. Digital preservation is no different. In addition to RLG, I also learned about AIP (Archival Information Package) and TDR (Trusted Digital Repository). I learned about NARA, the National Archives and Records Association. (These are the only acronyms I was actually able to retain!)
The bottom line is that standards and standards bodies all influenced the process that was put together.
#4: They Know More About Storage Than They Probably Want To
Software, hardware, and consulting has all been provided by EMC. Full-time, on-site resources to manage, monitor, and maintain that infrastructure was not! Herein lies one of the great challenges of institutions that wish to do large-scale digital archiving: managing the digital infrastructure.
They’re not following well-known and established storage infrastructure best practices for digital preservation. They’re creating them.
You may have noticed that I still haven’t described anything about the process (e.g. scanners, meta-data). I also haven’t talked about other products in the infrastructure (e.g. Documentum, Legato).
I couldn’t do either of those topics justice in one blog. But in future blogs I would like to. Know why?
Brush With Greatness
During my visit I was shown a folder of JFK documents. We opened the folder and there it was, a document that came across the desk of JFK. I was speechless looking at such an amazing piece of US history.
The JFK library’s process and infrastructure for digital preservation needs to be digitally preserved. There’s a world-wide need for best practices. In future blogs I hope to more fully describe what they’ve come up with.
More to come,
Steve

