Digitizing Dickens

Digitizing Dickens

In my local newspaper this morning there was an article that caught my eye. There’s a fairly significant digital archiving project occurring at the Worcester Polytechnic Institute to preserve some of the original works of English novelist Charles Dickens. This, of course, is interesting to me in the context of previous posts I’ve written about the digital preservation occurring at the JFK Presidential Library and Museum in South Boston.

The George C. Gordon Library at WPI has a very significant selection of historical books and they invest a large amount of their time caring for and preserving the physical copies of old works.

So why go to the effort of digitally preserving Dickens’ works when certainly most of his novels can be found online already?

Well the answer lies in the fact that the particular books at WPI’s library are sequential portions of the entire novel. I looked up this announcement (from 2008) of the WPI Dickens digitization project and learned that the original release of his novels became available as a series of printed installments.  Once these serialized versions are digitized it will be the first time they’ll be available on the web.

The importance of web-based access to the serial documents is highlighted in this quote by WPI professor Joel Bratin:

“Few scholars—and certainly very few readers—have previously had access to these rare serial parts, as they can be found in only a few collections worldwide,” Brattin says. “That’s unfortunate, as they reflect, more so than any other published text, the author’s original intentions as to wording, punctuation, and so on. Plus, by exploring these installments, readers are able to experience the novels—and a bit of Victorian London life—as their earliest readers did, complete with advertisements and with illustrations that are rarely reproduced in full in today’s paperback editions.”

This reminds me of the digitization project at the JFK Library. Copies of JFK’s speeches are widely available on the internet, but the original speeches that he hand-edited in his office are not. These originals clearly provide new insights to researchers.

The WPI effort uses scanners and OCR (optical character recognition) to join the scanned image together with metadata. Additional metadata is added by a WPI digital curator. The library is storing the documents as PDF/A, the archival form of PDF, which actually leaves out some of the PDF features that are not amenable to long-term archiving. The stories are being preserved along with color illustrations and will be text-searchable.

The newspaper article went on to state that WPI owns all but four of the serialized novels, and hopes to borrow the rest (and preserve them) from the Charles Dickens Museum in London.

Steve

http://stevetodd.typepad.com

Twitter: SteveTodd