As reported yesterday, EMC’s Centera development will be moving away from Belgium and the Mechelen development center will be closed.
I certainly found this news to be disappointing, as the software developers in Belgium have always presented me with some great opportunities to collaborate and build interesting products. In fact we were just on the phone last week brainstorming about new directions. I have always found these discussions quite exciting. Best of luck to them.
I could make a very strong argument that the work done in Mechelen was some of the most disruptive and innovative storage software development of this decade.
In that spirit I’d like to highlight what I think is their main accomplishment.
They took a desktop application (Filepool) and turned it into a 5 9s storage system.
Centera has taken its spot right alongside Symmetrix, CLARiiON, and Celerra when it comes to quality. The latest release of CentraStar and Centera Gen4 hardware has resulted in a measureable and proveable 5 9s quality rating. This is due in large part to the sweat equity of the Mechelen team and how they handled the challenges and obstacles that come with the introduction of any brand new storage technology.
Designed For Big, Large Files
The initial market (and initial design point) of the original Centera device was to store hundreds and thousands of large, fixed content files (think X-RAYS, scanned check images). Customers certainly used it for this purpose (and they still do). However, many customer accounts began storing hundreds of millions of small files (think email archiving), and these accounts began to push Centera towards its object scalability limits.
When I started at Centera in 2003 I joined with the Mechelen team in trying to understand why the system became slower as it began to fill up with millions of small objects. It was discovered that the random nature of MD5 hash values didn’t play very well with certain database and file system configs, not to mention buffer caches and disk drive layouts.
Thus began some very cutting edge research into optimizing the storage and retrieval of objects. Mechelen developers spent months experimenting with new naming formats, new file system layouts, distributed database techniques, and locality of reference optimizations. Successive releases of CentraStar continually broke through to higher and higher per disk object capacities. The latest release supports 25 million objects per disk.
As EMC announced each successive release of CentraStar and detailed the growth in object count, it did not generate much excitement in the press (object count, that’s the announcement?!!). It did, however, resonate with customers, who desired to pack as many objects as possible into the smallest, self-managing system that they could buy.
Failure Evolution
Right alongside object count is the progress that the Mechelen developers made handling system failures. Think about it. Filepool was a Java desktop application. Long-time storage system developers, including myself, were skeptical about a system that ran “Java microcode”. The Mechelen desktop developers did not have experience with the myriad of failure permutations that are common to storage systems.
That didn’t stop them.
I have two data points that highlight their progress. When a disk containing 10 million small objects fails, Centera can regenerate and redistribute all of the objects in just about 14 hours (with no load). As the objects get larger (and thus the object count per disk gets smaller), a disk regeneration completes in roughly 10 hours, independent of ingest load!
That is incredible progress. I’ve written before about some of the techniques they discovered to make this work.
True Disruption
I haven’t written about all of the other innovative features developed by this team; I focused on the most difficult ones: scale and failure recovery.
But let’s not forget the Centera SDK, the hundreds of partner applications, provision-free capacity upgrade, virtual pools, retention and retention hold, and all of the other cool features that the team delivered in the Centera product.
Is Centera a disruptive technology? Absolutely. Need proof?
How about thousands of customers? How about 100s of PB capacity sold?
That’s a lot of business that didn’t go to somebody else. That qualifies as disruptive in my book.
Hospitals are one of the primary accounts for Centera. Finland’s health care system will be based on Centera. Health care providers are not going to purchase equipment that has quality issues. They are not going to buy equipment that requires constant re-provisioning as capacities grow.
Object count and quality continue to make a difference in these sales.
Nice job Mechelen.
Steve


Agreed! Thanks for sharing the great work this team did for EMC, and the industry as whole.
— Chuck
Are you sure of your numbers ?
you say 3500 customers and 1.5PB sold. It means on average 420gb per customers ? Am i wrong ?
Ced
Ced,
Good catch, I’ll re-post the right numbers.
Steve
Just to get the numbers right: the latest release supports 25 million objects/disk.