In my last post I looked under the hood of the latest VNX release and provided some detail about the new engine (MCx). I think a fair analogy for the new MCx would be that of an engine swap, of which Wikipedia has this to say:
An engine swap is the process of removing a car‘s engine and replacing it with another….. usually one that is more powerful or more modern and maintainable.
Most engine swaps don’t happen, however, when the car is in motion. Chad puts it this way:
How do you change the engine of a car while barreling down the highway?
It’s one thing to play around in the lab and get great performance numbers. It’s another thing to replace the engine and enter the Grand Prix of the customer’s data center. So while the accomplishment of MCx’s performance increase is substantial, perhaps the greater achievement is the MCx qualification effort.
I spent some time with MCx technologist Marc Cassano, who highlighted the following:
The Disk Array Qualifier
The DAQ is a legendary testing framework which has been the foundational driver of VNX quality for close to 25 years. The VNX RAID algorithms function as a state machine that adroitly handles any number of failures coming from underlying hardware components. The DAQ testing framework sits outside the VNX system and tortures the disk array with disk failures, media read errors, power failures, processor reboots, torn writes, stripe spanning, rebuild checkpoint straddling, etc., etc., etc.
There are many RAID levels, but one of the most popular and oft-used is RAID-5. After more than two decades of testing, this state machine has stood the test of time as the longest-lasting and most-used implementation of RAID-5 in the industry.
And as I mentioned in the last post, these foundational state machines have been left alone in the MCx release. They function as they always have.
What has changed is that the state machines are now wrapped in the FBE architecture (described in my last post and re-emphasized below).
If anything were to break as a result of the new deployment, the DAQ would surely expose the problem.
Physical and Logical Package Error Insertion
The diagram below highlights that the physical disk plumbing that sits below the MCx software (boards, ports, enclosures, drives, etc), have been modeled as objects that pass messages to each other as I/O requests get routed down through the software stack and onto the physical infrastructure. The diagram below visualizes this approach.
Every piece of the physical infrastructure represented by this chain of objects can be simulated in software. Every object in the chain can be programmatically controlled via a common API. This allows test engineers to perform error insertion at any level of the physical infrastructure, including simulating bad drives, life-cycle injections, illegal hardware responses, etc.
This allowed the test engineers to perform scenario-based testing up-and-down the chain of objects. Certain objects could be held in one state while others were changed. This helped create timing scenarios in a simulated environment that were very hard to cause in the real product.
The physical package architecture was tested inside VNX for years but did not function as live code until 2011, at which point in time it replaced the previous back-end infrastructure management software. The insertion of the physical package brought along increased levels of internal programmatic testability. As new errors were inserted into the I/O path, the code coverage ensuring proper handling of all errors marched upward.
With the MCx release, a new logical package has been introduced (the diagram below), which also has the same API enabling error insertion. This allows the MCR (multi-core RAID) driver inside VNX to exercise a wide range of error recovery code paths.
Top-to-Bottom Unit Test Framework
The Physical and Logical Package represents an architectural modeling approach known as FBE (Flare back-end). Most of this model exists at the MCR layered drive and below:

In addition to FBE error insertion, each layered driver has its own unit test framework. For example, when the cache and RAID layers were separated from each other (they used to exist together in one layered driver), a set of cache driver unit tests were written to specifically qualify specific areas of cache functionality, including pre-fetching, mirroring to peer SP, de-staging, etc.
The same can be said of the flash tier (MCF) and RAID tier (MCR).
Putting it all together, there is one internal test harness that can do error insertion and validation across the FBE, MCC, MCF, and MCR layers. The functionality represented by these areas happen to represent the new “MCx engine” that we’ve been talking about.
Multi-core Workstations, Multi-core VNX
The final piece of the quality puzzle is that each test engineer fully simulates the VNX on a multi-core workstation. Hundreds upon hundreds of internal unit tests make up the test framework. Running every test in the VNX internal test suite takes over four hours to run with seven instances running in parallel. Given that the key approach for MCx is core affinity, many of the unit tests try and rupture this affinity.
Once all the tests have passed, the actual customer build can be deployed onto the hardware, and two test frameworks hammer on MCx: the DAQ and the internal unit tests.
This post has focused primarily on the new frameworks that test the MCx components. Of course, there are hundreds if not thousands of other tests that stress and qualify the existing capabilities supported by VNX.
So enjoy the engine swap. Drive it as fast as you want. It’s safe.
But the speed will take a while to get used to.
Steve
EMC Fellow



