CLARiiON Parity Shedding

CLARiiON Parity Shedding

You never know where inspiration might strike.

Before the release of the first CLARiiON, the VP of Engineering held an impromptu data integrity meeting. We had put a lot of effort into rock-solid data integrity for our implementation of RAID5, but we had one sticky problem that we couldn’t figure out. So we sat around for an hour scratching our heads and came up with nothing. Meeting over.

I started my ten minute drive home. My boss started his 90 minute drive home.

About halfway home he nearly drove off the road due to divine software inspiration.

Before describing the solution, some groundwork needs to be laid.

What Was the Problem?

The problem was “RAID5 interrupted writes”.  Writing customer data to a RAID5 group resulted in two separate operations: (1) a write of the customer data, and (2) a write of the parity information. If a power or system failure hit at the wrong time, then parity would need to be fixed.

I blogged about this recently.  CLARiiON NVRAM can accelerate the repair of parity in this situation. It does this by pointing directly to the parity data that might be out of sync.  All disk drives are consulted and compared to the parity information; if it’s wrong it gets fixed.

But what if one of the disk drives had failed?  Then parity couldn’t be fixed, and data might be lost. This scenario was the subject of our impromptu data integrity meeting.

Isn’t That a Double Failure?

One of the tempting cop-outs when bumping into particularly gnarly RAID5 data integrity problems is “Hey, it’s a double failure”.  For example, if two disks in a RAID5 configuration simultaneously get pulled out of the system, well, that’s a double failure. If the customer tries to read data that resides on one of those two disks, the RAID5 algorithms can’t retrieve the data.

So if a system suffers a drive failure, and then a power failure, is it reasonable to just say “Hey, it’s a double failure”?  In this case we felt the future of our product was at stake, because this particular double failure was highly likely.

Single Drive Failures

One of the promises proposed by RAID technology was the ability to continue reading and writing customer data after a single disk drive failure.  Reading data located on a missing disk drive resulted in reconstructing the data with the help of the parity information.  Writing new data that mapped to a missing disk drive could be accomplished by reading adjacent disk drives, combining the adjacent data with the new data, and writing new parity. So in essence a disk drive could fail and the system still appeared healthy to the customer.

The problem we were concerned about, however, would occur when writing new data to adjacent and healthy disks. Using the traditional RAID5 write algorithms would result in a reading from a healthy disk drive followed by writing to that healthy disk. So far, so good, right? Time to write new parity information. Even though a disk drive is missing, the system is still able to perform RAID5 operations on the remaining healthy drives.

Until the lights go out. And parity never gets updated.

You know that data you had on your failed drive?  Hope you had backups.

Is This Likely?

When a disk fails in a CLARiiON, there’s no telling when a customer might replace it. If they configured hot spares, it’ll happen immediately. If they configured Navisphere for email alerts, they’ll find out soon. If they walk by the system in the lab, they’ll see the fault light. But for any number of reasons, that disk might not be repaired for quite a while. It could be days, weeks, or months.

Which means the next power failure could potentially result in data loss (if it happens at just the wrong time).

Which brings us back to our data integrity meeting.  RAID was a new technology. If we wanted customers to adopt our technology, we had to close this hole. The scenario could easily happen. And when it did, the “Hey, but it’s a double failure!” cop-out would likely result in a product with a reputation for data integrity issues.

Eureka

So what thought nearly caused my boss to drive off the road? Well, the sole purpose for updating parity in RAID5 was to survive a single drive failure. Which caused him to ask one simple question.

Why continue to update parity if a disk has failed?

It’s a good question. Turns out that it serves no purpose other than to put the customer’s data at risk!

So we made a decision to change the CLARiiON mode of operation when a disk failed. Instead of updating parity, we replaced parity.  Replaced it with what?  With the data from the failed disk.  We called it “parity shedding”. You know those extra 8 bytes at the end of a CLARiiON sector?  We flipped a bit indicating that “this ain’t parity no more”. Once the “at-risk” data had safely been stored on the parity disk, the customer data was written directly to disk.

No need to do the traditional RAID5 write operation.  Power or system failures can occur at any time, and no data is lost.  Great idea.

Now we just had to build it.

Build and Test

I’ve outlined how write operations to a failed drive work (data from the failed drive gets reconstructed and written to the parity location).  How about read operations to a failed drive?  Traditional RAID5 algorithms always rely on using parity information from the parity drive to reconstruct the data.  Parity shedding read algorithms still use this traditional technique. When reading the parity, however, the parity shedding algorithms might discover “shed data”. Reconstruction of the data halts; the proper data is returned to the customer.

Rebuilding a disk in parity shedding mode uses a similar algorithm; parity or shed data could be discovered, and the algorithms rebuild the disk appropriately.

All of the code was written as part of a state machine, and the disk array qualifier was updated to test all possible boundary and edge conditions that parity shedding introduced.

Sales Call

In the 90s I went on quite a few sales calls to customers that were evaluating different RAID vendors. I stepped through several power failure conditions, explained 520 bytes, explained parity shedding, and then asked the customer to step competitors through the same scenarios.

They usually ended up buying CLARiiON.

I’d recommend customers ask the same questions today. When considering a RAID implementation, don’t assume that RAID is “commodity”, and that data integrity is implemented with equal care by all RAID vendors.

Parity shedding is a great example of the lengths that CLARiiON went through to protect customer’s data, and one of the big reasons that the CLARiiON brand means “data integrity”.

Steve