[OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Edward Ned Harvey (openindiana) openindiana at nedharvey.com
Thu Apr 18 13:57:12 UTC 2013


> From: Jim Klimov [mailto:jimklimov at cos.ru]
> 
> Well, thanks to checksums we can know which variant of userdata
> is correct, and thanks to parities we can verify which bytes are
> wrong in a particular block. If there's relatively few such bytes,
> it is theoretically possible to brute-force match values into the
> "wrong" bytes and recalculate checksums. So if a "broken" range
> is on the order of 30-40 bytes (which someone said is typical
> for a CRC error and HDD returning uncertain data) you have a
> chance of recovering the block in a few days if lucky ;)
> 
> This is a very compute-intensive task; I proposed this idea half
> a year ago on the zfs list (I had unrecoverable errors on raidz2
> made of 4 data disks and 2 parity disks, meaning corruptions on
> 3 or more drives, but not necessarily whole-sector corruptions)
> and tried to take known byte values from different components at
> known "bad" byte offsets and put them into the puzzle. Complexity
> (size of recursive iteration) grows very quickly even if we only
> have about 5 values to match (unlike 256 in full recovery above),
> and we estimated that for a 4096 byte block it would take Earth's
> compute resources longer than the lifetime of the universe to do
> the full search and recovery. So such approach is really limited
> to just a few dozen broken bytes. But it is possible :)

I think you're misplacing a decimal, confusing bits for bytes, and mixing up exponents.  Cuz you're way off.

With merely 70 unknown *bits* that is, less than 10 bytes, you'll need a 3-letter government agency devoting all its computational resources to the problem for a few years.

Furthermore, when you find a matching cksum, you haven't found the correct data yet.  You'll need to exhaustively search the entire space requiring 2^70 operations, find all the matches (there will be a lot) and from those matches, choose the one you think is right.

Even with merely 70 unknown bits, and a 32-bit cksum (the default in zfs fletcher-4) you will have 2^38 (that is, 256 billion) results that produce the right cksum.  You'll have to rely on your knowledge of the jpg file or txt file or whatever, to choose which one of the 256 billion cksum-passing-results is *actually* the right result.




More information about the OpenIndiana-discuss mailing list