[OpenIndiana-discuss] vdev reliability was: Recommendations for fast storage

Jim Klimov jimklimov at cos.ru
Thu Apr 18 12:46:18 UTC 2013


On 2013-04-18 12:46, Sebastian Gabler wrote:
> I do not think that zfs will have better resilience against rot of
> parity data than conventional RAID. At best, block level checksums can
> help raise an error, so you know at least that something went wrong. But
> recovery of the data will probably not be possible. So, in my opinion
> BER is an issue under ZFS as anywhere else.

Well, thanks to checksums we can know which variant of userdata
is correct, and thanks to parities we can verify which bytes are
wrong in a particular block. If there's relatively few such bytes,
it is theoretically possible to brute-force match values into the
"wrong" bytes and recalculate checksums. So if a "broken" range
is on the order of 30-40 bytes (which someone said is typical
for a CRC error and HDD returning uncertain data) you have a
chance of recovering the block in a few days if lucky ;)

This is a very compute-intensive task; I proposed this idea half
a year ago on the zfs list (I had unrecoverable errors on raidz2
made of 4 data disks and 2 parity disks, meaning corruptions on
3 or more drives, but not necessarily whole-sector corruptions)
and tried to take known byte values from different components at
known "bad" byte offsets and put them into the puzzle. Complexity
(size of recursive iteration) grows very quickly even if we only
have about 5 values to match (unlike 256 in full recovery above),
and we estimated that for a 4096 byte block it would take Earth's
compute resources longer than the lifetime of the universe to do
the full search and recovery. So such approach is really limited
to just a few dozen broken bytes. But it is possible :)

//Jim





More information about the OpenIndiana-discuss mailing list