[OpenIndiana-discuss] unrecoverable corruptions on raidz2 WAS hardware specs for illumos storage/virtualization server

Jim Klimov jimklimov at cos.ru
Sun Nov 18 11:13:23 UTC 2012


First things first, sorry for an offtopic from the OPs question :)
I renamed the thread...

On 2012-11-18 08:45, Jason Matthews wrote:
> On Nov 17, 2012, at 1:06 PM, Jim Klimov <jimklimov at cos.ru> wrote:
>> My box did have uncorrected raidz2 errors and couldn't even point
>> to a failed drive - it's as if they all scratched the same region
>> at once...
>> which may have been caused by power jerks or something,
>> with all drives accessing same areas in unison.
>
> western digital drives on an adaptec controller by chance?

No, my story can't confirm yours - my home NAS used the recent
(at the time) 2Tb Samsungs with the onboard SATA of the Asus
P5B-Deluxe motherboard. The box ran on while I was (and still
am) out of the country, so anything could happen on the hardware
side and remain unnoticed.

On the hardware side:

It is off for the past several months because we did manage to
confirm over the phone that the CPU water-cooler dried out,
causing overheat and quick hangs of the box, and I can't get
a friend blessed with golden engineering hands to come by my
house at the time when someone is there, to refill it ;)

So the problems of my pool could indeed be due to CPU overheat
and some momentary craze, or to some surges (or blackouts)
issued by the power source (I'm not sure it is as good as it
used to be when it was younger - and back then it wasn't close
to overload... could get dustier et al since then).

On the ZFS side:

My attempts at manual inspection of these blocks were incomplete,
but it seems that about 4KB worth of data got bad in some 128KB
logical ZFS blocks, leading to checksum mismatches. These files
were good for months (including weekly scrubs) before that, so
something happened with the on-disk data in such a manner that
raidz2 over 6 disks could not recover it.

I can only guess that the identical offsets at each drive were
hovered over and scratched or otherwise influenced by a disk
head during some bad power event, which is likely if the disks
in a raidz have to access same offsets most of the time and
their heads are likely to be flying over same positions.

On another side note: Perhaps it might be a useful tunable
precaution - to have ZFS access the individual disks in a
TLVDEV with some delay between each other, adding to latency
and to safety against such problem-cause, but the illumos-gate
might reasonably turn the idea down, because a well-built machine
of the target market shouldn't suffer this problem - and the
added code complexity and artificially slower IOs would likely
be seen as a drawback. Also, command-queuing in the firmware
might render these precautions worthless. Then again, developer
enthusiasts are free to try implementing this failsafe net IO
mode for the worse boxes out there, and I won't mind having it
enabled on my box at least for added peace of mind. The box is
slow enough already with random IOs (5-10MB/s was not uncommon),
so I won't be bothered by even more lags, as long as it keeps
the data safely. I think, I'll post an RFE lest the idea be lost.

I am also not sure if ZFS did indeed try its best to recover
the blocks by trying all of the possible myriads of sector
permutations (i.e. if different offsets of component disks
were broken) and not just the few trivial combinations, but
a recent reply on the zfs-discuss list stated that ZFS does
try to find the correct solution by trying everything it can.
If the box comes back up, I still hope to try to permutate the
block from raw sectors manually, if I get ZDB to extract them :)

Thanks for listening,
//Jim Klimov



More information about the OpenIndiana-discuss mailing list