[OpenIndiana-discuss] System disk corruption

Mon Feb 20 16:54:39 UTC 2012

On 2012-02-20 17:05, Richard Elling wrote:
> On Feb 20, 2012, at 6:38 AM, Robin Axelsson wrote:
>> Maybe the iostat "behavior" depends on the controller it monitors. Some controllers such as the AMD SB950 in my case may not be as transparent with errors as the LSI 1068e operating in IT mode.
>>
>> Still, I find this to be too much of a coincidence. It is evident that ZFS is not very good to use without disk redundancy.
> Eh? Other file systems will blissfully deliver corrupted data. Silent data
> corruption is a much worse fate!
>
>> I'll try to add a mirror to the system pools as soon as possible. It would be great if there were some kind of software that could be set up to generate .par2 files (with x% data redundancy) on-the-fly to protect files on hard drives without disk redundancy (RAID=0).
> Not needed. ZFS has a copies parameter where you can set the number of
> redundant copies on a per-dataset basis. For example, you can set copies=2
> for important data, and copies=1 (the default) for data stored on other media
> (eg .iso files)
>
> OTOH, par2 is a completely different architecture that is designed for transferring
> files reliably. par2 is not well suited for direct access to data.
>
>> I couldn't recover the image file with cp but I learned in the process that it is possible with dd. 'dd if=infile of=outfile conv=noerror,sync' could do it.
> Correct, cp will exit on a failed read.
That is all fine but I kind of expected that cp had some kind of a 
force/recover/salvage parameter for recovering corrupted files.
>> Then I discovered ddrescue which did *exactly* what I expected cp to do. I just entered:
>>
>> # ddrescue /path/to/corrupted/file /path/to/recovered/file /path/to/logfile.log
> Good idea.
>
>> all paths were even in the same vdev. In the process the vdev became 'DEGRADED' even though no additional corruption occurred. So I did a scrub afterwards and 'zfs clear':ed the error afterwards. I did an fmadm repair to tell fma about it. Perhaps I should fmadm reset zfs-diagnosis and zfs-retire as well.
> Once you've recovered the data, why are you so interested in eliminating the history of
> the corruption?
I'm not, I just want things to return to normal.

>> Neither par2 nor ddrescue are included with OpenIndiana, I downloaded and installed them manually from the opencsw.org repository. I would strongly recommend to have such tools included with OI.
> par2 seems to have little traction. ddrescue can be useful, but is only applicable in rare cases.
>   -- richard
The copies=n > 1 parameters and so called ditto blocks seems to be an 
interesting idea. I think I may try and use that one until I get a 
mirror drive.

I think par2 is kind of useful. Par2 can generate checksums with any 
user defined percentage number of redundancy between 0 and 100%. If one 
assumes that the likelihood of corruption is 0.1% per data written 
(which is really bad) then even a 1% redundancy will protect against 
such corruption (if par2 data is updated on every write). This also 
applies even if the corruption occurs in the par2 data.

Of course, if an entire drive goes down it won't be sufficient (nor 
would be ditto blocks) but it could provide a slimmer trade off between 
ditto block redundancy and storage space. I guess the price to be paid 
is I/O performance and CPU.

If I understand it correctly, par2 uses similar principles as raidz/2/3 
and it also uses Reed-Solomon code for check-summing.

The problem with par2 on "file" level is that if an error has occurred 
in a pool, zfs won't be very forthcoming with it even though the error 
may be fixable with par2.

> --
> DTrace Conference, April 3, 2012, http://wiki.smartos.org/display/DOC/dtrace.conf
> ZFS Performance and Training
> Richard.Elling at RichardElling.com
> +1-760-896-4422
>
>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>