[OpenIndiana-discuss] ECC question

Mike Gerdts mgerdts at gmail.com
Sun Feb 20 15:51:20 UTC 2011


On Sun, Feb 20, 2011 at 9:09 AM, taemun <taemun at gmail.com> wrote:
> Note that ZFS isn't intrinsically any more likely to die from a bad bit in
> RAM than any given file system, but it is going to be able to *tell you when
> it occurs*.

Actually, zfs is far more likely to have problems due to a bad bit.
Most file systems blindly pass bad data from disk to the application.
ZFS verifies the integrity of what it reads before sending it to the
application.  It does this by comparing the checksum (cryptographic
hash, actually) of a block with the checksum that was stored on disk
when the block was written to disk. If it detects corrupt data (via a
checksum mismatch) it will look to see if there are other copies of
the data and try to self correct.  This is great.

However, suppose that before writing the data, there was a single bit
error in the memory holding the checksum while or after computing the
checksum.  This will cause an invalid checksum to be written along
with the data.  Next time this data and checksum are read from disk
(e.g. after the next reboot), the checksum will not match the checksum
that is computed from the data that is on disk.  It will not be able
to correct it either.  The same problem could occur if there are any
errors in the memory holding file data - so long as that corruption
happens after the checksum is computed but before the data is sent to
disk.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/



More information about the OpenIndiana-discuss mailing list