[OpenIndiana-discuss] Zfs stability "Scrubs"

Jan Owoc jsowoc at gmail.com
Fri Oct 12 22:06:16 UTC 2012


On Fri, Oct 12, 2012 at 3:07 PM, Michael Stapleton
<michael.stapleton at techsologic.com> wrote:
> It is easy to understand that zfs srubs can be useful, But, How often do
> we scrub or the equivalent of any other file system? UFS? VXFS?
> NTFS? ...

If your data has checksums, it is "standard practice" to periodically
verify your checksums and correct if necessary. ECC memory does do a
"scrub" every once in a while :-). The FS you named don't have
checksums, so scrubbing would do no good.


> For example, data deduplication uses digests on data to detect
> duplication. Most dedup systems assume that if the digest is the same
> for two pieces of data, then the data must be the same.
> This assumption is not actually true. Two differing pieces of data can
> have the same digest, but the chance of this happening is so low that
> the risk is accepted.

"So low" is an understatement. Have you ever taken 2 to the power of
256? (ZFS currently requires sha256 checksums if you want to do
dedup.) Chances of a block being different but having a duplicate
sha256 is 1 in 115792089237316195423570985008687907853269984665640564039457584007913129639936.

Just for fun, let's see what those odds give you. Say you were writing
all human information ever produced (2.56e+20 bytes) [1] on one ZFS
filesystem (with 1-byte blocksize). Let's say you were writing this
much data every second for the age of the known universe (4.3e+17 s).
Your odds of having one false positive with this amount of data are 1
in 1e+39.

[1] http://www.wired.co.uk/news/archive/2011-02/14/256-exabytes-of-human-information


> I'm only writing this because I get the feeling some people think scrubs
> are a need. Maybe people associate doing scrubs with something like
> doing NTFS defrags?

All scrubbing does is put stress on drives and verify that data can
still be read from them. If a hard drive ever fails on you and you
need to replace it (how often does that happen?), then you know "hey,
just last week all the other hard drives were able to read their data
under stress, so are less likely to fail on me".


Jan



More information about the OpenIndiana-discuss mailing list