[oi-dev] Webrev: Issue 2024 - ZFS dedup=on should not panic the system when two blocks with same checksum exist in DDT

Thu Apr 19 23:36:04 UTC 2012

2012-04-19 11:23, Joshua M. Clulow wrote:
 > Assuming you have keys in redmine already (so that I don't have to
 > re-run the sync process) you can try out this test version:
 >    webrev -t rsync://webrev@alpha.sysmgr.org:some_webrev_name -O -U

Heh, it works and is quite simple - must be magic...
gotta get this into Wiki when the solution is polished ;)

So here goes my first webrev attempt:
   http://alpha.sysmgr.org/~webrev/jimklimov/issue2024/

I've reported on the issue in detail on the zfs-discuss list
this winter, and in illumos bugtracker as issue #2024
   ZFS dedup=on should not panic the system when two blocks
   with same checksum exist in DDT
   https://www.illumos.org/issues/2024

In short, I had some corruption on a pool with deduped data.
DDT entries had a checksum, same as in BP, but the actual
data block was corrupted and mismatched the checksum.

If the block identical to original (same checksum) is written
with dedup=verify, then ZFS detects that the newly written
block is different from ondisk data, and makes another DDT
entry. If the original block is later rewritten with dedup=on
(no verification) ZFS panics the kernel.

Stacktrace research led to some routines which can validly
return a NULL pointer during DDT lookups, but other code
assumes that a pointer is always valid - thus my kernel got
NULL pointer dereferences and panicked.

This panic also happens soon after pool reimport (after
reboot), I guess some processing like deferred-free kicks in.

The proposed fix adds some non-NULLness checks to the routine
in question and some routines near it in the source code file.
I added kernel logging in case the lookup returns NULL; it may
be correct to invoke the message only if the kernel is DEBUG...
or maybe the users should always know there is something foul
with their pools?

It is possible that some of these checks may be not needed,
or that I missed some locations where similar checks are due.
It is also possible that the solution may lead to block leaks.
As an interim solution, I prefered to potentially lose a few
MBs of disk space rather than not have access to the whole pool.

ZFS code gurus are kindly asked to review the solution here:
   http://alpha.sysmgr.org/~webrev/jimklimov/issue2024/
   https://www.illumos.org/issues/2024

Thanks,
//Jim Klimov