[OpenIndiana-discuss] System Panics after deleting zvol (Jim Klimov)

Fri Nov 9 08:17:46 UTC 2012

On 2012-11-09 01:17, Matthew Savage wrote:
>
>>> Need some help trying to recover data from my zpool (version 28).
>>>
>>> You can read all the gory details here:
>>> http://www.nexentastor.org/boards/2/topics/8502
>>>
>>> I am unable to import the pool into OpenIndiana live (or any other
>>> distro) and am expereincing the same kernel panic whenever I try to
>>> import my zpool after I deleted an old zvol...  long story short I have
>>> tried every import switch I could find and even ran zdb -e -bcsvL which
>>> took almost 4 days(screen shots in thread listed above).
>> Well, I had something that smells similar, so I can share my own
>> experience and expectations ;\
>>
>> To recap, your system uses dedup and has 12Gb RAM, and apparently
>> the removal of the zvol requires it to walk the DDT blocks to find
>> if your zvol's blocks are there. If yes, the counter is decreased,
>> and if it reaches zero - the data block is released. This kind of
>> process needs lots of memory to hold the DDT to walk it (and even
>> if your L2ARC cache helps, it's not by very much, because typical
>> DDT entries are about twice as big as pointers from RAM to L2ARC).
>>
>> It also seemed (in my case, with OI versions 147-148) that some
>> of such operations are implemented in the kernel as needing to
>> store great portions of data about the pool metadata in RAM -
>> and being kernel memory, it is not even able to be swapped out.
>>
>> My runs of zdb (with similar keys to yours above) consumed about
>> 35Gb of virtual memory (the userspace process can swap out) so
>> that's my estimate of the memory needs to process my pool's meta
>> data at that time. The box had only 8Gb (max supported) and thus
>> the kernel couldn't allocate enough and hung with the system going
>> down in "scanrate hell" (I am not sure it panicked, though, except
>> for other errors; I did not know about "deadman watchdog timer"
>> which seems to be an in-kernel way of inducing a proper panic
>> under these or similar conditions).
>>
>> A symptom of the scanrate hell is that my OS was doing up to
>> millions of "page scans" per second, and ultimately this was
>> all it did until reset. If you start "vmstat 1" on a console,
>> you can see that once free RAM goes down below ~90-128Mb, the
>> "sr" field increases - this is kernel walking through pages
>> in RAM looking for some to evict into swap. There are none
>> (when all memory held by kernel walking ZFS), so there is
>> nothing to release, leading to an intense infinite cycle.
>>
>> Other posters reported that temporarily increasing RAM on
>> systems in such loops with inaccessible pools did solve the
>> problem by avoiding that bottleneck - but this is not always
>> possible. For others the problem "dissolved over time" -
>> after a week or so of panics and reboots, the pool just
>> got imported and working.
>>
>> Another thing I was pointed to this way was that with the
>> zdb walk I can determine the "deferred free" counter - I
>> believe this is number of blocks already marked for release
>> but not yet processed through all metadata (such as the DDT
>> walks). For me, this counter went down over reboots, and the
>> problem did dissolve in about two weeks of the storage test
>> box doing nothing but imports and panics.
>>
>> HTH,
>> //Jim Klimov
>>
>>
>
> Jim,
>
>     I was reading in order to get the "deferred free" counter you have
> to run the zdb -bsvL -e pool, im going to try that
> and see how much the counter changes... in the article I was reading he
> scripted a watchdog to soft reset his system
> so he could run the import over an over again until it cleared up.  Hey
> it's worth a try...

If you're referring to my older posts on this matter, yes, I think they
should help you. The latest version of that watchdog is available at
http://81.5.113.5/~jim/freeram-watchdog-20110610-v0.11.tgz

If I got in this situation today, however, I'd first take a look at
setting up the deadman watchdog timer (in addition to or instead of my
watchdog). Being a kstat-based userspace program, my watchdog gets new
data once a second, and sometimes this is the time it takes for the 
system to go from "ok" to hanging. About once out of 5-10 hangs the
watchdog did not and could not catch the condition and reboot the OS.

//Jim