[OpenIndiana-discuss] System Panics after deleting zvol (Jim Klimov)

Matthew Savage matt at focusedmobile.com
Fri Nov 9 00:17:36 UTC 2012


>> Need some help trying to recover data from my zpool (version 28).
>>
>> You can read all the gory details here:
>> http://www.nexentastor.org/boards/2/topics/8502
>>
>> I am unable to import the pool into OpenIndiana live (or any other
>> distro) and am expereincing the same kernel panic whenever I try to
>> import my zpool after I deleted an old zvol...  long story short I have
>> tried every import switch I could find and even ran zdb -e -bcsvL which
>> took almost 4 days(screen shots in thread listed above).
>>      
> Well, I had something that smells similar, so I can share my own
> experience and expectations ;\
>
> To recap, your system uses dedup and has 12Gb RAM, and apparently
> the removal of the zvol requires it to walk the DDT blocks to find
> if your zvol's blocks are there. If yes, the counter is decreased,
> and if it reaches zero - the data block is released. This kind of
> process needs lots of memory to hold the DDT to walk it (and even
> if your L2ARC cache helps, it's not by very much, because typical
> DDT entries are about twice as big as pointers from RAM to L2ARC).
>
> It also seemed (in my case, with OI versions 147-148) that some
> of such operations are implemented in the kernel as needing to
> store great portions of data about the pool metadata in RAM -
> and being kernel memory, it is not even able to be swapped out.
>
> My runs of zdb (with similar keys to yours above) consumed about
> 35Gb of virtual memory (the userspace process can swap out) so
> that's my estimate of the memory needs to process my pool's meta
> data at that time. The box had only 8Gb (max supported) and thus
> the kernel couldn't allocate enough and hung with the system going
> down in "scanrate hell" (I am not sure it panicked, though, except
> for other errors; I did not know about "deadman watchdog timer"
> which seems to be an in-kernel way of inducing a proper panic
> under these or similar conditions).
>
> A symptom of the scanrate hell is that my OS was doing up to
> millions of "page scans" per second, and ultimately this was
> all it did until reset. If you start "vmstat 1" on a console,
> you can see that once free RAM goes down below ~90-128Mb, the
> "sr" field increases - this is kernel walking through pages
> in RAM looking for some to evict into swap. There are none
> (when all memory held by kernel walking ZFS), so there is
> nothing to release, leading to an intense infinite cycle.
>
> Other posters reported that temporarily increasing RAM on
> systems in such loops with inaccessible pools did solve the
> problem by avoiding that bottleneck - but this is not always
> possible. For others the problem "dissolved over time" -
> after a week or so of panics and reboots, the pool just
> got imported and working.
>
> Another thing I was pointed to this way was that with the
> zdb walk I can determine the "deferred free" counter - I
> believe this is number of blocks already marked for release
> but not yet processed through all metadata (such as the DDT
> walks). For me, this counter went down over reboots, and the
> problem did dissolve in about two weeks of the storage test
> box doing nothing but imports and panics.
>
> HTH,
> //Jim Klimov
>
>
>    

Jim,

    I was reading in order to get the "deferred free" counter you have 
to run the zdb -bsvL -e pool, im going to try that
and see how much the counter changes... in the article I was reading he 
scripted a watchdog to soft reset his system
so he could run the import over an over again until it cleared up.  Hey 
it's worth a try...

Thanks..

Matt



More information about the OpenIndiana-discuss mailing list