[OpenIndiana-discuss] Kernel panic on hung zpool accessed via lofi

Mon Sep 14 18:23:18 UTC 2015

>-----Original Message-----
>From: Jim Klimov [mailto:jimklimov at cos.ru] 
>Sent: September 12, 2015 10:31 AM
>To: Discussion list for OpenIndiana; Watson, Dan; openindiana-discuss at openindiana.org
>Subject: Re: [OpenIndiana-discuss] Kernel panic on hung zpool accessed via lofi
>
>11 сентября 2015 г. 20:57:46 CEST, "Watson, Dan" <Dan.Watson at bcferries.com> пишет:
>>Hi all,
>>
>>I've been enjoying OI for quite a while butI'm running into a problem
>>with accessing zpool on disk image files sitting on zfs accessed via
>>lofi that I hope someone can give me a hint on.
<snip>
>>I have been able to reproduce this problem several times, although it
>>has managed to complete enough to rename the original zpool.
>>
>>Has anyone else encountered this issue with lofi mounted zpools?
>>I'm using mpt_sas with SATA drives, and I _DO_ have error counters
>>climbing for some of those drives, is it probably that?
>>Any other ideas?
>>
>>I'd greatly appreciate any suggestions.
>>
>>Thanks!
>>Dan
>>
>>_______________________________________________
>>openindiana-discuss mailing list
>>openindiana-discuss at openindiana.org
>>http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>From the zpool status I see it also refers to cache disks. Are those device names actually available (present and not used by another pool)? Can you remove them from the pool after you've imported it?
>
>Consider importing with '-N' to not automount (and autoshare) filesystems from this pool, and '-R /a' or some other empty/absent altroot path to ensure lack of conflicts when you do mount (and also does not add the poll into >zfs.cache file for later autoimports). At least, mounting and sharing as a (partially) kernel-side operation is something that might time out...
>
>Also, you might want to tune or disable the deadman timer and increase other acceptable latencies (see OI wiki or other resources).
>
>How much RAM does the box have (you pay twice the ARC cache for oldtank and for pool which hosts the dd files), maybe tune down primary/secondary caching for the files store.
>
>How did you get into this recovery situation? Maybe oldtank is corrupted and so is trying to recover during import? E.g. I had a history with a deduped pool where I deleted lots of data and the kernel wanted more RAM to process >the delete-queue of blocks than I had, and it took dozens of panic-reboots to complete (progress can be tracked with zdb).
>
>Alternately you can import the pool read-only to maybe avoid these recoveries altogether if you only want to retrieve the data.
>
>Jim
>
>--
>Typos courtesy of K-9 Mail on my Samsung Android

I can't remove the cache drives from the zpool as all zpool commands seem to hang waiting for something but they are not available on the host (anymore). I'm hoping they show up as absent/degraded.

I'll try -N and/or -R /a

I'll read up on how to tune the deadman timer, I've been looking at https://smartos.org/bugview/OS-2415 and that has lots of useful things to tune.

I ended up doing this because the original host of the zpool stopped being able to make it to multi-user while attached to the disk. With SATA disk in a SAS tray that usually means (to me) that one of the disks is faulty and sending resets to the controller causing the whole disk tray to reset. I tried identifying the faulty disk but tested individually all the disks worked fine. I decided to try copying the disk images to the alternate host to try to recover the data. Further oddities have cropped up on the original host so I'm going to try connecting the original disk tray to an alternate host.

I'll try read-only first. I was unaware there was a way to do this. I obviously need a ZFS refresher.

Thanks Jim!

Dan