[OpenIndiana-discuss] summary: in desperate need of fsck.zfs

Wed Jul 25 15:16:56 UTC 2012

On 07/25/2012 10:42 AM, Gregory Youngblood wrote:
> Assuming the faulted drive or pool is not rpool, containing required
> files for the system, why should a faulty drive or pool hang the
> entire box? Why can't the system return an error and continue? Thanks,
> Greg Sent from my iPad On Jul 25, 2012, at 6:35 AM, Ray Arachelian
> <ray at arachelian.com> wrote:

Not sure.  It did a kernel panic with the previous version I had on
there which I just upgraded a couple of days ago.  I think it was 151a,
now it's running 151a5 and hasn't panicked when it hit the bad files.

Now when it loses access to the JBOD (and it just did), it doesn't hang
the entire machine, but any zpool commands lock up and you can't kill
them.  So I'm sure that when I get home tonight, I can remove the zfs
cache file, then power cycle the machine, add the current bad file to
the exclude list and kick off rsync again, etc.

I'd reboot it now, but the target zpool I'm trying to copy the data to
is on the same jbod has failmode to wait instead of continue.  I'll fix
this when I get home.  Wish this box had a DRAC or an ILO, would have
made life easier. :)  Probably would be a better idea if I moved the
target zpool to another jbod or internally come to think of it so I
don't corrupt it with all these USB disconnects.

Note that for USB drives it's probably safer to leave failmode to wait
and not continue, but in this case, wait means wait forever since it
can't come back since the cache marked it as unsafe.
It might be ok for internally attached SATA drives.

Would be nice if I could clear the zpool cache without rebooting... but
it is what it is.