[OpenIndiana-discuss] ZFS help

Jeff Woolsey jlw at jlw.com
Mon Mar 13 00:27:17 UTC 2017


TL;DR: I'd already done most of that, and wouldn't do most of what you
counsel against.

It's clear I left out details of the saga that preceded this.  This is a
generic x86 box (PC) with 4 SATA ports, 3.5GB memory, and one
hyperthreaded 3.2GHz CPU. One of my mirrored 2TB disks flaked out, and
pending getting another, I replaced it with an apparently-working spare
1.5TB.  The way the pools are laid out is historical; this system
started out on a pair of 500s.  Anyway, the 1.5TB was also flakey.  Most
of the time I could pacify things by power-cycling the individual
recalcitrant disk, which would then recover the pool with just a scrub,
usually; a resilver otherwise.

As for this disk, the reason it looks like I have mirrored two slices on
the same disk is that there were a number of reboots and devfsadm -C in
between there.   ZFS managed to deal with that most of the time.  Note
also that it _was_ /dev/dsk/c5d0s3.  It isn't any more.  I just want the
pool to forget about that disk entirely (since it's not there (UNAVAIL),
just what is the pool resilvering _from_???).

24 hours later the resilvering has not advanced _at all_.

ECC is not available in PC architecture; the systems I do have with ECC
are ten times slower, SCSI-only, and SPARC (for which OI-current is not
available).

I'll try the dd thing, and try to import the image that results.  I
suspect it may have the same problem.


On 3/11/17 10:54 PM, Nikola M wrote:
> On 03/11/17 11:25 PM, Jeff Woolsey wrote:
>> # uname -a
>> SunOS bombast 5.11 illumos-2816291 i86pc i386 i86pc
>> # cat /etc/release
>>               OpenIndiana Hipster 2016.10 (powered by illumos)
>>          OpenIndiana Project, part of The Illumos Foundation (C)
>> 2010-2016
>>                          Use is subject to license terms.
>>                             Assembled 30 October 2016
>> # # zpool status cloaking
>>    pool: cloaking
>>   state: ONLINE
>> status: One or more devices is currently being resilvered.  The pool
>> will
>>          continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>    scan: resilver in progress since Sat Mar 11 11:42:37 2017
>>      8.25M scanned out of 358G at 962/s, (scan is slow, no estimated
>> time)
>>      5.31M resilvered, 0.00% done
>> config:
>>
>>          NAME                     STATE     READ WRITE CKSUM
>>          cloaking                 ONLINE      97     0     0
>>            mirror-0               ONLINE     582     0     0
>>              c5d0s6               ONLINE       0     0   582 
>> (resilvering)
>>              8647373200783277078  UNAVAIL      0     0     0  was
>> /dev/dsk/c5d0s3
>>
>> errors: 120 data errors, use '-v' for a list
>
> Not an expert (should really ask OpenZFS people what to do next),
> yet seems like you do have disk died/unavailable and even after
> reboot, it would continue doing the operation that started.
> "120 data errors" looks like you DO have some data errors form disk
> itself and also you do have 582 checksum errors in transfer from/to disk.
>
> I hope you have Backups elsewhere, I hope you are not using SATA disks
> on SAS to SATA expanders (unreliable), I hope you are not using SATA
> disks on SAS controller (not recommended), I hope you are using ECC
> RAM (must have if valuing data).
> Also it seems you have done some weird thing.., adding 2 disk slices
> on the SAME disk to a mirror..
> What is the point of that, when you can always set 'zfs set copies=2'
> for any dataset to get duplicated data copies on same pool. anyway?
>
>> 8.25M scanned out of 358G at 962/s, (scan is slow, no estimated time)
>
> When I start zpool scrub, it starts slowly but later it does speed up.
> I would recommend turning machine off, booting from some live USB/DVD
> media and dump with dd (disk dump) _Everything_ on that disk/working
> partition/slice elsewhere (on image, device) for safekeeping, in case
> other disk dies too.
>
>> # zpool reopen cloaking
>> cannot reopen 'cloaking': pool I/O is currently suspended
>> # zpool detach cloaking /dev/dsk/c5d0s3
>> cannot detach /dev/dsk/c5d0s3: pool I/O is currently suspended
>> # zpool detach cloaking 8647373200783277078
>> cannot detach 8647373200783277078: pool I/O is currently suspended
>> # zpool detach cloaking randomtrash
>> cannot detach randomtrash: no such device in pool
>> #
>>
>> How can I get rid of the UNAVAIL disk slice so that this pool doesn't
>> try to resilver (From what, pray tell?) all the time.  I don't know
>> where that ugly number came from--this system only has SATA disks.  I
>> have a new mirror slice just waiting for it as soon as it stops doing
>> this.  zpool clear  just hangs. Meanwhile, despite its assertions of
>> ONLINE,
>>
>> # zfs list -r cloaking
>> cannot open 'cloaking': pool I/O is currently suspended
>> # zpool remove cloaking 8647373200783277078
>> cannot remove 8647373200783277078: only inactive hot spares, cache,
>> top-level, or log devices can be removed
>> # zpool offline cloaking 8647373200783277078
>> cannot offline 8647373200783277078: pool I/O is currently suspended
>> #
>>
>> I'm of the opinion that the data is mostly intact (unless zpool has been
>> tricked into resilvering a data disk from a blank one (horrors)).
>>
>> # zpool export cloaking
>>
>> hangs.
>>
>
>
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> https://openindiana.org/mailman/listinfo/openindiana-discuss


-- 
Jeff Woolsey {{woolsey,jlw}@jlw,first.last@{gmail,jlw}}.com
Nature abhors straight antennas, clean lenses, and empty storage.
"Delete! Delete! OK!" -Dr. Bronner on disk space management
Card-sorting, Joel.  -Crow on solitaire




More information about the openindiana-discuss mailing list