[OpenIndiana-discuss] ZFS help
Jeff Woolsey
jlw at jlw.com
Mon Mar 13 00:27:17 UTC 2017
TL;DR: I'd already done most of that, and wouldn't do most of what you
counsel against.
It's clear I left out details of the saga that preceded this. This is a
generic x86 box (PC) with 4 SATA ports, 3.5GB memory, and one
hyperthreaded 3.2GHz CPU. One of my mirrored 2TB disks flaked out, and
pending getting another, I replaced it with an apparently-working spare
1.5TB. The way the pools are laid out is historical; this system
started out on a pair of 500s. Anyway, the 1.5TB was also flakey. Most
of the time I could pacify things by power-cycling the individual
recalcitrant disk, which would then recover the pool with just a scrub,
usually; a resilver otherwise.
As for this disk, the reason it looks like I have mirrored two slices on
the same disk is that there were a number of reboots and devfsadm -C in
between there. ZFS managed to deal with that most of the time. Note
also that it _was_ /dev/dsk/c5d0s3. It isn't any more. I just want the
pool to forget about that disk entirely (since it's not there (UNAVAIL),
just what is the pool resilvering _from_???).
24 hours later the resilvering has not advanced _at all_.
ECC is not available in PC architecture; the systems I do have with ECC
are ten times slower, SCSI-only, and SPARC (for which OI-current is not
available).
I'll try the dd thing, and try to import the image that results. I
suspect it may have the same problem.
On 3/11/17 10:54 PM, Nikola M wrote:
> On 03/11/17 11:25 PM, Jeff Woolsey wrote:
>> # uname -a
>> SunOS bombast 5.11 illumos-2816291 i86pc i386 i86pc
>> # cat /etc/release
>> OpenIndiana Hipster 2016.10 (powered by illumos)
>> OpenIndiana Project, part of The Illumos Foundation (C)
>> 2010-2016
>> Use is subject to license terms.
>> Assembled 30 October 2016
>> # # zpool status cloaking
>> pool: cloaking
>> state: ONLINE
>> status: One or more devices is currently being resilvered. The pool
>> will
>> continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>> scan: resilver in progress since Sat Mar 11 11:42:37 2017
>> 8.25M scanned out of 358G at 962/s, (scan is slow, no estimated
>> time)
>> 5.31M resilvered, 0.00% done
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> cloaking ONLINE 97 0 0
>> mirror-0 ONLINE 582 0 0
>> c5d0s6 ONLINE 0 0 582
>> (resilvering)
>> 8647373200783277078 UNAVAIL 0 0 0 was
>> /dev/dsk/c5d0s3
>>
>> errors: 120 data errors, use '-v' for a list
>
> Not an expert (should really ask OpenZFS people what to do next),
> yet seems like you do have disk died/unavailable and even after
> reboot, it would continue doing the operation that started.
> "120 data errors" looks like you DO have some data errors form disk
> itself and also you do have 582 checksum errors in transfer from/to disk.
>
> I hope you have Backups elsewhere, I hope you are not using SATA disks
> on SAS to SATA expanders (unreliable), I hope you are not using SATA
> disks on SAS controller (not recommended), I hope you are using ECC
> RAM (must have if valuing data).
> Also it seems you have done some weird thing.., adding 2 disk slices
> on the SAME disk to a mirror..
> What is the point of that, when you can always set 'zfs set copies=2'
> for any dataset to get duplicated data copies on same pool. anyway?
>
>> 8.25M scanned out of 358G at 962/s, (scan is slow, no estimated time)
>
> When I start zpool scrub, it starts slowly but later it does speed up.
> I would recommend turning machine off, booting from some live USB/DVD
> media and dump with dd (disk dump) _Everything_ on that disk/working
> partition/slice elsewhere (on image, device) for safekeeping, in case
> other disk dies too.
>
>> # zpool reopen cloaking
>> cannot reopen 'cloaking': pool I/O is currently suspended
>> # zpool detach cloaking /dev/dsk/c5d0s3
>> cannot detach /dev/dsk/c5d0s3: pool I/O is currently suspended
>> # zpool detach cloaking 8647373200783277078
>> cannot detach 8647373200783277078: pool I/O is currently suspended
>> # zpool detach cloaking randomtrash
>> cannot detach randomtrash: no such device in pool
>> #
>>
>> How can I get rid of the UNAVAIL disk slice so that this pool doesn't
>> try to resilver (From what, pray tell?) all the time. I don't know
>> where that ugly number came from--this system only has SATA disks. I
>> have a new mirror slice just waiting for it as soon as it stops doing
>> this. zpool clear just hangs. Meanwhile, despite its assertions of
>> ONLINE,
>>
>> # zfs list -r cloaking
>> cannot open 'cloaking': pool I/O is currently suspended
>> # zpool remove cloaking 8647373200783277078
>> cannot remove 8647373200783277078: only inactive hot spares, cache,
>> top-level, or log devices can be removed
>> # zpool offline cloaking 8647373200783277078
>> cannot offline 8647373200783277078: pool I/O is currently suspended
>> #
>>
>> I'm of the opinion that the data is mostly intact (unless zpool has been
>> tricked into resilvering a data disk from a blank one (horrors)).
>>
>> # zpool export cloaking
>>
>> hangs.
>>
>
>
> _______________________________________________
> openindiana-discuss mailing list
> openindiana-discuss at openindiana.org
> https://openindiana.org/mailman/listinfo/openindiana-discuss
--
Jeff Woolsey {{woolsey,jlw}@jlw,first.last@{gmail,jlw}}.com
Nature abhors straight antennas, clean lenses, and empty storage.
"Delete! Delete! OK!" -Dr. Bronner on disk space management
Card-sorting, Joel. -Crow on solitaire
More information about the openindiana-discuss
mailing list