[OpenIndiana-discuss] ZFS help

Nikola M minikola at gmail.com
Sun Mar 12 06:54:50 UTC 2017


On 03/11/17 11:25 PM, Jeff Woolsey wrote:
> # uname -a
> SunOS bombast 5.11 illumos-2816291 i86pc i386 i86pc
> # cat /etc/release
>               OpenIndiana Hipster 2016.10 (powered by illumos)
>          OpenIndiana Project, part of The Illumos Foundation (C) 2010-2016
>                          Use is subject to license terms.
>                             Assembled 30 October 2016
> # # zpool status cloaking
>    pool: cloaking
>   state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>          continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>    scan: resilver in progress since Sat Mar 11 11:42:37 2017
>      8.25M scanned out of 358G at 962/s, (scan is slow, no estimated time)
>      5.31M resilvered, 0.00% done
> config:
>
>          NAME                     STATE     READ WRITE CKSUM
>          cloaking                 ONLINE      97     0     0
>            mirror-0               ONLINE     582     0     0
>              c5d0s6               ONLINE       0     0   582  (resilvering)
>              8647373200783277078  UNAVAIL      0     0     0  was /dev/dsk/c5d0s3
>
> errors: 120 data errors, use '-v' for a list

Not an expert (should really ask OpenZFS people what to do next),
yet seems like you do have disk died/unavailable and even after reboot, 
it would continue doing the operation that started.
"120 data errors" looks like you DO have some data errors form disk 
itself and also you do have 582 checksum errors in transfer from/to disk.

I hope you have Backups elsewhere, I hope you are not using SATA disks 
on SAS to SATA expanders (unreliable), I hope you are not using SATA 
disks on SAS controller (not recommended), I hope you are using ECC RAM 
(must have if valuing data).
Also it seems you have done some weird thing.., adding 2 disk slices on 
the SAME disk to a mirror..
What is the point of that, when you can always set 'zfs set copies=2' 
for any dataset to get duplicated data copies on same pool. anyway?

> 8.25M scanned out of 358G at 962/s, (scan is slow, no estimated time)

When I start zpool scrub, it starts slowly but later it does speed up.
I would recommend turning machine off, booting from some live USB/DVD 
media and dump with dd (disk dump) _Everything_ on that disk/working 
partition/slice elsewhere (on image, device) for safekeeping, in case 
other disk dies too.
Then I would need to wait him to finish resilvering and then add to it 
another device/slice to resilver on it again (Zpool copies only used 
space, so it's faster).
That way you can continue working and then remove other devices in the 
pool and re-add one more new device to you again have healthy 2 disk 
mirror as minimum.
So you would need 2 working disks/slices on separate physical disk 
devices to survive this minimum.

For crucial data, you can also think in the future about 3-disk mirror 
(or raidz2 for better disk usage), that would keep you afloat if even 2 
disk dies out of 3.
Also have to check that machine hardware (Ecc,sas/sata,expanders,using 
whole disks instead of slices etc.) and surely, do replicate (zfs send) 
data elsewhere, but don't mistake replication for offline backups.

> # zpool reopen cloaking
> cannot reopen 'cloaking': pool I/O is currently suspended
> # zpool detach cloaking /dev/dsk/c5d0s3
> cannot detach /dev/dsk/c5d0s3: pool I/O is currently suspended
> # zpool detach cloaking 8647373200783277078
> cannot detach 8647373200783277078: pool I/O is currently suspended
> # zpool detach cloaking randomtrash
> cannot detach randomtrash: no such device in pool
> #
>
> How can I get rid of the UNAVAIL disk slice so that this pool doesn't
> try to resilver (From what, pray tell?) all the time.  I don't know
> where that ugly number came from--this system only has SATA disks.  I
> have a new mirror slice just waiting for it as soon as it stops doing
> this.  zpool clear  just hangs. Meanwhile, despite its assertions of
> ONLINE,
>
> # zfs list -r cloaking
> cannot open 'cloaking': pool I/O is currently suspended
> # zpool remove cloaking 8647373200783277078
> cannot remove 8647373200783277078: only inactive hot spares, cache, top-level, or log devices can be removed
> # zpool offline cloaking 8647373200783277078
> cannot offline 8647373200783277078: pool I/O is currently suspended
> #
>
> I'm of the opinion that the data is mostly intact (unless zpool has been
> tricked into resilvering a data disk from a blank one (horrors)).
>
> # zpool export cloaking
>
> hangs.
>




More information about the openindiana-discuss mailing list