[OpenIndiana-discuss] ZFS help
Jeff Woolsey
jlw at jlw.com
Fri Mar 17 02:48:12 UTC 2017
On 3/13/17 1:00 AM, Udo Grabowski (IMK) wrote:
> Hi,
>
> (see inline hints)
>
>
> This pool is in wait state for the missing device and will not
> recover without a reboot (we had this situation a couple of times).
> The only way to get out of this is a reboot, and the second disk should
> either be not in the slot, or it should be a fresh disk with nothing
> on it, so that the system declares that as corrupt on reboot.
That may be difficult as zfs thinks that each submirror is a different
slice on the same disk. About all I can do there is fill the "missing"
slice with garbage (or zeros). (As it happens, the pool that
temporarily is not living there is called "missing"...). Well, that
didn't work. zdb shows that device with a different label than the one
in the pool.
> Unfortunately, the other disk seems to have serious data errors, so
> the pool will show severe dataloss (see zpool status -v after repair).
> The weird number you see seems to be the GUID of the disk, the label
> is corrupt in some way.
That other disk has been replaced, and is now working fine for the other
pools.
>
> Look at iostat -exn, it should tell you more about the errors,
> as well as a 'smartctl -a -dsat,12 /dev/rdsk/c5d0s2' that shows
> the specific errors recorded by the disk controller.
The BIOS was complaining of SMART reporting imminent failure on the
temporary replacement disk (1500GB). That disk is also no longer in the
system.
>
> After reboot, you also very probably will have to send a 'repair' to
> the fmadm event ids before it will start to resilver the pool. If that
> all fails, you may evacuate the /etc/zfs/zpool.cache file and reboot
> and try to import that pool by hand, but I would try that as a last
> resort, you may loose the complete pool. Also note that the device name
> is recorded on the pool itself, see zdb -l /dev/dsk/c5d0s6, and that
> name will be adapted only correctly after a reboot or reimport if it is
> wrong. 'touch /reconfigure' before reboot may also be advisable.
> If it fails to import, there are more tricky options to rollback the
> pool on import, but I hope that this is not necessary.
>
Looks like it's time for heavier artillery.
> And finally, dying capacitors
> in your AC-adapter or on the mainboard can drive the whole machine
> crazy;
I've replaced those before on an earlier incantation of this system
(micro-ATX socket 754 Athlon64 3000+). Unlikely to be the cause here,
as the symptom of this in the past was catatonia (i.e. dead, no response
at all). But I'll keep it in mind.
>>
>> I'll try the dd thing, and try to import the image that results. I
>> suspect it may have the same problem.
>>
So far I've been unable to convince zpool to import from /dev/lofi/1 .
I'm guessing it's because there is no fdisk label there, or I haven't
figured out how to slice a lofi "disk".
Meanwhile, the list of things I can't do remains:
>>>
>>>> # zpool reopen cloaking
>>>> cannot reopen 'cloaking': pool I/O is currently suspended
>>>> # zpool detach cloaking /dev/dsk/c5d0s3
>>>> cannot detach /dev/dsk/c5d0s3: pool I/O is currently suspended
>>>> # zpool detach cloaking 8647373200783277078
>>>> cannot detach 8647373200783277078: pool I/O is currently suspended
>>>> # zpool clear cloaking just hangs. Meanwhile, despite its
>>>> assertions of
>>>> ONLINE,
>>>>
>>>> # zfs list -r cloaking
>>>> cannot open 'cloaking': pool I/O is currently suspended
>>>> # zpool remove cloaking 8647373200783277078
>>>> cannot remove 8647373200783277078: only inactive hot spares, cache,
>>>> top-level, or log devices can be removed
>>>> # zpool offline cloaking 8647373200783277078
>>>> cannot offline 8647373200783277078: pool I/O is currently suspended
>>>> #
>>>>
>>>> I'm of the opinion that the data is mostly intact (unless zpool has
>>>> been
>>>> tricked into resilvering a data disk from a blank one (horrors)).
>>>>
>>>> # zpool export cloaking
>>>>
>>>> hangs.
>
--
Jeff Woolsey {{woolsey,jlw}@jlw,first.last@{gmail,jlw}}.com
Nature abhors straight antennas, clean lenses, and empty storage.
"Delete! Delete! OK!" -Dr. Bronner on disk space management
Card-sorting, Joel. -Crow on solitaire
More information about the openindiana-discuss
mailing list