[OpenIndiana-discuss] Help debugging and replacing failed (maybe)hard drive

Mike La Spina mike.laspina at laspina.ca
Tue Jul 3 23:25:56 UTC 2012


Some things to consider.

Generally a RAID controller that is in virtual RAID0 presentation mode
can hide a problem drive from ZFS.
e.g. Just places the virtual drive offline.

You need to make sure it's really dead.

Without access to the PERC Raid management interface/app you may be out
of luck when it comes to placing the virtual drive back online with or
without a replacement disk. The controller tracks what state and ID are
presented to the server.
Running the PERC BIOS based management/configuration app may be required
to correct the current behavior. (One of the many reasons ZFS should
drive a JBOD array)

Check your device messages

Use pfexec dmesg or check /var/adm/messages.?

Check the device configuration state

Use pfexec cfgadm

Once the disk device state is corrected and presuming c3t0d0 was
replaced and remains presented as c3t0d0 you can simply issue:

pfexec zfs replace c3t0d0 c3t0d0

ZFS will resilver the newly inserted disk since it does not have the
missing device vdev signature.

Regards,
Mike

http://blog.laspina.ca  


-----Original Message-----
From: Wood Peter [mailto:peterwood.sd at gmail.com] 
Sent: Tuesday, July 03, 2012 1:54 PM
To: OpenIndiana Discuss
Subject: [OpenIndiana-discuss] Help debugging and replacing failed
(maybe)hard drive

Hi,

I have OI-151a running on Dell PE R710 with 15 drives MD1000 DAS
connected via PERC6/E controller. The controller is configured with 15
RAID0 virtual devices for each hard drive so the OS sees 15 drives. I
know it's not ideal but Dell doesn't have a controller that will give
direct access to the drives and work with MD1000.

Anyway, here is what I found this morning:

root at tzstor14:~# zpool status -v pool01
  pool: pool01
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning
in a
        degraded state.
action: Online the device using 'zpool online' or replace the device
with
        'zpool replace'.
  scan: resilvered 71.9G in 0h36m with 0 errors on Sun Jun 24 05:37:43
2012
config:

        NAME           STATE     READ WRITE CKSUM
        pool01         DEGRADED     0     0     0
          raidz1-0     DEGRADED     0     0     0
            spare-0    REMOVED      0     0     0
              c3t0d0   REMOVED      0     0     0
              c3t14d0  ONLINE       0     0     0
            c3t1d0     ONLINE       0     0     0
            c3t2d0     ONLINE       0     0     0
            c3t3d0     ONLINE       0     0     0
            c3t4d0     ONLINE       0     0     0
            c3t5d0     ONLINE       0     0     0
            c3t6d0     ONLINE       0     0     0
          raidz1-1     ONLINE       0     0     0
            c3t7d0     ONLINE       0     0     0
            c3t8d0     ONLINE       0     0     0
            c3t9d0     ONLINE       0     0     0
            c3t10d0    ONLINE       0     0     0
            c3t11d0    ONLINE       0     0     0
            c3t12d0    ONLINE       0     0     0
            c3t13d0    ONLINE       0     0     0
        logs
          mirror-2     ONLINE       0     0     0
            c2t4d0     ONLINE       0     0     0
            c2t5d0     ONLINE       0     0     0
        cache
          c2t2d0       ONLINE       0     0     0
          c2t3d0       ONLINE       0     0     0
        spares
          c3t14d0      INUSE     currently in use

errors: No known data errors
root at tzstor14:~#

There is ton of information on the Internet about zpool failures but
it's mostly outdated and in some cases contradicting. I couldn't find
anything that applies to my case.

- What state "REMOVED" means and why the drive was put in this state?
  "iostat -En" shows no errors for this drive. I can't find any
indication that the drive is bad.

- If the drive has to be replaced could somebody please confirm that the
following steps are sufficient:

* zpool offline pool01 c3t0d0
* zpool detach pool01 c3t0d0
* Physically replace the drive with a new one
* zpool add pool01 spare c3t0d0
* zpool scrub pool01

Thank you for any hints and pointers.

-- Peter
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list