[OpenIndiana-discuss] Help debugging and replacing failed (maybe) hard drive

Wood Peter peterwood.sd at gmail.com
Tue Jul 3 18:54:22 UTC 2012


Hi,

I have OI-151a running on Dell PE R710 with 15 drives MD1000 DAS connected
via PERC6/E controller. The controller is configured with 15 RAID0 virtual
devices for each hard drive so the OS sees 15 drives. I know it's not ideal
but Dell doesn't have a controller that will give direct access to the
drives and work with MD1000.

Anyway, here is what I found this morning:

root at tzstor14:~# zpool status -v pool01
  pool: pool01
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 71.9G in 0h36m with 0 errors on Sun Jun 24 05:37:43 2012
config:

        NAME           STATE     READ WRITE CKSUM
        pool01         DEGRADED     0     0     0
          raidz1-0     DEGRADED     0     0     0
            spare-0    REMOVED      0     0     0
              c3t0d0   REMOVED      0     0     0
              c3t14d0  ONLINE       0     0     0
            c3t1d0     ONLINE       0     0     0
            c3t2d0     ONLINE       0     0     0
            c3t3d0     ONLINE       0     0     0
            c3t4d0     ONLINE       0     0     0
            c3t5d0     ONLINE       0     0     0
            c3t6d0     ONLINE       0     0     0
          raidz1-1     ONLINE       0     0     0
            c3t7d0     ONLINE       0     0     0
            c3t8d0     ONLINE       0     0     0
            c3t9d0     ONLINE       0     0     0
            c3t10d0    ONLINE       0     0     0
            c3t11d0    ONLINE       0     0     0
            c3t12d0    ONLINE       0     0     0
            c3t13d0    ONLINE       0     0     0
        logs
          mirror-2     ONLINE       0     0     0
            c2t4d0     ONLINE       0     0     0
            c2t5d0     ONLINE       0     0     0
        cache
          c2t2d0       ONLINE       0     0     0
          c2t3d0       ONLINE       0     0     0
        spares
          c3t14d0      INUSE     currently in use

errors: No known data errors
root at tzstor14:~#

There is ton of information on the Internet about zpool failures but it's
mostly outdated and in some cases contradicting. I couldn't find anything
that applies to my case.

- What state "REMOVED" means and why the drive was put in this state?
  "iostat -En" shows no errors for this drive. I can't find any indication
that the drive is bad.

- If the drive has to be replaced could somebody please confirm that the
following steps are sufficient:

* zpool offline pool01 c3t0d0
* zpool detach pool01 c3t0d0
* Physically replace the drive with a new one
* zpool add pool01 spare c3t0d0
* zpool scrub pool01

Thank you for any hints and pointers.

-- Peter


More information about the OpenIndiana-discuss mailing list