[OpenIndiana-discuss] Help debugging and replacing failed (maybe)hard drive

Rich rercola at acm.jhu.edu
Wed Jul 4 01:03:24 UTC 2012


MegaCli under Solaris will, in my experience, talk to PERC controllers.

Just as a useful tidbit short of having to boot into the on-card BIOS
or another OS and interrogate the card that way.

- Rich

On Tue, Jul 3, 2012 at 7:25 PM, Mike La Spina <mike.laspina at laspina.ca> wrote:
> Some things to consider.
>
> Generally a RAID controller that is in virtual RAID0 presentation mode
> can hide a problem drive from ZFS.
> e.g. Just places the virtual drive offline.
>
> You need to make sure it's really dead.
>
> Without access to the PERC Raid management interface/app you may be out
> of luck when it comes to placing the virtual drive back online with or
> without a replacement disk. The controller tracks what state and ID are
> presented to the server.
> Running the PERC BIOS based management/configuration app may be required
> to correct the current behavior. (One of the many reasons ZFS should
> drive a JBOD array)
>
> Check your device messages
>
> Use pfexec dmesg or check /var/adm/messages.?
>
> Check the device configuration state
>
> Use pfexec cfgadm
>
> Once the disk device state is corrected and presuming c3t0d0 was
> replaced and remains presented as c3t0d0 you can simply issue:
>
> pfexec zfs replace c3t0d0 c3t0d0
>
> ZFS will resilver the newly inserted disk since it does not have the
> missing device vdev signature.
>
> Regards,
> Mike
>
> http://blog.laspina.ca
>
>
> -----Original Message-----
> From: Wood Peter [mailto:peterwood.sd at gmail.com]
> Sent: Tuesday, July 03, 2012 1:54 PM
> To: OpenIndiana Discuss
> Subject: [OpenIndiana-discuss] Help debugging and replacing failed
> (maybe)hard drive
>
> Hi,
>
> I have OI-151a running on Dell PE R710 with 15 drives MD1000 DAS
> connected via PERC6/E controller. The controller is configured with 15
> RAID0 virtual devices for each hard drive so the OS sees 15 drives. I
> know it's not ideal but Dell doesn't have a controller that will give
> direct access to the drives and work with MD1000.
>
> Anyway, here is what I found this morning:
>
> root at tzstor14:~# zpool status -v pool01
>   pool: pool01
>  state: DEGRADED
> status: One or more devices has been removed by the administrator.
>         Sufficient replicas exist for the pool to continue functioning
> in a
>         degraded state.
> action: Online the device using 'zpool online' or replace the device
> with
>         'zpool replace'.
>   scan: resilvered 71.9G in 0h36m with 0 errors on Sun Jun 24 05:37:43
> 2012
> config:
>
>         NAME           STATE     READ WRITE CKSUM
>         pool01         DEGRADED     0     0     0
>           raidz1-0     DEGRADED     0     0     0
>             spare-0    REMOVED      0     0     0
>               c3t0d0   REMOVED      0     0     0
>               c3t14d0  ONLINE       0     0     0
>             c3t1d0     ONLINE       0     0     0
>             c3t2d0     ONLINE       0     0     0
>             c3t3d0     ONLINE       0     0     0
>             c3t4d0     ONLINE       0     0     0
>             c3t5d0     ONLINE       0     0     0
>             c3t6d0     ONLINE       0     0     0
>           raidz1-1     ONLINE       0     0     0
>             c3t7d0     ONLINE       0     0     0
>             c3t8d0     ONLINE       0     0     0
>             c3t9d0     ONLINE       0     0     0
>             c3t10d0    ONLINE       0     0     0
>             c3t11d0    ONLINE       0     0     0
>             c3t12d0    ONLINE       0     0     0
>             c3t13d0    ONLINE       0     0     0
>         logs
>           mirror-2     ONLINE       0     0     0
>             c2t4d0     ONLINE       0     0     0
>             c2t5d0     ONLINE       0     0     0
>         cache
>           c2t2d0       ONLINE       0     0     0
>           c2t3d0       ONLINE       0     0     0
>         spares
>           c3t14d0      INUSE     currently in use
>
> errors: No known data errors
> root at tzstor14:~#
>
> There is ton of information on the Internet about zpool failures but
> it's mostly outdated and in some cases contradicting. I couldn't find
> anything that applies to my case.
>
> - What state "REMOVED" means and why the drive was put in this state?
>   "iostat -En" shows no errors for this drive. I can't find any
> indication that the drive is bad.
>
> - If the drive has to be replaced could somebody please confirm that the
> following steps are sufficient:
>
> * zpool offline pool01 c3t0d0
> * zpool detach pool01 c3t0d0
> * Physically replace the drive with a new one
> * zpool add pool01 spare c3t0d0
> * zpool scrub pool01
>
> Thank you for any hints and pointers.
>
> -- Peter
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list