[OpenIndiana-discuss] Help debugging and replacing failed (maybe)hard drive
Mike La Spina
mike.laspina at laspina.ca
Tue Jul 3 23:25:56 UTC 2012
Some things to consider.
Generally a RAID controller that is in virtual RAID0 presentation mode
can hide a problem drive from ZFS.
e.g. Just places the virtual drive offline.
You need to make sure it's really dead.
Without access to the PERC Raid management interface/app you may be out
of luck when it comes to placing the virtual drive back online with or
without a replacement disk. The controller tracks what state and ID are
presented to the server.
Running the PERC BIOS based management/configuration app may be required
to correct the current behavior. (One of the many reasons ZFS should
drive a JBOD array)
Check your device messages
Use pfexec dmesg or check /var/adm/messages.?
Check the device configuration state
Use pfexec cfgadm
Once the disk device state is corrected and presuming c3t0d0 was
replaced and remains presented as c3t0d0 you can simply issue:
pfexec zfs replace c3t0d0 c3t0d0
ZFS will resilver the newly inserted disk since it does not have the
missing device vdev signature.
Regards,
Mike
http://blog.laspina.ca
-----Original Message-----
From: Wood Peter [mailto:peterwood.sd at gmail.com]
Sent: Tuesday, July 03, 2012 1:54 PM
To: OpenIndiana Discuss
Subject: [OpenIndiana-discuss] Help debugging and replacing failed
(maybe)hard drive
Hi,
I have OI-151a running on Dell PE R710 with 15 drives MD1000 DAS
connected via PERC6/E controller. The controller is configured with 15
RAID0 virtual devices for each hard drive so the OS sees 15 drives. I
know it's not ideal but Dell doesn't have a controller that will give
direct access to the drives and work with MD1000.
Anyway, here is what I found this morning:
root at tzstor14:~# zpool status -v pool01
pool: pool01
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning
in a
degraded state.
action: Online the device using 'zpool online' or replace the device
with
'zpool replace'.
scan: resilvered 71.9G in 0h36m with 0 errors on Sun Jun 24 05:37:43
2012
config:
NAME STATE READ WRITE CKSUM
pool01 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
spare-0 REMOVED 0 0 0
c3t0d0 REMOVED 0 0 0
c3t14d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
c3t8d0 ONLINE 0 0 0
c3t9d0 ONLINE 0 0 0
c3t10d0 ONLINE 0 0 0
c3t11d0 ONLINE 0 0 0
c3t12d0 ONLINE 0 0 0
c3t13d0 ONLINE 0 0 0
logs
mirror-2 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
cache
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
spares
c3t14d0 INUSE currently in use
errors: No known data errors
root at tzstor14:~#
There is ton of information on the Internet about zpool failures but
it's mostly outdated and in some cases contradicting. I couldn't find
anything that applies to my case.
- What state "REMOVED" means and why the drive was put in this state?
"iostat -En" shows no errors for this drive. I can't find any
indication that the drive is bad.
- If the drive has to be replaced could somebody please confirm that the
following steps are sufficient:
* zpool offline pool01 c3t0d0
* zpool detach pool01 c3t0d0
* Physically replace the drive with a new one
* zpool add pool01 spare c3t0d0
* zpool scrub pool01
Thank you for any hints and pointers.
-- Peter
_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss at openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss
More information about the OpenIndiana-discuss
mailing list