[OpenIndiana-discuss] Help debugging and replacing failed (maybe) hard drive
Jan Owoc
jsowoc at gmail.com
Tue Jul 3 19:23:48 UTC 2012
On Tue, Jul 3, 2012 at 12:54 PM, Wood Peter <peterwood.sd at gmail.com> wrote:
> root at tzstor14:~# zpool status -v pool01
> pool: pool01
> state: DEGRADED
> status: One or more devices has been removed by the administrator.
> Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
Am I correct in assuming that (to your knowledge) neither you nor any
other administrator has detached or removed any drives on or about
June 24th?
> action: Online the device using 'zpool online' or replace the device with
> 'zpool replace'.
> scan: resilvered 71.9G in 0h36m with 0 errors on Sun Jun 24 05:37:43 2012
> config:
>
> NAME STATE READ WRITE CKSUM
> pool01 DEGRADED 0 0 0
> raidz1-0 DEGRADED 0 0 0
> spare-0 REMOVED 0 0 0
> c3t0d0 REMOVED 0 0 0
> c3t14d0 ONLINE 0 0 0
> c3t1d0 ONLINE 0 0 0
> c3t2d0 ONLINE 0 0 0
> c3t3d0 ONLINE 0 0 0
> c3t4d0 ONLINE 0 0 0
> c3t5d0 ONLINE 0 0 0
> c3t6d0 ONLINE 0 0 0
> raidz1-1 ONLINE 0 0 0
> c3t7d0 ONLINE 0 0 0
> c3t8d0 ONLINE 0 0 0
> c3t9d0 ONLINE 0 0 0
> c3t10d0 ONLINE 0 0 0
> c3t11d0 ONLINE 0 0 0
> c3t12d0 ONLINE 0 0 0
> c3t13d0 ONLINE 0 0 0
> logs
> mirror-2 ONLINE 0 0 0
> c2t4d0 ONLINE 0 0 0
> c2t5d0 ONLINE 0 0 0
> cache
> c2t2d0 ONLINE 0 0 0
> c2t3d0 ONLINE 0 0 0
> spares
> c3t14d0 INUSE currently in use
>
> errors: No known data errors
> There is ton of information on the Internet about zpool failures but it's
> mostly outdated and in some cases contradicting. I couldn't find anything
> that applies to my case.
I personally use the ZFS Administration Guide (I think the version for
OpenSolaris is most similar to what OI has). The document should
relatively comprehensive and (hopefully) not internally contradictory:
http://docs.oracle.com/cd/E19120-01/open.solaris/index.html
> - What state "REMOVED" means and why the drive was put in this state?
> "iostat -En" shows no errors for this drive. I can't find any indication
> that the drive is bad.
Page 106 of the PDF of the ZFS Administration Guide:
"The device was physically removed while the system was running.
Device removal detection is hardware-dependent and might not be
supported on all platforms."
> - If the drive has to be replaced could somebody please confirm that the
> following steps are sufficient:
>
> * zpool offline pool01 c3t0d0
> * zpool detach pool01 c3t0d0
> * Physically replace the drive with a new one
> * zpool add pool01 spare c3t0d0
> * zpool scrub pool01
I would do this a bit differently:
* zpool offline pool01 c3t0d0
** IF you believe the hard drive is faulty (vs. random single error),
physically replace
* zpool replace pool01 c3t0d0
** wait for automagic resilver (~36 minutes)
* zpool detach pool01 c3t14d0
* zpool add pool01 spare c3t14d0
I like "my" way better because the drives/spares stay in the same
locations (you might have them labelled a certain way) but it does
require an additional copying of the data back (if you left checksums
on, this shouldn't be a problem). No additional scrub should be
necessary, but if you are doubting the drive, you can re-scrub just in
case (you should be scrubbing weekly or monthly anyway, right?).
Jan
More information about the OpenIndiana-discuss
mailing list