[OpenIndiana-discuss] disconnected drives, how to avoid in the future?

Maurilio Longo maurilio.longo at libero.it
Wed Jan 11 08:04:26 UTC 2012


Martin,

you can set a timeout with lsiutil, but I've found that it makes no
difference, if a consumer grade disk starts trying to read a failing sector it
can block a pool indefinitely.

Best regards.

Maurilio.

Martin Frost wrote:
>  > From: Jason Matthews <jason at broken.net>
>  > Date: Tue, 10 Jan 2012 08:26:08 -0800
>  > 
>  > 
>  > you can adjust the disk timeouts in solaris. 
> 
> Here's an article on how to do that, although it ends with the author
> adding this comment "However in testing with failing harddrives (on
> mpt_sas anyway), we see that the sd timeouts are completely ignored so
> my entire post above is moot!"
> 
>   http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
> 
> I haven't tested this, so does it work or not (in OpenIndiana)?
> 
> Martin
> 
>  > there are two schools of thought here:
>  > 
>  > 1) accomodate the extremely long timeouts of cinsumer drives and
>  > let the drive decide whether to report an error back (fail itself
>  > out)
>  > 
>  > 2) set the time outs very narrowly and be aggressive in letting zfs
>  > fail out disks.
>  > 
>  > i generally go with option 2. 
>  > 
>  > Sent from Jasons' hand held
>  > 
>  > On Jan 10, 2012, at 7:13 AM, Maurilio Longo <maurilio.longo at libero.it> wrote:
>  > 
>  > > Geoff,
>  > > 
>  > > I've hit this problem several times in the past, with OpenSolaris
>  > > and then with OpenIndiana.
>  > > 
>  > > There are, to my knowledge, no available solutions, it is so by
>  > > design!
>  > > 
>  > > If a disk stops responding the pool waits until after it responds
>  > > again (sometimes pulling it out of its slot and then reinserting
>  > > the disk causes a reset of the link and it starts working again).
>  > > 
>  > > I was not able to assess what happens if I set failmode to continue.
>  > > 
>  > > I think it could be no better since you still cannot write to the pool.
>  > > 
>  > > This is IMHO the biggest problem of ZFS, in that I cannot
>  > > instruct it to stop using a failed device if it has some level of
>  > > redundancy still available.
>  > > 
>  > > Wait is OK only if an entire vdev stops responding, not if a disk
>  > > in a vdev with redundancy has problems either fatal or
>  > > transitory.
>  > > 
>  > > Best regards.
>  > > 
>  > > Maurilio.
>  > > 
>  > > 
>  > > PS. Using server grade disks (those with TLER) makes it possibile
>  > > to overcome this problem for transitory errors.
>  > > 
>  > > 
>  > > Geoff Nordli wrote:
>  > > 
>  > >> Part of my concern is why one disk would have completely brought
>  > >> down the system.  I have seen this come up on the list before,
>  > >> but I don't remember any resolutions to fixing it.
>  > >> 
>  > >> Anyone have any clues to try to prevent this from happening in
>  > >> the future?
>  > >> 
>  > >> thanks,
>  > >> 
>  > >> Geoff
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
> 

-- 
 __________
|  |  | |__| Maurilio Longo
|_|_|_|____| farmaconsult s.r.l.





More information about the OpenIndiana-discuss mailing list