[OpenIndiana-discuss] disconnected drives, how to avoid in the future?

Karl Rossing karl.rossing at barobinson.com
Thu Apr 12 17:30:13 UTC 2012


I'm running into this issue with disconnected drives on snv_134.

Would upgrading to oi_151a2 have the updated mpt_sas drive as noted on

http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ 

"Update (New): These timeouts don’t do squat because mpt_sas doesn’t 
honour the timeouts. This was recently uncovered by Nexenta and a patch 
to fix it is about to hit Illumos shortly. I’ll post when it does. 
Another patch is in progress which will further improve how mpt_sas 
handles failed drives. Thanks to Albert Lee for his work on them - you, 
sir, rock!"

Karl

On 01/10/2012 10:48 AM, Martin Frost wrote:
>   >  From: Jason Matthews<jason at broken.net>
>   >  Date: Tue, 10 Jan 2012 08:26:08 -0800
>   >
>   >
>   >  you can adjust the disk timeouts in solaris.
>
> Here's an article on how to do that, although it ends with the author
> adding this comment "However in testing with failing harddrives (on
> mpt_sas anyway), we see that the sd timeouts are completely ignored so
> my entire post above is moot!"
>
>    http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>
> I haven't tested this, so does it work or not (in OpenIndiana)?
>
> Martin
>
>   >  there are two schools of thought here:
>   >
>   >  1) accomodate the extremely long timeouts of cinsumer drives and
>   >  let the drive decide whether to report an error back (fail itself
>   >  out)
>   >
>   >  2) set the time outs very narrowly and be aggressive in letting zfs
>   >  fail out disks.
>   >
>   >  i generally go with option 2.
>   >
>   >  Sent from Jasons' hand held
>   >
>   >  On Jan 10, 2012, at 7:13 AM, Maurilio Longo<maurilio.longo at libero.it>  wrote:
>   >
>   >  >  Geoff,
>   >  >
>   >  >  I've hit this problem several times in the past, with OpenSolaris
>   >  >  and then with OpenIndiana.
>   >  >
>   >  >  There are, to my knowledge, no available solutions, it is so by
>   >  >  design!
>   >  >
>   >  >  If a disk stops responding the pool waits until after it responds
>   >  >  again (sometimes pulling it out of its slot and then reinserting
>   >  >  the disk causes a reset of the link and it starts working again).
>   >  >
>   >  >  I was not able to assess what happens if I set failmode to continue.
>   >  >
>   >  >  I think it could be no better since you still cannot write to the pool.
>   >  >
>   >  >  This is IMHO the biggest problem of ZFS, in that I cannot
>   >  >  instruct it to stop using a failed device if it has some level of
>   >  >  redundancy still available.
>   >  >
>   >  >  Wait is OK only if an entire vdev stops responding, not if a disk
>   >  >  in a vdev with redundancy has problems either fatal or
>   >  >  transitory.
>   >  >
>   >  >  Best regards.
>   >  >
>   >  >  Maurilio.
>   >  >
>   >  >
>   >  >  PS. Using server grade disks (those with TLER) makes it possibile
>   >  >  to overcome this problem for transitory errors.
>   >  >
>   >  >
>   >  >  Geoff Nordli wrote:
>   >  >
>   >  >>  Part of my concern is why one disk would have completely brought
>   >  >>  down the system.  I have seen this come up on the list before,
>   >  >>  but I don't remember any resolutions to fixing it.
>   >  >>
>   >  >>  Anyone have any clues to try to prevent this from happening in
>   >  >>  the future?
>   >  >>
>   >  >>  thanks,
>   >  >>
>   >  >>  Geoff
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss



CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.



More information about the OpenIndiana-discuss mailing list