[OpenIndiana-discuss] disconnected drives, how to avoid in the future?

Rich rercola at acm.jhu.edu
Thu Apr 12 21:42:38 UTC 2012


Those patches aren't yet in OI/IL mainline, as of when I looked today.

Regarding when they'll be usable, either in mainline or by fetching
them yourself...

17:33 < PMT> ping Triskelios - I don't suppose you have your pending
patches to mpt_sas (per
              http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/)
laying around somewhere easily grabbable?
17:34 <@Triskelios> not at the moment, should land on our public repo
on bitbucket sometime soon

- Rich

On Thu, Apr 12, 2012 at 1:30 PM, Karl Rossing
<karl.rossing at barobinson.com> wrote:
> I'm running into this issue with disconnected drives on snv_134.
>
> Would upgrading to oi_151a2 have the updated mpt_sas drive as noted on
>
> http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
> "Update (New): These timeouts don’t do squat because mpt_sas doesn’t honour
> the timeouts. This was recently uncovered by Nexenta and a patch to fix it
> is about to hit Illumos shortly. I’ll post when it does. Another patch is in
> progress which will further improve how mpt_sas handles failed drives.
> Thanks to Albert Lee for his work on them - you, sir, rock!"
>
> Karl
>
>
> On 01/10/2012 10:48 AM, Martin Frost wrote:
>>
>>  >  From: Jason Matthews<jason at broken.net>
>>  >  Date: Tue, 10 Jan 2012 08:26:08 -0800
>>  >
>>  >
>>  >  you can adjust the disk timeouts in solaris.
>>
>> Here's an article on how to do that, although it ends with the author
>> adding this comment "However in testing with failing harddrives (on
>> mpt_sas anyway), we see that the sd timeouts are completely ignored so
>> my entire post above is moot!"
>>
>>
>> http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>>
>> I haven't tested this, so does it work or not (in OpenIndiana)?
>>
>> Martin
>>
>>  >  there are two schools of thought here:
>>  >
>>  >  1) accomodate the extremely long timeouts of cinsumer drives and
>>  >  let the drive decide whether to report an error back (fail itself
>>  >  out)
>>  >
>>  >  2) set the time outs very narrowly and be aggressive in letting zfs
>>  >  fail out disks.
>>  >
>>  >  i generally go with option 2.
>>  >
>>  >  Sent from Jasons' hand held
>>  >
>>  >  On Jan 10, 2012, at 7:13 AM, Maurilio Longo<maurilio.longo at libero.it>
>>  wrote:
>>  >
>>  >  >  Geoff,
>>  >  >
>>  >  >  I've hit this problem several times in the past, with OpenSolaris
>>  >  >  and then with OpenIndiana.
>>  >  >
>>  >  >  There are, to my knowledge, no available solutions, it is so by
>>  >  >  design!
>>  >  >
>>  >  >  If a disk stops responding the pool waits until after it responds
>>  >  >  again (sometimes pulling it out of its slot and then reinserting
>>  >  >  the disk causes a reset of the link and it starts working again).
>>  >  >
>>  >  >  I was not able to assess what happens if I set failmode to
>> continue.
>>  >  >
>>  >  >  I think it could be no better since you still cannot write to the
>> pool.
>>  >  >
>>  >  >  This is IMHO the biggest problem of ZFS, in that I cannot
>>  >  >  instruct it to stop using a failed device if it has some level of
>>  >  >  redundancy still available.
>>  >  >
>>  >  >  Wait is OK only if an entire vdev stops responding, not if a disk
>>  >  >  in a vdev with redundancy has problems either fatal or
>>  >  >  transitory.
>>  >  >
>>  >  >  Best regards.
>>  >  >
>>  >  >  Maurilio.
>>  >  >
>>  >  >
>>  >  >  PS. Using server grade disks (those with TLER) makes it possibile
>>  >  >  to overcome this problem for transitory errors.
>>  >  >
>>  >  >
>>  >  >  Geoff Nordli wrote:
>>  >  >
>>  >  >>  Part of my concern is why one disk would have completely brought
>>  >  >>  down the system.  I have seen this come up on the list before,
>>  >  >>  but I don't remember any resolutions to fixing it.
>>  >  >>
>>  >  >>  Anyone have any clues to try to prevent this from happening in
>>  >  >>  the future?
>>  >  >>
>>  >  >>  thanks,
>>  >  >>
>>  >  >>  Geoff
>>
>> _______________________________________________
>> OpenIndiana-discuss mailing list
>> OpenIndiana-discuss at openindiana.org
>> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
>
>
>
> CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
> confidential and is intended for the use of the named addressee(s) only and
> may contain information that is private, confidential, privileged, and
> exempt from disclosure under law.  All rights to privilege are expressly
> claimed and reserved and are not waived.  Any use, dissemination,
> distribution, copying or disclosure of this message and any attachments, in
> whole or in part, by anyone other than the intended recipient(s) is strictly
> prohibited.  If you have received this communication in error, please notify
> the sender immediately, delete this communication from all data storage
> devices and destroy all hard copies.
>
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss



More information about the OpenIndiana-discuss mailing list