[OpenIndiana-discuss] disconnected drives, how to avoid in the future?
Maurilio Longo
maurilio.longo at libero.it
Wed Jan 11 08:04:26 UTC 2012
Martin,
you can set a timeout with lsiutil, but I've found that it makes no
difference, if a consumer grade disk starts trying to read a failing sector it
can block a pool indefinitely.
Best regards.
Maurilio.
Martin Frost wrote:
> > From: Jason Matthews <jason at broken.net>
> > Date: Tue, 10 Jan 2012 08:26:08 -0800
> >
> >
> > you can adjust the disk timeouts in solaris.
>
> Here's an article on how to do that, although it ends with the author
> adding this comment "However in testing with failing harddrives (on
> mpt_sas anyway), we see that the sd timeouts are completely ignored so
> my entire post above is moot!"
>
> http://blogs.everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>
> I haven't tested this, so does it work or not (in OpenIndiana)?
>
> Martin
>
> > there are two schools of thought here:
> >
> > 1) accomodate the extremely long timeouts of cinsumer drives and
> > let the drive decide whether to report an error back (fail itself
> > out)
> >
> > 2) set the time outs very narrowly and be aggressive in letting zfs
> > fail out disks.
> >
> > i generally go with option 2.
> >
> > Sent from Jasons' hand held
> >
> > On Jan 10, 2012, at 7:13 AM, Maurilio Longo <maurilio.longo at libero.it> wrote:
> >
> > > Geoff,
> > >
> > > I've hit this problem several times in the past, with OpenSolaris
> > > and then with OpenIndiana.
> > >
> > > There are, to my knowledge, no available solutions, it is so by
> > > design!
> > >
> > > If a disk stops responding the pool waits until after it responds
> > > again (sometimes pulling it out of its slot and then reinserting
> > > the disk causes a reset of the link and it starts working again).
> > >
> > > I was not able to assess what happens if I set failmode to continue.
> > >
> > > I think it could be no better since you still cannot write to the pool.
> > >
> > > This is IMHO the biggest problem of ZFS, in that I cannot
> > > instruct it to stop using a failed device if it has some level of
> > > redundancy still available.
> > >
> > > Wait is OK only if an entire vdev stops responding, not if a disk
> > > in a vdev with redundancy has problems either fatal or
> > > transitory.
> > >
> > > Best regards.
> > >
> > > Maurilio.
> > >
> > >
> > > PS. Using server grade disks (those with TLER) makes it possibile
> > > to overcome this problem for transitory errors.
> > >
> > >
> > > Geoff Nordli wrote:
> > >
> > >> Part of my concern is why one disk would have completely brought
> > >> down the system. I have seen this come up on the list before,
> > >> but I don't remember any resolutions to fixing it.
> > >>
> > >> Anyone have any clues to try to prevent this from happening in
> > >> the future?
> > >>
> > >> thanks,
> > >>
> > >> Geoff
>
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss
>
--
__________
| | | |__| Maurilio Longo
|_|_|_|____| farmaconsult s.r.l.
More information about the OpenIndiana-discuss
mailing list