[OpenIndiana-discuss] Problems with zpool stalling

Richard Elling richard.elling at richardelling.com
Mon Mar 12 15:52:09 UTC 2012


On Mar 12, 2012, at 2:36 AM, Hans Joergensen wrote:

> Hello,
> 
> I've been having this problem with several storage servers running
> mostly NFS clients.. (ESXi). And I've seen it both on nexenta and
> OI.
> 
> I/O seems to freeze/pause when write IOPS are high..

There are many possible reasons for this. A full analysis can be time consuming.

> 
> Today I noticed it while having the zilstat.ksh and iostat running;
> 
> zilstat.ksh
>   N-Bytes  N-Bytes/s N-Max-Rate    B-Bytes  B-Bytes/s B-Max-Rate ops  <=4kB 4-32kB >=32kB
>  56030032   56030032   56030032  100241408  100241408  100241408 820      0     30    790
>   2645352    2645352    2645352    5058560    5058560    5058560 50 0     11     39
>     68792      68792      68792     131072     131072     131072 1 0      0      1
>         0          0          0          0          0          0 0 0      0      0
>         0          0          0          0          0          0 0 0      0      0
>   6648152    6648152    6648152   12996608   12996608   12996608 125      0     19    106
>     10576      10576      10576     708608     708608     708608 11 0      3      8
>    663648     663648     663648    2539520    2539520    2539520 30 0      3     27
>  58724208   58724208   58732968  109330432  109330432  109330432 916      5     16    896
> 
> iostat -xnz 5 showed everything - including busy - drop to around zero
> at the same time.

This shows you the symptom, but not the cause. To get to the cause, you
will need to correlate all potential resource contention for all parts of the 
system (including switches) and all error reports.

> 
> I can't really figure out how to go from here... ? I think it's
> weird if it's a ZIL-problem since the ZIL-devices are not busy when
> the problem occurs.
> 
> Also I've seen this on multiple machines connected to different
> switches.
> 
> Any ideas? Anyone seen something like this before?
> 
> btw, dmesg shows nothing.. And the number of SAS errors are
> minimal..

Begin with the SAS errors. If you have any errors in the disk subsystem, then I/O
can be impacted. A well managed system will show zero errors.
 -- richard

> 
> Thanks,
> 
> Hans Jørgensen
> Denmark.
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss

-- 

ZFS storage and performance consulting at http://www.RichardElling.com









More information about the OpenIndiana-discuss mailing list