[OpenIndiana-discuss] Sudden ZFS performance issue

Saso Kiselkov skiselkov.ml at gmail.com
Fri Jul 5 18:00:24 UTC 2013


On 05/07/2013 17:08, wim at vandenberge.us wrote:
> Good morning,
> 
> I have a weird problem with two of the 15+ OpenSolaris storage servers in our
> environment. All the Nearline servers are essentially the same. Supermicro
> X9DR3-F based server, Dual E5-2609's, 64GB memory, Dual 10Gb SFP+ NICs, LSI
> 9200-8e HBA, Supermicro CSE-826E26-R1200LPB storage arrays and Seagate
> enterprise 2TB SATA or SAS drives (not mixed within a server). Root, l2ARC and
> ZIL are all on Intel SSD (SLC series 313 for ZIL, MLC 520 for L2ARC and MLC 330
> for boot)
> 
> The volumes are built out of 9 drive Z1 groups, ashift is set to 9 (which is
> supposed to appropiate for the enterprise seagates). The pools are large
> (120-130TB) but are only between 27 and 32% full. Each server serves an iSCSI
> (Comstar) and an CIFS (in kernel server) volume of the same pool. I realize this
> is not optimal from a recovery/resilver/rebuild standpoint but the servers are
> replicated and the data is easily rebuildable.
> 
> Initially these servers did great for several months, while certainly no speed
> demons, 300+ MB/sec for sequential read/writes was not a problem. Several weeks
> ago, literally overnight, replication times went through the roof for one
> server. Simple testing showed that reading from the pool would no longer go over
> 25MB/s. Even a scrub that used to run at 400+ MB/sec is now crawling along at
> below 40MB/s.
> 
> Sometime yesterday the second server started to exhibit the exact same
> behaviour. This one is used even less (it's our D2D2T server) and data is
> written to it at night and read during the day to be written to tape.
> 
> I've exhausted all I know and I'm at a loss. Does anyone have any ideas of what
> to look at, or do any obvious reasons for this behaviour jump out from the
> configuration above?

Isn't iostat -Exn reporting some transport errors? Smells like a drive
gone bad and forcing retries, which would cause about a 10x decrease in
performance. Just a guess, though.

Cheers,
-- 
Saso



More information about the OpenIndiana-discuss mailing list