[OpenIndiana-discuss] Crossbow performance & bit rot

Jason Matthews jason at broken.net
Tue Jan 17 19:01:23 UTC 2012



James Carlson wrote:

> I don't know what is causing the problem you're seeing, but I do 
> have some guesses.
>
> Assuming it affects all packets, and not just those of a particular
> type, the slow packet delivery sounds a bit like either a driver problem
> or a problem with the (Crossbow-introduced) "polling" mode that drivers
> can use during periods of high traffic.

Yes, it impacts all types of packets.

> Another fairly likely possibility would be the hardware offload 
> features that some drivers can use -- checksum and LSO.  If the 
> driver you're using can turn off these features, I strongly 
> recommend trying that. I'd quickly run out of appendages before 
> I could count the number of times that "hardware acceleration"
> has thrown performance in the toilet.

The igb driver doesn't have any tunables for disabling hardware
acceleration.

I did however issue 'echo dohwcksum/W 0 | mdb -kw'

I assume that turned off hardware checksums. Flipping that bit made no
difference, however.

I added a Broadcom 5709 dual port card which uses the bnx driver. It
experiences the same degradation as the Intel chips. The Broadcom driver is
configurable for turning off hardware checksums and so I did that in
bnx.conf. It made no difference, the network performance still deteriorates.

I could find no options to disable LSO on either the Broadcom or Intel
cards. Is there an object to tweak with mdb?

> I suggest:
>
> - using dtrace to find out what the code paths are when traffic is
> working fine and when it's slow; the difference may reveal something,
>

Tracking this with dtrace is currently out of my league, but I take
direction well.

> - installing a network interface card that uses a different driver
> (e.g., if you have Intel, install Broadcom) and see if that interface
> behaves differently.

This had no impact.

My current work around is that I expanded the number of zones running on the
four web servers from 8 zones total to 28 zones total. This has dramatically
slowed the degradation. I would have gone to 32 zones but the current IP
subnet wont support it. Before expanding to 28 zones, I was forced to delete
the vnics and reboot the zones each day.


>From time to time, I see correctiosn in the response time graphs. For
instance, one or more zones/vnics on the a given system experience a sudden
drop in response times. Say from 5ms to less than 1 ms. When this happens,
occasionally a cycle starts where every one to two hours the performance
resets back under 1ms(I view this as a good thing), and then begins to rise
again. Eventually, some large spike comes around in response time, the cycle
breaks, and the system resumes the slow climb to certain death.

I could use some more ideas. I am considering putting a pre-crossbow S10,
oi148, and/or S11 on the front line here to see if the problem goes away. I
starting to scrape the bottom of the barrel for ideas.



j.


More information about the OpenIndiana-discuss mailing list