[OpenIndiana-discuss] Crossbow performance & bit rot

Lou Picciano loupicciano at comcast.net
Fri Jan 20 13:53:12 UTC 2012


Yes, Jason, now that James has mentioned the 'Interrupt Storm' matter... 

You've piqued my interest. Any chance you're having all these problems on a SandyBridge system? Would love to hear more. In our case, we'd seen interrupt faults rising into the _multiple millions_, over a 5-second interval, before all hell started breaking loose. Though our (latest series of) problems were not network-related, they may be responsible for problems with ZFS. 

Lou Picciano 

----- Original Message -----
From: "James Carlson" <carlsonj at workingcode.com> 
To: "Jason Matthews" <jason at broken.net> 
Cc: openindiana-discuss at openindiana.org 
Sent: Thursday, January 19, 2012 12:26:30 PM 
Subject: Re: [OpenIndiana-discuss] Crossbow performance & bit rot 

On 01/17/12 14:01, Jason Matthews wrote: 
>> Another fairly likely possibility would be the hardware offload 
>> features that some drivers can use -- checksum and LSO. If the 
>> driver you're using can turn off these features, I strongly 
>> recommend trying that. I'd quickly run out of appendages before 
>> I could count the number of times that "hardware acceleration" 
>> has thrown performance in the toilet. 
> 
> The igb driver doesn't have any tunables for disabling hardware 
> acceleration. 

It has a lot of tunables. First, there are the ones documented in 
/kernel/drv/igb.conf. Of those, perhaps the most interesting is 
flow_control. It looks like the hardware acceleration ones here are the 
queuing-related ones, but they appear to be disabled by default. 

Then there are ones that can be controlled via igb.conf but that aren't 
advertised in the default file that ships. These include: 

intr_force 
tx_hcksum_enable 
rx_hcksum_enable 
lso_enable 
tx_head_wb_enable 
tx_copy_threshold 
tx_recycle_threshold 
tx_overload_threshold 
tx_resched_threshold 
rx_copy_threshold 
rx_limit_per_intr 
intr_throttling 
mcast_max_num 

Several of those control hardware acceleration features. 

Then there are the private link properties that you can access via dladm 
show-linkprop (but that don't show up unless specifically requested): 

_tx_copy_thresh 
_tx_recycle_thresh 
_tx_overload_thresh 
_tx_resched_thresh 
_rx_copy_thresh 
_rx_limit_per_intr 
_intr_throttling 
_adv_pause_cap 
_adv_asym_pause_cap 

All of those look interesting, but would likely require some 
driver-level information to use properly. 

> I did however issue 'echo dohwcksum/W 0 | mdb -kw' 
> 
> I assume that turned off hardware checksums. Flipping that bit made no 
> difference, however. 
> 
> I added a Broadcom 5709 dual port card which uses the bnx driver. It 
> experiences the same degradation as the Intel chips. The Broadcom driver is 
> configurable for turning off hardware checksums and so I did that in 
> bnx.conf. It made no difference, the network performance still deteriorates. 

That's interesting. 

Have you tried isolating other components? Is the behavior the same on 
all switch ports? Does it differ if you're connected to a different 
brand of switch? 

Is there anything special going on here? For example, do you have 
standard IEEE link negotiation turned off -- forcing duplex or link speed? 

> I could find no options to disable LSO on either the Broadcom or Intel 
> cards. Is there an object to tweak with mdb? 

It's a driver property; see above. (Tweaking with mdb is at least 
theoretically possible, but is actually fairly hard to do.) 

>> I suggest: 
>> 
>> - using dtrace to find out what the code paths are when traffic is 
>> working fine and when it's slow; the difference may reveal something, 
>> 
> 
> Tracking this with dtrace is currently out of my league, but I take 
> direction well. 

It's an interactive exercise, unfortunately, and I don't think I have 
the time to walk through it here. 

> My current work around is that I expanded the number of zones running on the 
> four web servers from 8 zones total to 28 zones total. This has dramatically 
> slowed the degradation. I would have gone to 32 zones but the current IP 
> subnet wont support it. Before expanding to 28 zones, I was forced to delete 
> the vnics and reboot the zones each day. 
> 
> 
>>From time to time, I see correctiosn in the response time graphs. For 
> instance, one or more zones/vnics on the a given system experience a sudden 
> drop in response times. Say from 5ms to less than 1 ms. When this happens, 
> occasionally a cycle starts where every one to two hours the performance 
> resets back under 1ms(I view this as a good thing), and then begins to rise 
> again. Eventually, some large spike comes around in response time, the cycle 
> breaks, and the system resumes the slow climb to certain death. 
> 
> I could use some more ideas. I am considering putting a pre-crossbow S10, 
> oi148, and/or S11 on the front line here to see if the problem goes away. I 
> starting to scrape the bottom of the barrel for ideas. 

Those symptoms seem to point in the direction of something much broader; 
a system-wide issue with interrupts, perhaps. 

Have you seen the postings about APIC issues? Could those apply here? 

set apix:apic_timer_preferred_mode = 0x0 

-- 
James Carlson 42.703N 71.076W <carlsonj at workingcode.com> 

_______________________________________________ 
OpenIndiana-discuss mailing list 
OpenIndiana-discuss at openindiana.org 
http://openindiana.org/mailman/listinfo/openindiana-discuss 


More information about the OpenIndiana-discuss mailing list