[OpenIndiana-discuss] Crossbow performance & bit rot

Mon Jan 23 22:42:32 UTC 2012

-----Original Message-----
From: James Carlson [mailto:carlsonj at workingcode.com] 

> It has a lot of tunables.  First, there are the ones documented in
> /kernel/drv/igb.conf.  Of those, perhaps the most interesting is
> flow_control.  It looks like the hardware acceleration ones here are the
> queuing-related ones, but they appear to be disabled by default.

> Then there are ones that can be controlled via igb.conf but that aren't
> advertised in the default file that ships.  These include:

Looks like I was dupe'd by Sun's documentation. 
http://docs.oracle.com/cd/E19082-01/819-2724/giozx/index.html

it only outlines mr_enable & intr_force, 

> Have you tried isolating other components?  Is the behavior the same on
> all switch ports?  Does it differ if you're connected to a different
> brand of switch?

So I have four servers performing this particular role. Three of them,
slowly and quietly march to their death. One, marches to its death but
periodically gets a 'reset'  where http response time suddenly drops and the
"clock" on the march to death starts over.

Yesterday, the 'resets' started happening on a second box.

I current switch is a Junper EX4200 configured as a virtual switch with
reasonable size stack chassis configured as line cards. It seems to work
just dandy for the other systems connected to it. I haven't tried another
switch yet, but there are no link related issues being reported on either
side. I have used this configuration before with e1000 based nics on
OSOL2009.6 and OI148.

> Is there anything special going on here?  For example, do you have
> standard IEEE link negotiation turned off -- forcing duplex or link speed?

Nothing particularly special. The gigE spec says to leave those in autoneg,
and they are in autoneg. Autoneg appears to work as advertised.

The switch ports are configured for jumbo frames, 9014 bytes, but the web
servers are not. The web servers use standard 1500 byte MTUs.  I have tested
jumbo/std frames, and it makes no difference on the problem.

The servers are connected via a LACP with 2 members in the bundle. I have
tested with LACP and with straight up Ethernet switching, it makes no
difference.

Here is the switch config, it is pretty plain vanilla.

root at brdr0.sf0# show ge-0/0/2
description "www004 igb1";
ether-options {
    auto-negotiation;
    802.3ad ae4;
}

root at brdr0.sf0# show ge-1/0/2
description "www004 igb0";
ether-options {
    auto-negotiation;
    802.3ad ae4;
}

root at brdr0.sf0# show ae4
description "www004 aggr0";
mtu 9014;
aggregated-ether-options {
    lacp {
        active;
    }
}
unit 0 {
    family ethernet-switching {
        vlan {
            members 300;
        }
    }
}

> Those symptoms seem to point in the direction of something much broader;
> a system-wide issue with interrupts, perhaps.

While I haven't instrumented interrupts, the normal range is between
16-30k/second. This doesn't strike me as unusual. It was actually the very
first thing I looked at before I engaged the list.

> Have you seen the postings about APIC issues?  Could those apply here?
>
>  set apix:apic_timer_preferred_mode = 0x0

I havent seen the APIC issues but I have been turning the APIC knobs. I gave
apic_timer_preferred_mode a whirl. It made no impact. I also tried

setprop acpi-user-options=0x8 and setprop acpi-user-options=0x2
individually. These also brought me no joy.

I gave disabling hw acceleration a whirl.

tx_hcksum_enable = 0;
rx_hcksum_enable = 0;
lso_enable = 0;

disabling the hardware acceleration earned me about 40% jump in cpu
utilization but no relief on the original problem.

I also gave adding software ring buffers a go. That had little or no effect
as well.

And of course, I gave hw ring buffers a go...
name = "pci8086,a03c" parent = "/pci at 0,0/pci8086,a03c at 3" unit-address =
"0,1"  rx_ring_size = 4096;
name = "pci8086,a03c" parent = "/pci at 0,0/pci8086,a03c at 3" unit-address =
"0,1"  tx_ring_size = 4096;
name = "pci8086,a03c" parent = "/pci at 0,0/pci8086,a03c at 3" unit-address =
"0,1"  mr_enable = 1;
name = "pci8086,a03c" parent = "/pci at 0,0/pci8086,a03c at 3" unit-address =
"0,1"  rx_group_number = 4;

Last night, I disabled flow control on one box. That had no effect.

I haven't tried connecting them to another switch, but I see no link related
issues. No discards, errors, etc.

I did notice one interesting item. The server that has had the 'resets'
where performance returns to normal but then begins the march of death has a
link state of 'unknown' on its two vnics.

All the other zones, which live on other physical systems, have a state of
up. What this might mean, and how or if it relates I have no idea. But that
is the only outlier I can find the batch. I would normally categorize the
zones with the link state of 'unknown' as some how broken but they are the
best performing zones in the bunch.

Nothing I have tried so far makes any difference, except, if offload enough
traffic off the system, the problem does appear to dissipate.

j.