[OpenIndiana-discuss] Crossbow performance & bit rot

Jason Matthews jason at broken.net
Sat Dec 31 06:27:16 UTC 2011




As a follow up, I have determined that crossbow is not at fault for the
degradation of the network stack performance. As it turns out, the shared
stack fails much faster than multiple zones with exclusive-ip.

To determine this, I striped the zones of crossbow and used a shared IP
stack. I found that over a shorter period of time the native IP stack
suffers from the same degradation. The degradation does not heal itself when
I remove it from the web pool/remove the onslaught of traffic from the
system. 

The symptoms are slow out bound connection times, high ping times between
systems on a local subnet  (1ms to 500ms on gigE), and general network
flakiness. 

In other words, some thing causes the network stack to get into a
unrecoverable funk. For crossbow interfaces, destroying the vnic and
re-creating them seems to fix it. On physical interfaces, rebooting
certainly fixes it but I haven't tried to reload the driver. As these are
production servers, I have moved back to crossbow where I can tear down the
zones and rebuild the vnic in seconds rather than minutes for a reboot. This
however  is a band-aid.

Attached is a graphic of HTTP response times of nginx serving robots.txt.
You see that over time, the performance degrades until the vnic is recreated
or the system is rebooted. When that happens, performance is restored for
several hours or days. In either case, I have four physical systems that
exhibit this problem.

Where can I file a bug on this?

For reference, each server sees about 500 connections/second in HTTP
traffic, and they divvy up between 15k and 60k packets/second. Here are my
modifiers to the network stack defaults:

root at web008:~# /root/bin/get-network-tune.sh
tcp_fin_wait_2_flush_interval
15000
tcp_time_wait_interval
current value: 10000
tcp_conn_req_max_q - default 128
current value: 4096
tcp_conn_req_max_q0 - default 1024
current value: 98304
tcp_keepalive_interval - default 7200000
current value: 600000
tcp_ip_abort_cinterval - default 180000
current value: 60000

This dump comes about 10 minutes after rebooting and re-creating the vnic...

root at web008:/tmp# netstat -sP tcp

TCP     tcpRtoAlgorithm     =     4     tcpRtoMin           =   400
        tcpRtoMax           =240000     tcpMaxConn          =    -1
        tcpActiveOpens      =533035     tcpPassiveOpens     =588486
        tcpAttemptFails     =  3788     tcpEstabResets      =   723
        tcpCurrEstab        =   126     tcpOutSegs          =2374366
        tcpOutDataSegs      =1619765    tcpOutDataBytes     =459442156
        tcpRetransSegs      =101134     tcpRetransBytes     =10121463
        tcpOutAck           =2762952    tcpOutAckDelayed    =    24
        tcpOutUrg           =     0     tcpOutWinUpdate     =     0
        tcpOutWinProbe      =     0     tcpOutControl       =2327673
        tcpOutRsts          = 23430     tcpOutFastRetrans   =    84
        tcpInSegs           =3003425
        tcpInAckSegs        =     0     tcpInAckBytes       =458650724
        tcpInDupAck         =373734     tcpInAckUnsent      =   868
        tcpInInorderSegs    =1606849    tcpInInorderBytes   =423500583
        tcpInUnorderSegs    =    72     tcpInUnorderBytes   = 21585
        tcpInDupSegs        = 69640     tcpInDupBytes       =2615379
        tcpInPartDupSegs    =     1     tcpInPartDupBytes   =   668
        tcpInPastWinSegs    =    42     tcpInPastWinBytes   =754601871
        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
        tcpInClosed         =   192     tcpRttNoUpdate      =   457
        tcpRttUpdate        =1533952    tcpTimRetrans       = 51415
        tcpTimRetransDrop   =  5483     tcpTimKeepalive     =     0
        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
        tcpListenDrop       =     0     tcpListenDropQ0     =     0
        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =    29

I will welcome any suggestions on getting better results out of the OS.


Thanks,
j.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: web-response-time.png
Type: image/png
Size: 50754 bytes
Desc: not available
URL: <http://openindiana.org/pipermail/openindiana-discuss/attachments/20111230/1ea70443/attachment-0001.png>


More information about the OpenIndiana-discuss mailing list