[OpenIndiana-discuss] Crossbow performance & bit rot
Jason Matthews
jason at broken.net
Sat Dec 31 06:27:16 UTC 2011
As a follow up, I have determined that crossbow is not at fault for the
degradation of the network stack performance. As it turns out, the shared
stack fails much faster than multiple zones with exclusive-ip.
To determine this, I striped the zones of crossbow and used a shared IP
stack. I found that over a shorter period of time the native IP stack
suffers from the same degradation. The degradation does not heal itself when
I remove it from the web pool/remove the onslaught of traffic from the
system.
The symptoms are slow out bound connection times, high ping times between
systems on a local subnet (1ms to 500ms on gigE), and general network
flakiness.
In other words, some thing causes the network stack to get into a
unrecoverable funk. For crossbow interfaces, destroying the vnic and
re-creating them seems to fix it. On physical interfaces, rebooting
certainly fixes it but I haven't tried to reload the driver. As these are
production servers, I have moved back to crossbow where I can tear down the
zones and rebuild the vnic in seconds rather than minutes for a reboot. This
however is a band-aid.
Attached is a graphic of HTTP response times of nginx serving robots.txt.
You see that over time, the performance degrades until the vnic is recreated
or the system is rebooted. When that happens, performance is restored for
several hours or days. In either case, I have four physical systems that
exhibit this problem.
Where can I file a bug on this?
For reference, each server sees about 500 connections/second in HTTP
traffic, and they divvy up between 15k and 60k packets/second. Here are my
modifiers to the network stack defaults:
root at web008:~# /root/bin/get-network-tune.sh
tcp_fin_wait_2_flush_interval
15000
tcp_time_wait_interval
current value: 10000
tcp_conn_req_max_q - default 128
current value: 4096
tcp_conn_req_max_q0 - default 1024
current value: 98304
tcp_keepalive_interval - default 7200000
current value: 600000
tcp_ip_abort_cinterval - default 180000
current value: 60000
This dump comes about 10 minutes after rebooting and re-creating the vnic...
root at web008:/tmp# netstat -sP tcp
TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
tcpRtoMax =240000 tcpMaxConn = -1
tcpActiveOpens =533035 tcpPassiveOpens =588486
tcpAttemptFails = 3788 tcpEstabResets = 723
tcpCurrEstab = 126 tcpOutSegs =2374366
tcpOutDataSegs =1619765 tcpOutDataBytes =459442156
tcpRetransSegs =101134 tcpRetransBytes =10121463
tcpOutAck =2762952 tcpOutAckDelayed = 24
tcpOutUrg = 0 tcpOutWinUpdate = 0
tcpOutWinProbe = 0 tcpOutControl =2327673
tcpOutRsts = 23430 tcpOutFastRetrans = 84
tcpInSegs =3003425
tcpInAckSegs = 0 tcpInAckBytes =458650724
tcpInDupAck =373734 tcpInAckUnsent = 868
tcpInInorderSegs =1606849 tcpInInorderBytes =423500583
tcpInUnorderSegs = 72 tcpInUnorderBytes = 21585
tcpInDupSegs = 69640 tcpInDupBytes =2615379
tcpInPartDupSegs = 1 tcpInPartDupBytes = 668
tcpInPastWinSegs = 42 tcpInPastWinBytes =754601871
tcpInWinProbe = 0 tcpInWinUpdate = 0
tcpInClosed = 192 tcpRttNoUpdate = 457
tcpRttUpdate =1533952 tcpTimRetrans = 51415
tcpTimRetransDrop = 5483 tcpTimKeepalive = 0
tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans = 29
I will welcome any suggestions on getting better results out of the OS.
Thanks,
j.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: web-response-time.png
Type: image/png
Size: 50754 bytes
Desc: not available
URL: <http://openindiana.org/pipermail/openindiana-discuss/attachments/20111230/1ea70443/attachment-0001.png>
More information about the OpenIndiana-discuss
mailing list