[OpenIndiana-discuss] TCP Reset Packet Problem

Patrick Yu ipaq3870 at gmail.com
Tue Aug 7 04:31:55 UTC 2012


I was actually trying to say "strange problem of TCP reset packet (or
the the lack of)". :-)

Anyway, after some more hours of digging around, I found some leads:

# ndd tcp tcp_rst_sent_rate_enabled
1
# ndd tcp tcp_rst_sent_rate
40
# kstat tcp 1 1 | egrep '[Rr]st'
        outRsts                         874
        tcp_rst_unsent                  3644
# telnet 127.0.0.1 12345
Trying 127.0.0.1...
^C
# kstat tcp 1 1 | egrep '[Rr]st'
        outRsts                         875
        tcp_rst_unsent                  3648

The rst sent rate of 40 a second seems not being observed, despite
there's no reset packets generated in the system except for the test
run. I did some more tests: When trying to increase the rst_sent_rate,
it takes a value of 800+ to make reset packets work, and the value
needs to be further incremented when more reset packets are being
sent. It seems like the counter for reset packets per second never get
zeroed.

Looks like a real bug to me. But I am still not sure how to trigger
this - it runs fine in the first day of two before exhibiting this
strange behavior. I even did some "stress" test from
https://blogs.oracle.com/clive/entry/tcp_reset_delay to a freshly
rebooted system in failed attempts to reproduce the erroneous
conditions. But I am sure it will come back when it's left there for
another day.

I suspect it could be the time accuracy problem due to it being a vbox
VM. I looked at tcp_output.c from
https://hg.openindiana.org/upstream/illumos/illumos-gate/file/adffc698eaf5/usr/src/uts/common/inet/tcp/tcp_output.c#l3279
and tried to change the clock backwards and forwards, but still could
not reproduce it.

Now, my temporary workaround is to set 0 to tcp_rst_sent_rate_enabled,
but in effect totally disable any tcp reset DOS protection. Hope this
could help someone with a similar case.

Best regards,
Patrick

On Mon, Aug 6, 2012 at 3:09 PM, Patrick Yu <ipaq3870 at gmail.com> wrote:
> Hi,
>
> I am experiencing a very strange TCP problem (the lack of) with my new
> oi_151a5 install. The machine ran fine on the first day or two after a
> fresh reboot, and after that SSH connections broke down and hanged
> mysteriously during SSL handshake where no connections could be made
> from both outside or even from inside using loopback lo0.
>
> It took me awhile to track it down to this bug -
> https://www.illumos.org/issues/1983 where the workaround posted solved
> my SSH problem. But upon closer examination I found the source of the
> problem is actually something else in my particular case. It turns out
> any TCP connections to a closed port that is not being listened to
> would not generate a TCP reset packet from the networking core. Any
> clients connecting to these ports would hang there indefinitely for
> lengthy retries.
>
> I initially thought it was due to ipfilter but even after I cleared
> the table, RST was still not being sent no matter what interface was
> involved (lo0, e1000g0). The connection and RST packet would come back
> after a reboot, and the problem recurs after a few days even with
> low/no load as this is a testing installation running as a VM.
>
> Things like X didn't start properly when there's missing TCP RST. I
> didn't have time to look into it, but I presume it's related to this
> problem too. Worth nothing is that those ports being listened to
> exhibited no problems whatsoever - I can even do a iperf across the
> network with very good results.
>
> I could do some silly thing like the below ipf.conf snippet to "force"
> RST packet being sent. But then if there's any pass statement at the
> end like "pass in quick on lo0", RST would disappear again!
> set intercept_loopback true;
> block return-rst in
>
> Anyone has an idea what could be the cause? A misconfiguration or a
> bug? Any pointer would be greatly appreciated. I still keep a snapshot
> of the problematic VM and am ready to do some more experiments with
> it. Below is what the problematic session looks like, and a normal
> snoop after reboot.
>
> # telnet 127.0.0.1 12345
> Trying 127.0.0.1...
> ^C
>
> # snoop -I lo0 -tr -r
> Using device ipnet/lo0 (promiscuous mode)
>   0.00000    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> 71716119 0,nop,wscale 2>
>   1.13752    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> 71716119 0,nop,wscale 2>
>   3.40631    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> 71716119 0,nop,wscale 2>
>   7.92479    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> 71716119 0,nop,wscale 2>
>  16.93940    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> 71716119 0,nop,wscale 2>
> ^C#
>
> # ifconfig lo0
> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
> 8232 index 1
>         inet 127.0.0.1 netmask ff000000
> #
> # netstat -r -n | grep lo0
> 127.0.0.1            127.0.0.1            UH        5       9638 lo0
> ::1                         ::1                         UH      7    1612 lo0
> #
> # ipf -Fa
> #
> # ipfstat -io
> empty list for ipfilter(out)
> empty list for ipfilter(in)
> #
> # netstat -anv | grep 12345
> #
> # svccfg -s ipfilter:default listprop |grep firewall_config
> firewall_config_default                       com.sun,fw_configuration
> firewall_config_default/value_authorization   astring
> solaris.smf.value.firewall.config
> firewall_config_default/version               count    1
> firewall_config_default/apply_to              astring
> firewall_config_default/exceptions            astring
> firewall_config_default/policy                astring  custom
> firewall_config_default/custom_policy_file    astring  /etc/ipf/ipf.conf
> firewall_config_default/open_ports            astring
> firewall_config_override                      com.sun,fw_configuration
> firewall_config_override/apply_to             astring
> firewall_config_override/value_authorization  astring
> solaris.smf.value.firewall.config
> firewall_config_override/policy               astring  none
> #
> # reboot
> #
> # telnet 127.0.0.1 12345
> Trying 127.0.0.1...
> telnet: Unable to connect to remote host: Connection refused
> #
> # snoop -I lo0 -tr -r
> Using device ipnet/lo0 (promiscuous mode)
>   0.00000    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=53940 Syn
> Seq=1084268217 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp 6061
> 0,nop,wscale 2>
>   0.00005    127.0.0.1 -> 127.0.0.1    TCP D=53940 S=12345 Rst
> Ack=1084268218 Win=0
> ^C#
>
> Thanks.
>
> Best regards,
> Patrick



More information about the OpenIndiana-discuss mailing list