[OpenIndiana-discuss] TCP Reset Packet Problem

Michael Stapleton michael.stapleton at techsologic.com
Tue Aug 7 04:55:56 UTC 2012


Do you have the VBox client utilities installed? I had seen strange
clock problems when the Agent is not installed.

Mike

On Tue, 2012-08-07 at 12:31 +0800, Patrick Yu wrote:

> I was actually trying to say "strange problem of TCP reset packet (or
> the the lack of)". :-)
> 
> Anyway, after some more hours of digging around, I found some leads:
> 
> # ndd tcp tcp_rst_sent_rate_enabled
> 1
> # ndd tcp tcp_rst_sent_rate
> 40
> # kstat tcp 1 1 | egrep '[Rr]st'
>         outRsts                         874
>         tcp_rst_unsent                  3644
> # telnet 127.0.0.1 12345
> Trying 127.0.0.1...
> ^C
> # kstat tcp 1 1 | egrep '[Rr]st'
>         outRsts                         875
>         tcp_rst_unsent                  3648
> 
> The rst sent rate of 40 a second seems not being observed, despite
> there's no reset packets generated in the system except for the test
> run. I did some more tests: When trying to increase the rst_sent_rate,
> it takes a value of 800+ to make reset packets work, and the value
> needs to be further incremented when more reset packets are being
> sent. It seems like the counter for reset packets per second never get
> zeroed.
> 
> Looks like a real bug to me. But I am still not sure how to trigger
> this - it runs fine in the first day of two before exhibiting this
> strange behavior. I even did some "stress" test from
> https://blogs.oracle.com/clive/entry/tcp_reset_delay to a freshly
> rebooted system in failed attempts to reproduce the erroneous
> conditions. But I am sure it will come back when it's left there for
> another day.
> 
> I suspect it could be the time accuracy problem due to it being a vbox
> VM. I looked at tcp_output.c from
> https://hg.openindiana.org/upstream/illumos/illumos-gate/file/adffc698eaf5/usr/src/uts/common/inet/tcp/tcp_output.c#l3279
> and tried to change the clock backwards and forwards, but still could
> not reproduce it.
> 
> Now, my temporary workaround is to set 0 to tcp_rst_sent_rate_enabled,
> but in effect totally disable any tcp reset DOS protection. Hope this
> could help someone with a similar case.
> 
> Best regards,
> Patrick
> 
> On Mon, Aug 6, 2012 at 3:09 PM, Patrick Yu <ipaq3870 at gmail.com> wrote:
> > Hi,
> >
> > I am experiencing a very strange TCP problem (the lack of) with my new
> > oi_151a5 install. The machine ran fine on the first day or two after a
> > fresh reboot, and after that SSH connections broke down and hanged
> > mysteriously during SSL handshake where no connections could be made
> > from both outside or even from inside using loopback lo0.
> >
> > It took me awhile to track it down to this bug -
> > https://www.illumos.org/issues/1983 where the workaround posted solved
> > my SSH problem. But upon closer examination I found the source of the
> > problem is actually something else in my particular case. It turns out
> > any TCP connections to a closed port that is not being listened to
> > would not generate a TCP reset packet from the networking core. Any
> > clients connecting to these ports would hang there indefinitely for
> > lengthy retries.
> >
> > I initially thought it was due to ipfilter but even after I cleared
> > the table, RST was still not being sent no matter what interface was
> > involved (lo0, e1000g0). The connection and RST packet would come back
> > after a reboot, and the problem recurs after a few days even with
> > low/no load as this is a testing installation running as a VM.
> >
> > Things like X didn't start properly when there's missing TCP RST. I
> > didn't have time to look into it, but I presume it's related to this
> > problem too. Worth nothing is that those ports being listened to
> > exhibited no problems whatsoever - I can even do a iperf across the
> > network with very good results.
> >
> > I could do some silly thing like the below ipf.conf snippet to "force"
> > RST packet being sent. But then if there's any pass statement at the
> > end like "pass in quick on lo0", RST would disappear again!
> > set intercept_loopback true;
> > block return-rst in
> >
> > Anyone has an idea what could be the cause? A misconfiguration or a
> > bug? Any pointer would be greatly appreciated. I still keep a snapshot
> > of the problematic VM and am ready to do some more experiments with
> > it. Below is what the problematic session looks like, and a normal
> > snoop after reboot.
> >
> > # telnet 127.0.0.1 12345
> > Trying 127.0.0.1...
> > ^C
> >
> > # snoop -I lo0 -tr -r
> > Using device ipnet/lo0 (promiscuous mode)
> >   0.00000    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> > 71716119 0,nop,wscale 2>
> >   1.13752    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> > 71716119 0,nop,wscale 2>
> >   3.40631    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> > 71716119 0,nop,wscale 2>
> >   7.92479    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> > 71716119 0,nop,wscale 2>
> >  16.93940    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
> > Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
> > 71716119 0,nop,wscale 2>
> > ^C#
> >
> > # ifconfig lo0
> > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
> > 8232 index 1
> >         inet 127.0.0.1 netmask ff000000
> > #
> > # netstat -r -n | grep lo0
> > 127.0.0.1            127.0.0.1            UH        5       9638 lo0
> > ::1                         ::1                         UH      7    1612 lo0
> > #
> > # ipf -Fa
> > #
> > # ipfstat -io
> > empty list for ipfilter(out)
> > empty list for ipfilter(in)
> > #
> > # netstat -anv | grep 12345
> > #
> > # svccfg -s ipfilter:default listprop |grep firewall_config
> > firewall_config_default                       com.sun,fw_configuration
> > firewall_config_default/value_authorization   astring
> > solaris.smf.value.firewall.config
> > firewall_config_default/version               count    1
> > firewall_config_default/apply_to              astring
> > firewall_config_default/exceptions            astring
> > firewall_config_default/policy                astring  custom
> > firewall_config_default/custom_policy_file    astring  /etc/ipf/ipf.conf
> > firewall_config_default/open_ports            astring
> > firewall_config_override                      com.sun,fw_configuration
> > firewall_config_override/apply_to             astring
> > firewall_config_override/value_authorization  astring
> > solaris.smf.value.firewall.config
> > firewall_config_override/policy               astring  none
> > #
> > # reboot
> > #
> > # telnet 127.0.0.1 12345
> > Trying 127.0.0.1...
> > telnet: Unable to connect to remote host: Connection refused
> > #
> > # snoop -I lo0 -tr -r
> > Using device ipnet/lo0 (promiscuous mode)
> >   0.00000    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=53940 Syn
> > Seq=1084268217 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp 6061
> > 0,nop,wscale 2>
> >   0.00005    127.0.0.1 -> 127.0.0.1    TCP D=53940 S=12345 Rst
> > Ack=1084268218 Win=0
> > ^C#
> >
> > Thanks.
> >
> > Best regards,
> > Patrick
> 
> _______________________________________________
> OpenIndiana-discuss mailing list
> OpenIndiana-discuss at openindiana.org
> http://openindiana.org/mailman/listinfo/openindiana-discuss




More information about the OpenIndiana-discuss mailing list