[OpenIndiana-discuss] TCP Reset Packet Problem

Patrick Yu ipaq3870 at gmail.com
Mon Aug 6 07:09:22 UTC 2012


Hi,

I am experiencing a very strange TCP problem (the lack of) with my new
oi_151a5 install. The machine ran fine on the first day or two after a
fresh reboot, and after that SSH connections broke down and hanged
mysteriously during SSL handshake where no connections could be made
from both outside or even from inside using loopback lo0.

It took me awhile to track it down to this bug -
https://www.illumos.org/issues/1983 where the workaround posted solved
my SSH problem. But upon closer examination I found the source of the
problem is actually something else in my particular case. It turns out
any TCP connections to a closed port that is not being listened to
would not generate a TCP reset packet from the networking core. Any
clients connecting to these ports would hang there indefinitely for
lengthy retries.

I initially thought it was due to ipfilter but even after I cleared
the table, RST was still not being sent no matter what interface was
involved (lo0, e1000g0). The connection and RST packet would come back
after a reboot, and the problem recurs after a few days even with
low/no load as this is a testing installation running as a VM.

Things like X didn't start properly when there's missing TCP RST. I
didn't have time to look into it, but I presume it's related to this
problem too. Worth nothing is that those ports being listened to
exhibited no problems whatsoever - I can even do a iperf across the
network with very good results.

I could do some silly thing like the below ipf.conf snippet to "force"
RST packet being sent. But then if there's any pass statement at the
end like "pass in quick on lo0", RST would disappear again!
set intercept_loopback true;
block return-rst in

Anyone has an idea what could be the cause? A misconfiguration or a
bug? Any pointer would be greatly appreciated. I still keep a snapshot
of the problematic VM and am ready to do some more experiments with
it. Below is what the problematic session looks like, and a normal
snoop after reboot.

# telnet 127.0.0.1 12345
Trying 127.0.0.1...
^C

# snoop -I lo0 -tr -r
Using device ipnet/lo0 (promiscuous mode)
  0.00000    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
71716119 0,nop,wscale 2>
  1.13752    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
71716119 0,nop,wscale 2>
  3.40631    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
71716119 0,nop,wscale 2>
  7.92479    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
71716119 0,nop,wscale 2>
 16.93940    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=36692 Syn
Seq=1227588634 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp
71716119 0,nop,wscale 2>
^C#

# ifconfig lo0
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
8232 index 1
        inet 127.0.0.1 netmask ff000000
#
# netstat -r -n | grep lo0
127.0.0.1            127.0.0.1            UH        5       9638 lo0
::1                         ::1                         UH      7    1612 lo0
#
# ipf -Fa
#
# ipfstat -io
empty list for ipfilter(out)
empty list for ipfilter(in)
#
# netstat -anv | grep 12345
#
# svccfg -s ipfilter:default listprop |grep firewall_config
firewall_config_default                       com.sun,fw_configuration
firewall_config_default/value_authorization   astring
solaris.smf.value.firewall.config
firewall_config_default/version               count    1
firewall_config_default/apply_to              astring
firewall_config_default/exceptions            astring
firewall_config_default/policy                astring  custom
firewall_config_default/custom_policy_file    astring  /etc/ipf/ipf.conf
firewall_config_default/open_ports            astring
firewall_config_override                      com.sun,fw_configuration
firewall_config_override/apply_to             astring
firewall_config_override/value_authorization  astring
solaris.smf.value.firewall.config
firewall_config_override/policy               astring  none
#
# reboot
#
# telnet 127.0.0.1 12345
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
#
# snoop -I lo0 -tr -r
Using device ipnet/lo0 (promiscuous mode)
  0.00000    127.0.0.1 -> 127.0.0.1    TCP D=12345 S=53940 Syn
Seq=1084268217 Len=0 Win=32768 Options=<mss 8192,sackOK,tstamp 6061
0,nop,wscale 2>
  0.00005    127.0.0.1 -> 127.0.0.1    TCP D=53940 S=12345 Rst
Ack=1084268218 Win=0
^C#

Thanks.

Best regards,
Patrick



More information about the OpenIndiana-discuss mailing list