From owner-freebsd-current@FreeBSD.ORG Tue Nov 20 10:16:37 2007 Return-Path: Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C55316A419 for ; Tue, 20 Nov 2007 10:16:37 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2001:1b20:1:3::1]) by mx1.freebsd.org (Postfix) with ESMTP id C1DBC13C468 for ; Tue, 20 Nov 2007 10:16:36 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.1/8.14.1) with ESMTP id lAKAGPxN043222; Tue, 20 Nov 2007 11:16:33 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.1/8.14.1/Submit) id lAKAGPj0043221; Tue, 20 Nov 2007 11:16:25 +0100 (CET) (envelope-from olli) Date: Tue, 20 Nov 2007 11:16:25 +0100 (CET) Message-Id: <200711201016.lAKAGPj0043221@lurza.secnetix.de> From: Oliver Fromme To: freebsd-current@FreeBSD.ORG X-Newsgroups: list.freebsd-current User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.2-STABLE-20070808 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Tue, 20 Nov 2007 11:16:34 +0100 (CET) X-Mailman-Approved-At: Tue, 20 Nov 2007 12:36:07 +0000 Cc: Subject: Sockets stuck in SYN_RCVD (re(4), RELENG_7, i386) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-current@FreeBSD.ORG List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Nov 2007 10:16:37 -0000 Hello, I'm watching a very strange problem here. There are two machines with almost identical hardware (see dmesg and pciconf output at the bottom). They also run identical sources: RELENG_7 (i386) as of October-18. I know it's a few weeks old, but I haven't seen any changes in the CVS that might be related to the following problem. On the first machine, I see a slow, but constant increase of the number of sockets in state SYN_RCVD in "netstat -an" output. The number of those sockets is the same as sysctl net.inet.tcp.syncache.count. This does not happen on the second machine at all (count is zero). At the moment, the count on the first machine is 702. We first noticed it three days ago when the count was 330, which leads to the assumption that the problem started about six days ago. However, the machine has an uptime of 32 days. So something must have triggered the problem after about 26 days of uptime. The port numbers and remote IPs of the SYN_RCVD sockets seem to be completely random. Most of the local ports are port 25, but a few are also port 80 or port 53. These are the ports most often used on the machine, all other ports are blocked in IPFW. In very rare cases a socket leaves the SYN_RCVD state. For example, yesterday I watched a socket with local destination port 80 that was in state SYN_RCVD for about 40 minutes and then disappeared. Both machines are only very lightly loaded. In fact they are pretty much 100% idle most of the time. They run sendmail, apache, BIND and a few minor things, but they really don't do much. There's nothing in the logs. Both machines have an re(4) interface. However, one interesting difference is that the first machine runs in GigE mode, while the second, while the second runs only at 100 Mbps. I don't know if the speed changed; the machines are colocated and if have no idea what kind of switch ports they are connected to. It could well be that the first machine's port was changed from 100M to GigE six days ago. I'm reluctant to change the speed manually to 100M, because I might lose the link if the switch is fixed at GigE. I would have to initiate a remote reboot in that case. Another thing worth noting is the fact that the second machine only has an uptime of 21 days. I'm curious if it will start collecting SYN_RCVD sockets when it reaches 26 days, too. :-) By the way, the problem does not seem to affect normal operation, so I'm not too worried at the moment. I can connect to the machine's services (ssh, http, smtp, dns) without any problems. A few data: $ sysctl net.inet.tcp.syncache net.inet.tcp.syncache.rst_on_sock_fail: 1 net.inet.tcp.syncache.rexmtlimit: 3 net.inet.tcp.syncache.hashsize: 512 net.inet.tcp.syncache.count: 702 net.inet.tcp.syncache.cachelimit: 15360 net.inet.tcp.syncache.bucketlimit: 30 $ netstat -s | sed -n '/sync/,/rec/p' 395637 syncache entries added 12023 retransmitted 8719 dupsyn 0 dropped 391666 completed 0 bucket overflow 0 cache overflow 1926 reset 1404 stale 1 aborted 0 badack 21 unreach 0 zone failures 395637 cookies sent 175 cookies received Output from dmesg and pciconf of the first machine is here: http://www.secnetix.de/~olli/dmesg/box/7.0-PRE-20071018.dmesg.txt http://www.secnetix.de/~olli/dmesg/box/7.0-PRE-20071018.pciconf.txt For comparison, this is the second machine which does _not_ exhibit the problem: http://www.secnetix.de/~olli/dmesg/pluto/7.0-PRE-20071018.dmesg.txt http://www.secnetix.de/~olli/dmesg/pluto/7.0-PRE-20071018.pciconf.txt Please let me know if I should provide more information. The next thing I would try is to reboot the machine, so I can see whether the problem occurs immediately or only after some uptime. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Python tricks" is a tough one, cuz the language is so clean. E.g., C makes an art of confusing pointers with arrays and strings, which leads to lotsa neat pointer tricks; APL mistakes everything for an array, leading to neat one-liners; and Perl confuses everything period, making each line a joyous adventure . -- Tim Peters