From owner-freebsd-net@FreeBSD.ORG Thu Jun 26 07:50:13 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5B6B1065671 for ; Thu, 26 Jun 2008 07:50:13 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id A20BB8FC25 for ; Thu, 26 Jun 2008 07:50:13 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 40E6A46C10; Thu, 26 Jun 2008 03:50:13 -0400 (EDT) Date: Thu, 26 Jun 2008 08:50:13 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ali Niknam In-Reply-To: <4862BCF5.4070900@transip.nl> Message-ID: <20080626081831.V96707@fledge.watson.org> References: <486283B0.3060805@transip.nl> <20080625195523.N29013@fledge.watson.org> <4862BCF5.4070900@transip.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@freebsd.org Subject: Re: FreeBSD 7.0: sockets stuck in CLOSED state... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2008 07:50:13 -0000 On Wed, 25 Jun 2008, Ali Niknam wrote: >> precisely matches that what you'd expect: lots of TCP connections in the >> CLOSED state reflecting a series of connections built by an application but >> then not properly discarded. Likewise, when the application is killed, all >> of the connections go away -- most likely because the file descriptors are >> all closed, allowing them to be garbage collected and connection state >> freed. If it is this sort of bug, then most likely you're missing a call >> to close() in a work loop somewhere, and in some exceptional case, you fall >> out of the loop without calling close(). > > I will double check this once more, but honestly, i strongly doubt it... > > Also one other thing that I've noticed, is that it's always the input buffer > that has bytes left; never the output buffer... > > Moreover, i've seen that close() reports EBADF, but due to the insane amount > of connections I can not say for certain that that's when the connection > goes into CLOSED state. The ip's do match, but it's very common for the same > ip's to make numerous connections too. I think the first logical step is to wait for the application to get into that state again, and then run procstat or fstat to dump the file descriptor away for the process. Presumably in the normal steady state, you expect to see a few IPC sockets (syslog, etc), a TCP listen socket, and some number of in-progress TCP sessions. The question, of course, is whether you see a lot more file descriptors than that, and in particular, ones that matched the CLOSED entries in netstat. If you find that there are lots of open file descriptors and they match up approximately with netstat, then it's an application bug that just manifests a bit differently in 7.x than in 6.x. On the other hand, if you see only a small number of open file descriptors, then we may be looking at something quite a bit more complicated. I would next seek to confirm the analysis that "they go away when the application is killed" -- do they really disappear at the very moment it exits, or do they kind of disappear over time and it just happens that by the time you run netstat after killing the application, they're gone. I.e., I'd try something like "netstat -na > file1 ; kill pid ; sleep 1 ; netstat -na > file2 ; diff -u file1 file2". If they really all go away in a large quantity the moment the process dies, then the reference model is working (i.e., they are freed), but perhaps references are being held onto in an unexpected way. For example, is the incomplete listen queue somehow getting filled with CLOSED sockets that are only garbage collected when close() is called on the listen socket? If we suspect that, we can actually test it by having your application close the listen socket and re-open it once in a while, and see if the CLOSED sockets fail to stack up. Speaking of which, I meant to ask: are you using accept filters, and if so, which one? Robert N M Watson Computer Laboratory University of Cambridge