From owner-freebsd-questions@FreeBSD.ORG  Wed Jun 18 14:36:12 2008
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9FEDB106566B
	for <freebsd-questions@freebsd.org>;
	Wed, 18 Jun 2008 14:36:12 +0000 (UTC)
	(envelope-from freebsd-questions@transip.nl)
Received: from relay0.transip.nl (relay0.transip.nl [80.69.67.21])
	by mx1.freebsd.org (Postfix) with ESMTP id 6415C8FC18
	for <freebsd-questions@freebsd.org>;
	Wed, 18 Jun 2008 14:36:12 +0000 (UTC)
	(envelope-from freebsd-questions@transip.nl)
Received: from [192.168.0.3] (ip86-50-212-87.adsl2.versatel.nl [87.212.50.86])
	by relay0.transip.nl (Postfix) with ESMTP id 80122104459
	for <freebsd-questions@freebsd.org>;
	Wed, 18 Jun 2008 16:19:41 +0200 (CEST)
Message-ID: <4859197A.8040203@transip.nl>
Date: Wed, 18 Jun 2008 16:19:38 +0200
From: Ali Niknam <freebsd-questions@transip.nl>
Organization: Transip BV
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: freebsd-questions@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Sockets stuck in CLOSED state...
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Jun 2008 14:36:12 -0000

Dear All,

Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 
to FreeBSD 7.0 amd64.

After upgrading I noticed a weird error/bug. It seems that after several 
thousand TCP connections some seem to hang in 'CLOSED' state.

netstat -n gives:
...
tcp4      0       0  1.2.3.4.*          4.5.6.7.42149       CLOSED
tcp4      39      0  1.2.3.4.*          4.5.6.7.54103       CLOSED
tcp4      35      0  1.2.3.4.*          4.5.6.7.41718       CLOSED
tcp4      38      0  1.2.3.4.*          4.5.6.7.55618       CLOSED
tcp4      41      0  1.2.3.4.*          4.5.6.7.44230       CLOSED
tcp4      39      0  1.2.3.4.*          4.5.6.7.49439       CLOSED
...

These never go away; they gradually increase and increase until the 
application starts giving errors (probably because some socket or 
filedescriptor limit is reached). When the application is killed these 
entries disappear.

The application in question is a self written DNS server, multithreaded, 
and running fine for years without any troubles on both BSD 5.x as well 
as 6.x. Also 32bits as well as 64bits on 6.x.

Ofcourse that doesn't mean that the application is error free, however, 
after doing extensive testing I really can not find anything wrong with 
the application itself, so I'm thinking maybe there's a change somewhere 
that causes this? I know that tcp/network has been completely redone...

What basically happens in the application is this:
  - one main tcp thread runs an infinite while loop waiting for new 
connections to arrive
  - as soon as one arrives a new thread is spawned that handles the 
newly created stream
  - it reads some bytes, writes some bytes, then closes it
  - thread exits

What appears to happen is this: after the new thread is spawned it tries 
to read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) 
and therefore closes the sockets and calls pthread_exit. However in 
netstat that same stream oftenly appears to have bytes 'stuck' in the in 
queue...

I really can't see how this can cause hanging sockets in 'CLOSED' state. 
Even if the incoming queue isnt read entirely a call to close should 
close it. Also I really can't find any documentation in netstat, or 
elsewhere, about the 'CLOSED' state...


Any help would greatly be appreciated!


Kind Regards,


Ali Niknam