From owner-freebsd-net@FreeBSD.ORG Sun Jun 29 06:42:25 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83CB71065689 for ; Sun, 29 Jun 2008 06:42:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5B1DE8FC15 for ; Sun, 29 Jun 2008 06:42:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id E218446C32; Sun, 29 Jun 2008 02:42:24 -0400 (EDT) Date: Sun, 29 Jun 2008 07:42:24 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ali Niknam In-Reply-To: <20080627090939.M78484@fledge.watson.org> Message-ID: <20080629073519.D10134@fledge.watson.org> References: <486283B0.3060805@transip.nl> <20080625195523.N29013@fledge.watson.org> <4862BCF5.4070900@transip.nl> <20080626081831.V96707@fledge.watson.org> <20080627090939.M78484@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@freebsd.org Subject: Probably not a kernel bug (was: Re: FreeBSD 7.0: sockets stuck in CLOSED state...) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2008 06:42:25 -0000 On Fri, 27 Jun 2008, Robert Watson wrote: > I've asked Ali to do a bit more debugging and tracing of the application to > see if we can reach any conclusions about this. In particular, if he traces > to a file all file descriptor numbers returned by accept(2), then we can > later compare that file with the leaked descriptors present in > netstat/sockstat and decide whether the application *should* have known they > were open or not. Another public follow-up: Ali has been sending me debugging information privately due to the inclusion of application source code and IP addresses. Tracing of the application suggests that there is an application concurrency bug leading to one socket to be closed twice and another socket to be left open. The bug might be triggering in 7.x but not earlier releases because of the change to libthr, which can lead to more parallelism/asynchrony in the application. In conclusion: we currently believe that this report of sockets stuck in the CLOSED state is not the result of a kernel bug. If any further information comes to light, I will send a followup. Thanks, Robert N M Watson Computer Laboratory University of Cambridge