From owner-freebsd-current@FreeBSD.ORG Mon Nov 26 19:37:17 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4D2616A46B for ; Mon, 26 Nov 2007 19:37:17 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2001:1b20:1:3::1]) by mx1.freebsd.org (Postfix) with ESMTP id 4CF5A13C4F2 for ; Mon, 26 Nov 2007 19:37:16 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.1/8.14.1) with ESMTP id lAQJb9Xo050606; Mon, 26 Nov 2007 20:37:15 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.1/8.14.1/Submit) id lAQJb84U050605; Mon, 26 Nov 2007 20:37:08 +0100 (CET) (envelope-from olli) Date: Mon, 26 Nov 2007 20:37:08 +0100 (CET) Message-Id: <200711261937.lAQJb84U050605@lurza.secnetix.de> From: Oliver Fromme To: freebsd-current@freebsd.org In-Reply-To: <200711210956.lAL9uI0o097057@lurza.secnetix.de> X-Newsgroups: list.freebsd-current User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.2-STABLE-20070808 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Mon, 26 Nov 2007 20:37:15 +0100 (CET) X-Mailman-Approved-At: Mon, 26 Nov 2007 19:57:43 +0000 Subject: Re: Sockets stuck in SYN_RCVD (re(4), RELENG_7, i386) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-current@freebsd.org List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2007 19:37:18 -0000 Hello, Now I have an additional piece of information for this bug. Today I noticed that the second system -- which did not exhibit the problem so far -- also started collecting sockets in the SYN_RCVD state ("netstat -n"). Extrapoliting the current count and growth rate, it must have started on Saturday. The machine then had an uptime of 25 days -- about the same uptime as the first machine when it started to show this problem. Whatever triggers the bug, it seems to be uptime-related. Both machines are running with HZ=1000 (the default). A signed int variable running at HZ speed would overflow after 2^31 seconds which happens to be 24.9 days ... So it seems this is what's happening: Somewhere in the kernel (probably the TCP syncache code) there's a piece of code using uptime information in HZ resolution for timing purposes or whatever. However, it uses a signed int, maybe just for intermediate results, causing an overflow after 2^31/HZ seconds, which leads to wrong results and finally hanging sockets in the SYN_RCVD state. Could anyone familiar help me trying to locate the bug in the code? I'm pretty sure that my analysis isn't far from the truth. I'm also pretty sure that a type cast at the right place will fix it. The problem is to find the right place. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C is quirky, flawed, and an enormous success." -- Dennis M. Ritchie.