From owner-freebsd-current@FreeBSD.ORG Mon Oct 18 22:24:14 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E612616A4CE; Mon, 18 Oct 2004 22:24:14 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6D7FB43D1D; Mon, 18 Oct 2004 22:24:14 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i9IMO6Cm057256; Mon, 18 Oct 2004 18:24:07 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i9IMO6CU057253; Mon, 18 Oct 2004 18:24:06 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Mon, 18 Oct 2004 18:24:06 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Vlad In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: scottl@freebsd.org cc: current@freebsd.org cc: Marc UBM Bocklet Subject: Re: [BETA7-panic] sodealloc(): so_count 1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Oct 2004 22:24:15 -0000 On Sun, 17 Oct 2004, Vlad wrote: > is there a specific condition when that happens? I tried to simulate > heavy tcp traffic from number of sources but could not induct the panic > by such artificial traffic. It happened to me only in 'natural' way ;) > > so maybe if you know exactly how to trigger it, and share that with us, > we could do some workaround on live production servers so it doesn't > happen, until it's fixed in the code? I've merged a likely fix to the problem to HEAD as of a minute or two ago, which broadens the scope of the accept mutex to reduce the opportunity for races (it both expands the coverage to some additional reference operations, and also avoids dropping a lock to reorder). With this change in place, I'm no longer able to easily reproduce the problem -- I've had a couple of SMP boxes running for an hour or two trying without success. Previously I had reproduction time with just the right traffic down to a second or two. I'll merge the fix to RELENG_5 shortly for merge to RELENG_5_3 before 5.3 goes out the door. Obviously, any help in getting testing exposure for this change, as it comes very late in the release cycle, would be most welcome. A copy of the patch can be found at: http://www.watson.org/~robert/freebsd/netperf/20041018-sofree-race-fix.diff A complete description can be found in the commit message. Thanks to everyone who has helped diagnosis and fix this! Hopefully we've got the right fix now, although obviously as the next few days of testing play out, we'll see. Thanks, Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research > > > > The good news and the bad news: after spending a day or two hacking up an > > IP stack simulator to simulate various nasty combinations of TCP packets, > > I've managed to reproduce the problem, and am able to get a core. I'm > > currently working on tracking down the problem. > > -- > Vlad >