Date: Mon, 18 Oct 2004 18:24:06 -0400 (EDT) From: Robert Watson <rwatson@freebsd.org> To: Vlad <marchenko@gmail.com> Cc: Marc UBM Bocklet <ubm@u-boot-man.de> Subject: Re: [BETA7-panic] sodealloc(): so_count 1 Message-ID: <Pine.NEB.3.96L.1041018182045.47572H-100000@fledge.watson.org> In-Reply-To: <cd70c6810410171416149f3b2@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 17 Oct 2004, Vlad wrote: > is there a specific condition when that happens? I tried to simulate > heavy tcp traffic from number of sources but could not induct the panic > by such artificial traffic. It happened to me only in 'natural' way ;) > > so maybe if you know exactly how to trigger it, and share that with us, > we could do some workaround on live production servers so it doesn't > happen, until it's fixed in the code? I've merged a likely fix to the problem to HEAD as of a minute or two ago, which broadens the scope of the accept mutex to reduce the opportunity for races (it both expands the coverage to some additional reference operations, and also avoids dropping a lock to reorder). With this change in place, I'm no longer able to easily reproduce the problem -- I've had a couple of SMP boxes running for an hour or two trying without success. Previously I had reproduction time with just the right traffic down to a second or two. I'll merge the fix to RELENG_5 shortly for merge to RELENG_5_3 before 5.3 goes out the door. Obviously, any help in getting testing exposure for this change, as it comes very late in the release cycle, would be most welcome. A copy of the patch can be found at: http://www.watson.org/~robert/freebsd/netperf/20041018-sofree-race-fix.diff A complete description can be found in the commit message. Thanks to everyone who has helped diagnosis and fix this! Hopefully we've got the right fix now, although obviously as the next few days of testing play out, we'll see. Thanks, Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research > > > > The good news and the bad news: after spending a day or two hacking up an > > IP stack simulator to simulate various nasty combinations of TCP packets, > > I've managed to reproduce the problem, and am able to get a core. I'm > > currently working on tracking down the problem. > > -- > Vlad >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041018182045.47572H-100000>