Date: Sat, 2 Jul 2005 00:13:08 +0400 From: Gleb Smirnoff <glebius@FreeBSD.org> To: Gary Mu1der <gmulder@infotechfl.com> Cc: freebsd-stable@FreeBSD.org Subject: Re: panic in RELENG_5 UMA - two new stack traces Message-ID: <20050701201308.GD59610@cell.sick.ru> In-Reply-To: <42C58373.60008@infotechfl.com> References: <20050621090701.GB34406@cell.sick.ru> <20050621105154.GA36538@cell.sick.ru> <42B961B9.7A5856B3@freebsd.org> <20050623104230.GB61389@cell.sick.ru> <20050623141514.GD738@obiwan.tataz.chchile.org> <42BC5EE2.2020003@infotechfl.com> <20050627082958.GB97832@cell.sick.ru> <42C16BBF.4060107@infotechfl.com> <20050701085808.GD52023@cell.sick.ru> <42C58373.60008@infotechfl.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 01, 2005 at 01:54:59PM -0400, Gary Mu1der wrote: G> >On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G> >G> I spent the day yesterday trying to reproduce the crash that I posted G> >G> last week and you kindly replied to. This is due to the fact that I G> >G> stupidly managed to overwrite the kernel.debug that I used to generate G> >G> the stack trace. Sadly I could not cause the system to crash again with G> >G> the same sb* errors. G> >G> G> >G> I did however remove both the Berkley Packet Filter and IPFilter from G> >my G> custom kernel to try and isolate the problem. This has caused the G> >crash G> to occur in a different and more reproducible form. I have both G> >G> INVARIANTS and WITNESS enabled, as you can see from my kernel conf. G> >G> which is included at the end of this e-mail. G> >G> G> >G> Below are the latest stack traces (using bge and then fxp NICs), kernel G> >G> conf. and dmesg. Any help would be appreciated. This time I have a copy G> >G> of both the core files and corresponding kernel.debug so I can G> >hopefully G> provide you with any info you need. G> > G> >How often does it crash? Does debug.mpsafenet=0 increases stability? G> G> I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G> -d scripts, all running in parallel. G> G> debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G> instances of the above script and the system has been stable for over an G> hour. Thanks! We definitely see that the bug is a race, not a broken logic. I am almost sure, that you are experiencing the same bug as I described in the beginning of the thread. Although there is no yet fix available for race between 'arp -d' and outgoing packet, there is one for race between incoming ARP reply and outgoing packet. We will probably commit it soon, after more review. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050701201308.GD59610>