From owner-freebsd-stable@FreeBSD.ORG Tue Jul 5 15:21:26 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 155C316A41C; Tue, 5 Jul 2005 15:21:26 +0000 (GMT) (envelope-from gmulder@infotechfl.com) Received: from pigeon.infotechfl.com (mailrelay.infotechfl.com [209.251.147.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id A4AD843D4C; Tue, 5 Jul 2005 15:21:25 +0000 (GMT) (envelope-from gmulder@infotechfl.com) Received: from [172.20.0.75] (gmulder.infotechfl.com [172.20.0.75]) by pigeon.infotechfl.com (8.11.6/8.11.6) with ESMTP id j65FLO003980; Tue, 5 Jul 2005 11:21:24 -0400 Message-ID: <42CAA5AE.7050806@infotechfl.com> Date: Tue, 05 Jul 2005 11:22:22 -0400 From: Gary Mu1der User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Gleb Smirnoff References: <20050621090701.GB34406@cell.sick.ru> <20050621105154.GA36538@cell.sick.ru> <42B961B9.7A5856B3@freebsd.org> <20050623104230.GB61389@cell.sick.ru> <20050623141514.GD738@obiwan.tataz.chchile.org> <42BC5EE2.2020003@infotechfl.com> <20050627082958.GB97832@cell.sick.ru> <42C16BBF.4060107@infotechfl.com> <20050701085808.GD52023@cell.sick.ru> <42C58373.60008@infotechfl.com> <20050701201308.GD59610@cell.sick.ru> In-Reply-To: <20050701201308.GD59610@cell.sick.ru> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: panic in RELENG_5 UMA - two new stack traces X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2005 15:21:26 -0000 Gleb Smirnoff wrote: > G> >How often does it crash? Does debug.mpsafenet=0 increases stability? > G> > G> I can reproduce the crash within 60 seconds of firing off 30+ ping/arp > G> -d scripts, all running in parallel. > G> > G> debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ > G> instances of the above script and the system has been stable for over an > G> hour. > > Thanks! We definitely see that the bug is a race, not a broken logic. I am > almost sure, that you are experiencing the same bug as I described in > the beginning of the thread. > > Although there is no yet fix available for race between 'arp -d' and > outgoing packet, there is one for race between incoming ARP reply and > outgoing packet. We will probably commit it soon, after more review. Sorry to say, but it looks like debug.mpsafenet=0 reduced the frequency of the problem, but did not eliminate it. The system crashed and hung again over the weekend with very little load. There was no kernel panic, so no core files. I can leave 5.4 on this system for a week or so before installing 4.11, if you want me to continue doing diagnostics on it. Gary