From owner-freebsd-current@FreeBSD.ORG Thu Aug 26 18:18:34 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4AE0116A4D2 for ; Thu, 26 Aug 2004 18:18:34 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3D12943D4C for ; Thu, 26 Aug 2004 18:18:34 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 308F672DD4; Thu, 26 Aug 2004 11:18:34 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 2B52A72DCB; Thu, 26 Aug 2004 11:18:34 -0700 (PDT) Date: Thu, 26 Aug 2004 11:18:34 -0700 (PDT) From: Doug White To: Craig Boston In-Reply-To: <20040826145324.GA40029@nowhere> Message-ID: <20040826110809.F37301@carver.gumbysoft.com> References: <20040822115345.Y94593@carver.gumbysoft.com> <20040826145324.GA40029@nowhere> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: PLEASE TEST: IPI deadlock avoidance patch X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Aug 2004 18:18:34 -0000 On Thu, 26 Aug 2004, Craig Boston wrote: > On Sun, Aug 22, 2004 at 12:05:39PM -0700, Doug White wrote: > > If you have a reasonably fast i386 or amd64 multiprocessor and/or > > hyperthreading machine and are experiencing reproducible hangs during -j > > buildwords and other highly parallel operations, please try this patch: > > Just a follow-up to my off-list message and another data point, with > this patch I no longer get deadlocks, however I now get random data > corruption. Okay, for those of you experiencing the data corruption issue, I need to know the following: . cvsup date & time for the affect kernel(s) . branch you're tracking . revision of src/sys/kern/kern_lock.c - I'm checking for a specific set of commits here . reproduction case - applications involved and detailed description of the operation(s) involved. It would also be nice if you could set up a serial console and attempt to break into the debugger with an NMI, if your system is so equipped. You'll want to set these sysctls beforehand: machdep.panic_on_nmi=0 debug.kdb.stop_cpus=0 That should prevent the usual suspects from disrupting your entry to ddb. This usually works for me for getting into ddb in the IPI deadlock situation. If you are tracking RELENG_5, be aware the patch is NOT committed there, and cvsup will happily obliterate the changed files on next run. So be sure to reapply the patch after cvsup until the patch is merged, which should be Real Soon Now. > Disabling the second processor or falling back to an older kernel (one > from before the IPI hangs started) both fix the problem. My guess here is that there is another change that got masked by the IPI problems that are causing this, and getting SMP usable again has brought it into the light. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org