From owner-freebsd-stable@FreeBSD.ORG Fri May 14 11:53:25 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C64701065672 for ; Fri, 14 May 2010 11:53:25 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 9894B8FC1C for ; Fri, 14 May 2010 11:53:25 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 466FA46B99; Fri, 14 May 2010 07:53:25 -0400 (EDT) Received: from John-Baldwins-Macbook-Pro.local (localhost [IPv6:::1]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 5DAC28A021; Fri, 14 May 2010 07:53:24 -0400 (EDT) Message-ID: <4BED3912.9080509@FreeBSD.org> Date: Fri, 14 May 2010 07:50:42 -0400 From: John Baldwin User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: Terry Kennedy References: <01NN32EOXMYC006UN1@tmk.com> In-Reply-To: <01NN32EOXMYC006UN1@tmk.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 14 May 2010 07:53:24 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=BAYES_00,NO_RELAYS autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-stable@freebsd.org Subject: Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 11:53:25 -0000 Terry Kennedy wrote: > I'm reposting this over here at the suggestion of the Forums moderator. > The original post is at http://forums.freebsd.org/showthread.php?t=14163 > > Got an interesting crash just now (well, as interesting as a crash on a > soon-to-be production system can be). > > This is 8-STABLE/amd64, last cvsup'd early in the morning of May 9th. > > The system didn't complete the crash dump, so it needed a manual reset to get > it going again. > > The crash was a "page fault while in kernel mode" with the current process > being the interrupt service routine for the bce0 GigE. Things progressed > reasonably until partway through the dump, when the system locked up with a > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the > same PID as reported in the main crash. Hmm. You could try changing the code to not do a nested panic in that case. You would update subr_turnstile.c to just return if panicstr is not NULL rather than calling panic. However, there is still a good chance you will end up deadlocking in that case. I have another patch I can send you next week that prevents blocking on mutexes duing a panic which may also help. > 3) Is there any way to rig the system to obtain more info if this happens > again? Right now I'm using an embedded remote console server, but I could > switch the system to a serial port if enabling the kernel debugger might help. > But I think that the sleeping thread bit would happen even at the debugger > prompt, wouldn't it? Include DDB and enable the 'trace_on_panic' sysctl knob perhaps. > I just booted the new kernel and tried this again, and got another crash. The > message is identical to the first, except that the instruction pointer changed > by 0x10 (presumably due to code differences between the old and new kernels) > and it got 6MB further writing the crash dump. > > Since it seems I can reproduce this at will, I'll be glad to either perform > additional information-gathering or give a developer access to the box for > testing purposes. > > Is it possible to correlate the source line in the kernel with the instruction > pointer in the panic? If you are booted into the same kernel with the same modules loaded, you can probably run 'kgdb' as root do 'l *'. -- John Baldwin