From owner-freebsd-stable@FreeBSD.ORG Mon May 17 16:02:00 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A47CC106566C for ; Mon, 17 May 2010 16:02:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 757898FC19 for ; Mon, 17 May 2010 16:02:00 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 270A346B95; Mon, 17 May 2010 12:02:00 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 506648A027; Mon, 17 May 2010 12:01:59 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Mon, 17 May 2010 10:55:41 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <01NN32EOXMYC006UN1@tmk.com> <01NN3LDWWAQ6006QOF@tmk.com> In-Reply-To: <01NN3LDWWAQ6006QOF@tmk.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201005171055.41191.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 17 May 2010 12:01:59 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Terry Kennedy Subject: Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 May 2010 16:02:00 -0000 On Friday 14 May 2010 7:59:40 am Terry Kennedy wrote: > > > The crash was a "page fault while in kernel mode" with the current process > > > being the interrupt service routine for the bce0 GigE. Things progressed > > > reasonably until partway through the dump, when the system locked up with a > > > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the > > > same PID as reported in the main crash. > > > > Hmm. You could try changing the code to not do a nested panic in that > > case. You would update subr_turnstile.c to just return if panicstr is > > not NULL rather than calling panic. However, there is still a good > > chance you will end up deadlocking in that case. I have another patch I > > can send you next week that prevents blocking on mutexes duing a panic > > which may also help. > > Ok, I'll be glad to try that. --- //depot/vendor/freebsd/src/sys/kern/kern_mutex.c 2010/01/23 15:55:14 +++ //depot/projects/smpng/sys/kern/kern_mutex.c 2010/03/10 22:33:24 @@ -348,6 +348,15 @@ return; } + /* + * If we have already panic'd and this is the thread that called + * panic(), then don't block on any mutexes but silently succeed. + * Otherwise, the kernel will deadlock since the scheduler isn't + * going to run the thread that holds the lock we need. + */ + if (panicstr != NULL && curthread->td_flags & TDF_INPANIC) + return; + lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime); if (LOCK_LOG_TEST(&m->lock_object, opts)) @@ -664,6 +673,15 @@ } /* + * If we failed to unlock this lock and we are a thread that has + * called panic(), it may be due to the bypass in _mtx_lock_sleep() + * above. In that case, just return and leave the lock alone to + * avoid changing the state. + */ + if (panicstr != NULL && curthread->td_flags & TDF_INPANIC) + return; + + /* * We have to lock the chain before the turnstile so this turnstile * can be removed from the hash list if it is empty. */ > > > 3) Is there any way to rig the system to obtain more info if this happens > > > again? Right now I'm using an embedded remote console server, but I could > > > switch the system to a serial port if enabling the kernel debugger might help. > > > But I think that the sleeping thread bit would happen even at the debugger > > > prompt, wouldn't it? > > > > Include DDB and enable the 'trace_on_panic' sysctl knob perhaps. > > Hmmm. Do you think it will get very far before the sleeping thread business > locks it up? It should be able to print the backtrace when it panics at least. -- John Baldwin