Date: Mon, 17 May 2010 10:55:41 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-stable@freebsd.org Cc: Terry Kennedy <TERRY@tmk.com> Subject: Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write Message-ID: <201005171055.41191.jhb@freebsd.org> In-Reply-To: <01NN3LDWWAQ6006QOF@tmk.com> References: <01NN32EOXMYC006UN1@tmk.com> <01NN3LDWWAQ6006QOF@tmk.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 14 May 2010 7:59:40 am Terry Kennedy wrote: > > > The crash was a "page fault while in kernel mode" with the current process > > > being the interrupt service routine for the bce0 GigE. Things progressed > > > reasonably until partway through the dump, when the system locked up with a > > > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the > > > same PID as reported in the main crash. > > > > Hmm. You could try changing the code to not do a nested panic in that > > case. You would update subr_turnstile.c to just return if panicstr is > > not NULL rather than calling panic. However, there is still a good > > chance you will end up deadlocking in that case. I have another patch I > > can send you next week that prevents blocking on mutexes duing a panic > > which may also help. > > Ok, I'll be glad to try that. --- //depot/vendor/freebsd/src/sys/kern/kern_mutex.c 2010/01/23 15:55:14 +++ //depot/projects/smpng/sys/kern/kern_mutex.c 2010/03/10 22:33:24 @@ -348,6 +348,15 @@ return; } + /* + * If we have already panic'd and this is the thread that called + * panic(), then don't block on any mutexes but silently succeed. + * Otherwise, the kernel will deadlock since the scheduler isn't + * going to run the thread that holds the lock we need. + */ + if (panicstr != NULL && curthread->td_flags & TDF_INPANIC) + return; + lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime); if (LOCK_LOG_TEST(&m->lock_object, opts)) @@ -664,6 +673,15 @@ } /* + * If we failed to unlock this lock and we are a thread that has + * called panic(), it may be due to the bypass in _mtx_lock_sleep() + * above. In that case, just return and leave the lock alone to + * avoid changing the state. + */ + if (panicstr != NULL && curthread->td_flags & TDF_INPANIC) + return; + + /* * We have to lock the chain before the turnstile so this turnstile * can be removed from the hash list if it is empty. */ > > > 3) Is there any way to rig the system to obtain more info if this happens > > > again? Right now I'm using an embedded remote console server, but I could > > > switch the system to a serial port if enabling the kernel debugger might help. > > > But I think that the sleeping thread bit would happen even at the debugger > > > prompt, wouldn't it? > > > > Include DDB and enable the 'trace_on_panic' sysctl knob perhaps. > > Hmmm. Do you think it will get very far before the sleeping thread business > locks it up? It should be able to print the backtrace when it panics at least. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201005171055.41191.jhb>