From owner-freebsd-current@FreeBSD.ORG Tue Dec 20 14:52:51 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 222461065672; Tue, 20 Dec 2011 14:52:51 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id DABA58FC15; Tue, 20 Dec 2011 14:52:50 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 8E45C46B0A; Tue, 20 Dec 2011 09:52:50 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 15AF8B93E; Tue, 20 Dec 2011 09:52:50 -0500 (EST) From: John Baldwin To: Attilio Rao Date: Tue, 20 Dec 2011 09:35:30 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <4EED2F1C.2060409@zedat.fu-berlin.de> <201112200852.23300.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201112200935.30772.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 20 Dec 2011 09:52:50 -0500 (EST) Cc: mdf@freebsd.org, "O. Hartmann" , freebsd-current@freebsd.org, Robert Watson Subject: Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 14:52:51 -0000 On Tuesday, December 20, 2011 9:20:09 am Attilio Rao wrote: > 2011/12/20 John Baldwin : > > On Saturday, December 17, 2011 10:41:15 pm mdf@freebsd.org wrote: > >> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev wrote: > >> > On Sun, 18 Dec 2011 01:09:00 +0100 > >> > "O. Hartmann" wrote: > >> > > >> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock > >> >> panic: sleeping thread > >> >> cpuid = 0 > >> >> > >> >> PID 16 is always USB on my box. > >> > > >> > You really need to give us a backtrace when you quote panics. It is > >> > impossible to make any sense of the above panic message without more > >> > context. > >> > >> In the case of this panic, the stack of the thread which panics is > >> useless; it's someone trying to propagate priority that discovered it. > >> A backtrace on tid 100033 would be useful. > >> > >> With WITNESS enabled, it's possible to have this panic display the > >> stack of the incorrectly sleeping thread at the time it acquired the > >> lock, as well, but this code isn't in CURRENT or any release. I have > >> a patch at $WORK I can dig up on Monday. > > > > Huh? The stock kernel dumps a stack trace of the offending thread if you have > > DDB enabled: > > > > /* > > * If the thread is asleep, then we are probably about > > * to deadlock. To make debugging this easier, just > > * panic and tell the user which thread misbehaved so > > * they can hopefully get a stack trace from the truly > > * misbehaving thread. > > */ > > if (TD_IS_SLEEPING(td)) { > > printf( > > "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n", > > td->td_tid, td->td_proc->p_pid); > > #ifdef DDB > > db_trace_thread(td, -1); > > #endif > > panic("sleeping thread"); > > } > > > > It may be that we can make use of the STACK API here instead to output this > > trace even when DDB isn't enabled. The patch below tries to do that > > (untested). It does some odd thigns though since it is effectively running > > from a panic context already, so it uses a statically allocated 'struct stack' > > rather than using stack_create() and uses stack_print_ddb() since it is > > holding spin locks and can't possibly grab an sx lock: > > > > Index: subr_turnstile.c > > =================================================================== > > --- subr_turnstile.c (revision 228534) > > +++ subr_turnstile.c (working copy) > > @@ -72,6 +72,7 @@ __FBSDID("$FreeBSD$"); > > #include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -175,6 +176,7 @@ static void turnstile_fini(void *mem, int size); > > static void > > propagate_priority(struct thread *td) > > { > > + static struct stack st; > > struct turnstile *ts; > > int pri; > > > > @@ -217,8 +219,10 @@ propagate_priority(struct thread *td) > > printf( > > "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n", > > td->td_tid, td->td_proc->p_pid); > > -#ifdef DDB > > - db_trace_thread(td, -1); > > +#ifdef STACK > > + stack_zero(&st); > > + stack_save_td(&st, td); > > + stack_print_ddb(&st); > > #endif > > panic("sleeping thread"); > > } > > > > -- > > I'm not sure it is a wise idea to trimm the DDB part, because it is > much more common than STACK enabled. Note that stack(9) is working if > you define DDB too, so I'd say to do that for both. > Also, I don't think you need the stack_zero() prior to set it. Err, STACK is enabled in GENERIC in release kernels but DDB is not, so I think STACK is the more common one. As far as stack_zero(), I was just being paranoid. > As we are here, however, I have a question for Robert here: do you > think we should support the _ddb() variant of options even in the case > DDB is not enabled in the kernel? > Probabilly the way it is nowadays is easier to have stack(9) both > defined for DDB and STACK, but in general I wouldn't advertise that. The _ddb variants are always enabled by my reading. They just use different entry points into the linker that don't use locking. -- John Baldwin