From owner-freebsd-current@FreeBSD.ORG Tue Dec 20 15:17:50 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50515106564A; Tue, 20 Dec 2011 15:17:50 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 4C5038FC25; Tue, 20 Dec 2011 15:17:48 +0000 (UTC) Received: by wgbdr11 with SMTP id dr11so12619911wgb.31 for ; Tue, 20 Dec 2011 07:17:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=YsG1CUJ/Sa8GJf9frrdySgIN5dCZjnRRyoIEKDoS0vQ=; b=xK6+4ujg/lWvYf5ujXaAJt/yk9OFT4vFj9IY3RIAoR5mH35cDLlELFImKA15O1F4Og zO66qsw5POjM8BcTh/YsHkAQ6mPvU34D/aD+lruPZhTjfJ/FJIws015rdSl0bzkSPH5v 3L+WtLkZkLmHxsZXeWUG3lhQdW2/rD7BvB3jg= MIME-Version: 1.0 Received: by 10.227.207.15 with SMTP id fw15mr2454483wbb.15.1324394267574; Tue, 20 Dec 2011 07:17:47 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.216.18.130 with HTTP; Tue, 20 Dec 2011 07:17:46 -0800 (PST) In-Reply-To: <201112200935.30772.jhb@freebsd.org> References: <4EED2F1C.2060409@zedat.fu-berlin.de> <201112200852.23300.jhb@freebsd.org> <201112200935.30772.jhb@freebsd.org> Date: Tue, 20 Dec 2011 16:17:46 +0100 X-Google-Sender-Auth: 2_yp9qY4qs3mk2bgydurTF8shng Message-ID: From: Attilio Rao To: John Baldwin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: mdf@freebsd.org, "O. Hartmann" , freebsd-current@freebsd.org, Robert Watson Subject: Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Dec 2011 15:17:50 -0000 2011/12/20 John Baldwin : > On Tuesday, December 20, 2011 9:20:09 am Attilio Rao wrote: >> 2011/12/20 John Baldwin : >> > On Saturday, December 17, 2011 10:41:15 pm mdf@freebsd.org wrote: >> >> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev = wrote: >> >> > On Sun, 18 Dec 2011 01:09:00 +0100 >> >> > "O. Hartmann" wrote: >> >> > >> >> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock >> >> >> panic: sleeping thread >> >> >> cpuid =3D 0 >> >> >> >> >> >> PID 16 is always USB on my box. >> >> > >> >> > You really need to give us a backtrace when you quote panics. It is >> >> > impossible to make any sense of the above panic message without mor= e >> >> > context. >> >> >> >> In the case of this panic, the stack of the thread which panics is >> >> useless; it's someone trying to propagate priority that discovered it= . >> >> =C2=A0A backtrace on tid 100033 would be useful. >> >> >> >> With WITNESS enabled, it's possible to have this panic display the >> >> stack of the incorrectly sleeping thread at the time it acquired the >> >> lock, as well, but this code isn't in CURRENT or any release. =C2=A0I= have >> >> a patch at $WORK I can dig up on Monday. >> > >> > Huh? =C2=A0The stock kernel dumps a stack trace of the offending threa= d if you have >> > DDB enabled: >> > >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * If the threa= d is asleep, then we are probably about >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * to deadlock.= =C2=A0To make debugging this easier, just >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * panic and te= ll the user which thread misbehaved so >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * they can hop= efully get a stack trace from the truly >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * misbehaving = thread. >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */ >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (TD_IS_SLEEP= ING(td)) { >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0printf( >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"Sleeping threa= d (tid %d, pid %d) owns a non-sleepable lock\n", >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_tid, td->td_proc->p_pid); >> > #ifdef DDB >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0db_trace_thread(td, -1); >> > #endif >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0panic("sleeping thread"); >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >> > >> > It may be that we can make use of the STACK API here instead to output= this >> > trace even when DDB isn't enabled. =C2=A0The patch below tries to do t= hat >> > (untested). =C2=A0It does some odd thigns though since it is effective= ly running >> > from a panic context already, so it uses a statically allocated 'struc= t stack' >> > rather than using stack_create() and uses stack_print_ddb() since it i= s >> > holding spin locks and can't possibly grab an sx lock: >> > >> > Index: subr_turnstile.c >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > --- subr_turnstile.c =C2=A0 =C2=A0(revision 228534) >> > +++ subr_turnstile.c =C2=A0 =C2=A0(working copy) >> > @@ -72,6 +72,7 @@ __FBSDID("$FreeBSD$"); >> > =C2=A0#include >> > =C2=A0#include >> > =C2=A0#include >> > +#include >> > =C2=A0#include >> > =C2=A0#include >> > >> > @@ -175,6 +176,7 @@ static void turnstile_fini(void *mem, int size); >> > =C2=A0static void >> > =C2=A0propagate_priority(struct thread *td) >> > =C2=A0{ >> > + =C2=A0 =C2=A0 =C2=A0 static struct stack st; >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0struct turnstile *ts; >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0int pri; >> > >> > @@ -217,8 +219,10 @@ propagate_priority(struct thread *td) >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0printf( >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"Sleeping threa= d (tid %d, pid %d) owns a non-sleepable lock\n", >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_tid, td->td_proc->p_pid); >> > -#ifdef DDB >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 db_trace_thread(td, -1); >> > +#ifdef STACK >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 stack_zero(&st); >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 stack_save_td(&st, td); >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 stack_print_ddb(&st); >> > =C2=A0#endif >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0panic("sleeping thread"); >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >> > >> > -- >> >> I'm not sure it is a wise idea to trimm the DDB part, because it is >> much more common than STACK enabled. Note that stack(9) is working if >> you define DDB too, so I'd say to do that for both. >> Also, I don't think you need the stack_zero() prior to set it. > > Err, STACK is enabled in GENERIC in release kernels but DDB is not, so I = think > STACK is the more common one. =C2=A0As far as stack_zero(), I was just be= ing paranoid. And what is the point for not having #ifdef STACK as #if defined(STACK) || defined(DDB) ? >> As we are here, however, I have a question for Robert here: do you >> think we should support the _ddb() variant of options even in the case >> DDB is not enabled in the kernel? >> Probabilly the way it is nowadays is easier to have stack(9) both >> defined for DDB and STACK, but in general I wouldn't advertise that. > > The _ddb variants are always enabled by my reading. =C2=A0They just use d= ifferent > entry points into the linker that don't use locking. My question is different: why we define them anyway even when DDB is not enabled? Attilio --=20 Peace can only be achieved by understanding - A. Einstein