From owner-freebsd-current@FreeBSD.ORG Thu Nov 17 20:54:32 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C1DA71065804; Thu, 17 Nov 2011 20:54:22 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id C2EF08FC0C; Thu, 17 Nov 2011 20:54:20 +0000 (UTC) Received: by wwg14 with SMTP id 14so3671721wwg.31 for ; Thu, 17 Nov 2011 12:54:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=zDmUEm1cvECcP8j2dN+OcgeS6b2WD6i+TlqxjXFZExE=; b=CebhFPw5z+PfVnKBhVnegh6f/8ERFYGHG+rS2M0QNfA86KZ748alpdbOAh9ekxSDMW 53VQUj7ydvseLOklB1t0m5GXTlyNMJ/QYZBMh4eCkOMJZkm8Prl/KQk2VGFRxrznvwux ImNzayuEPcpeyplxQpZg0NihAtlBtG673yN60= MIME-Version: 1.0 Received: by 10.227.205.11 with SMTP id fo11mr172272wbb.16.1321563259865; Thu, 17 Nov 2011 12:54:19 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.216.85.8 with HTTP; Thu, 17 Nov 2011 12:54:19 -0800 (PST) In-Reply-To: <4EC563BB.60209@FreeBSD.org> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <201111171137.18663.jhb@freebsd.org> <4EC53D1B.4000308@FreeBSD.org> <201111171409.37629.jhb@freebsd.org> <4EC563BB.60209@FreeBSD.org> Date: Thu, 17 Nov 2011 21:54:19 +0100 X-Google-Sender-Auth: 8CxYW3poKDwinW9YCARc-gtLoUQ Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Kostik Belousov , Alexander Motin , freebsd-current@freebsd.org Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2011 20:54:32 -0000 2011/11/17 Andriy Gapon : > on 17/11/2011 21:09 John Baldwin said the following: >> On Thursday, November 17, 2011 11:58:03 am Andriy Gapon wrote: >>> on 17/11/2011 18:37 John Baldwin said the following: >>>> On Thursday, November 17, 2011 4:47:42 am Andriy Gapon wrote: >>>>> on 17/11/2011 10:34 Andriy Gapon said the following: >>>>>> on 17/11/2011 10:15 Kostik Belousov said the following: >>>>>>> I have the following change for eons on my test boxes. Without it, >>>>>>> I simply cannot get _any_ dump. >>>>>>> >>>>>>> diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c >>>>>>> index 10b89c7..a38e42f 100644 >>>>>>> --- a/sys/cam/cam_xpt.c >>>>>>> +++ b/sys/cam/cam_xpt.c >>>>>>> @@ -4230,7 +4230,7 @@ xpt_done(union ccb *done_ccb) >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0TAILQ_INSERT_TAIL(&cam_simq, sim, links); >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&cam_simq_lock); >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0sim->flags |=3D CAM_SIM_ON_DONEQ; >>>>>>> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0if (first) >>>>>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0if (first && panicstr =3D=3D NULL) >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0swi_sched(cambio_ih, 0)= ; >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >>>>>> >>>>>> I think that this (or similar) change should go into the patch and t= he tree. >>>>>> >>>>> >>>>> And, BTW, I still would like to do something like the following (perh= aps with >>>>> td_oncpu =3D NOCPU and td_flags &=3D ~TDF_NEEDRESCHED also moved to t= he common code): >>>>> >>>>> Index: sys/kern/sched_ule.c >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/kern/sched_ule.c =C2=A0 (revision 227608) >>>>> +++ sys/kern/sched_ule.c =C2=A0 (working copy) >>>>> @@ -1790,7 +1790,6 @@ sched_switch(struct thread *td, struct thread *= new >>>>> =C2=A0 =C2=A0td->td_oncpu =3D NOCPU; >>>>> =C2=A0 =C2=A0if (!(flags & SW_PREEMPT)) >>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_flags &=3D ~TDF_NEEDR= ESCHED; >>>>> - =C2=A0td->td_owepreempt =3D 0; >>>>> =C2=A0 =C2=A0tdq->tdq_switchcnt++; >>>>> =C2=A0 =C2=A0/* >>>>> =C2=A0 =C2=A0 * The lock pointer in an idle thread should never chang= e. =C2=A0Reset it >>>>> Index: sys/kern/kern_synch.c >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/kern/kern_synch.c =C2=A0(revision 227608) >>>>> +++ sys/kern/kern_synch.c =C2=A0(working copy) >>>>> @@ -406,6 +406,8 @@ mi_switch(int flags, struct thread *newtd) >>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0("mi_switch: switch must be voluntary or i= nvoluntary")); >>>>> =C2=A0 =C2=A0KASSERT(newtd !=3D curthread, ("mi_switch: preempting ba= ck to ourself")); >>>>> >>>>> + =C2=A0td->td_owepreempt =3D 0; >>>>> + >>>>> =C2=A0 =C2=A0/* >>>>> =C2=A0 =C2=A0 * Don't perform context switches from the debugger. >>>>> =C2=A0 =C2=A0 */ >>>>> Index: sys/kern/sched_4bsd.c >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/kern/sched_4bsd.c =C2=A0(revision 227608) >>>>> +++ sys/kern/sched_4bsd.c =C2=A0(working copy) >>>>> @@ -940,7 +940,6 @@ sched_switch(struct thread *td, struct thread *ne= w >>>>> =C2=A0 =C2=A0td->td_lastcpu =3D td->td_oncpu; >>>>> =C2=A0 =C2=A0if (!(flags & SW_PREEMPT)) >>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_flags &=3D ~TDF_NEEDR= ESCHED; >>>>> - =C2=A0td->td_owepreempt =3D 0; >>>>> =C2=A0 =C2=A0td->td_oncpu =3D NOCPU; >>>>> >>>>> =C2=A0 =C2=A0/* >>>>> >>>>> Does anybody see any potential problems with such a change? >>>> >>>> Hmm, does this mean the preemption will be lost if you break into the >>>> debugger and continue in the non-panic case? >>> >>> Not sure which exact scenario you have in mind. >>> Please note that the above diff just moves resetting of td_owepreempt t= o an >>> earlier place. =C2=A0As far as I can see there are no checks of td_owep= reempt value >>> between the new place and the old places. >> >> I'm worried that you are clearing td_owepreempt even in cases where a co= ntext >> switch is not performed. =C2=A0So say you enter DDB with td_owepreempt s= et and that >> DDB bails on a context switch. =C2=A0With your change it will now clear = td_owepreempt >> and "lose" the preemption. >> > > And without the change we get the recursion and double-fault because of > kdb_switch -> thread_unlock -> =C2=A0spinlock_exit -> =C2=A0critical_exit= -> > mi_switch in this case ? > > BTW, it is my opinion that we really should not let the debugger code cal= l > mi_switch for any reason. Yes, I agree with this, this is why the sched_bind() in boot() is broken (immagine calling things like doadump from KDB. KDB right now can be thought as a first cut of this patch because it does disable the CPUs when entering the context, thus, the bug here is that if you stop all CPUs including CPU0 and later on you want bind on it you are death). We need to discuss this and the patch more extensively, I'm taking my time and hopefully will do a full review during the weekend. Attilio --=20 Peace can only be achieved by understanding - A. Einstein