Date: Thu, 17 Nov 2011 21:54:19 +0100 From: Attilio Rao <attilio@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: Kostik Belousov <kostikbel@gmail.com>, Alexander Motin <mav@freebsd.org>, freebsd-current@freebsd.org Subject: Re: Stop scheduler on panic Message-ID: <CAJ-FndB7-UeFVQhqe0sTnpJ2PhWO5ijaCLaE-6bzMU%2B8=gYXeg@mail.gmail.com> In-Reply-To: <4EC563BB.60209@FreeBSD.org> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <201111171137.18663.jhb@freebsd.org> <4EC53D1B.4000308@FreeBSD.org> <201111171409.37629.jhb@freebsd.org> <4EC563BB.60209@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/11/17 Andriy Gapon <avg@freebsd.org>: > on 17/11/2011 21:09 John Baldwin said the following: >> On Thursday, November 17, 2011 11:58:03 am Andriy Gapon wrote: >>> on 17/11/2011 18:37 John Baldwin said the following: >>>> On Thursday, November 17, 2011 4:47:42 am Andriy Gapon wrote: >>>>> on 17/11/2011 10:34 Andriy Gapon said the following: >>>>>> on 17/11/2011 10:15 Kostik Belousov said the following: >>>>>>> I have the following change for eons on my test boxes. Without it, >>>>>>> I simply cannot get _any_ dump. >>>>>>> >>>>>>> diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c >>>>>>> index 10b89c7..a38e42f 100644 >>>>>>> --- a/sys/cam/cam_xpt.c >>>>>>> +++ b/sys/cam/cam_xpt.c >>>>>>> @@ -4230,7 +4230,7 @@ xpt_done(union ccb *done_ccb) >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0TAILQ_INSERT_TAIL(&cam_simq, sim, links); >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&cam_simq_lock); >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0sim->flags |=3D CAM_SIM_ON_DONEQ; >>>>>>> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0if (first) >>>>>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0if (first && panicstr =3D=3D NULL) >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0swi_sched(cambio_ih, 0)= ; >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >>>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} >>>>>> >>>>>> I think that this (or similar) change should go into the patch and t= he tree. >>>>>> >>>>> >>>>> And, BTW, I still would like to do something like the following (perh= aps with >>>>> td_oncpu =3D NOCPU and td_flags &=3D ~TDF_NEEDRESCHED also moved to t= he common code): >>>>> >>>>> Index: sys/kern/sched_ule.c >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/kern/sched_ule.c =C2=A0 (revision 227608) >>>>> +++ sys/kern/sched_ule.c =C2=A0 (working copy) >>>>> @@ -1790,7 +1790,6 @@ sched_switch(struct thread *td, struct thread *= new >>>>> =C2=A0 =C2=A0td->td_oncpu =3D NOCPU; >>>>> =C2=A0 =C2=A0if (!(flags & SW_PREEMPT)) >>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_flags &=3D ~TDF_NEEDR= ESCHED; >>>>> - =C2=A0td->td_owepreempt =3D 0; >>>>> =C2=A0 =C2=A0tdq->tdq_switchcnt++; >>>>> =C2=A0 =C2=A0/* >>>>> =C2=A0 =C2=A0 * The lock pointer in an idle thread should never chang= e. =C2=A0Reset it >>>>> Index: sys/kern/kern_synch.c >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/kern/kern_synch.c =C2=A0(revision 227608) >>>>> +++ sys/kern/kern_synch.c =C2=A0(working copy) >>>>> @@ -406,6 +406,8 @@ mi_switch(int flags, struct thread *newtd) >>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0("mi_switch: switch must be voluntary or i= nvoluntary")); >>>>> =C2=A0 =C2=A0KASSERT(newtd !=3D curthread, ("mi_switch: preempting ba= ck to ourself")); >>>>> >>>>> + =C2=A0td->td_owepreempt =3D 0; >>>>> + >>>>> =C2=A0 =C2=A0/* >>>>> =C2=A0 =C2=A0 * Don't perform context switches from the debugger. >>>>> =C2=A0 =C2=A0 */ >>>>> Index: sys/kern/sched_4bsd.c >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> --- sys/kern/sched_4bsd.c =C2=A0(revision 227608) >>>>> +++ sys/kern/sched_4bsd.c =C2=A0(working copy) >>>>> @@ -940,7 +940,6 @@ sched_switch(struct thread *td, struct thread *ne= w >>>>> =C2=A0 =C2=A0td->td_lastcpu =3D td->td_oncpu; >>>>> =C2=A0 =C2=A0if (!(flags & SW_PREEMPT)) >>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_flags &=3D ~TDF_NEEDR= ESCHED; >>>>> - =C2=A0td->td_owepreempt =3D 0; >>>>> =C2=A0 =C2=A0td->td_oncpu =3D NOCPU; >>>>> >>>>> =C2=A0 =C2=A0/* >>>>> >>>>> Does anybody see any potential problems with such a change? >>>> >>>> Hmm, does this mean the preemption will be lost if you break into the >>>> debugger and continue in the non-panic case? >>> >>> Not sure which exact scenario you have in mind. >>> Please note that the above diff just moves resetting of td_owepreempt t= o an >>> earlier place. =C2=A0As far as I can see there are no checks of td_owep= reempt value >>> between the new place and the old places. >> >> I'm worried that you are clearing td_owepreempt even in cases where a co= ntext >> switch is not performed. =C2=A0So say you enter DDB with td_owepreempt s= et and that >> DDB bails on a context switch. =C2=A0With your change it will now clear = td_owepreempt >> and "lose" the preemption. >> > > And without the change we get the recursion and double-fault because of > kdb_switch -> thread_unlock -> =C2=A0spinlock_exit -> =C2=A0critical_exit= -> > mi_switch in this case ? > > BTW, it is my opinion that we really should not let the debugger code cal= l > mi_switch for any reason. Yes, I agree with this, this is why the sched_bind() in boot() is broken (immagine calling things like doadump from KDB. KDB right now can be thought as a first cut of this patch because it does disable the CPUs when entering the context, thus, the bug here is that if you stop all CPUs including CPU0 and later on you want bind on it you are death). We need to discuss this and the patch more extensively, I'm taking my time and hopefully will do a full review during the weekend. Attilio --=20 Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndB7-UeFVQhqe0sTnpJ2PhWO5ijaCLaE-6bzMU%2B8=gYXeg>