Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Nov 2011 22:05:29 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        mdf@freebsd.org
Cc:        Kostik Belousov <kostikbel@gmail.com>, Alexander Motin <mav@freebsd.org>, freebsd-current@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: Stop scheduler on panic
Message-ID:  <CAJ-FndDLTpognq6VTZS514XY%2Beh12LrW=XSoS5=6eF3T7KApUg@mail.gmail.com>
In-Reply-To: <CAMBSHm-jne0qFb5A9ua1N_KOuc8O9e=pjX7_iwMG2dODzpy%2B_A@mail.gmail.com>
References:  <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <201111171137.18663.jhb@freebsd.org> <4EC53D1B.4000308@FreeBSD.org> <201111171409.37629.jhb@freebsd.org> <4EC563BB.60209@FreeBSD.org> <CAJ-FndB7-UeFVQhqe0sTnpJ2PhWO5ijaCLaE-6bzMU%2B8=gYXeg@mail.gmail.com> <CAMBSHm-jne0qFb5A9ua1N_KOuc8O9e=pjX7_iwMG2dODzpy%2B_A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/11/17  <mdf@freebsd.org>:
> On Thu, Nov 17, 2011 at 12:54 PM, Attilio Rao <attilio@freebsd.org> wrote=
:
>> 2011/11/17 Andriy Gapon <avg@freebsd.org>:
>>> BTW, it is my opinion that we really should not let the debugger code c=
all
>>> mi_switch for any reason.
>>
>> Yes, I agree with this, this is why the sched_bind() in boot() is
>> broken (immagine calling things like doadump from KDB. KDB right now
>> can be thought as a first cut of this patch because it does disable
>> the CPUs when entering the context, thus, the bug here is that if you
>> stop all CPUs including CPU0 and later on you want bind on it you are
>> death).
>
> Another patch related to this area we have at $WORK:
>
> =C2=A0#if defined(SMP)
> - =C2=A0 =C2=A0 =C2=A0 /*
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0* Bind us to CPU 0 so that all shutdown code=
 runs there. =C2=A0Some
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0* systems don't shutdown properly (i.e., ACP=
I power off) if we
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0* run on another processor.
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
> - =C2=A0 =C2=A0 =C2=A0 thread_lock(curthread);
> - =C2=A0 =C2=A0 =C2=A0 sched_bind(curthread, 0);
> - =C2=A0 =C2=A0 =C2=A0 thread_unlock(curthread);
> - =C2=A0 =C2=A0 =C2=A0 KASSERT(PCPU_GET(cpuid) =3D=3D 0, ("%s: not runnin=
g on cpu 0", __func__));
> + =C2=A0 =C2=A0 =C2=A0 /*
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0* sched_bind can't be done reliably inside o=
f panic. =C2=A0cpu_reset() will
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0* rebind us in any case, more reliably.
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
> + =C2=A0 =C2=A0 =C2=A0 if (panicstr =3D=3D NULL) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Bind us to CPU=
 0 so that all shutdown code runs there. =C2=A0Some
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* systems don't =
shutdown properly (i.e., ACPI power off) if we
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* run on another=
 processor.
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 thread_lock(curthread)=
;
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sched_bind(curthread, =
0);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 thread_unlock(curthrea=
d);
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 KASSERT(PCPU_GET(cpuid=
) =3D=3D 0, ("boot: not running on cpu 0"));
> + =C2=A0 =C2=A0 =C2=A0 }
> =C2=A0#endif
> =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We're in the process of rebooting. */
> =C2=A0 =C2=A0 =C2=A0 =C2=A0rebooting =3D 1;

This doesn't cover the KDB case which is the most broken here.
(I'm a bit unsure about the name of functions and I cannot check now,
but in short):
- you enter KDB via debug.kdb.enter=3D1 (for example)
- kdb_enter() stop CPUs and if it is on CPU1 it stops CPU0
- you call functions entering boot() from KDB prompt (IIRC "call
doadump" should do it)
- boot() wants to bind on CPU0 which is turned off

This case only take care of panic, which is not enough.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDLTpognq6VTZS514XY%2Beh12LrW=XSoS5=6eF3T7KApUg>