Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Jan 2018 01:23:40 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        Wojciech Macek <wma@freebsd.org>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r328110 - head/sys/kern
Message-ID:  <20180119004249.V2293@besplex.bde.org>
In-Reply-To: <bdd209e5-70f9-6c13-6c39-67daf03802cf@FreeBSD.org>
References:  <201801180738.w0I7cswv054484@repo.freebsd.org> <bdd209e5-70f9-6c13-6c39-67daf03802cf@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 18 Jan 2018, Andriy Gapon wrote:

> On 18/01/2018 09:38, Wojciech Macek wrote:
>> ...
>> Log:
>>   KDB: restart only CPUs stopped by KDB
>>
>>   There is a case when not all CPUs went online. In that situation,
>>   restart only APs which were operational before entering KDB.
>
> What is the context here?
> I mean, what is the state of those CPUs that are not online?

I think they are there but already stopped.  The patch avoids trying to
stop them again (which might cause problems) and starting them again
(which does cause problems).  I recently fixed x86 suspension abusing
the stopped_cpus mask (only for resume IIRC).  Mixtures of suspended and
stopped CPUs still have no chance of working.

> Also, it seems you allow for the situation where a CPU that was not online at
> the time of kdb_trap becomes online (and running) while kdb is active?
> If that's so, then it can mess up the system big time.
>
> I think that this is not a right solution.

I have much larger patches in this area (mostly in subr_smp.c).  It is
essential for kdb_enter() and panic() to stop all CPUs and wait forever
if necessary.  Returning early leaves the other CPUs in an unknown state
where they might clobber kdb invariants.  Sometimes they are just looping,
but this is only safe if they keep looping for the duration of the stopping
and the looping doesn't cause overheating.  CPUs in NMI handles cannot be
stopped immediately, and at least on x86 there are races sending new (STOP)
NMIs to them.  I handle this case by stopping NMI handlers in a special loop

> P.S.
> While not a recipe for a solution, these musing may be of interest to you:
> https://lists.freebsd.org/pipermail/freebsd-arch/2011-June/011373.html

I didn't know that you had uncommitted musings on this.  Most of it is too
complicated for me.  I implemented some of points 1 and 2.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180119004249.V2293>