From owner-svn-src-head@freebsd.org  Thu Jan 18 14:23:49 2018
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B364E7D412;
 Thu, 18 Jan 2018 14:23:49 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id C77AF6F57C;
 Thu, 18 Jan 2018 14:23:48 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 33AB4427DBC;
 Fri, 19 Jan 2018 01:23:41 +1100 (AEDT)
Date: Fri, 19 Jan 2018 01:23:40 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Andriy Gapon <avg@freebsd.org>
cc: Wojciech Macek <wma@freebsd.org>, src-committers@freebsd.org, 
 svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject: Re: svn commit: r328110 - head/sys/kern
In-Reply-To: <bdd209e5-70f9-6c13-6c39-67daf03802cf@FreeBSD.org>
Message-ID: <20180119004249.V2293@besplex.bde.org>
References: <201801180738.w0I7cswv054484@repo.freebsd.org>
 <bdd209e5-70f9-6c13-6c39-67daf03802cf@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=DIX/22Fb c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=lsZ57SiuAe-1uc7pl8UA:9
 a=CjuIK1q_8ugA:10 a=SRiwBZ143OQA:10 a=IjZwj45LgO3ly-622nXo:22
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Jan 2018 14:23:49 -0000

On Thu, 18 Jan 2018, Andriy Gapon wrote:

> On 18/01/2018 09:38, Wojciech Macek wrote:
>> ...
>> Log:
>>   KDB: restart only CPUs stopped by KDB
>>
>>   There is a case when not all CPUs went online. In that situation,
>>   restart only APs which were operational before entering KDB.
>
> What is the context here?
> I mean, what is the state of those CPUs that are not online?

I think they are there but already stopped.  The patch avoids trying to
stop them again (which might cause problems) and starting them again
(which does cause problems).  I recently fixed x86 suspension abusing
the stopped_cpus mask (only for resume IIRC).  Mixtures of suspended and
stopped CPUs still have no chance of working.

> Also, it seems you allow for the situation where a CPU that was not online at
> the time of kdb_trap becomes online (and running) while kdb is active?
> If that's so, then it can mess up the system big time.
>
> I think that this is not a right solution.

I have much larger patches in this area (mostly in subr_smp.c).  It is
essential for kdb_enter() and panic() to stop all CPUs and wait forever
if necessary.  Returning early leaves the other CPUs in an unknown state
where they might clobber kdb invariants.  Sometimes they are just looping,
but this is only safe if they keep looping for the duration of the stopping
and the looping doesn't cause overheating.  CPUs in NMI handles cannot be
stopped immediately, and at least on x86 there are races sending new (STOP)
NMIs to them.  I handle this case by stopping NMI handlers in a special loop

> P.S.
> While not a recipe for a solution, these musing may be of interest to you:
> https://lists.freebsd.org/pipermail/freebsd-arch/2011-June/011373.html

I didn't know that you had uncommitted musings on this.  Most of it is too
complicated for me.  I implemented some of points 1 and 2.

Bruce