From owner-svn-src-head@freebsd.org Thu Jan 18 14:23:49 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B364E7D412; Thu, 18 Jan 2018 14:23:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id C77AF6F57C; Thu, 18 Jan 2018 14:23:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 33AB4427DBC; Fri, 19 Jan 2018 01:23:41 +1100 (AEDT) Date: Fri, 19 Jan 2018 01:23:40 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andriy Gapon cc: Wojciech Macek , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r328110 - head/sys/kern In-Reply-To: Message-ID: <20180119004249.V2293@besplex.bde.org> References: <201801180738.w0I7cswv054484@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=DIX/22Fb c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=lsZ57SiuAe-1uc7pl8UA:9 a=CjuIK1q_8ugA:10 a=SRiwBZ143OQA:10 a=IjZwj45LgO3ly-622nXo:22 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jan 2018 14:23:49 -0000 On Thu, 18 Jan 2018, Andriy Gapon wrote: > On 18/01/2018 09:38, Wojciech Macek wrote: >> ... >> Log: >> KDB: restart only CPUs stopped by KDB >> >> There is a case when not all CPUs went online. In that situation, >> restart only APs which were operational before entering KDB. > > What is the context here? > I mean, what is the state of those CPUs that are not online? I think they are there but already stopped. The patch avoids trying to stop them again (which might cause problems) and starting them again (which does cause problems). I recently fixed x86 suspension abusing the stopped_cpus mask (only for resume IIRC). Mixtures of suspended and stopped CPUs still have no chance of working. > Also, it seems you allow for the situation where a CPU that was not online at > the time of kdb_trap becomes online (and running) while kdb is active? > If that's so, then it can mess up the system big time. > > I think that this is not a right solution. I have much larger patches in this area (mostly in subr_smp.c). It is essential for kdb_enter() and panic() to stop all CPUs and wait forever if necessary. Returning early leaves the other CPUs in an unknown state where they might clobber kdb invariants. Sometimes they are just looping, but this is only safe if they keep looping for the duration of the stopping and the looping doesn't cause overheating. CPUs in NMI handles cannot be stopped immediately, and at least on x86 there are races sending new (STOP) NMIs to them. I handle this case by stopping NMI handlers in a special loop > P.S. > While not a recipe for a solution, these musing may be of interest to you: > https://lists.freebsd.org/pipermail/freebsd-arch/2011-June/011373.html I didn't know that you had uncommitted musings on this. Most of it is too complicated for me. I implemented some of points 1 and 2. Bruce