From owner-freebsd-current@FreeBSD.ORG  Fri Dec  2 22:32:07 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E7DAD106564A;
	Fri,  2 Dec 2011 22:32:06 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id AFCBD8FC12;
	Fri,  2 Dec 2011 22:32:05 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA07476;
	Sat, 03 Dec 2011 00:32:03 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1RWbeF-000B5l-Gb; Sat, 03 Dec 2011 00:32:03 +0200
Message-ID: <4ED951E0.9000000@FreeBSD.org>
Date: Sat, 03 Dec 2011 00:32:00 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:8.0) Gecko/20111108 Thunderbird/8.0
MIME-Version: 1.0
To: John Baldwin <jhb@FreeBSD.org>
References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua>
	<201112011349.50502.jhb@freebsd.org> <4ED7E6B0.30400@FreeBSD.org>
	<201112011553.34432.jhb@freebsd.org>
	<4ED7F4BC.3080206@FreeBSD.org> <4ED855E6.20207@FreeBSD.org>
	<4ED8A306.9020801@FreeBSD.org> <4ED8F1C1.7010206@FreeBSD.org>
	<CAJ-FndCBXXGG+ihS_rVfM5TqcopHABg80U0my9PxguYY8QBD=Q@mail.gmail.com>
	<4ED91B8D.2080808@FreeBSD.org>
In-Reply-To: <4ED91B8D.2080808@FreeBSD.org>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Attilio Rao <attilio@FreeBSD.org>, freebsd-current@FreeBSD.org,
	Konstantin Belousov <kib@FreeBSD.org>
Subject: Re: Stop scheduler on panic
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Dec 2011 22:32:07 -0000

on 02/12/2011 20:40 John Baldwin said the following:
> On 12/2/11 12:18 PM, Attilio Rao wrote:
>> 2011/12/2 John Baldwin<jhb@freebsd.org>:
>>> On 12/2/11 5:05 AM, Andriy Gapon wrote:
>>>>
>>>> on 02/12/2011 06:36 John Baldwin said the following:
>>>>>
>>>>> Ah, ok (I had thought SCHEDULER_STOPPED was going to always be true when
>>>>> kdb was
>>>>> active).  But I think these two changes should cover critical_exit() ok.
>>>>>
>>>>
>>>> I attempted to start a discussion about this a few times already :-)
>>>> Should we treat kdb context the same as SCHEDULER_STOPPED context (in the
>>>> current definition) ?  That is, skip all locks in the same fashion?
>>>> There are pros and contras.
>>>
>>>
>>> kdb should not block on locks, no.  Most debugger commands should not go
>>> near locks anyway unless they are intended to carefully modify the existing
>>> system in a safe manner (such as the 'kill' command which should only be
>>> using try locks and fail if it cannot safely post the signal).
>>
>> The biggest problem to KDB as the same as panic is that doing proper
>> 'continue' is impossible.
>> One of the features of the 'skip-locking' path is that it doesn't take
>> into account fast locking paths, where sometimes the lock can succeed
>> and other fails and you don't know about them. Also the restarted CPUs
>> can find corrupted datas (as they can be arbitrarely updated), I'm
>> sure it is too much panic prone.
> 
> Yes, my thought is that kdb commands, etc. should be using dedicated routines
> that do not use locks whenever possible.  The problem of a user
> calling an arbitrary routine is not solvable (so I don't think we should try to
> solve that, you use 'call' at your own risk), but built-in commands should
> explicitly either 1) not use locking, or 2) only use try locks and fail out
> cleanly (including dropping any try locks acquired) if a try fails.  Now, that's
> an ideal view, I don't know how close we are to that in practice or if it is a
> realistically attainable goal.
> 


I agree with what Attilio and you say.  Initially it was tempting for me to
apply the same SCHEDULER_STOPPED stopped medicine to the kdb_active context, but
after trying to deal with kdb_active x SCHEDULER_STOPPED x ukbd situation I
really changed my mind.


I would classify the code that can be called in kdb_active context as follows:
o debugger code proper (kdb, ddb, gdb stub, etc) - this obviously must not
(doesn't have to) use any locking

o code that can be invoked via 'call' command - this is essentially any code and
I don't think that it can/should do anything special for the kdb_active context [*]

o debugger helper routines - those that do something trivial should not acquire
any locks; those that access shared resources should try the relevant locks and
bail out if a resource can be in inconsistent state, or should be equipped to
deal correctly with such a state; this is the same as what you say above

o common code that the debuggers have to use - most obviously this is console
code and drivers that serve a particular console; on one hand those drivers can
have a non-trivial state that must be lock protected during normal operation, on
the other hand the debugger must disregard those locks and grab its console;
this is the most complex case in my opinion.

Dealing with panics is much simpler, because it's a one way road to a system
reset.  Possibility to enter and exit debugger implies additional complications.
So it doesn't look like SCHEDULER_STOPPED can be used equivalently for panic and
for kdb_active.  kdb_active requires more elaborate handling.

[*] - but currently we depend on some "general purpose" routines to be
'callable' from debugger where we should really have a debugger command; the
most popular example is 'call doadump'.
-- 
Andriy Gapon