From owner-freebsd-current@FreeBSD.ORG Fri Nov 18 21:59:38 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EBFC106564A; Fri, 18 Nov 2011 21:59:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 16DA88FC08; Fri, 18 Nov 2011 21:59:36 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA14238; Fri, 18 Nov 2011 23:59:34 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1RRWT8-000FOG-Fn; Fri, 18 Nov 2011 23:59:34 +0200 Message-ID: <4EC6D544.5040803@FreeBSD.org> Date: Fri, 18 Nov 2011 23:59:32 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111108 Thunderbird/8.0 MIME-Version: 1.0 To: John Baldwin References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <4EC563BB.60209@FreeBSD.org> <201111171635.07278.jhb@freebsd.org> <201111171638.03208.jhb@freebsd.org> In-Reply-To: <201111171638.03208.jhb@freebsd.org> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Kostik Belousov , Alexander Motin , freebsd-current@FreeBSD.org Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Nov 2011 21:59:38 -0000 on 17/11/2011 23:38 John Baldwin said the following: > On Thursday, November 17, 2011 4:35:07 pm John Baldwin wrote: >> Hmmm, you could also make critical_exit() not perform deferred preemptions >> if SCHEDULER_STOPPED? That would fix the recursion and still let the >> preemption "work" when resuming from the debugger? Yes, that's a good solution, I think. I just didn't want to touch such a "low level" code, but I guess there is no rational reason for that. > Or you could let the debugger run in a permament critical section (though > perhaps that breaks too many other things like debugger routines that try > to use locks like the 'kill' command (which is useful!)). Effectively what you > are trying to do is having the debugger run in a critical section until the > debugger is exited. It would be cleanest to let it run that way explicitly > if possible since then you don't have to catch as many edge cases. I like this idea, but likely it would take some effort to get done. Related to this is something that I attempted to discuss before. I think that because the debugger acts on a frozen system image (the debugger is a sole actor and observer), the debugger should try to minimize its interaction with the debugged system. In this vein I think that the debugger should also bypass any locks just like with SCHEDULER_STOPPED. The debugger should also be careful to note a state of any subsystems that it uses (e.g. the keyboard) and return them to the initial state if it had to be changed. But that's a bit different story. And I really get beyond my knowledge zone when I try to think about things like handling 'call func_xxxx' in the debugger where func_xxxx may want to acquire some locks or noticeably change some state within a system. But to continue about the locks... I have this idea to re-implement SCHEDULER_STOPPED as some more general check that could be abstractly denoted as LOCKING_POLICY_CHECK(context). Where the context could be defined by flags like normal, in-panic, in-debugger, etc. And the locking policies could be: normal, bypass, warn, panic, etc. However, I am not sure if this could be useful (and doable properly) in practice. I am just concerned with the interaction between the debugger and the locks. It still seems to me inconsistent that we are going with SCHEDULER_STOPPED for panic, but we are continuing to use "if (!kdb_active)" around some locks that could be problematic under kdb (e.g. in USB). In my opinion the amount of code shared between normal context and kdb context is about the same as amount of code shared between normal context and panic context. But I haven't really quantified those. -- Andriy Gapon