From owner-freebsd-current@FreeBSD.ORG Tue Dec 6 15:21:49 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FA61106566C; Tue, 6 Dec 2011 15:21:49 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7E7898FC08; Tue, 6 Dec 2011 15:21:48 +0000 (UTC) Received: by bkat2 with SMTP id t2so10523569bka.13 for ; Tue, 06 Dec 2011 07:21:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZgZ8VtqjmroMOuyFckueuHH4N+UZw9BFdAT/cJBbppY=; b=rFIYGBrHDuW6CUOte6NxcPC2WgxtFCu4qAbhZiyXAWsjLZNyILF6z9GjK4FAyZUROq dbdweAId0u0Y1dG7gIwfzzvNuEyHw199SuF+uuB7cW7nPyt98lkC2jF0ko9X+KH+fxMB 3pEScKhqKV7p0SsR6MY+o0u98u2L7xALEdkvg= MIME-Version: 1.0 Received: by 10.216.180.209 with SMTP id j59mr2888555wem.1.1323184907201; Tue, 06 Dec 2011 07:21:47 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.216.47.211 with HTTP; Tue, 6 Dec 2011 07:21:47 -0800 (PST) In-Reply-To: <4ED951E0.9000000@FreeBSD.org> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <201112011349.50502.jhb@freebsd.org> <4ED7E6B0.30400@FreeBSD.org> <201112011553.34432.jhb@freebsd.org> <4ED7F4BC.3080206@FreeBSD.org> <4ED855E6.20207@FreeBSD.org> <4ED8A306.9020801@FreeBSD.org> <4ED8F1C1.7010206@FreeBSD.org> <4ED91B8D.2080808@FreeBSD.org> <4ED951E0.9000000@FreeBSD.org> Date: Tue, 6 Dec 2011 16:21:47 +0100 X-Google-Sender-Auth: nLm1gsjnDcIyTODJxGymg3KFrJs Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org, Konstantin Belousov Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2011 15:21:49 -0000 2011/12/2 Andriy Gapon : > on 02/12/2011 20:40 John Baldwin said the following: >> On 12/2/11 12:18 PM, Attilio Rao wrote: >>> 2011/12/2 John Baldwin: >>>> On 12/2/11 5:05 AM, Andriy Gapon wrote: >>>>> >>>>> on 02/12/2011 06:36 John Baldwin said the following: >>>>>> >>>>>> Ah, ok (I had thought SCHEDULER_STOPPED was going to always be true = when >>>>>> kdb was >>>>>> active). =C2=A0But I think these two changes should cover critical_e= xit() ok. >>>>>> >>>>> >>>>> I attempted to start a discussion about this a few times already :-) >>>>> Should we treat kdb context the same as SCHEDULER_STOPPED context (in= the >>>>> current definition) ? =C2=A0That is, skip all locks in the same fashi= on? >>>>> There are pros and contras. >>>> >>>> >>>> kdb should not block on locks, no. =C2=A0Most debugger commands should= not go >>>> near locks anyway unless they are intended to carefully modify the exi= sting >>>> system in a safe manner (such as the 'kill' command which should only = be >>>> using try locks and fail if it cannot safely post the signal). >>> >>> The biggest problem to KDB as the same as panic is that doing proper >>> 'continue' is impossible. >>> One of the features of the 'skip-locking' path is that it doesn't take >>> into account fast locking paths, where sometimes the lock can succeed >>> and other fails and you don't know about them. Also the restarted CPUs >>> can find corrupted datas (as they can be arbitrarely updated), I'm >>> sure it is too much panic prone. >> >> Yes, my thought is that kdb commands, etc. should be using dedicated rou= tines >> that do not use locks whenever possible. =C2=A0The problem of a user >> calling an arbitrary routine is not solvable (so I don't think we should= try to >> solve that, you use 'call' at your own risk), but built-in commands shou= ld >> explicitly either 1) not use locking, or 2) only use try locks and fail = out >> cleanly (including dropping any try locks acquired) if a try fails. =C2= =A0Now, that's >> an ideal view, I don't know how close we are to that in practice or if i= t is a >> realistically attainable goal. >> > > > I agree with what Attilio and you say. =C2=A0Initially it was tempting fo= r me to > apply the same SCHEDULER_STOPPED stopped medicine to the kdb_active conte= xt, but > after trying to deal with kdb_active x SCHEDULER_STOPPED x ukbd situation= I > really changed my mind. > > > I would classify the code that can be called in kdb_active context as fol= lows: > o debugger code proper (kdb, ddb, gdb stub, etc) - this obviously must no= t > (doesn't have to) use any locking > > o code that can be invoked via 'call' command - this is essentially any c= ode and > I don't think that it can/should do anything special for the kdb_active c= ontext [*] > > o debugger helper routines - those that do something trivial should not a= cquire > any locks; those that access shared resources should try the relevant loc= ks and > bail out if a resource can be in inconsistent state, or should be equippe= d to > deal correctly with such a state; this is the same as what you say above > > o common code that the debuggers have to use - most obviously this is con= sole > code and drivers that serve a particular console; on one hand those drive= rs can > have a non-trivial state that must be lock protected during normal operat= ion, on > the other hand the debugger must disregard those locks and grab its conso= le; > this is the most complex case in my opinion. Thanks for summarizing this up. However, please note that code in 2 and 4 entries may have the same issues or being the same thing, in practice. Anyway, I'm thinking now that if we really want to stop CPUs when entering KDB (and I think we do) functions at 2 and 4 should basically just be totally lockless or in general being totally re-entrant because when we restart CPUs we don't really want them finding datas to be corrupted. Also, skipping locking, is totally broken for this very specific reason. Functions at point 2 and 4 should be totally lockless then and possibly just work on read mode. For point 2, specifically, I think we need an explicit KPI to define functions within the subsystem themselves (something like DB_SHOW_COMMAND()) which marks undoublty functions to be called within ddb (the only KDB backend we implement right now) and likely for functions at point 4 we need to find a way to stress their belonging to the KDB area. Attilio --=20 Peace can only be achieved by understanding - A. Einstein