From owner-freebsd-current@FreeBSD.ORG Thu Nov 17 21:05:31 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5493E106566C; Thu, 17 Nov 2011 21:05:31 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 689598FC18; Thu, 17 Nov 2011 21:05:29 +0000 (UTC) Received: by wwg14 with SMTP id 14so3692494wwg.31 for ; Thu, 17 Nov 2011 13:05:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=D3fW5UCMKPV81t32YmdD1TJu4NU8wHdctkclM8vigZs=; b=HuwZ+CzAZbpOSeHKnnUg9z/MTzJoD/kS3EJFNwYjQGoqkr99oSDQu0lLut0tlg4HGr G10nd7/MutCE3LjD4A17ljLuToAPRGJ1YEddjAnC9lwPE5a9az1ZFzdsQTtJX3ajS+1b VZhBT8NMbBYMaO+LeLzBbQiR7JPdmoeqezZx4= MIME-Version: 1.0 Received: by 10.227.205.11 with SMTP id fo11mr194328wbb.16.1321563929227; Thu, 17 Nov 2011 13:05:29 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.216.85.8 with HTTP; Thu, 17 Nov 2011 13:05:29 -0800 (PST) In-Reply-To: References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <201111171137.18663.jhb@freebsd.org> <4EC53D1B.4000308@FreeBSD.org> <201111171409.37629.jhb@freebsd.org> <4EC563BB.60209@FreeBSD.org> Date: Thu, 17 Nov 2011 22:05:29 +0100 X-Google-Sender-Auth: mgQ3_2WR5C45wHkpQ8LrlDVEbUY Message-ID: From: Attilio Rao To: mdf@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Kostik Belousov , Alexander Motin , freebsd-current@freebsd.org, Andriy Gapon Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2011 21:05:31 -0000 2011/11/17 : > On Thu, Nov 17, 2011 at 12:54 PM, Attilio Rao wrote= : >> 2011/11/17 Andriy Gapon : >>> BTW, it is my opinion that we really should not let the debugger code c= all >>> mi_switch for any reason. >> >> Yes, I agree with this, this is why the sched_bind() in boot() is >> broken (immagine calling things like doadump from KDB. KDB right now >> can be thought as a first cut of this patch because it does disable >> the CPUs when entering the context, thus, the bug here is that if you >> stop all CPUs including CPU0 and later on you want bind on it you are >> death). > > Another patch related to this area we have at $WORK: > > =C2=A0#if defined(SMP) > - =C2=A0 =C2=A0 =C2=A0 /* > - =C2=A0 =C2=A0 =C2=A0 =C2=A0* Bind us to CPU 0 so that all shutdown code= runs there. =C2=A0Some > - =C2=A0 =C2=A0 =C2=A0 =C2=A0* systems don't shutdown properly (i.e., ACP= I power off) if we > - =C2=A0 =C2=A0 =C2=A0 =C2=A0* run on another processor. > - =C2=A0 =C2=A0 =C2=A0 =C2=A0*/ > - =C2=A0 =C2=A0 =C2=A0 thread_lock(curthread); > - =C2=A0 =C2=A0 =C2=A0 sched_bind(curthread, 0); > - =C2=A0 =C2=A0 =C2=A0 thread_unlock(curthread); > - =C2=A0 =C2=A0 =C2=A0 KASSERT(PCPU_GET(cpuid) =3D=3D 0, ("%s: not runnin= g on cpu 0", __func__)); > + =C2=A0 =C2=A0 =C2=A0 /* > + =C2=A0 =C2=A0 =C2=A0 =C2=A0* sched_bind can't be done reliably inside o= f panic. =C2=A0cpu_reset() will > + =C2=A0 =C2=A0 =C2=A0 =C2=A0* rebind us in any case, more reliably. > + =C2=A0 =C2=A0 =C2=A0 =C2=A0*/ > + =C2=A0 =C2=A0 =C2=A0 if (panicstr =3D=3D NULL) { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Bind us to CPU= 0 so that all shutdown code runs there. =C2=A0Some > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* systems don't = shutdown properly (i.e., ACPI power off) if we > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* run on another= processor. > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/ > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 thread_lock(curthread)= ; > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sched_bind(curthread, = 0); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 thread_unlock(curthrea= d); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 KASSERT(PCPU_GET(cpuid= ) =3D=3D 0, ("boot: not running on cpu 0")); > + =C2=A0 =C2=A0 =C2=A0 } > =C2=A0#endif > =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We're in the process of rebooting. */ > =C2=A0 =C2=A0 =C2=A0 =C2=A0rebooting =3D 1; This doesn't cover the KDB case which is the most broken here. (I'm a bit unsure about the name of functions and I cannot check now, but in short): - you enter KDB via debug.kdb.enter=3D1 (for example) - kdb_enter() stop CPUs and if it is on CPU1 it stops CPU0 - you call functions entering boot() from KDB prompt (IIRC "call doadump" should do it) - boot() wants to bind on CPU0 which is turned off This case only take care of panic, which is not enough. Attilio --=20 Peace can only be achieved by understanding - A. Einstein