From owner-freebsd-current@FreeBSD.ORG Tue Dec 6 22:11:03 2011 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2DD2A1065670; Tue, 6 Dec 2011 22:11:03 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 480288FC17; Tue, 6 Dec 2011 22:11:02 +0000 (UTC) Received: by lagv3 with SMTP id v3so1472487lag.13 for ; Tue, 06 Dec 2011 14:11:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=mUmXWc+9NHugaaQqyaDh8vjvQTujtts/I1tR4xkS9kI=; b=WQn49LamWh3nUSt3HvCqIk9M0j3CvbZmoKTFIjkSgz0E59aPndUqBUw/p2HTq0xocW bnYe0w3mOU1LReW7TsN6ZsRCEL6IKe8TP7Vqgizmrzwfkg5XmzHSxi1uOQmq01pJaY6z 6BZRe1Pp2dHxAh/y8FT6b/U6Ir954IpfrXRzQ= MIME-Version: 1.0 Received: by 10.152.104.236 with SMTP id gh12mr10236562lab.49.1323209461166; Tue, 06 Dec 2011 14:11:01 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.152.21.104 with HTTP; Tue, 6 Dec 2011 14:11:01 -0800 (PST) In-Reply-To: <4EDE8931.1080506@FreeBSD.org> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <4EDE8931.1080506@FreeBSD.org> Date: Tue, 6 Dec 2011 23:11:01 +0100 X-Google-Sender-Auth: Qw9LDTbXrPdH4Vh3LLWZ4o-7gio Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Kostik Belousov , arch@freebsd.org, current@freebsd.org Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2011 22:11:03 -0000 2011/12/6 Andriy Gapon : > on 06/12/2011 20:34 Attilio Rao said the following: > [snip] >> - I'm not entirely sure, why we want to disable interrupts at this >> moment (before to stop other CPUs)?: > > Because I believe that stop_cpus_hard() should run in a context with inte= rrupts > and preemption disabled. =C2=A0Also, I believe that the whole panic handl= ing =C2=A0code > should run in the same context. =C2=A0So it was only natural for me to en= ter that > context at this point. I'm not against that, I don't see anything wrong with having interrupts disabled at that point. >> @@ -547,13 +555,18 @@ panic(const char *fmt, ...) >> =C2=A0{ >> =C2=A0#ifdef SMP >> =C2=A0 =C2=A0 =C2=A0 static volatile u_int panic_cpu =3D NOCPU; >> + =C2=A0 =C2=A0 cpuset_t other_cpus; >> =C2=A0#endif >> =C2=A0 =C2=A0 =C2=A0 struct thread *td =3D curthread; >> =C2=A0 =C2=A0 =C2=A0 int bootopt, newpanic; >> =C2=A0 =C2=A0 =C2=A0 va_list ap; >> =C2=A0 =C2=A0 =C2=A0 static char buf[256]; >> >> - =C2=A0 =C2=A0 critical_enter(); >> + =C2=A0 =C2=A0 if (stop_scheduler_on_panic) >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 spinlock_enter(); >> + =C2=A0 =C2=A0 else >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 critical_enter(); >> + >> >> - In this chunk I don't entirely understand the kdb_active check: >> @@ -566,11 +579,18 @@ panic(const char *fmt, ...) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PCPU_GET(= cpuid)) =3D=3D 0) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 while (panic_cpu !=3D NOCPU) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; /* nothing */ >> + =C2=A0 =C2=A0 if (stop_scheduler_on_panic) { >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (panicstr =3D=3D NULL && = !kdb_active) { >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = other_cpus =3D all_cpus; >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = CPU_CLR(PCPU_GET(cpuid), &other_cpus); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = stop_cpus_hard(other_cpus); >> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } >> + =C2=A0 =C2=A0 } >> =C2=A0#endif >> >> =C2=A0 =C2=A0 =C2=A0 bootopt =3D RB_AUTOBOOT; >> =C2=A0 =C2=A0 =C2=A0 newpanic =3D 0; >> - =C2=A0 =C2=A0 if (panicstr) >> + =C2=A0 =C2=A0 if (panicstr !=3D NULL) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bootopt |=3D RB_NOSYNC; >> =C2=A0 =C2=A0 =C2=A0 else { >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bootopt |=3D RB_DUMP; >> >> Is it for avoiding to pass an empty mask to stop_cpus() in kdb_trap() >> (I saw you changed the policy there)? > > Yes. =C2=A0You know my position about elimination of the cpuset parameter= to > stop_cpus_* and my intention to do so. =C2=A0This is related to that. =C2= =A0Right now that > check is not strictly necessary, =C2=A0but it doesn't do any harm either.= =C2=A0We know > that all other CPUs are already stopped when kdb_active (ditto for panics= tr !=3D > NULL). I see there could be races with disabiling interrupts and having 2 different stopping mechanisms that want to stop cpus, even using stop_cpus_hard(), on architectures that don't use a privileged channel (NMI) as mostly of our !x86. They are mostly very rare races (requiring kdb_trap() and panic() to happen in parallel on different CPUs). >> Maybe we can find a better integration among the two. > > What kind of integration? > Right now I have simple model - both stop all other CPUs. Well, there is no synchronization atm between panic stopping cpus and the kdb_trap(). When kdb_trap() stop cpus it has interrupts disabled and if panic already started they will both deadlock because IPI_STOP won't be properly delivered. However, I see all this as a problem with other arches not having/not implementing a real dedicated channel for cpu_stop_hard(), so we should not think about it now. I think we may need some sort of control as panic already does with panic_cpu before to disable interrupts, but don't worry about it now. >> I'd also move the setting of stop_scheduler variable in the "if", it >> seems a bug to me to have it set otherwise. > > Can you please explain what bug do you suspect here? > I do not see any. I just see more natural to move it within the above if (panicstr =3D=3D NULL ...) condition. > [snip] >> - I'm not sure I like to change the policies on cpu stopping for KDB >> with this patchset. > > I am not sure what exactly you mean by change in policies. =C2=A0I do not= see any > such change, entering kdb always stops all other CPUs, with or without th= e patch. Yes, I was confused by older code did just stopped CPUs before kdb_trap() manually, I think what kdb_trap() does now is ok (and you just retain it). I'd just change this check on panicstr: @@ -606,9 +603,13 @@ kdb_trap(int type, int code, struct trapframe *tf) intr =3D intr_disable(); #ifdef SMP - other_cpus =3D all_cpus; - CPU_CLR(PCPU_GET(cpuid), &other_cpus); - stop_cpus_hard(other_cpus); + if (panicstr =3D=3D NULL) { + other_cpus =3D all_cpus; + CPU_CLR(PCPU_GET(cpuid), &other_cpus); + stop_cpus_hard(other_cpus); + did_stop_cpus =3D 1; + } else + did_stop_cpus =3D 0; to be SCHEDULER_STOPPED(). If you agree I can fix the kern_mutex, kern_sx and kern_rwlock parts and it should be done. Attilio --=20 Peace can only be achieved by understanding - A. Einstein