From owner-freebsd-arch@FreeBSD.ORG Thu Nov 17 17:28:24 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2687A1065673; Thu, 17 Nov 2011 17:28:24 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-245-Pennsylvania.hfc.comcastbusiness.net [75.149.8.245]) by mx1.freebsd.org (Postfix) with ESMTP id D239A8FC17; Thu, 17 Nov 2011 17:28:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id C776B446004; Thu, 17 Nov 2011 12:11:24 -0500 (EST) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oqOiyXUH4ZgV; Thu, 17 Nov 2011 12:11:23 -0500 (EST) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 7E560446003; Thu, 17 Nov 2011 12:11:23 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Andrew Boyer In-Reply-To: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> Date: Thu, 17 Nov 2011 12:11:07 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <0850D6DB-386B-4588-A362-D53637D25F7D@averesystems.com> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> To: Kostik Belousov X-Mailer: Apple Mail (2.1084) Cc: arch@freebsd.org, current@freebsd.org, avg@freebsd.org Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2011 17:28:24 -0000 On Nov 13, 2011, at 3:32 AM, Kostik Belousov wrote: > I was tricked into finishing the work by Andrey Gapon, who developed > the patch to reliably stop other processors on panic. The patch > greatly improves the chances of getting dump on panic on SMP host. > Several people already saw the patchset, and I remember that Andrey > posted it to some lists. >=20 > The change stops other (*) processors early upon the panic. This way, > no parallel manipulation of the kernel memory is performed by CPUs. > In particular, the kernel memory map is static. Patch prevents the > panic thread from blocking and switching out. >=20 > * - in the context of the description, other means not current. >=20 > Since other threads are not run anymore, lock owner cannot release a > lock which is required by panic thread. Due to this, we need to fake > a lock acquisition after the panic, which adds minimal overhead to the > locking cost. The patch tries to not add any overhead on the fast path > of the lock acquire. The check for the after-panic condition was > reduced to single memory access, done only when the quick cas lock > attempt failed, and braced with __unlikely compiler hint. >=20 > For now, the new mode of operation is disabled by default, since some > further USB changes are needed to make USB keyboard usable in that > environment. >=20 > With the patch, getting a dump from the machine without debugger > compiled in is much more realistic. Please comment, I will commit the > change in 2 weeks unless strong reasons not to are given. >=20 > http://people.freebsd.org/~kib/misc/stop_cpus_on_panic.1.patch >=20 We have many systems running Andriy's latest version of the patch under = 8.2. I also brought in the related USB patch; without it, the system = hangs up while dumping almost every time. With both patches in place* = it has worked flawlessly for us. -Andrew * - with one change: always do the critical_enter() / critical_exit(). = Using spinlock_enter() blocks the software watchdog, which needs to = still be active in case the dump hangs for other reasons. This is = obviously not ideal but the best solution I have right now. We also = stop all of the network interfaces at the beginning of boot(). -------------------------------------------------- Andrew Boyer aboyer@averesystems.com