From owner-freebsd-current@FreeBSD.ORG Mon Nov 14 13:09:03 2011 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C75081065670; Mon, 14 Nov 2011 13:09:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id C10DC8FC1A; Mon, 14 Nov 2011 13:09:02 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id pAED8vkE068215 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 14 Nov 2011 15:08:57 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id pAED8v7G098385; Mon, 14 Nov 2011 15:08:57 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id pAED8vKg098384; Mon, 14 Nov 2011 15:08:57 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 14 Nov 2011 15:08:57 +0200 From: Kostik Belousov To: Andriy Gapon Message-ID: <20111114130857.GC50300@deviant.kiev.zoral.com.ua> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <4EC0E838.50609@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="YTXTVaVjKhDLKTix" Content-Disposition: inline In-Reply-To: <4EC0E838.50609@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: arch@freebsd.org, current@freebsd.org Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2011 13:09:03 -0000 --YTXTVaVjKhDLKTix Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Nov 14, 2011 at 12:06:48PM +0200, Andriy Gapon wrote: > on 13/11/2011 10:32 Kostik Belousov said the following: > > I was tricked into finishing the work by Andrey Gapon, who developed the > > patch to reliably stop other processors on panic. The patch greatly > > improves the chances of getting dump on panic on SMP host. Several peop= le > > already saw the patchset, and I remember that Andrey posted it to some > > lists. > >=20 > > The change stops other (*) processors early upon the panic. This way, = no > > parallel manipulation of the kernel memory is performed by CPUs. In > > particular, the kernel memory map is static. Patch prevents the panic > > thread from blocking and switching out. > >=20 > > * - in the context of the description, other means not current. > >=20 > > Since other threads are not run anymore, lock owner cannot release a lo= ck > > which is required by panic thread. Due to this, we need to fake a lock > > acquisition after the panic, which adds minimal overhead to the locking > > cost. The patch tries to not add any overhead on the fast path of the l= ock > > acquire. The check for the after-panic condition was reduced to single > > memory access, done only when the quick cas lock attempt failed, and br= aced > > with __unlikely compiler hint. > >=20 > > For now, the new mode of operation is disabled by default, since some= =20 > > further USB changes are needed to make USB keyboard usable in that=20 > > environment. > >=20 > > With the patch, getting a dump from the machine without debugger compil= ed > > in is much more realistic. Please comment, I will commit the change in= 2 > > weeks unless strong reasons not to are given. > >=20 > > http://people.freebsd.org/~kib/misc/stop_cpus_on_panic.1.patch > >=20 >=20 >=20 > On a more serious note: > - some code in my latest version of the patch was contributed by or was b= ased > on the code or ideas contributed by jhb and mdf (so that attributions are= not > lost) Please provide me with proper attribution for contributors and testers. > - there was a concern about how sync-on-panic would work >=20 > About the latter, I have never really tested it. mdf has suggested to > move the sync-on-panic code to a place after we ensure that there is > only one CPU in panic(), but before we stop other CPUs. sync_on_panic is incompatible with the patch. I argue that it provides non-zero chance of damaging good filesystems even if panic was unrelated to the fs/bio/device layer. As an example, consider the case when other CPU was modifying in-memory representation of the metadata, and panic happen on this CPU. If you write half-changed block back, you make more damage to the filesystem then if you do not. The half-backed sync spoils any journaling or SU consistency guarantees. The issues of multithreading nature of our storage subsystem are secondary. The user who sets either tunable shall know what he does. > > I think that I've already sent you a list of the early testers for > various WIP versions of the patch. I do not have the list. BTW, if you want, feel free to handle the commit youself. You definitely spent much more efforts on the stuff and deserve the credit. I was promised in private that a review will be provided during this weekend. --YTXTVaVjKhDLKTix Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7BEukACgkQC3+MBN1Mb4gE/wCggdI+mhzimk5hH5kOX6F70qMb AZcAnAx/abmVDuBXVq9zHSGIoEDJUhFX =dG36 -----END PGP SIGNATURE----- --YTXTVaVjKhDLKTix--