From owner-freebsd-current@FreeBSD.ORG Thu Nov 17 08:15:41 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6108E106564A; Thu, 17 Nov 2011 08:15:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id D41588FC0A; Thu, 17 Nov 2011 08:15:40 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id pAH8FX5T012366 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Nov 2011 10:15:34 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id pAH8FXBd012030; Thu, 17 Nov 2011 10:15:33 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id pAH8FXQb012029; Thu, 17 Nov 2011 10:15:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 17 Nov 2011 10:15:33 +0200 From: Kostik Belousov To: Alexander Motin Message-ID: <20111117081533.GP50300@deviant.kiev.zoral.com.ua> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <20111116202714.5ee4bd53@fabiankeil.de> <4EC43764.1020202@FreeBSD.org> <4EC4423A.3020904@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rJakZGbrL8S1ZGl5" Content-Disposition: inline In-Reply-To: <4EC4423A.3020904@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-current@freebsd.org, Andriy Gapon Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2011 08:15:41 -0000 --rJakZGbrL8S1ZGl5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 17, 2011 at 01:07:38AM +0200, Alexander Motin wrote: > On 17.11.2011 00:21, Andriy Gapon wrote: > >on 16/11/2011 21:27 Fabian Keil said the following: > >>Kostik Belousov wrote: > >> > >>>I was tricked into finishing the work by Andrey Gapon, who developed > >>>the patch to reliably stop other processors on panic. The patch > >>>greatly improves the chances of getting dump on panic on SMP host. > >> > >>I tested the patch trying to get a dump (from the debugger) for > >>kern/162036, which currently results in the double fault reported in: > >>http://lists.freebsd.org/pipermail/freebsd-current/2011-September/02776= 6.html > >> > >>It didn't help, but also didn't make anything worse. > >> > >>Fabian > > > >The mi_switch recursion looks very familiar to me: > >mi_switch() at mi_switch+0x270 > >critical_exit() at critical_exit+0x9b > >spinlock_exit() at spinlock_exit+0x17 > >mi_switch() at mi_switch+0x275 > >critical_exit() at critical_exit+0x9b > >spinlock_exit() at spinlock_exit+0x17 > >[several pages of the previous three lines skipped] > >mi_switch() at mi_switch+0x275 > >critical_exit() at critical_exit+0x9b > >spinlock_exit() at spinlock_exit+0x17 > >intr_even_schedule_thread() at intr_event_schedule_thread+0xbb > >ahci_end_transaction() at ahci_end_transaction+0x398 > >ahci_ch_intr() at ahci_ch_intr+0x2b5 > >ahcipoll() at ahcipoll+0x15 > >xpt_polled_action() at xpt_polled_action+0xf7 > > > >In fact I once discussed with jhb this recursion triggered from a differ= ent > >place. To quote myself: > > spinlock_exit -> critical_exit -> mi_switch -> kdb_switch -> > >thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -> ... > > in the kdb context > > this issue seems to be triggered by td_owepreempt being true at= =20 > >the time > >kdb is entered > > and there of course has to be an initial spinlock_exit call=20 > >somewhere > > in my case it's because of usb keyboard > > I wonder if it would make sense to clear td_owepreempt right=20 > >before > >calling kdb_switch in mi_switch > > instead of in sched_switch() > > clearing td_owepreempt seems like a scheduler-independent=20 > >operation to me > > or is it better to just skip locking in usb when kdb_active is = set > > ? > > > >The workaround described above should work in this case. > >Another possibility is to pessimize mtx_unlock_spin() implementations to= =20 > >check > >SCHEDULER_STOPPED() and to bypass any further actions in that case. But= =20 > >that > >would add unnecessary overhead to the sunny day code paths. > > > >Going further up the stack one can come up with the following proposals: > >- check SCHEDULER_STOPPED() swi_sched() and return early > >- do not call swi_sched() from xpt_done() if we somehow know that we are= =20 > >in a > >polling mode >=20 > There is no flag in CAM now to indicate polling mode, but if needed, it= =20 > should not be difficult to add one and not call swi_sched(). I have the following change for eons on my test boxes. Without it, I simply cannot get _any_ dump. diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c index 10b89c7..a38e42f 100644 --- a/sys/cam/cam_xpt.c +++ b/sys/cam/cam_xpt.c @@ -4230,7 +4230,7 @@ xpt_done(union ccb *done_ccb) TAILQ_INSERT_TAIL(&cam_simq, sim, links); mtx_unlock(&cam_simq_lock); sim->flags |=3D CAM_SIM_ON_DONEQ; - if (first) + if (first && panicstr =3D=3D NULL) swi_sched(cambio_ih, 0); } } --rJakZGbrL8S1ZGl5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7EwqQACgkQC3+MBN1Mb4iB3QCgvD8S2klE4Svfxy9maBIBZH2m lE4Ani/T7aqvzD2mUT+KSM51xfeir360 =/f4D -----END PGP SIGNATURE----- --rJakZGbrL8S1ZGl5--