From owner-freebsd-current@FreeBSD.ORG Thu Nov 17 18:32:55 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0E51106566B; Thu, 17 Nov 2011 18:32:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 234488FC19; Thu, 17 Nov 2011 18:32:54 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id pAHIWn3k016623 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Nov 2011 20:32:49 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id pAHIWndK014524; Thu, 17 Nov 2011 20:32:49 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id pAHIWns1014523; Thu, 17 Nov 2011 20:32:49 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 17 Nov 2011 20:32:49 +0200 From: Kostik Belousov To: Alexander Motin Message-ID: <20111117183249.GY50300@deviant.kiev.zoral.com.ua> References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <20111116202714.5ee4bd53@fabiankeil.de> <4EC43764.1020202@FreeBSD.org> <4EC4423A.3020904@FreeBSD.org> <20111117081533.GP50300@deviant.kiev.zoral.com.ua> <4EC4C89A.2040208@FreeBSD.org> <20111117090653.GR50300@deviant.kiev.zoral.com.ua> <4EC4D3DE.8080103@FreeBSD.org> <20111117111405.GT50300@deviant.kiev.zoral.com.ua> <4EC500CD.6040305@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IchbHJI4BusCGQwO" Content-Disposition: inline In-Reply-To: <4EC500CD.6040305@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-current@freebsd.org, Andriy Gapon Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2011 18:32:55 -0000 --IchbHJI4BusCGQwO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Nov 17, 2011 at 02:40:45PM +0200, Alexander Motin wrote: > On 11/17/11 13:14, Kostik Belousov wrote: > > On Thu, Nov 17, 2011 at 11:29:02AM +0200, Alexander Motin wrote: > >> On 11/17/11 11:06, Kostik Belousov wrote: > >>> On Thu, Nov 17, 2011 at 10:40:58AM +0200, Alexander Motin wrote: > >>>> On 11/17/11 10:15, Kostik Belousov wrote: > >>>>> On Thu, Nov 17, 2011 at 01:07:38AM +0200, Alexander Motin wrote: > >>>>>> On 17.11.2011 00:21, Andriy Gapon wrote: > >>>>>>> on 16/11/2011 21:27 Fabian Keil said the following: > >>>>>>>> Kostik Belousov wrote: > >>>>>>>> > >>>>>>>>> I was tricked into finishing the work by Andrey Gapon, who deve= loped > >>>>>>>>> the patch to reliably stop other processors on panic. The patch > >>>>>>>>> greatly improves the chances of getting dump on panic on SMP ho= st. > >>>>>>>> > >>>>>>>> I tested the patch trying to get a dump (from the debugger) for > >>>>>>>> kern/162036, which currently results in the double fault reporte= d in: > >>>>>>>> http://lists.freebsd.org/pipermail/freebsd-current/2011-Septembe= r/027766.html > >>>>>>>> > >>>>>>>> It didn't help, but also didn't make anything worse. > >>>>>>>> > >>>>>>>> Fabian > >>>>>>> > >>>>>>> The mi_switch recursion looks very familiar to me: > >>>>>>> mi_switch() at mi_switch+0x270 > >>>>>>> critical_exit() at critical_exit+0x9b > >>>>>>> spinlock_exit() at spinlock_exit+0x17 > >>>>>>> mi_switch() at mi_switch+0x275 > >>>>>>> critical_exit() at critical_exit+0x9b > >>>>>>> spinlock_exit() at spinlock_exit+0x17 > >>>>>>> [several pages of the previous three lines skipped] > >>>>>>> mi_switch() at mi_switch+0x275 > >>>>>>> critical_exit() at critical_exit+0x9b > >>>>>>> spinlock_exit() at spinlock_exit+0x17 > >>>>>>> intr_even_schedule_thread() at intr_event_schedule_thread+0xbb > >>>>>>> ahci_end_transaction() at ahci_end_transaction+0x398 > >>>>>>> ahci_ch_intr() at ahci_ch_intr+0x2b5 > >>>>>>> ahcipoll() at ahcipoll+0x15 > >>>>>>> xpt_polled_action() at xpt_polled_action+0xf7 > >>>>>>> > >>>>>>> In fact I once discussed with jhb this recursion triggered from a= different > >>>>>>> place. To quote myself: > >>>>>>> spinlock_exit -> critical_exit -> mi_switch -> kdb_sw= itch -> > >>>>>>> thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -= > ... > >>>>>>> in the kdb context > >>>>>>> this issue seems to be triggered by td_owepreempt being = true at=20 > >>>>>>> the time > >>>>>>> kdb is entered > >>>>>>> and there of course has to be an initial spinlock_exit c= all=20 > >>>>>>> somewhere > >>>>>>> in my case it's because of usb keyboard > >>>>>>> I wonder if it would make sense to clear td_owepreempt r= ight=20 > >>>>>>> before > >>>>>>> calling kdb_switch in mi_switch > >>>>>>> instead of in sched_switch() > >>>>>>> clearing td_owepreempt seems like a scheduler-independen= t=20 > >>>>>>> operation to me > >>>>>>> or is it better to just skip locking in usb when kdb_act= ive is set > >>>>>>> ? > >>>>>>> > >>>>>>> The workaround described above should work in this case. > >>>>>>> Another possibility is to pessimize mtx_unlock_spin() implementat= ions to=20 > >>>>>>> check > >>>>>>> SCHEDULER_STOPPED() and to bypass any further actions in that cas= e. But=20 > >>>>>>> that > >>>>>>> would add unnecessary overhead to the sunny day code paths. > >>>>>>> > >>>>>>> Going further up the stack one can come up with the following pro= posals: > >>>>>>> - check SCHEDULER_STOPPED() swi_sched() and return early > >>>>>>> - do not call swi_sched() from xpt_done() if we somehow know that= we are=20 > >>>>>>> in a > >>>>>>> polling mode > >>>>>> > >>>>>> There is no flag in CAM now to indicate polling mode, but if neede= d, it=20 > >>>>>> should not be difficult to add one and not call swi_sched(). > >>>>> > >>>>> I have the following change for eons on my test boxes. Without it, > >>>>> I simply cannot get _any_ dump. > >>>>> > >>>>> diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c > >>>>> index 10b89c7..a38e42f 100644 > >>>>> --- a/sys/cam/cam_xpt.c > >>>>> +++ b/sys/cam/cam_xpt.c > >>>>> @@ -4230,7 +4230,7 @@ xpt_done(union ccb *done_ccb) > >>>>> TAILQ_INSERT_TAIL(&cam_simq, sim, links); > >>>>> mtx_unlock(&cam_simq_lock); > >>>>> sim->flags |=3D CAM_SIM_ON_DONEQ; > >>>>> - if (first) > >>>>> + if (first && panicstr =3D=3D NULL) > >>>>> swi_sched(cambio_ih, 0); > >>>>> } > >>>>> } > >>>> > >>>> That should be OK for kernel dumping. I was thinking about CAM abusi= ng > >>>> polling not only for dumping. But looking on cases where it does it = now, > >>>> may be it is better to rewrite them instead. > >>> > >>> So, should I interpret your response as 'Reviewed by" ? > >> > >> It feels somehow dirty to me. I don't like these global variables. If > >> you consider it is fine, proceed, I see no much harm. But if not, I can > >> add polling flag to the CAM. Flip a coin for me. :) > > You promised to add the polling at summer' meeting in Kiev. Will you do > > it now ? >=20 > Sorry, I've probably forgot. The patch is attached. >=20 > --=20 > Alexander Motin > Index: cam_sim.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- cam_sim.h (revision 227580) > +++ cam_sim.h (working copy) > @@ -104,7 +104,8 @@ > u_int32_t flags; > #define CAM_SIM_REL_TIMEOUT_PENDING 0x01 > #define CAM_SIM_MPSAFE 0x02 > -#define CAM_SIM_ON_DONEQ 0x04 > +#define CAM_SIM_ON_DONEQ 0x04 > +#define CAM_SIM_POLLED 0x08 > struct callout callout; > struct cam_devq *devq; /* Device Queue to use for this SIM */ > int refcount; /* References to the SIM. */ > Index: cam_xpt.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- cam_xpt.c (revision 227580) > +++ cam_xpt.c (working copy) > @@ -2957,6 +2957,9 @@ > =20 > mtx_assert(sim->mtx, MA_OWNED); > =20 > + /* Don't use ISR for this SIM while polling. */ > + sim->flags |=3D CAM_SIM_POLLED; > + > /* > * Steal an opening so that no other queued requests > * can get it before us while we simulate interrupts. > @@ -2996,6 +2999,9 @@ > } else { > start_ccb->ccb_h.status =3D CAM_RESRC_UNAVAIL; > } > + > + /* Use CAM ISR for this SIM again. */ > + sim->flags &=3D ~CAM_SIM_POLLED; > } > =20 > /* > @@ -4224,7 +4230,7 @@ > TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h, > sim_links.tqe); > done_ccb->ccb_h.pinfo.index =3D CAM_DONEQ_INDEX; > - if ((sim->flags & CAM_SIM_ON_DONEQ) =3D=3D 0) { > + if ((sim->flags & (CAM_SIM_ON_DONEQ | CAM_SIM_POLLED)) =3D=3D 0) { > mtx_lock(&cam_simq_lock); > first =3D TAILQ_EMPTY(&cam_simq); > TAILQ_INSERT_TAIL(&cam_simq, sim, links); Seems it worked for me, on the same machine where I absolutely needed my change before. Thanks. --IchbHJI4BusCGQwO Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk7FU1AACgkQC3+MBN1Mb4jT+wCgqyLadu1mCsiDNz2unAyGpj7d QdQAn2SSW/UurJ5yPhQhRZkz7XzwCQWs =KguT -----END PGP SIGNATURE----- --IchbHJI4BusCGQwO--