From owner-freebsd-current@FreeBSD.ORG Thu Nov 17 12:40:51 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1AF0D106567A; Thu, 17 Nov 2011 12:40:50 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 37AFF8FC25; Thu, 17 Nov 2011 12:40:49 +0000 (UTC) Received: by eyd10 with SMTP id 10so2767285eyd.13 for ; Thu, 17 Nov 2011 04:40:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=G38ieQvJBRUWS5QyuoIRlZWUZZvfA/GZD8wqpEseBTc=; b=slQYnjceFKw4rwo+h2APvIkxi5qzYtHogXa2ggx0qvrLMV0HzeXLJ/hKgtWQ7niCez NlUunVN/cJMsM84qBTuJmpwHwzVfoyM0ia1rSLYC2DUHUwglb2NZ/gBIJsYHa/0vMqzU AC2ixeXz+TQQWl488xi8QotTMMqKETF24UEI8= Received: by 10.213.22.133 with SMTP id n5mr4961ebb.149.1321533648562; Thu, 17 Nov 2011 04:40:48 -0800 (PST) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id q28sm89343784eea.6.2011.11.17.04.40.46 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 17 Nov 2011 04:40:47 -0800 (PST) Sender: Alexander Motin Message-ID: <4EC500CD.6040305@FreeBSD.org> Date: Thu, 17 Nov 2011 14:40:45 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111003 Thunderbird/7.0.1 MIME-Version: 1.0 To: Kostik Belousov References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <20111116202714.5ee4bd53@fabiankeil.de> <4EC43764.1020202@FreeBSD.org> <4EC4423A.3020904@FreeBSD.org> <20111117081533.GP50300@deviant.kiev.zoral.com.ua> <4EC4C89A.2040208@FreeBSD.org> <20111117090653.GR50300@deviant.kiev.zoral.com.ua> <4EC4D3DE.8080103@FreeBSD.org> <20111117111405.GT50300@deviant.kiev.zoral.com.ua> In-Reply-To: <20111117111405.GT50300@deviant.kiev.zoral.com.ua> Content-Type: multipart/mixed; boundary="------------090104080008010605000800" Cc: freebsd-current@freebsd.org, Andriy Gapon Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2011 12:40:51 -0000 This is a multi-part message in MIME format. --------------090104080008010605000800 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 11/17/11 13:14, Kostik Belousov wrote: > On Thu, Nov 17, 2011 at 11:29:02AM +0200, Alexander Motin wrote: >> On 11/17/11 11:06, Kostik Belousov wrote: >>> On Thu, Nov 17, 2011 at 10:40:58AM +0200, Alexander Motin wrote: >>>> On 11/17/11 10:15, Kostik Belousov wrote: >>>>> On Thu, Nov 17, 2011 at 01:07:38AM +0200, Alexander Motin wrote: >>>>>> On 17.11.2011 00:21, Andriy Gapon wrote: >>>>>>> on 16/11/2011 21:27 Fabian Keil said the following: >>>>>>>> Kostik Belousov wrote: >>>>>>>> >>>>>>>>> I was tricked into finishing the work by Andrey Gapon, who developed >>>>>>>>> the patch to reliably stop other processors on panic. The patch >>>>>>>>> greatly improves the chances of getting dump on panic on SMP host. >>>>>>>> >>>>>>>> I tested the patch trying to get a dump (from the debugger) for >>>>>>>> kern/162036, which currently results in the double fault reported in: >>>>>>>> http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html >>>>>>>> >>>>>>>> It didn't help, but also didn't make anything worse. >>>>>>>> >>>>>>>> Fabian >>>>>>> >>>>>>> The mi_switch recursion looks very familiar to me: >>>>>>> mi_switch() at mi_switch+0x270 >>>>>>> critical_exit() at critical_exit+0x9b >>>>>>> spinlock_exit() at spinlock_exit+0x17 >>>>>>> mi_switch() at mi_switch+0x275 >>>>>>> critical_exit() at critical_exit+0x9b >>>>>>> spinlock_exit() at spinlock_exit+0x17 >>>>>>> [several pages of the previous three lines skipped] >>>>>>> mi_switch() at mi_switch+0x275 >>>>>>> critical_exit() at critical_exit+0x9b >>>>>>> spinlock_exit() at spinlock_exit+0x17 >>>>>>> intr_even_schedule_thread() at intr_event_schedule_thread+0xbb >>>>>>> ahci_end_transaction() at ahci_end_transaction+0x398 >>>>>>> ahci_ch_intr() at ahci_ch_intr+0x2b5 >>>>>>> ahcipoll() at ahcipoll+0x15 >>>>>>> xpt_polled_action() at xpt_polled_action+0xf7 >>>>>>> >>>>>>> In fact I once discussed with jhb this recursion triggered from a different >>>>>>> place. To quote myself: >>>>>>> spinlock_exit -> critical_exit -> mi_switch -> kdb_switch -> >>>>>>> thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -> ... >>>>>>> in the kdb context >>>>>>> this issue seems to be triggered by td_owepreempt being true at >>>>>>> the time >>>>>>> kdb is entered >>>>>>> and there of course has to be an initial spinlock_exit call >>>>>>> somewhere >>>>>>> in my case it's because of usb keyboard >>>>>>> I wonder if it would make sense to clear td_owepreempt right >>>>>>> before >>>>>>> calling kdb_switch in mi_switch >>>>>>> instead of in sched_switch() >>>>>>> clearing td_owepreempt seems like a scheduler-independent >>>>>>> operation to me >>>>>>> or is it better to just skip locking in usb when kdb_active is set >>>>>>> ? >>>>>>> >>>>>>> The workaround described above should work in this case. >>>>>>> Another possibility is to pessimize mtx_unlock_spin() implementations to >>>>>>> check >>>>>>> SCHEDULER_STOPPED() and to bypass any further actions in that case. But >>>>>>> that >>>>>>> would add unnecessary overhead to the sunny day code paths. >>>>>>> >>>>>>> Going further up the stack one can come up with the following proposals: >>>>>>> - check SCHEDULER_STOPPED() swi_sched() and return early >>>>>>> - do not call swi_sched() from xpt_done() if we somehow know that we are >>>>>>> in a >>>>>>> polling mode >>>>>> >>>>>> There is no flag in CAM now to indicate polling mode, but if needed, it >>>>>> should not be difficult to add one and not call swi_sched(). >>>>> >>>>> I have the following change for eons on my test boxes. Without it, >>>>> I simply cannot get _any_ dump. >>>>> >>>>> diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c >>>>> index 10b89c7..a38e42f 100644 >>>>> --- a/sys/cam/cam_xpt.c >>>>> +++ b/sys/cam/cam_xpt.c >>>>> @@ -4230,7 +4230,7 @@ xpt_done(union ccb *done_ccb) >>>>> TAILQ_INSERT_TAIL(&cam_simq, sim, links); >>>>> mtx_unlock(&cam_simq_lock); >>>>> sim->flags |= CAM_SIM_ON_DONEQ; >>>>> - if (first) >>>>> + if (first && panicstr == NULL) >>>>> swi_sched(cambio_ih, 0); >>>>> } >>>>> } >>>> >>>> That should be OK for kernel dumping. I was thinking about CAM abusing >>>> polling not only for dumping. But looking on cases where it does it now, >>>> may be it is better to rewrite them instead. >>> >>> So, should I interpret your response as 'Reviewed by" ? >> >> It feels somehow dirty to me. I don't like these global variables. If >> you consider it is fine, proceed, I see no much harm. But if not, I can >> add polling flag to the CAM. Flip a coin for me. :) > You promised to add the polling at summer' meeting in Kiev. Will you do > it now ? Sorry, I've probably forgot. The patch is attached. -- Alexander Motin --------------090104080008010605000800 Content-Type: text/plain; name="cam_poll.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cam_poll.patch" Index: cam_sim.h =================================================================== --- cam_sim.h (revision 227580) +++ cam_sim.h (working copy) @@ -104,7 +104,8 @@ u_int32_t flags; #define CAM_SIM_REL_TIMEOUT_PENDING 0x01 #define CAM_SIM_MPSAFE 0x02 -#define CAM_SIM_ON_DONEQ 0x04 +#define CAM_SIM_ON_DONEQ 0x04 +#define CAM_SIM_POLLED 0x08 struct callout callout; struct cam_devq *devq; /* Device Queue to use for this SIM */ int refcount; /* References to the SIM. */ Index: cam_xpt.c =================================================================== --- cam_xpt.c (revision 227580) +++ cam_xpt.c (working copy) @@ -2957,6 +2957,9 @@ mtx_assert(sim->mtx, MA_OWNED); + /* Don't use ISR for this SIM while polling. */ + sim->flags |= CAM_SIM_POLLED; + /* * Steal an opening so that no other queued requests * can get it before us while we simulate interrupts. @@ -2996,6 +2999,9 @@ } else { start_ccb->ccb_h.status = CAM_RESRC_UNAVAIL; } + + /* Use CAM ISR for this SIM again. */ + sim->flags &= ~CAM_SIM_POLLED; } /* @@ -4224,7 +4230,7 @@ TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h, sim_links.tqe); done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX; - if ((sim->flags & CAM_SIM_ON_DONEQ) == 0) { + if ((sim->flags & (CAM_SIM_ON_DONEQ | CAM_SIM_POLLED)) == 0) { mtx_lock(&cam_simq_lock); first = TAILQ_EMPTY(&cam_simq); TAILQ_INSERT_TAIL(&cam_simq, sim, links); --------------090104080008010605000800--