From owner-freebsd-current@FreeBSD.ORG Wed Nov 16 23:07:40 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3D47106566C; Wed, 16 Nov 2011 23:07:40 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id D62188FC16; Wed, 16 Nov 2011 23:07:39 +0000 (UTC) Received: by bkbzs8 with SMTP id zs8so1644984bkb.13 for ; Wed, 16 Nov 2011 15:07:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=fV6khi+peTcjw0/mtHuGhDNDaJ269FN8CgWcrOlwtyI=; b=aB7/OphXFhMsHwbzANAHWS92D7vhNIZ/4oTAeLfWVjm+Pmh7OH8CixE9NMSLld1Wy/ 8rOSwJbWU2vjsUcU/NCZ5CSkcmiwkSqaxeNPu/hDi51GIgbB/oRF44V6NGcKfJCNKriz KWOnOlIE6jQioc/AHv6+f/CO6YzvpEdD6EVvU= Received: by 10.204.136.200 with SMTP id s8mr21650114bkt.49.1321484858712; Wed, 16 Nov 2011 15:07:38 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id a21sm138036fao.18.2011.11.16.15.07.36 (version=SSLv3 cipher=OTHER); Wed, 16 Nov 2011 15:07:37 -0800 (PST) Sender: Alexander Motin Message-ID: <4EC4423A.3020904@FreeBSD.org> Date: Thu, 17 Nov 2011 01:07:38 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111112 Thunderbird/8.0 MIME-Version: 1.0 To: Andriy Gapon References: <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <20111116202714.5ee4bd53@fabiankeil.de> <4EC43764.1020202@FreeBSD.org> In-Reply-To: <4EC43764.1020202@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org, Konstantin Belousov Subject: Re: Stop scheduler on panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Nov 2011 23:07:40 -0000 On 17.11.2011 00:21, Andriy Gapon wrote: > on 16/11/2011 21:27 Fabian Keil said the following: >> Kostik Belousov wrote: >> >>> I was tricked into finishing the work by Andrey Gapon, who developed >>> the patch to reliably stop other processors on panic. The patch >>> greatly improves the chances of getting dump on panic on SMP host. >> >> I tested the patch trying to get a dump (from the debugger) for >> kern/162036, which currently results in the double fault reported in: >> http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html >> >> It didn't help, but also didn't make anything worse. >> >> Fabian > > The mi_switch recursion looks very familiar to me: > mi_switch() at mi_switch+0x270 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > mi_switch() at mi_switch+0x275 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > [several pages of the previous three lines skipped] > mi_switch() at mi_switch+0x275 > critical_exit() at critical_exit+0x9b > spinlock_exit() at spinlock_exit+0x17 > intr_even_schedule_thread() at intr_event_schedule_thread+0xbb > ahci_end_transaction() at ahci_end_transaction+0x398 > ahci_ch_intr() at ahci_ch_intr+0x2b5 > ahcipoll() at ahcipoll+0x15 > xpt_polled_action() at xpt_polled_action+0xf7 > > In fact I once discussed with jhb this recursion triggered from a different > place. To quote myself: > spinlock_exit -> critical_exit -> mi_switch -> kdb_switch -> > thread_unlock -> spinlock_exit -> critical_exit -> mi_switch -> ... > in the kdb context > this issue seems to be triggered by td_owepreempt being true at the time > kdb is entered > and there of course has to be an initial spinlock_exit call somewhere > in my case it's because of usb keyboard > I wonder if it would make sense to clear td_owepreempt right before > calling kdb_switch in mi_switch > instead of in sched_switch() > clearing td_owepreempt seems like a scheduler-independent operation to me > or is it better to just skip locking in usb when kdb_active is set > ? > > The workaround described above should work in this case. > Another possibility is to pessimize mtx_unlock_spin() implementations to check > SCHEDULER_STOPPED() and to bypass any further actions in that case. But that > would add unnecessary overhead to the sunny day code paths. > > Going further up the stack one can come up with the following proposals: > - check SCHEDULER_STOPPED() swi_sched() and return early > - do not call swi_sched() from xpt_done() if we somehow know that we are in a > polling mode There is no flag in CAM now to indicate polling mode, but if needed, it should not be difficult to add one and not call swi_sched(). -- Alexander Motin