From owner-freebsd-current@FreeBSD.ORG Wed Jul 29 14:08:17 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1EDD106566C; Wed, 29 Jul 2009 14:08:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 925FA8FC18; Wed, 29 Jul 2009 14:08:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 2F23846B1A; Wed, 29 Jul 2009 10:08:17 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 6DEF68A0A4; Wed, 29 Jul 2009 10:08:16 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 29 Jul 2009 09:50:42 -0400 User-Agent: KMail/1.9.7 References: <746CE32B-BCF8-460A-982D-25341554E8FD@lassitu.de> <226F1AFF-45D8-4E4C-BE7F-D2EDC35EC8F6@lassitu.de> <3bbf2fe10907281943m2392a9f9w7c69303e6c3b91d0@mail.gmail.com> In-Reply-To: <3bbf2fe10907281943m2392a9f9w7c69303e6c3b91d0@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200907290950.43842.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 29 Jul 2009 10:08:16 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Stefan Bethke , Giovanni Trematerra , Dan Naumov , Attilio Rao , barbara , "Bjoern A. Zeeb" , Robert Watson , "C. C. Tang" Subject: Re: spinlock held too long on reboot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jul 2009 14:08:18 -0000 On Tuesday 28 July 2009 10:43:36 pm Attilio Rao wrote: > 2009/5/23 Stefan Bethke : > > I wrote: > > > >> Syncing disks, vnodes remaining...0 done > >> All buffers synced. > >> GEOM_MIRROR: Device diesel_root: provider mirror/diesel_root destroyed. > >> Uptime: 6m32s > >> GEOM_MIRROR: Device diesel_root destroyed. > >> Rebooting... > >> cpu_reset: Stopping other CPUs > >> spin lock 0xffffffff8078c900 (sched lock 1) held by 0xffffff00014d4ab0 > >> (tid 100002) too long > >> panic: spin lock held too long > >> cpuid = 0 > >> KDB: enter: panic > >> [thread pid 77 tid 100090 ] > >> Stopped at kdb_enter+0x3d: movq $0,0x48bbd0(%rip) > >> db> bt > >> Tracing pid 77 tid 100090 td 0xffffff000457bab0 > >> kdb_enter() at kdb_enter+0x3d > >> panic() at panic+0x17b > >> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > >> _mtx_lock_spin() at _mtx_lock_spin+0x9e > >> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x72 > >> sched_balance_group() at sched_balance_group+0xc5 > >> sched_balance_group() at sched_balance_group+0x1f8 > >> sched_balance() at sched_balance+0xa2 > >> sched_clock() at sched_clock+0xf6 > >> statclock() at statclock+0xbd > >> lapic_handle_timer() at lapic_handle_timer+0x197 > >> Xtimerint() at Xtimerint+0x8c > >> --- interrupt, rip = 0xffffffff80541cc4, rsp = 0xffffff80771dba90, rbp = > >> 0xffffff80771dbab0 --- > >> DELAY() at DELAY+0x64 > >> cpu_reset() at cpu_reset+0xdd > >> boot() at boot+0x2e6 > >> reboot() at reboot+0x42 > >> syscall() at syscall+0x1a5 > >> Xfast_syscall() at Xfast_syscall+0xd0 > >> --- syscall (55, FreeBSD ELF64, reboot), rip = 0x800788eec, rsp = > >> 0x7fffffffeca8, rbp = 0 --- > > > > > > I've only seen this once. If I should encounter it again, is there > > something you'd like me to look at? > > [ Sorry, trying to add anyone who alredy reported such a problem even > if I know many of you experienced it on -STABLE] > > Could you try this patch against -CURRENT: > http://www.freebsd.org/~attilio/stop_nmi.diff > > This patch basically does 2 things: > 1) Removing the STOP_NMI option, and adding the infrastructure for > using NMI on KDB invocation and normal stop IPIs on standard cpu > shutdown. > In order to accomplish that and forsee a better design than what > STOP_NMI does now, 2 new functions are introduced: * > ipi_hstop_selected() which does, if the architecture offers such an > option, the possibility to send a "forced" IPI through a privileged > channel (NMI on amd64 and ia32) in order to stop CPUs passed in the > mask. Note that for the other architectures that are not amd64 and > ia32 ipi_hstop_selected() is defaulted to ipi_selected(..., STOP_IPI), > but if maintainers want to override that they can simply implement > something harder Why not just add a new IPI_STOP_HARD that maps to IPI_STOP on most archs and does the NMI logic on x86. This avoids adding a new API (ipi_hstop_selected()) instead just adding a new logical IPI. -- John Baldwin