From owner-freebsd-current@FreeBSD.ORG Wed Jul 29 14:13:24 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F27361065688; Wed, 29 Jul 2009 14:13:23 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-fx0-f223.google.com (mail-fx0-f223.google.com [209.85.220.223]) by mx1.freebsd.org (Postfix) with ESMTP id F04D48FC1D; Wed, 29 Jul 2009 14:13:22 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fxm23 with SMTP id 23so681832fxm.43 for ; Wed, 29 Jul 2009 07:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=HCw3F65tENXBIIo9/RQ11cxMCxiXB2GOtA4gjlf6t00=; b=NepCP2yaWj0ST2e5KJqhKcwP2C+3qY0E1lVgNrrGPKZ/nXmZqN0XgEKd/aAbiKVFEH tfCHckXGuVAMj44LswpIbfG41s2Wt4YpYSQIjBbkKgC5aExxEmncJr2u5gPDUmdL+U+R 1l0ePGj8dyEOTkQ/nEbFlsXuHfavF4iSwT3x8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=KHCjlhlukDvzgZIP7CyOYr2ehNbl2o89Kv6CXdVG0r17dRy7/73Nqvod7PE48/1u8D AoNvynYMD68/vxMeQYj7ld1ZBVNmpAzgXgGiMz7hgx/7oAdWoPIyzWqqXiT3+yoO78xD y94pSHB7uxmdobj3zLkIBzYxa3XoWEX72DvF0= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.109.148 with SMTP id j20mr4026997fap.43.1248876802152; Wed, 29 Jul 2009 07:13:22 -0700 (PDT) In-Reply-To: <200907290950.43842.jhb@freebsd.org> References: <746CE32B-BCF8-460A-982D-25341554E8FD@lassitu.de> <226F1AFF-45D8-4E4C-BE7F-D2EDC35EC8F6@lassitu.de> <3bbf2fe10907281943m2392a9f9w7c69303e6c3b91d0@mail.gmail.com> <200907290950.43842.jhb@freebsd.org> Date: Wed, 29 Jul 2009 16:13:22 +0200 X-Google-Sender-Auth: 03e8caed8a6f7e80 Message-ID: <3bbf2fe10907290713s3feeb83am177a32e77215114d@mail.gmail.com> From: Attilio Rao To: John Baldwin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Stefan Bethke , freebsd-current@freebsd.org, Giovanni Trematerra , Dan Naumov , barbara , "Bjoern A. Zeeb" , Robert Watson , "C. C. Tang" Subject: Re: spinlock held too long on reboot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jul 2009 14:13:24 -0000 2009/7/29 John Baldwin : > On Tuesday 28 July 2009 10:43:36 pm Attilio Rao wrote: >> 2009/5/23 Stefan Bethke : >> > I wrote: >> > >> >> Syncing disks, vnodes remaining...0 done >> >> All buffers synced. >> >> GEOM_MIRROR: Device diesel_root: provider mirror/diesel_root destroyed. >> >> Uptime: 6m32s >> >> GEOM_MIRROR: Device diesel_root destroyed. >> >> Rebooting... >> >> cpu_reset: Stopping other CPUs >> >> spin lock 0xffffffff8078c900 (sched lock 1) held by 0xffffff00014d4ab0 >> >> (tid 100002) too long >> >> panic: spin lock held too long >> >> cpuid = 0 >> >> KDB: enter: panic >> >> [thread pid 77 tid 100090 ] >> >> Stopped at kdb_enter+0x3d: movq $0,0x48bbd0(%rip) >> >> db> bt >> >> Tracing pid 77 tid 100090 td 0xffffff000457bab0 >> >> kdb_enter() at kdb_enter+0x3d >> >> panic() at panic+0x17b >> >> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 >> >> _mtx_lock_spin() at _mtx_lock_spin+0x9e >> >> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x72 >> >> sched_balance_group() at sched_balance_group+0xc5 >> >> sched_balance_group() at sched_balance_group+0x1f8 >> >> sched_balance() at sched_balance+0xa2 >> >> sched_clock() at sched_clock+0xf6 >> >> statclock() at statclock+0xbd >> >> lapic_handle_timer() at lapic_handle_timer+0x197 >> >> Xtimerint() at Xtimerint+0x8c >> >> --- interrupt, rip = 0xffffffff80541cc4, rsp = 0xffffff80771dba90, rbp = >> >> 0xffffff80771dbab0 --- >> >> DELAY() at DELAY+0x64 >> >> cpu_reset() at cpu_reset+0xdd >> >> boot() at boot+0x2e6 >> >> reboot() at reboot+0x42 >> >> syscall() at syscall+0x1a5 >> >> Xfast_syscall() at Xfast_syscall+0xd0 >> >> --- syscall (55, FreeBSD ELF64, reboot), rip = 0x800788eec, rsp = >> >> 0x7fffffffeca8, rbp = 0 --- >> > >> > >> > I've only seen this once. If I should encounter it again, is there >> > something you'd like me to look at? >> >> [ Sorry, trying to add anyone who alredy reported such a problem even >> if I know many of you experienced it on -STABLE] >> >> Could you try this patch against -CURRENT: >> http://www.freebsd.org/~attilio/stop_nmi.diff >> >> This patch basically does 2 things: >> 1) Removing the STOP_NMI option, and adding the infrastructure for >> using NMI on KDB invocation and normal stop IPIs on standard cpu >> shutdown. >> In order to accomplish that and forsee a better design than what >> STOP_NMI does now, 2 new functions are introduced: * >> ipi_hstop_selected() which does, if the architecture offers such an >> option, the possibility to send a "forced" IPI through a privileged >> channel (NMI on amd64 and ia32) in order to stop CPUs passed in the >> mask. Note that for the other architectures that are not amd64 and >> ia32 ipi_hstop_selected() is defaulted to ipi_selected(..., STOP_IPI), >> but if maintainers want to override that they can simply implement >> something harder > > Why not just add a new IPI_STOP_HARD that maps to IPI_STOP on most archs and > does the NMI logic on x86. This avoids adding a new API > (ipi_hstop_selected()) instead just adding a new logical IPI. When choosing among the two, as long as we had API like ipi_all_but_self() I thought we gave preference to more explicit API toward logical ones. Anyways I can reimplement in that way if any, it is something I like more as well. Just want to know if that fixes the problem for the users right now. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein