Date: Wed, 29 Jul 2009 16:13:22 +0200 From: Attilio Rao <attilio@freebsd.org> To: John Baldwin <jhb@freebsd.org> Cc: Stefan Bethke <stb@lassitu.de>, freebsd-current@freebsd.org, Giovanni Trematerra <giovanni.trematerra@gmail.com>, Dan Naumov <dan.naumov@gmail.com>, barbara <barbara.xxx1975@libero.it>, "Bjoern A. Zeeb" <bz@freebsd.org>, Robert Watson <rwatson@freebsd.org>, "C. C. Tang" <hiyorin@gmail.com> Subject: Re: spinlock held too long on reboot Message-ID: <3bbf2fe10907290713s3feeb83am177a32e77215114d@mail.gmail.com> In-Reply-To: <200907290950.43842.jhb@freebsd.org> References: <746CE32B-BCF8-460A-982D-25341554E8FD@lassitu.de> <226F1AFF-45D8-4E4C-BE7F-D2EDC35EC8F6@lassitu.de> <3bbf2fe10907281943m2392a9f9w7c69303e6c3b91d0@mail.gmail.com> <200907290950.43842.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2009/7/29 John Baldwin <jhb@freebsd.org>: > On Tuesday 28 July 2009 10:43:36 pm Attilio Rao wrote: >> 2009/5/23 Stefan Bethke <stb@lassitu.de>: >> > I wrote: >> > >> >> Syncing disks, vnodes remaining...0 done >> >> All buffers synced. >> >> GEOM_MIRROR: Device diesel_root: provider mirror/diesel_root destroyed. >> >> Uptime: 6m32s >> >> GEOM_MIRROR: Device diesel_root destroyed. >> >> Rebooting... >> >> cpu_reset: Stopping other CPUs >> >> spin lock 0xffffffff8078c900 (sched lock 1) held by 0xffffff00014d4ab0 >> >> (tid 100002) too long >> >> panic: spin lock held too long >> >> cpuid = 0 >> >> KDB: enter: panic >> >> [thread pid 77 tid 100090 ] >> >> Stopped at kdb_enter+0x3d: movq $0,0x48bbd0(%rip) >> >> db> bt >> >> Tracing pid 77 tid 100090 td 0xffffff000457bab0 >> >> kdb_enter() at kdb_enter+0x3d >> >> panic() at panic+0x17b >> >> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 >> >> _mtx_lock_spin() at _mtx_lock_spin+0x9e >> >> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x72 >> >> sched_balance_group() at sched_balance_group+0xc5 >> >> sched_balance_group() at sched_balance_group+0x1f8 >> >> sched_balance() at sched_balance+0xa2 >> >> sched_clock() at sched_clock+0xf6 >> >> statclock() at statclock+0xbd >> >> lapic_handle_timer() at lapic_handle_timer+0x197 >> >> Xtimerint() at Xtimerint+0x8c >> >> --- interrupt, rip = 0xffffffff80541cc4, rsp = 0xffffff80771dba90, rbp = >> >> 0xffffff80771dbab0 --- >> >> DELAY() at DELAY+0x64 >> >> cpu_reset() at cpu_reset+0xdd >> >> boot() at boot+0x2e6 >> >> reboot() at reboot+0x42 >> >> syscall() at syscall+0x1a5 >> >> Xfast_syscall() at Xfast_syscall+0xd0 >> >> --- syscall (55, FreeBSD ELF64, reboot), rip = 0x800788eec, rsp = >> >> 0x7fffffffeca8, rbp = 0 --- >> > >> > >> > I've only seen this once. If I should encounter it again, is there >> > something you'd like me to look at? >> >> [ Sorry, trying to add anyone who alredy reported such a problem even >> if I know many of you experienced it on -STABLE] >> >> Could you try this patch against -CURRENT: >> http://www.freebsd.org/~attilio/stop_nmi.diff >> >> This patch basically does 2 things: >> 1) Removing the STOP_NMI option, and adding the infrastructure for >> using NMI on KDB invocation and normal stop IPIs on standard cpu >> shutdown. >> In order to accomplish that and forsee a better design than what >> STOP_NMI does now, 2 new functions are introduced: * >> ipi_hstop_selected() which does, if the architecture offers such an >> option, the possibility to send a "forced" IPI through a privileged >> channel (NMI on amd64 and ia32) in order to stop CPUs passed in the >> mask. Note that for the other architectures that are not amd64 and >> ia32 ipi_hstop_selected() is defaulted to ipi_selected(..., STOP_IPI), >> but if maintainers want to override that they can simply implement >> something harder > > Why not just add a new IPI_STOP_HARD that maps to IPI_STOP on most archs and > does the NMI logic on x86. This avoids adding a new API > (ipi_hstop_selected()) instead just adding a new logical IPI. When choosing among the two, as long as we had API like ipi_all_but_self() I thought we gave preference to more explicit API toward logical ones. Anyways I can reimplement in that way if any, it is something I like more as well. Just want to know if that fixes the problem for the users right now. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10907290713s3feeb83am177a32e77215114d>