Date: Tue, 7 Jul 2009 03:27:48 +0200 From: Attilio Rao <attilio@freebsd.org> To: Dan Naumov <dan.naumov@gmail.com> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: 7.2-release/amd64: panic, spin lock held too long Message-ID: <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> In-Reply-To: <cf9b1ee00907061825r34165c48x6727c50b3219d5fb@mail.gmail.com> References: <cf9b1ee00907061812r3da70018i1c8d8d12bb038a80@mail.gmail.com> <3bbf2fe10907061818v245abd0cgc3ca5073cb93aea4@mail.gmail.com> <cf9b1ee00907061825r34165c48x6727c50b3219d5fb@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
2009/7/7 Dan Naumov <dan.naumov@gmail.com>: > On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<attilio@freebsd.org> wrote: >> 2009/7/7 Dan Naumov <dan.naumov@gmail.com>: >>> I just got a panic following by a reboot a few seconds after running >>> "portsnap update", /var/log/messages shows the following: >>> >>> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >>> Jul 7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock >>> 1) held by 0xffffff00017d8370 (tid 100054) too long >>> Jul 7 03:49:38 atom kernel: panic: spin lock held too long >> >> That's a known bug, affecting -CURRENT as well. >> The cpustop IPI is handled though an NMI, which means it could >> interrupt a CPU in any moment, even while holding a spinlock, >> violating one well known FreeBSD rule. >> That means that the cpu can stop itself while the thread was holding >> the sched lock spinlock and not releasing it (there is no way, modulo >> highly hackish, to fix that). >> In the while hardclock() wants to schedule something else to run and >> got stuck on the thread lock. >> >> Ideal fix would involve not using a NMI for serving the cpustop while >> having a cheap way (not making the common path too hard) to tell >> hardclock() to avoid scheduling while cpustop is in flight. >> >> Thanks, >> Attilio > > Any idea if a fix is being worked on and how unlucky must one be to > run into this issue, should I expect it to happen again? Is it > basically completely random? I'd like to work on that issue before BETA3 (and backport to STABLE_7), I'm just time-constrained right now. it is completely random. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7>