Date: Wed, 8 Jul 2009 03:57:29 +0300 From: Dan Naumov <dan.naumov@gmail.com> To: Attilio Rao <attilio@freebsd.org> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: 7.2-release/amd64: panic, spin lock held too long Message-ID: <cf9b1ee00907071757i169d2a82la260798f364054f9@mail.gmail.com> In-Reply-To: <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com> References: <cf9b1ee00907061812r3da70018i1c8d8d12bb038a80@mail.gmail.com> <3bbf2fe10907061818v245abd0cgc3ca5073cb93aea4@mail.gmail.com> <cf9b1ee00907061825r34165c48x6727c50b3219d5fb@mail.gmail.com> <3bbf2fe10907061827g35eaeb49g26cf6fdb64436ca7@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao<attilio@freebsd.org> wrote: > 2009/7/7 Dan Naumov <dan.naumov@gmail.com>: >> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<attilio@freebsd.org> wrote: >>> 2009/7/7 Dan Naumov <dan.naumov@gmail.com>: >>>> I just got a panic following by a reboot a few seconds after running >>>> "portsnap update", /var/log/messages shows the following: >>>> >>>> Jul =A07 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kerne= l >>>> Jul =A07 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched loc= k >>>> 1) held by 0xffffff00017d8370 (tid 100054) too long >>>> Jul =A07 03:49:38 atom kernel: panic: spin lock held too long >>> >>> That's a known bug, affecting -CURRENT as well. >>> The cpustop IPI is handled though an NMI, which means it could >>> interrupt a CPU in any moment, even while holding a spinlock, >>> violating one well known FreeBSD rule. >>> That means that the cpu can stop itself while the thread was holding >>> the sched lock spinlock and not releasing it (there is no way, modulo >>> highly hackish, to fix that). >>> In the while hardclock() wants to schedule something else to run and >>> got stuck on the thread lock. >>> >>> Ideal fix would involve not using a NMI for serving the cpustop while >>> having a cheap way (not making the common path too hard) to tell >>> hardclock() to avoid scheduling while cpustop is in flight. >>> >>> Thanks, >>> Attilio >> >> Any idea if a fix is being worked on and how unlucky must one be to >> run into this issue, should I expect it to happen again? Is it >> basically completely random? > > I'd like to work on that issue before BETA3 (and backport to > STABLE_7), I'm just time-constrained right now. > it is completely random. > > Thanks, > Attilio Ok, this is getting pretty bad, 23 hours later, I get the same kind of panic, the only difference is that instead of "portsnap update", this was triggered by "portsnap cron" which I have running between 3 and 4 am every day: Jul 8 03:03:49 atom kernel: ssppiinn lloocckk 00xxffffffffffffffff8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d 10100006070)5 )t otoo ol olnogng Jul 8 03:03:49 atom kernel: p Jul 8 03:03:49 atom kernel: anic: spin lock held too long Jul 8 03:03:49 atom kernel: cpuid =3D 0 Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s - Sincerely, Dan Naumov
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?cf9b1ee00907071757i169d2a82la260798f364054f9>