FreeBSD Mail Archives

Date:      Thu, 18 Aug 2011 03:04:32 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Hiroki Sato <hrs@freebsd.org>
Cc:        freebsd-stable@freebsd.org, sterling@camdensoftware.com, avg@freebsd.org, Nick Esborn <nick@desert.net>, kostikbel@gmail.com, mdtansca@freebsd.org
Subject:   Re: panic: spin lock held too long (RELENG_8 from today)
Message-ID:  <CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
In-Reply-To: <20110818.091600.831954331552558249.hrs@allbsd.org>
References:  <20110818.023832.373949045518579359.hrs@allbsd.org> <CAJ-FndCDOW0_B2MV0LZEo-tpEa9%2B7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com> <20110818.043332.27079545013461535.hrs@allbsd.org> <20110818.091600.831954331552558249.hrs@allbsd.org>

index | next in thread | previous in thread | raw e-mail


2011/8/18 Hiroki Sato <hrs@freebsd.org>:
> Hiroki Sato <hrs@freebsd.org> wrote
>  in <20110818.043332.27079545013461535.hrs@allbsd.org>:
>
> hr> Attilio Rao <attilio@freebsd.org> wrote
> hr>   in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>:
> hr>
> hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
> hr> at> > Hi,
> hr> at> >
> hr> at> > Mike Tancsa <mike@sentex.net> wrote
> hr> at> >  in <4E15A08C.6090407@sentex.net>:
> hr> at> >
> hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> hr> at> > mi> >>
> hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock
> hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the
> hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace
> hr> at> > mi> >> for the owner thread was not available.
> hr> at> > mi> >>
> hr> at> > mi> >> I was unable to make any conclusion from the data that was present.
> hr> at> > mi> >> If the situation is reproducable, you coulld try to revert r221937. This
> hr> at> > mi> >> is pure speculation, though.
> hr> at> > mi> >
> hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937
> hr> at> > mi> > unless there is any extra debugging you want me to add to the kernel
> hr> at> > mi> > instead  ?
> hr> at> >
> hr> at> >  I am also suffering from a reproducible panic on an 8-STABLE box, an
> hr> at> >  NFS server with heavy I/O load.  I could not get a kernel dump
> hr> at> >  because this panic locked up the machine just after it occurred, but
> hr> at> >  according to the stack trace it was the same as posted one.
> hr> at> >  Switching to an 8.2R kernel can prevent this panic.
> hr> at> >
> hr> at> >  Any progress on the investigation?
> hr> at>
> hr> at> Hiroki,
> hr> at> how easilly can you reproduce it?
> hr>
> hr>  It takes 5-10 hours.  I installed another kernel for debugging just
> hr>  now, so I think I will be able to collect more detail information in
> hr>  a couple of days.
> hr>
> hr> at> It would be important to have a DDB textdump with these informations:
> hr> at> - bt
> hr> at> - ps
> hr> at> - show allpcpu
> hr> at> - alltrace
> hr> at>
> hr> at> Alternatively, a coredump which has the stop cpu patch which Andryi can provide.
> hr>
> hr>  Okay, I will post them once I can get another panic.  Thanks!
>
>  I got the panic with a crash dump this time.  The result of bt, ps,
>  allpcpu, and traces can be found at the following URL:
>
>  http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt

Actually, I think I see the bug here.

In callout_cpu_switch() if a low priority thread is migrating the
callout and gets preempted after the outcoming cpu queue lock is left
(and scheduled much later) we get this problem.

In order to fix this bug it could be enough to use a critical section,
but I think this should be really interrupt safe, thus I'd wrap them
up with spinlock_enter()/spinlock_exit(). Fortunately
callout_cpu_switch() should be called rarely and also we already do
expensive locking operations in callout, thus we should not have
problem performance-wise.

Can the guys I also CC'ed here try the following patch, with all the
initial kernel options that were leading you to the deadlock? (thus
revert any debugging patch/option you added for the moment):
http://www.freebsd.org/~attilio/callout-fixup.diff

Please note that this patch is for STABLE_8, if you can confirm the
good result I'll commit to -CURRENT and then backmarge as soon as
possible.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation