Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Aug 2011 19:55:50 -0700
From:      Chip Camden <sterling@camdensoftware.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: panic: spin lock held too long (RELENG_8 from today)
Message-ID:  <20110818025550.GA1971@libertas.local.camdensoftware.com>
In-Reply-To: <CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>
References:  <20110818.023832.373949045518579359.hrs@allbsd.org> <CAJ-FndCDOW0_B2MV0LZEo-tpEa9%2B7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com> <20110818.043332.27079545013461535.hrs@allbsd.org> <20110818.091600.831954331552558249.hrs@allbsd.org> <CAJ-FndCL70m41dQ9FPmzUg0V8a9JacvLOnjmMQL=3PfN7NmPfQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Quoth Attilio Rao on Thursday, 18 August 2011:
> 2011/8/18 Hiroki Sato <hrs@freebsd.org>:
> > Hiroki Sato <hrs@freebsd.org> wrote
> > =A0in <20110818.043332.27079545013461535.hrs@allbsd.org>:
> >
> > hr> Attilio Rao <attilio@freebsd.org> wrote
> > hr> =A0 in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gm=
ail.com>:
> > hr>
> > hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
> > hr> at> > Hi,
> > hr> at> >
> > hr> at> > Mike Tancsa <mike@sentex.net> wrote
> > hr> at> > =A0in <4E15A08C.6090407@sentex.net>:
> > hr> at> >
> > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> > hr> at> > mi> >>
> > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long",=
 the spinlock
> > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgra=
ded to the
> > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so=
 the stack trace
> > hr> at> > mi> >> for the owner thread was not available.
> > hr> at> > mi> >>
> > hr> at> > mi> >> I was unable to make any conclusion from the data that=
 was present.
> > hr> at> > mi> >> If the situation is reproducable, you coulld try to re=
vert r221937. This
> > hr> at> > mi> >> is pure speculation, though.
> > hr> at> > mi> >
> > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try an=
d revert r221937
> > hr> at> > mi> > unless there is any extra debugging you want me to add =
to the kernel
> > hr> at> > mi> > instead =A0?
> > hr> at> >
> > hr> at> > =A0I am also suffering from a reproducible panic on an 8-STAB=
LE box, an
> > hr> at> > =A0NFS server with heavy I/O load. =A0I could not get a kerne=
l dump
> > hr> at> > =A0because this panic locked up the machine just after it occ=
urred, but
> > hr> at> > =A0according to the stack trace it was the same as posted one.
> > hr> at> > =A0Switching to an 8.2R kernel can prevent this panic.
> > hr> at> >
> > hr> at> > =A0Any progress on the investigation?
> > hr> at>
> > hr> at> Hiroki,
> > hr> at> how easilly can you reproduce it?
> > hr>
> > hr> =A0It takes 5-10 hours. =A0I installed another kernel for debugging=
 just
> > hr> =A0now, so I think I will be able to collect more detail informatio=
n in
> > hr> =A0a couple of days.
> > hr>
> > hr> at> It would be important to have a DDB textdump with these informa=
tions:
> > hr> at> - bt
> > hr> at> - ps
> > hr> at> - show allpcpu
> > hr> at> - alltrace
> > hr> at>
> > hr> at> Alternatively, a coredump which has the stop cpu patch which An=
dryi can provide.
> > hr>
> > hr> =A0Okay, I will post them once I can get another panic. =A0Thanks!
> >
> > =A0I got the panic with a crash dump this time. =A0The result of bt, ps,
> > =A0allpcpu, and traces can be found at the following URL:
> >
> > =A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt
>=20
> Actually, I think I see the bug here.
>=20
> In callout_cpu_switch() if a low priority thread is migrating the
> callout and gets preempted after the outcoming cpu queue lock is left
> (and scheduled much later) we get this problem.
>=20
> In order to fix this bug it could be enough to use a critical section,
> but I think this should be really interrupt safe, thus I'd wrap them
> up with spinlock_enter()/spinlock_exit(). Fortunately
> callout_cpu_switch() should be called rarely and also we already do
> expensive locking operations in callout, thus we should not have
> problem performance-wise.
>=20
> Can the guys I also CC'ed here try the following patch, with all the
> initial kernel options that were leading you to the deadlock? (thus
> revert any debugging patch/option you added for the moment):
> http://www.freebsd.org/~attilio/callout-fixup.diff
>=20
> Please note that this patch is for STABLE_8, if you can confirm the
> good result I'll commit to -CURRENT and then backmarge as soon as
> possible.
>=20
> Thanks,
> Attilio
>=20

Thanks, Attilio.  I've applied the patch and removed the extra debug
options I had added (though keeping debug symbols).  I'll let you know if
I experience any more panics.

Regards,

--=20
=2EO. | Sterling (Chip) Camden      | http://camdensoftware.com
=2E.O | sterling@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

--LQksG6bCIzRHxTLp
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJOTH82AAoJEIpckszW26+Rm0oH/3Ikeau8F1c55yqTjMh6X78B
/3yTy68BsfBwD/VeA00Q/cpxlCafovUeP8WwXPE9mNkdR9Rhf1VuU7K1iLOtbGHe
F+UJ/rB8rNPUNxezCqo2kzoMhx2o9NbCiZPW9toyL1lW/pa/B5/lToma8BnbxzOH
2LBSU/8+HU8YphqXr4hPEPFxWUx74tSvieHOEBI1/GVZea2vpUrInO7cfqQ3DzLE
/6vnvb0KVfhQjTeeApdFen46eS2mbPl+PtMKGv3C7Ctle+Bv2hm3QhoIc8DCOTTE
9lBdByd2lozIUK+bsc2DMg/+keoW9h1MRVcaNRASOhdx1L6QId6ULdg9Z5QO2G8=
=jONj
-----END PGP SIGNATURE-----

--LQksG6bCIzRHxTLp--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110818025550.GA1971>