From owner-freebsd-stable@FreeBSD.ORG Thu Aug 18 02:55:55 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 72DDB106564A for ; Thu, 18 Aug 2011 02:55:55 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh1.interactivevillages.com (ca.2e.7bae.static.theplanet.com [174.123.46.202]) by mx1.freebsd.org (Postfix) with ESMTP id 386458FC14 for ; Thu, 18 Aug 2011 02:55:55 +0000 (UTC) Received: from 184-78-197-203.war.clearwire-wmx.net ([184.78.197.203] helo=_HOSTNAME_) by wh1.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Qtslr-0001Mh-3w for freebsd-stable@freebsd.org; Wed, 17 Aug 2011 19:55:28 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Wed, 17 Aug 2011 19:55:50 -0700 Date: Wed, 17 Aug 2011 19:55:50 -0700 From: Chip Camden To: freebsd-stable@freebsd.org Message-ID: <20110818025550.GA1971@libertas.local.camdensoftware.com> Mail-Followup-To: freebsd-stable@freebsd.org References: <20110818.023832.373949045518579359.hrs@allbsd.org> <20110818.043332.27079545013461535.hrs@allbsd.org> <20110818.091600.831954331552558249.hrs@allbsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="LQksG6bCIzRHxTLp" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-PGP-Key: http://pgp.mit.edu:11371/pks/lookup?search=0xD6DBAF91 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh1.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com X-Source: X-Source-Args: X-Source-Dir: Subject: Re: panic: spin lock held too long (RELENG_8 from today) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 02:55:55 -0000 --LQksG6bCIzRHxTLp Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Attilio Rao on Thursday, 18 August 2011: > 2011/8/18 Hiroki Sato : > > Hiroki Sato wrote > > =A0in <20110818.043332.27079545013461535.hrs@allbsd.org>: > > > > hr> Attilio Rao wrote > > hr> =A0 in : > > hr> > > hr> at> 2011/8/17 Hiroki Sato : > > hr> at> > Hi, > > hr> at> > > > hr> at> > Mike Tancsa wrote > > hr> at> > =A0in <4E15A08C.6090407@sentex.net>: > > hr> at> > > > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > > hr> at> > mi> >> > > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long",= the spinlock > > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgra= ded to the > > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so= the stack trace > > hr> at> > mi> >> for the owner thread was not available. > > hr> at> > mi> >> > > hr> at> > mi> >> I was unable to make any conclusion from the data that= was present. > > hr> at> > mi> >> If the situation is reproducable, you coulld try to re= vert r221937. This > > hr> at> > mi> >> is pure speculation, though. > > hr> at> > mi> > > > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try an= d revert r221937 > > hr> at> > mi> > unless there is any extra debugging you want me to add = to the kernel > > hr> at> > mi> > instead =A0? > > hr> at> > > > hr> at> > =A0I am also suffering from a reproducible panic on an 8-STAB= LE box, an > > hr> at> > =A0NFS server with heavy I/O load. =A0I could not get a kerne= l dump > > hr> at> > =A0because this panic locked up the machine just after it occ= urred, but > > hr> at> > =A0according to the stack trace it was the same as posted one. > > hr> at> > =A0Switching to an 8.2R kernel can prevent this panic. > > hr> at> > > > hr> at> > =A0Any progress on the investigation? > > hr> at> > > hr> at> Hiroki, > > hr> at> how easilly can you reproduce it? > > hr> > > hr> =A0It takes 5-10 hours. =A0I installed another kernel for debugging= just > > hr> =A0now, so I think I will be able to collect more detail informatio= n in > > hr> =A0a couple of days. > > hr> > > hr> at> It would be important to have a DDB textdump with these informa= tions: > > hr> at> - bt > > hr> at> - ps > > hr> at> - show allpcpu > > hr> at> - alltrace > > hr> at> > > hr> at> Alternatively, a coredump which has the stop cpu patch which An= dryi can provide. > > hr> > > hr> =A0Okay, I will post them once I can get another panic. =A0Thanks! > > > > =A0I got the panic with a crash dump this time. =A0The result of bt, ps, > > =A0allpcpu, and traces can be found at the following URL: > > > > =A0http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt >=20 > Actually, I think I see the bug here. >=20 > In callout_cpu_switch() if a low priority thread is migrating the > callout and gets preempted after the outcoming cpu queue lock is left > (and scheduled much later) we get this problem. >=20 > In order to fix this bug it could be enough to use a critical section, > but I think this should be really interrupt safe, thus I'd wrap them > up with spinlock_enter()/spinlock_exit(). Fortunately > callout_cpu_switch() should be called rarely and also we already do > expensive locking operations in callout, thus we should not have > problem performance-wise. >=20 > Can the guys I also CC'ed here try the following patch, with all the > initial kernel options that were leading you to the deadlock? (thus > revert any debugging patch/option you added for the moment): > http://www.freebsd.org/~attilio/callout-fixup.diff >=20 > Please note that this patch is for STABLE_8, if you can confirm the > good result I'll commit to -CURRENT and then backmarge as soon as > possible. >=20 > Thanks, > Attilio >=20 Thanks, Attilio. I've applied the patch and removed the extra debug options I had added (though keeping debug symbols). I'll let you know if I experience any more panics. Regards, --=20 =2EO. | Sterling (Chip) Camden | http://camdensoftware.com =2E.O | sterling@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com --LQksG6bCIzRHxTLp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iQEcBAEBAgAGBQJOTH82AAoJEIpckszW26+Rm0oH/3Ikeau8F1c55yqTjMh6X78B /3yTy68BsfBwD/VeA00Q/cpxlCafovUeP8WwXPE9mNkdR9Rhf1VuU7K1iLOtbGHe F+UJ/rB8rNPUNxezCqo2kzoMhx2o9NbCiZPW9toyL1lW/pa/B5/lToma8BnbxzOH 2LBSU/8+HU8YphqXr4hPEPFxWUx74tSvieHOEBI1/GVZea2vpUrInO7cfqQ3DzLE /6vnvb0KVfhQjTeeApdFen46eS2mbPl+PtMKGv3C7Ctle+Bv2hm3QhoIc8DCOTTE 9lBdByd2lozIUK+bsc2DMg/+keoW9h1MRVcaNRASOhdx1L6QId6ULdg9Z5QO2G8= =jONj -----END PGP SIGNATURE----- --LQksG6bCIzRHxTLp--