From owner-freebsd-stable@FreeBSD.ORG Wed May 13 16:52:31 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E02631065672 for ; Wed, 13 May 2009 16:52:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B14328FC1F for ; Wed, 13 May 2009 16:52:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 4F90346B2E; Wed, 13 May 2009 12:52:31 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 3A0F68A025; Wed, 13 May 2009 12:52:30 -0400 (EDT) From: John Baldwin To: pluknet Date: Wed, 13 May 2009 12:48:08 -0400 User-Agent: KMail/1.9.7 References: <200905131015.27431.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200905131248.08465.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 13 May 2009 12:52:30 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-stable@freebsd.org Subject: Re: lock up in 6.2 (procs massively stuck in Giant) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2009 16:52:32 -0000 On Wednesday 13 May 2009 11:41:22 am pluknet wrote: > 2009/5/13 John Baldwin : > > On Wednesday 13 May 2009 2:40:33 am pluknet wrote: > >> 2009/5/13 pluknet : > >> > 2009/5/13 John Baldwin : > >> >> On Tuesday 12 May 2009 4:59:19 pm pluknet wrote: > >> >>> Hi. > >> >>> > >> >>> From just another box (not from the first two mentioned earlier) > >> >>> with a similar locking issue. If it would make sense, since there = are > >> >>> possibly a bit different conditions. > >> >>> clock proc here is on swi4, I hope it's a non-important difference. > >> >>> > >> >>> =A0 =A018 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 =A0LL =A0 =A0 *Giant =A0 = =A00xd0a6b140 [swi4: clock=20 sio] > >> >>> db> bt 18 > >> >> > >> >> Ok, this is a known issue in 6.x. =A0It is fixed in 6.4. > >> >> > >> > >> Looking at the face of kern_timeout.c I suspect that was fixed in=20 r181012. > > > > No, this particular issue is fixed by a change to sched_4bsd.c in r1799= 75. > > >=20 > Gah.. We constrained to use ule scheduler on 6.x (yes, I know that > "it's known to be broken (c)"), since we have had a very bad interactivity > on 4bsd on our workload. Ok, that's just another reason to move to 7.x. Hmmm I would have thought ULE wouldn't have suffered from this bug. The=20 problem on 4BSD was if softclock ever blocked on Giant and the thread that= =20 held Giant was on a run queue and pinned to a specific CPU but that another= =20 userland thread was running on that CPU already, the userland thread would= =20 never yield the CPU so long as it kept busy since the round robin timeout=20 would never run. =2D-=20 John Baldwin