From owner-freebsd-stable@FreeBSD.ORG  Wed May 13 16:52:31 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E02631065672
	for <freebsd-stable@freebsd.org>; Wed, 13 May 2009 16:52:31 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id B14328FC1F
	for <freebsd-stable@freebsd.org>; Wed, 13 May 2009 16:52:31 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 4F90346B2E;
	Wed, 13 May 2009 12:52:31 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id 3A0F68A025;
	Wed, 13 May 2009 12:52:30 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: pluknet <pluknet@gmail.com>
Date: Wed, 13 May 2009 12:48:08 -0400
User-Agent: KMail/1.9.7
References: <a31046fc0904292336w17aca317hefd32dad5bc28007@mail.gmail.com>
	<200905131015.27431.jhb@freebsd.org>
	<a31046fc0905130841p6c762d95h40ec989bd0355c9d@mail.gmail.com>
In-Reply-To: <a31046fc0905130841p6c762d95h40ec989bd0355c9d@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200905131248.08465.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Wed, 13 May 2009 12:52:30 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE
	autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: freebsd-stable@freebsd.org
Subject: Re: lock up in 6.2 (procs massively stuck in Giant)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 May 2009 16:52:32 -0000

On Wednesday 13 May 2009 11:41:22 am pluknet wrote:
> 2009/5/13 John Baldwin <jhb@freebsd.org>:
> > On Wednesday 13 May 2009 2:40:33 am pluknet wrote:
> >> 2009/5/13 pluknet <pluknet@gmail.com>:
> >> > 2009/5/13 John Baldwin <jhb@freebsd.org>:
> >> >> On Tuesday 12 May 2009 4:59:19 pm pluknet wrote:
> >> >>> Hi.
> >> >>>
> >> >>> From just another box (not from the first two mentioned earlier)
> >> >>> with a similar locking issue. If it would make sense, since there =
are
> >> >>> possibly a bit different conditions.
> >> >>> clock proc here is on swi4, I hope it's a non-important difference.
> >> >>>
> >> >>> =A0 =A018 =A0 =A0 0 =A0 =A0 0 =A0 =A0 0 =A0LL =A0 =A0 *Giant =A0 =
=A00xd0a6b140 [swi4: clock=20
sio]
> >> >>> db> bt 18
> >> >>
> >> >> Ok, this is a known issue in 6.x. =A0It is fixed in 6.4.
> >> >>
> >>
> >> Looking at the face of kern_timeout.c I suspect that was fixed in=20
r181012.
> >
> > No, this particular issue is fixed by a change to sched_4bsd.c in r1799=
75.
> >
>=20
> Gah.. We constrained to use ule scheduler on 6.x (yes, I know that
> "it's known to be broken (c)"), since we have had a very bad interactivity
> on 4bsd on our workload. Ok, that's just another reason to move to 7.x.

Hmmm I would have thought ULE wouldn't have suffered from this bug.  The=20
problem on 4BSD was if softclock ever blocked on Giant and the thread that=
=20
held Giant was on a run queue and pinned to a specific CPU but that another=
=20
userland thread was running on that CPU already, the userland thread would=
=20
never yield the CPU so long as it kept busy since the round robin timeout=20
would never run.

=2D-=20
John Baldwin