From owner-freebsd-stable@FreeBSD.ORG  Tue Nov 14 14:32:16 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 03E7C16A415
	for <freebsd-stable@freebsd.org>; Tue, 14 Nov 2006 14:32:16 +0000 (UTC)
	(envelope-from eagletree@hughes.net)
Received: from n126.sc0.cp.net (smtpout1071.sc0.he.tucows.com [64.97.144.71])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E63AD43D5C
	for <freebsd-stable@freebsd.org>; Tue, 14 Nov 2006 14:32:12 +0000 (GMT)
	(envelope-from eagletree@hughes.net)
Received: from [192.168.1.100] (67.47.213.86) by n126.sc0.cp.net (7.2.069.1)
	(authenticated as eagletree@hughes.net)
	id 4558A79A0003F636 for freebsd-stable@freebsd.org;
	Tue, 14 Nov 2006 14:32:00 +0000
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Transfer-Encoding: 7bit
Message-Id: <876647A0-1066-4B9C-A48A-1649FE3315C6@hughes.net>
Content-Type: text/plain; charset=US-ASCII; format=flowed
To: freebsd-stable@freebsd.org
From: Chris <eagletree@hughes.net>
Date: Tue, 14 Nov 2006 06:31:45 -0800
X-Mailer: Apple Mail (2.752.2)
Subject: 128 Bucket Free Count
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Nov 2006 14:32:16 -0000

I asked this question of Freebsd questions yesterday but
realize it probably needs to go to this list. I have a Tyan
S4882 quad opteron with 8GB RAM on 6.2 PR running a
healthy load of webserver traffic. This machine is hanging
occasionally (it was able to make 4 days this time we put
it in, we are on day 5 of the second round). I've been
through the hardware repeatedly since march when we
bought it and at this point I don't think we have a hardware
problem. Memory has been replaced and I went to 6.2
to pick up bge driver changes.

I have it set up to dump if the problem occurs again and
if we get a dump I'll certainly chase that as best as I can.
Never done that process before so I have my doubts that
I know what I'm doing.

Taking another tack, I started dumping vmstat -z to another
server every 2 minutes looking at the counters plus noting
differences between when the system booted and current
along with difference between last and current sample.
I put in checks to determine anything that exceeded a 25%
jump or drop in stats for any of the vm parameters. That
turned out to be meaningless because approximately half
make such jumps every cycle.

I put in one additional check which was to look for anything
that showed frequent changes yet had a Free Count that
would go to zero often. I found that the stat called
128 Bucket gradually dropped down from a starting point
of about 600 to 0 where it hovers between 0, 1 and 2 even
as the used count gradually grows. Oddly at 12 midnight
on Sunday, it totally freed up and popped back up above
500. One of the weekly cron cleanups?

Does the sampling I'm doing have validity and what is a
128 Bucket?

Coincidentally, yesterday people on the "questions" list
discussed the 6.2 PR todo page. When I looked at it, I noted
a panic related to 128 Bucket (the only reference on the
net). The problem is I'm not getting a panic I don't think.
But... I have to remotely cycle power to restart it because
it started failing to reboot when I supped to 6.2. So I'm
not really sure what is happening because I can't see the
console.  I set the acpi*reboot to 1 so I'm hoping that
problem will disappear on the next failure. I also
only allocated 8GB of swap so perhaps I've inhibited
crashing correctly.

Is it possible I'm seeing a related problem to the 128
Bucket Panic issue? Is it normal for 128 Bucket to sit
at 0, 1, or 2 for a free count and only be reset on
Sunday night at 12:00 midnight?

Thank you,
Chris Pratt