From owner-freebsd-stable@FreeBSD.ORG Tue Nov 14 14:32:16 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 03E7C16A415 for ; Tue, 14 Nov 2006 14:32:16 +0000 (UTC) (envelope-from eagletree@hughes.net) Received: from n126.sc0.cp.net (smtpout1071.sc0.he.tucows.com [64.97.144.71]) by mx1.FreeBSD.org (Postfix) with ESMTP id E63AD43D5C for ; Tue, 14 Nov 2006 14:32:12 +0000 (GMT) (envelope-from eagletree@hughes.net) Received: from [192.168.1.100] (67.47.213.86) by n126.sc0.cp.net (7.2.069.1) (authenticated as eagletree@hughes.net) id 4558A79A0003F636 for freebsd-stable@freebsd.org; Tue, 14 Nov 2006 14:32:00 +0000 Mime-Version: 1.0 (Apple Message framework v752.2) Content-Transfer-Encoding: 7bit Message-Id: <876647A0-1066-4B9C-A48A-1649FE3315C6@hughes.net> Content-Type: text/plain; charset=US-ASCII; format=flowed To: freebsd-stable@freebsd.org From: Chris Date: Tue, 14 Nov 2006 06:31:45 -0800 X-Mailer: Apple Mail (2.752.2) Subject: 128 Bucket Free Count X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Nov 2006 14:32:16 -0000 I asked this question of Freebsd questions yesterday but realize it probably needs to go to this list. I have a Tyan S4882 quad opteron with 8GB RAM on 6.2 PR running a healthy load of webserver traffic. This machine is hanging occasionally (it was able to make 4 days this time we put it in, we are on day 5 of the second round). I've been through the hardware repeatedly since march when we bought it and at this point I don't think we have a hardware problem. Memory has been replaced and I went to 6.2 to pick up bge driver changes. I have it set up to dump if the problem occurs again and if we get a dump I'll certainly chase that as best as I can. Never done that process before so I have my doubts that I know what I'm doing. Taking another tack, I started dumping vmstat -z to another server every 2 minutes looking at the counters plus noting differences between when the system booted and current along with difference between last and current sample. I put in checks to determine anything that exceeded a 25% jump or drop in stats for any of the vm parameters. That turned out to be meaningless because approximately half make such jumps every cycle. I put in one additional check which was to look for anything that showed frequent changes yet had a Free Count that would go to zero often. I found that the stat called 128 Bucket gradually dropped down from a starting point of about 600 to 0 where it hovers between 0, 1 and 2 even as the used count gradually grows. Oddly at 12 midnight on Sunday, it totally freed up and popped back up above 500. One of the weekly cron cleanups? Does the sampling I'm doing have validity and what is a 128 Bucket? Coincidentally, yesterday people on the "questions" list discussed the 6.2 PR todo page. When I looked at it, I noted a panic related to 128 Bucket (the only reference on the net). The problem is I'm not getting a panic I don't think. But... I have to remotely cycle power to restart it because it started failing to reboot when I supped to 6.2. So I'm not really sure what is happening because I can't see the console. I set the acpi*reboot to 1 so I'm hoping that problem will disappear on the next failure. I also only allocated 8GB of swap so perhaps I've inhibited crashing correctly. Is it possible I'm seeing a related problem to the 128 Bucket Panic issue? Is it normal for 128 Bucket to sit at 0, 1, or 2 for a free count and only be reset on Sunday night at 12:00 midnight? Thank you, Chris Pratt