From owner-freebsd-stable@FreeBSD.ORG Mon Jun 26 13:06:33 2006 Return-Path: X-Original-To: freebsd-stable@freeBSD.org Delivered-To: freebsd-stable@freeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 435E816A40B; Mon, 26 Jun 2006 13:06:33 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8477844914; Mon, 26 Jun 2006 13:06:32 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 133B846CCE; Mon, 26 Jun 2006 09:06:31 -0400 (EDT) Date: Mon, 26 Jun 2006 14:06:31 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Marc G. Fournier" In-Reply-To: <20060626081029.L1114@ganymede.hub.org> Message-ID: <20060626140333.M38418@fledge.watson.org> References: <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-acpi@freebsd.org, freebsd-stable@freeBSD.org, Pete French Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 13:06:33 -0000 On Mon, 26 Jun 2006, Marc G. Fournier wrote: >> I'm also running 6.x on several dual-PIII without problems. An issue local >> to Marc's setup is definitely indicated. Given the failure mode, I would >> be worried about a potential hardware issue, although subtle hardware and >> subtle system software problems are sometimes difficult to distinguish. > > Well, I've been trying to do it 'the hardway' ... went back to the original > kernel, and am slowly upgrading forward ... I'm currently running a June > 15th kernel with none of the problems that I was seeing before ... I'm just > in the process of running my third 'make -j3 buildworld' on this kernel, and > its clean ... going to go forward to June 22nd next, see if that too is > clean *cross fingers* I think this is a useful activity, especially if you've already run extensive memory testing on the box. If you haven't yet done that, I encourage you to take a break from buildworld's and make sure the memory tests pass. I spent several months on and off trying to track down a bug a few years ago, which turned out to be a one bit error in memory on the box. It would appear and disappear based on how the memory page was used -- for debugging kernels, it consistently got mapped to padding in the kernel's bss. For non-debugging kernels, it typically manifested in other usable kernel momory. Changes in kernel versions would move the bit around kernel memory and user memory, resulting in hard to debug failure modes. I wish I'd run the memory test earlier, but the lesson is clear! Robert N M Watson Computer Laboratory University of Cambridge