Date: Mon, 26 Jun 2006 14:06:31 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: "Marc G. Fournier" <scrappy@hub.org> Cc: freebsd-acpi@freebsd.org, freebsd-stable@freeBSD.org, Pete French <petefrench@ticketswitch.com> Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... Message-ID: <20060626140333.M38418@fledge.watson.org> In-Reply-To: <20060626081029.L1114@ganymede.hub.org> References: <E1FuYsL-000HT3-H2@dilbert.firstcallgroup.co.uk> <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 26 Jun 2006, Marc G. Fournier wrote: >> I'm also running 6.x on several dual-PIII without problems. An issue local >> to Marc's setup is definitely indicated. Given the failure mode, I would >> be worried about a potential hardware issue, although subtle hardware and >> subtle system software problems are sometimes difficult to distinguish. > > Well, I've been trying to do it 'the hardway' ... went back to the original > kernel, and am slowly upgrading forward ... I'm currently running a June > 15th kernel with none of the problems that I was seeing before ... I'm just > in the process of running my third 'make -j3 buildworld' on this kernel, and > its clean ... going to go forward to June 22nd next, see if that too is > clean *cross fingers* I think this is a useful activity, especially if you've already run extensive memory testing on the box. If you haven't yet done that, I encourage you to take a break from buildworld's and make sure the memory tests pass. I spent several months on and off trying to track down a bug a few years ago, which turned out to be a one bit error in memory on the box. It would appear and disappear based on how the memory page was used -- for debugging kernels, it consistently got mapped to padding in the kernel's bss. For non-debugging kernels, it typically manifested in other usable kernel momory. Changes in kernel versions would move the bit around kernel memory and user memory, resulting in hard to debug failure modes. I wish I'd run the memory test earlier, but the lesson is clear! Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060626140333.M38418>