From owner-freebsd-current@FreeBSD.ORG Fri Oct 15 17:41:34 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E6F016A4CE for ; Fri, 15 Oct 2004 17:41:34 +0000 (GMT) Received: from mail.boulderlabs.com (mail.boulderlabs.com [206.168.112.48]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2FDB643D2F for ; Fri, 15 Oct 2004 17:41:34 +0000 (GMT) (envelope-from bob@mail.boulderlabs.com) Received: from vec.boulderlabs.com (cpe-24-221-212-162.co.sprintbbd.net [24.221.212.162])i9FHfVe2014113 for ; Fri, 15 Oct 2004 11:41:32 -0600 (MDT) (envelope-from bob@mail.boulderlabs.com) Received: from vec.boulderlabs.com (localhost.boulderlabs.com [127.0.0.1]) by vec.boulderlabs.com (8.12.2/8.12.2) with ESMTP id i9FHfMxQ036620 for ; Fri, 15 Oct 2004 11:41:22 -0600 (MDT) (envelope-from bob@vec.boulderlabs.com) Message-Id: <200410151741.i9FHfMxQ036620@vec.boulderlabs.com> From: Robert Gray To: freebsd-current@freebsd.org Date: Fri, 15 Oct 2004 11:41:22 -0600 Sender: bob@boulderlabs.com X-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.60-rc3 X-Spam-Checker-Version: SpamAssassin 2.60-rc3 (1.202-2003-08-29-exp) on mail.boulderlabs.com Subject: Re: 5.3-BETA7 install cd: kernel trap 12 with interrupts disabled (fwd) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Oct 2004 17:41:34 -0000 Yes, a built-in memory tester would be helpful. However, my experience is that many memory problems pass the x86memtest, but cause "seg faults" during buildworld, partly due to the extra load and heat from disk activity. I'm a firm believer that we should encourage our users to buy ECC systems - motherboards that support ECC, and the more expensive SIMMs/DIMMs that have redundant bits. I'd rather spend a few more dollars to *know* when memory is the problem during normal operations instead of trying to locate the source of the problem after something bad happens. I would be happy to lead a "documentation" project for FreeBSD that talks about the issues and the current hardware options. I've got the start of an article for the USENIX login magazine. If people want to encourage me to publish, please send email and suggest an appropriate mailing group or forum. Thanks -robert gray Robert Watson Fri, 15 Oct 2004 04:14:04 EDT says: >On Fri, 15 Oct 2004, Guido van Rooij wrote: > >> It turns out this was a memory problem. Refitting the dimms was all >> that was needed to 'solve' the issue. The weird thing is that >> 1. The BIOS had memtesting enabled and did not complain >> 2. W2K seemed to install ok (though I never let it hit the disk) >> 3. FBSD would repeatedly crash at exactly the same spot. >> >> I am beginning to wonder if we should have a boot option that enables a >> thorough memtest from within the kernel...(e.g. boot -m). > >I guess the old /dev/test_for_shoddy_workmanship driver is working :-). > >It's probably just bad luck -- BIOS memory testing is generally pretty >poor, and it could just be FreeBSD stored your root vnode pointer (or some >other critical thing) in a memory word that Win2K was using for an icon, >so a single bit twiddle did pretty different things. Glad it's fixed. I >occasionally wonder if we shouldn't build a memory tester into the FreeBSD >boot loader to help diagnose this sort of stuff, though. Not to run every >boot, but as a diagnostics option. > >Robert N M Watson FreeBSD Core Team, TrustedBSD Projects >robert@fledge.watson.org Principal Research Scientist, McAfee Research