From owner-freebsd-current@FreeBSD.ORG Fri Oct 15 21:56:38 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B12CD16A4CE for ; Fri, 15 Oct 2004 21:56:38 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 392BB43D49 for ; Fri, 15 Oct 2004 21:56:38 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.1/8.13.1) with ESMTP id i9FLuLkf082072; Fri, 15 Oct 2004 14:56:26 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200410152156.i9FLuLkf082072@gw.catspoiler.org> Date: Fri, 15 Oct 2004 14:56:21 -0700 (PDT) From: Don Lewis To: bob@boulderlabs.com In-Reply-To: <200410151741.i9FHfMxQ036620@vec.boulderlabs.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org Subject: Re: 5.3-BETA7 install cd: kernel trap 12 with interrupts disabled (fwd) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Oct 2004 21:56:38 -0000 On 15 Oct, Robert Gray wrote: > > > Yes, a built-in memory tester would be helpful. However, my experience > is that many memory problems pass the x86memtest, but cause > "seg faults" during buildworld, partly due to the extra load and heat > from disk activity. > > I'm a firm believer that we should encourage our users to > buy ECC systems - motherboards that support ECC, and the > more expensive SIMMs/DIMMs that have redundant bits. So do I, but we don't have any support for reporting ECC errors. Hardware ECC support will paper over defective memory that has bad bits, but it won't be reliable. Frequent correctable ECC errors are a good indication that there is a hardware problem that needs to be fixed. Blindly turning ECC on will make hardware problems harder to detect and fix. I have one motherboard/memory combo (ECC on both) that sets the memory timing incorrectly (the memory is rated CL 2.5, but the BIOS configures it as CL 2 when it is configured to set the timing automaticallly). I was seeing files occasionally get corrupted when the were cached in RAM (/usr/src and /usr/obj would get hit), and some of the longer running tests in memtest86 would detect the problem. The problem went away when I manually set the memory timing to the correct value.