From owner-freebsd-bugs@FreeBSD.ORG Sat Nov 29 08:40:18 2003 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EBCA716A4CE for ; Sat, 29 Nov 2003 08:40:18 -0800 (PST) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 257D643FAF for ; Sat, 29 Nov 2003 08:40:18 -0800 (PST) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.9/8.12.9) with ESMTP id hATGeIFY068808 for ; Sat, 29 Nov 2003 08:40:18 -0800 (PST) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.9/8.12.9/Submit) id hATGeHpe068807; Sat, 29 Nov 2003 08:40:17 -0800 (PST) (envelope-from gnats) Date: Sat, 29 Nov 2003 08:40:17 -0800 (PST) Message-Id: <200311291640.hATGeHpe068807@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Don Bowman Subject: RE: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Don Bowman List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Nov 2003 16:40:19 -0000 The following reply was made to PR kern/59719; it has been noted by GNATS. From: Don Bowman To: 'Uwe Doering' , freebsd-gnats-submit@FreeBSD.org Cc: freebsd-bugs@freebsd.org, freebsd-stable@freebsd.org Subject: RE: kern/59719 Re: 4.9 Stable Crashes on SuperMicro with SMP Date: Sat, 29 Nov 2003 11:33:58 -0500 From: Uwe Doering [mailto:gemini@geminix.org] > Jonathan Gilpin wrote: > > I've run memtest (memtest86.com) kindly provided by Don and > it passed all > > the tests. I've installed installed a kernel module to test > for memory > > errors and found that again no memory errors are found... > So this means it's > > either a problem with the CPU's or a geniune bug in the > kernel. (bugger!) > > No, that's unfortunately not what it means. If a memory test > fails you > can draw the conclusion that you have bad memory, but this > doesn't work > the other way round. If a memory test passes there is still a > possibility that a memory chip is the culprit since memory > test software > cannot find all errors. > > Also, there is the chip set on the mainboard that coordinates > bus access > etc. for the two CPUs. Mainboard and chip set developers are > known to > make errors, too. In this case you would have to swap the entire > mainboard, possible with one from a different manufacturer. > I can tell > you from my own experience that it is really hard to find reliable PC > hardware these days, in light of ever shorter and faster > product release > cycles. I have several hundred of the motherboard the poster is using, and it works reliably with MP operation with 4.X. The memtest86 that i sent him understands the ECC registers on the e7501 MCH, it should find all correctable and uncorrectable errors. --don