From owner-freebsd-alpha Mon Apr 1 15:19:29 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12]) by hub.freebsd.org (Postfix) with ESMTP id 7834937B417 for ; Mon, 1 Apr 2002 15:19:19 -0800 (PST) Received: from pool0467.cvx40-bradley.dialup.earthlink.net ([216.244.43.212] helo=mindspring.com) by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16sB52-0007Dh-00; Mon, 01 Apr 2002 15:19:17 -0800 Message-ID: <3CA8EADE.C11C8DF7@mindspring.com> Date: Mon, 01 Apr 2002 15:18:54 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Christian Weisgerber Cc: freebsd-alpha@freebsd.org Subject: Re: Source of "processor correctable error"? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Christian Weisgerber wrote: > > Since the weekend my PC164 has taken to almost continuously spewing > gobs of > > Warning: received processor correctable error. > > In fact I first noticed this because writing the error messages to > the serial console took so much time the machine became sluggish. > I've switched to a graphics console now. > > Anyway, is there a way to narrow down the source of the underlying > hardware problem? What are the candidates anyway? On-chip cache, > off-chip cache, main memory? FWIW, if they are correctable, it's complaining about memory errors which are correctable using the ECC bits, in the use of ECC memory. There are generally three causes of this problem which I have seen in natures: 1) Thermal cooling of the system is insufficient, which introduces thermal related errors (fix: better cooling). 2) The memory was being clocked faster than the speed it was rated to run at (fix: clock it slower or buy more expensive memory). 3) The "ECC" memory was face ECC instead of real ECC, so the correction codes were incorrect, either as a result of a cheap vendor ripping a buyer off, or a cheap buyer not jumpering the system to not use ECC... or the system not having the option to be jumpered that way (fix: use real ECC memory, and not forgeries). It's always possible that you have bad RAM, or that the PCI bus-on time is set to high in the PCI chipset for the amount of rAM in the system, such that the DRAM referesh is delayed enough under load that your memory starts losing bits, Etc.. But while there are other possibilities, I have never seen them personally in nature (with ECC; I've seen the DRAM refresh starvation with an improperly BIOS programmed Cyrix Media GX chipset [5532?]). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message