Date: Tue, 2 Apr 2002 08:06:03 +0200 From: Wilko Bulte <wkb@freebie.xs4all.nl> To: Terry Lambert <tlambert2@mindspring.com> Cc: Christian Weisgerber <naddy@mips.inka.de>, freebsd-alpha@FreeBSD.ORG Subject: Re: Source of "processor correctable error"? Message-ID: <20020402080603.D41537@freebie.xs4all.nl> In-Reply-To: <3CA8EADE.C11C8DF7@mindspring.com>; from tlambert2@mindspring.com on Mon, Apr 01, 2002 at 03:18:54PM -0800 References: <a89rrl$2vek$1@kemoauc.mips.inka.de> <3CA8EADE.C11C8DF7@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 01, 2002 at 03:18:54PM -0800, Terry Lambert wrote: > Christian Weisgerber wrote: > > Since the weekend my PC164 has taken to almost continuously spewing > > gobs of > > > > Warning: received processor correctable error. > > > > In fact I first noticed this because writing the error messages to > > the serial console took so much time the machine became sluggish. > > I've switched to a graphics console now. > > > > Anyway, is there a way to narrow down the source of the underlying > > hardware problem? What are the candidates anyway? On-chip cache, > > off-chip cache, main memory? > > FWIW, if they are correctable, it's complaining about memory > errors which are correctable using the ECC bits, in the use of > ECC memory. The PC164 has ECC throughout, also on the caches etc. > There are generally three causes of this problem which I have > seen in natures: > > 1) Thermal cooling of the system is insufficient, which > introduces thermal related errors (fix: better cooling). Possible. I've seen a overheating Alpha CPU do warped things. > 3) The "ECC" memory was face ECC instead of real ECC, so > the correction codes were incorrect, either as a result > of a cheap vendor ripping a buyer off, or a cheap buyer > not jumpering the system to not use ECC... or the system > not having the option to be jumpered that way (fix: use > real ECC memory, and not forgeries). Fake parity won't even allow you to get a SRM console prompt on Alphas who need ECC. They want the real stuff. > It's always possible that you have bad RAM, or that the PCI > bus-on time is set to high in the PCI chipset for the amount > of rAM in the system, such that the DRAM referesh is delayed > enough under load that your memory starts losing bits, Etc.. Alphas don't allow you to figgle with this in the BIOS as they don't have a BIOS setup like that :) -- | / o / /_ _ wilko@FreeBSD.org |/|/ / / /( (_) Bulte Arnhem, the Netherlands We are FreeBSD. Resistance is futile. Prepare to be committed. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020402080603.D41537>