From owner-freebsd-alpha Mon Apr 1 16:30:28 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106]) by hub.freebsd.org (Postfix) with ESMTP id 0430B37B41A for ; Mon, 1 Apr 2002 16:30:24 -0800 (PST) Received: (from uucp@localhost) by srv1.cosmo-project.de (8.11.6/8.11.6) with UUCP id g320UKg47414; Tue, 2 Apr 2002 02:30:20 +0200 (CEST) (envelope-from ticso@cicely8.cicely.de) Received: from cicely8.cicely.de (cicely8.cicely.de [10.1.1.10]) by cicely5.cicely.de (8.12.1/8.12.1) with ESMTP id g320N76e075852; Tue, 2 Apr 2002 02:23:07 +0200 (CEST)?g (envelope-from ticso@cicely8.cicely.de) Received: from cicely8.cicely.de (localhost [127.0.0.1]) by cicely8.cicely.de (8.12.2/8.12.2) with ESMTP id g320N7nU043042; Tue, 2 Apr 2002 02:23:07 +0200 (CEST) (envelope-from ticso@cicely8.cicely.de) Received: (from ticso@localhost) by cicely8.cicely.de (8.12.2/8.12.2/Submit) id g320N5UK043041; Tue, 2 Apr 2002 02:23:05 +0200 (CEST) Date: Tue, 2 Apr 2002 02:23:04 +0200 From: Bernd Walter To: Terry Lambert Cc: Christian Weisgerber , freebsd-alpha@FreeBSD.ORG Subject: Re: Source of "processor correctable error"? Message-ID: <20020402002303.GH41357@cicely8.cicely.de> References: <3CA8EADE.C11C8DF7@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3CA8EADE.C11C8DF7@mindspring.com> User-Agent: Mutt/1.3.26i X-Operating-System: FreeBSD cicely8.cicely.de 5.0-CURRENT i386 Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Mon, Apr 01, 2002 at 03:18:54PM -0800, Terry Lambert wrote: > Christian Weisgerber wrote: > > > > Since the weekend my PC164 has taken to almost continuously spewing > > gobs of > > > > Warning: received processor correctable error. > > > > In fact I first noticed this because writing the error messages to > > the serial console took so much time the machine became sluggish. > > I've switched to a graphics console now. > > > > Anyway, is there a way to narrow down the source of the underlying > > hardware problem? What are the candidates anyway? On-chip cache, > > off-chip cache, main memory? > > FWIW, if they are correctable, it's complaining about memory > errors which are correctable using the ECC bits, in the use of > ECC memory. Right - it also hasn't to be main memory as D, I and B caches and some data paths are ECC protected too. But I never saw a message about cache failures so they might look different. > There are generally three causes of this problem which I have > seen in natures: > > 1) Thermal cooling of the system is insufficient, which > introduces thermal related errors (fix: better cooling). Possible, but I would first guess in bad simm or bad contact. > 2) The memory was being clocked faster than the speed it > was rated to run at (fix: clock it slower or buy more > expensive memory). Unlikely as overclocked memory tend to multibit errors in my expirience. > 3) The "ECC" memory was face ECC instead of real ECC, so > the correction codes were incorrect, either as a result > of a cheap vendor ripping a buyer off, or a cheap buyer > not jumpering the system to not use ECC... or the system > not having the option to be jumpered that way (fix: use > real ECC memory, and not forgeries). Beleave me - these boards don't let you even boot with such a kind of ram as you never get a chance to come over SRM because of all those error messages. I can speak from expirience here as I had an unsoldered pin on a simm which my stupid x86 box silently corrected for years... > It's always possible that you have bad RAM, or that the PCI > bus-on time is set to high in the PCI chipset for the amount > of rAM in the system, such that the DRAM referesh is delayed > enough under load that your memory starts losing bits, Etc.. I doubt that this is a refresh problem, as the chipset has well designed datapaths. > But while there are other possibilities, I have never seen them > personally in nature (with ECC; I've seen the DRAM refresh > starvation with an improperly BIOS programmed Cyrix Media GX > chipset [5532?]). Phew that's bad. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message