From owner-freebsd-sparc64@FreeBSD.ORG Sun Jul 4 10:54:30 2010 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C627106564A for ; Sun, 4 Jul 2010 10:54:30 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 2D9F58FC14 for ; Sun, 4 Jul 2010 10:54:29 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.3/8.14.3/ALCHEMY.FRANKEN.DE) with ESMTP id o64AsRMs083082; Sun, 4 Jul 2010 12:54:27 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.3/8.14.3/Submit) id o64AsRKa083081; Sun, 4 Jul 2010 12:54:27 +0200 (CEST) (envelope-from marius) Date: Sun, 4 Jul 2010 12:54:27 +0200 From: Marius Strobl To: John Floren Message-ID: <20100704105427.GA82962@alchemy.franken.de> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-sparc64@freebsd.org Subject: Re: ECC errors causing panic on Sun Fire V210 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Jul 2010 10:54:30 -0000 On Sat, Jul 03, 2010 at 12:12:29PM -0400, John Floren wrote: > Hi everyone. > > I recently installed FreeBSD on my newly-acquired Sun Fire V210. In > the three days I've had it up, it has spontaneously rebooted 4 times. > The first 3 times I didn't catch any error messages on the serial > console (I had disconnected my laptop), but this morning it did it > again after I left my laptop hooked up. Here's what I saw: > > panic: trap: corrected ecc error > cpuid = 0 > [forgot to write this part, it was something like not being able to > find the dump device] > [starts rebooting in OBP] > > It seems to me that the kernel should not be panicking when an ecc > error is *corrected*. Have any of you come across this before? I > attempted to search the archives but got an internal server error. > CURRENT as well as 7-STABLE and 8-STABLE (8.1-RELEASE will be the first release to include this) actually no longer should panic in case of ECC errors. The problem was to find a machine exhibiting ECC errors frequently enough to ensure the kernel does the right thing in order to recover gracefully from them. Nevertheless they are a sign of hardware problems. Marius