From owner-freebsd-sparc64@FreeBSD.ORG Mon Mar 8 13:43:10 2004 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5E1EB16A4CF for ; Mon, 8 Mar 2004 13:43:10 -0800 (PST) Received: from slacknet.slacknet.com (slacknet.slacknet.com [204.228.135.180]) by mx1.FreeBSD.org (Postfix) with ESMTP id 32E8543D1F for ; Mon, 8 Mar 2004 13:43:10 -0800 (PST) (envelope-from rj45@slacknet.com) Received: from rj45 (helo=localhost) by slacknet.slacknet.com with local-esmtp (Exim 4.30 #1 (Debian)) id 1B0SWj-0006rt-Q7; Mon, 08 Mar 2004 14:43:09 -0700 Date: Mon, 8 Mar 2004 14:43:09 -0700 (MST) From: RJ45 To: Kris Kennaway In-Reply-To: <20040308213505.GA8475@xor.obsecurity.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SA-Exim-Scanned: No; SAEximRunCond expanded to false cc: Thomas Moestl cc: sparc64@freebsd.org Subject: Re: panic: pcib: uncorrectable DMA error AFAR 0x25df020 AFSR 0x400000ff80800000 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Mar 2004 21:43:10 -0000 have you ever had solaris on this machine? solaris reports ECC errors in /var/adm/messages if you don;t look at that file they are not reported on the console. I am talking about solaris because it is very sensitive about ECC errors or ecache errors and they are reported with details like memory bank, if the error was correctable and the score of the error ( the probabilty the error could be due to a real hardware failure). it's normal to have errors like these once in a while, they are due to cosmic rays. Sun will replace your memory only if the error occours with a frequency of 2 errors every month or more. What kind of machine you have ? errors like these are very common on ultrasparc II > 400MHz a big stock of these cpus had defective ecache modules manufactured by IBM. to tell the truth solaris deal with these errors more nicely, with FreeBSD I had experienced too kernel panics due to hardware failures (ECC errors) Neverless FreeBSD is much better anyway. I just dumped Solaris 9 for FreeBSD 5.2.1 Rick On Mon, 8 Mar 2004, Kris Kennaway wrote: > On Mon, Mar 08, 2004 at 06:30:04PM +0100, Thomas Moestl wrote: > > On Sun, 2004/03/07 at 14:06:07 -0800, Kris Kennaway wrote: > > > One of the sparc package machines just died with this: > > > > > > panic: pcib: uncorrectable DMA error AFAR 0x25df020 AFSR 0x400000ff80800000 > > > at line 739 in file /var/portbuild/sparc64/src-client/sys/sparc64/pci/psycho.c > > > cpuid = 0; > > > Debugger("panic") > > > Stopped at Debugger+0x1c: ta %xcc, 1 > > > db> trace > > > __panic() at __panic+0x17c > > > psycho_ue() at psycho_ue+0x7c > > > intr_fast() at intr_fast+0x88 > > > -- interrupt level=0xd pil=0 %o7=0xc02a5a40 -- > > > spitfire_block_zero() at spitfire_block_zero+0x70 > > > vm_page_zero_idle() at vm_page_zero_idle+0x74 > > > vm_pagezero() at vm_pagezero+0xb4 > > > fork_exit() at fork_exit+0x8c > > > fork_trampoline() at fork_trampoline+0x8 > > > db> > > > > > > Any ideas? > > > > Looks like a memory ECC error that occured during a DMA read, i.e., a > > hardware problem. Has this box complained about correctable errors > > before? > > Not that I've seen on the console. I'll keep an eye on it. > > Kris >