Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Mar 2004 14:43:09 -0700 (MST)
From:      RJ45 <rj45@slacknet.com>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        sparc64@freebsd.org
Subject:   Re: panic: pcib: uncorrectable DMA error AFAR 0x25df020 AFSR 0x400000ff80800000
Message-ID:  <Pine.LNX.4.21.0403081437090.26061-100000@slacknet.slacknet.com>
In-Reply-To: <20040308213505.GA8475@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help

have you ever had solaris on this machine?
solaris reports ECC errors in /var/adm/messages
if you don;t look at that file they are not reported on the console.
I am talking about solaris because it is very sensitive about ECC
errors or ecache errors and they are reported with details like
memory bank, if the error was correctable and the score of the error ( the
probabilty the error could be due to a real hardware failure).

it's normal to have errors like these once in a while, they are due to
cosmic rays. Sun will replace your memory only if the error occours with 
a frequency of 2 errors every month or more.

What kind of machine you have ?
errors like these are very common on ultrasparc II > 400MHz
a big stock of these cpus had defective ecache modules manufactured by
IBM.

to tell the truth solaris deal with these errors more nicely, with FreeBSD
I had experienced too kernel panics due to hardware failures (ECC errors)

Neverless FreeBSD is much better anyway.
I just dumped Solaris 9 for FreeBSD 5.2.1

Rick


On Mon, 8 Mar 2004, Kris Kennaway wrote:

> On Mon, Mar 08, 2004 at 06:30:04PM +0100, Thomas Moestl wrote:
> > On Sun, 2004/03/07 at 14:06:07 -0800, Kris Kennaway wrote:
> > > One of the sparc package machines just died with this:
> > > 
> > > panic: pcib: uncorrectable DMA error AFAR 0x25df020 AFSR 0x400000ff80800000
> > > at line 739 in file /var/portbuild/sparc64/src-client/sys/sparc64/pci/psycho.c
> > > cpuid = 0;
> > > Debugger("panic")
> > > Stopped at      Debugger+0x1c:  ta              %xcc, 1
> > > db> trace
> > > __panic() at __panic+0x17c
> > > psycho_ue() at psycho_ue+0x7c
> > > intr_fast() at intr_fast+0x88
> > > -- interrupt level=0xd pil=0 %o7=0xc02a5a40 --
> > > spitfire_block_zero() at spitfire_block_zero+0x70
> > > vm_page_zero_idle() at vm_page_zero_idle+0x74
> > > vm_pagezero() at vm_pagezero+0xb4
> > > fork_exit() at fork_exit+0x8c
> > > fork_trampoline() at fork_trampoline+0x8
> > > db>
> > > 
> > > Any ideas?
> > 
> > Looks like a memory ECC error that occured during a DMA read, i.e., a
> > hardware problem. Has this box complained about correctable errors
> > before?
> 
> Not that I've seen on the console.  I'll keep an eye on it.
> 
> Kris
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.21.0403081437090.26061-100000>