Date: Mon, 8 Mar 2004 14:43:09 -0700 (MST) From: RJ45 <rj45@slacknet.com> To: Kris Kennaway <kris@obsecurity.org> Cc: sparc64@freebsd.org Subject: Re: panic: pcib: uncorrectable DMA error AFAR 0x25df020 AFSR 0x400000ff80800000 Message-ID: <Pine.LNX.4.21.0403081437090.26061-100000@slacknet.slacknet.com> In-Reply-To: <20040308213505.GA8475@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
have you ever had solaris on this machine? solaris reports ECC errors in /var/adm/messages if you don;t look at that file they are not reported on the console. I am talking about solaris because it is very sensitive about ECC errors or ecache errors and they are reported with details like memory bank, if the error was correctable and the score of the error ( the probabilty the error could be due to a real hardware failure). it's normal to have errors like these once in a while, they are due to cosmic rays. Sun will replace your memory only if the error occours with a frequency of 2 errors every month or more. What kind of machine you have ? errors like these are very common on ultrasparc II > 400MHz a big stock of these cpus had defective ecache modules manufactured by IBM. to tell the truth solaris deal with these errors more nicely, with FreeBSD I had experienced too kernel panics due to hardware failures (ECC errors) Neverless FreeBSD is much better anyway. I just dumped Solaris 9 for FreeBSD 5.2.1 Rick On Mon, 8 Mar 2004, Kris Kennaway wrote: > On Mon, Mar 08, 2004 at 06:30:04PM +0100, Thomas Moestl wrote: > > On Sun, 2004/03/07 at 14:06:07 -0800, Kris Kennaway wrote: > > > One of the sparc package machines just died with this: > > > > > > panic: pcib: uncorrectable DMA error AFAR 0x25df020 AFSR 0x400000ff80800000 > > > at line 739 in file /var/portbuild/sparc64/src-client/sys/sparc64/pci/psycho.c > > > cpuid = 0; > > > Debugger("panic") > > > Stopped at Debugger+0x1c: ta %xcc, 1 > > > db> trace > > > __panic() at __panic+0x17c > > > psycho_ue() at psycho_ue+0x7c > > > intr_fast() at intr_fast+0x88 > > > -- interrupt level=0xd pil=0 %o7=0xc02a5a40 -- > > > spitfire_block_zero() at spitfire_block_zero+0x70 > > > vm_page_zero_idle() at vm_page_zero_idle+0x74 > > > vm_pagezero() at vm_pagezero+0xb4 > > > fork_exit() at fork_exit+0x8c > > > fork_trampoline() at fork_trampoline+0x8 > > > db> > > > > > > Any ideas? > > > > Looks like a memory ECC error that occured during a DMA read, i.e., a > > hardware problem. Has this box complained about correctable errors > > before? > > Not that I've seen on the console. I'll keep an eye on it. > > Kris >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.21.0403081437090.26061-100000>