From owner-freebsd-sparc64@FreeBSD.ORG Mon Jul 21 15:47:12 2003 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1270037B401 for ; Mon, 21 Jul 2003 15:47:12 -0700 (PDT) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 7A6B643FAF for ; Mon, 21 Jul 2003 15:47:10 -0700 (PDT) (envelope-from tmoestl@gmx.net) Received: (qmail 21138 invoked by uid 65534); 21 Jul 2003 22:47:09 -0000 Received: from p508E7DC3.dip.t-dialin.net (EHLO galatea.local) (80.142.125.195) by mail.gmx.net (mp014) with SMTP; 22 Jul 2003 00:47:09 +0200 Received: from tmm by galatea.local with local (Exim 4.20 #1) id 19ejQy-0005zA-MW; Tue, 22 Jul 2003 00:47:08 +0200 Date: Tue, 22 Jul 2003 00:47:08 +0200 From: Thomas Moestl To: Chris Jackman Message-ID: <20030721224708.GB768@crow.dom2ip.de> Mail-Followup-To: Chris Jackman , freebsd-sparc64@freebsd.org References: <20030721194436.GA42900@collab.or8.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="BOKacYhQ+x31HxR3" Content-Disposition: inline In-Reply-To: <20030721194436.GA42900@collab.or8.net> User-Agent: Mutt/1.4.1i Sender: Thomas Moestl cc: freebsd-sparc64@freebsd.org Subject: Re: correctable DMA error AFAR X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2003 22:47:12 -0000 --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, 2003/07/21 at 15:44:36 -0400, Chris Jackman wrote: > Error messages: > > pcib0: correctable DMA error AFAR 0x476d6140 AFSR 0x40e600003f800000 > and > pcib0: correctable DMA error AFAR 0x40adbc40 AFSR 0x40c400003f800000 These signal correctable ECC errors during a DVMA read transaction. The differences in the AFSR values indicate different ECC syndromes. > My e250 has locked up twice in the last few weeks with these > error messages. The error gets repeated over and over > again on the serial console, and I can't do anything to the > box except power cycle it. This interrupt is informational only, and the documentation states that no further cleanup is required. We should probably clear the error bits in the status register however, since this looks like the interrupt being triggered again and again when any bits are still set. The manual is a bit ambiguous on that point, but clearing the bits is desirable anyway since it improves error reporting. The attached patch implements this; can you please try it and report how well it behaved on the next ECC error? Thanks, - Thomas -- Thomas Moestl http://www.tu-bs.de/~y0015675/ http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ce.diff" Index: sparc64/pci/psycho.c =================================================================== RCS file: /vol/ncvs/src/sys/sparc64/pci/psycho.c,v retrieving revision 1.41 diff -u -r1.41 psycho.c --- sparc64/pci/psycho.c 1 Jul 2003 15:52:06 -0000 1.41 +++ sparc64/pci/psycho.c 21 Jul 2003 22:41:12 -0000 @@ -745,12 +745,14 @@ struct psycho_softc *sc = (struct psycho_softc *)arg; u_int64_t afar, afsr; - PSYCHO_WRITE8(sc, PSR_CE_INT_CLR, 0); afar = PSYCHO_READ8(sc, PSR_CE_AFA); afsr = PSYCHO_READ8(sc, PSR_CE_AFS); /* It's correctable. Dump the regs and continue. */ device_printf(sc->sc_dev, "correctable DMA error AFAR %#lx " "AFSR %#lx\n", (u_long)afar, (u_long)afsr); + /* Clear the error bits that we caught. */ + PSYCHO_WRITE8(sc, PSR_CE_AFS, afsr & CEAFSR_ERRMASK); + PSYCHO_WRITE8(sc, PSR_CE_INT_CLR, 0); } static void Index: sparc64/pci/psychoreg.h =================================================================== RCS file: /vol/ncvs/src/sys/sparc64/pci/psychoreg.h,v retrieving revision 1.6 diff -u -r1.6 psychoreg.h --- sparc64/pci/psychoreg.h 6 Jan 2003 16:51:06 -0000 1.6 +++ sparc64/pci/psychoreg.h 21 Jul 2003 22:36:03 -0000 @@ -232,13 +232,28 @@ #define PCICTL_6ENABLE 0x000000000000003f /* enable 6 PCI slots */ /* Uncorrectable error asynchronous fault status registers */ -#define UEAFSR_BLK (1UL << 22) /* pri. error caused by read */ -#define UEAFSR_P_DTE (1UL << 56) /* pri. DMA translation error */ -#define UEAFSR_S_DTE (1UL << 57) /* sec. DMA translation error */ -#define UEAFSR_S_DWR (1UL << 58) /* sec. error during write */ -#define UEAFSR_S_DRD (1UL << 59) /* sec. error during read */ -#define UEAFSR_P_DWR (1UL << 61) /* pri. error during write */ -#define UEAFSR_P_DRD (1UL << 62) /* pri. error during read */ +#define UEAFSR_BLK (1UL << 23) /* Error caused by block transaction. */ +#define UEAFSR_P_DTE (1UL << 56) /* Pri. DVMA translation error. */ +#define UEAFSR_S_DTE (1UL << 57) /* Sec. DVMA translation error. */ +#define UEAFSR_S_DWR (1UL << 58) /* Sec. error during DVMA write. */ +#define UEAFSR_S_DRD (1UL << 59) /* Sec. error during DVMA read. */ +#define UEAFSR_S_PIO (1UL << 60) /* Sec. error during PIO access. */ +#define UEAFSR_P_DWR (1UL << 61) /* Pri. error during DVMA write. */ +#define UEAFSR_P_DRD (1UL << 62) /* Pri. error during DVMA read. */ +#define UEAFSR_P_PIO (1UL << 63) /* Pri. error during PIO access. */ + +/* Correctable error asynchronous fault status registers */ +#define CEAFSR_BLK (1UL << 23) /* Error caused by block transaction. */ +#define CEAFSR_S_DWR (1UL << 58) /* Sec. error caused by DVMA write. */ +#define CEAFSR_S_DRD (1UL << 59) /* Sec. error caused by DVMA read. */ +#define CEAFSR_S_PIO (1UL << 60) /* Sec. error caused by PIO access. */ +#define CEAFSR_P_DWR (1UL << 61) /* Pri. error caused by DVMA write. */ +#define CEAFSR_P_DRD (1UL << 62) /* Pri. error caused by DVMA read. */ +#define CEAFSR_P_PIO (1UL << 63) /* Pri. error caused by PIO access. */ + +#define CEAFSR_ERRMASK \ + (CEAFSR_P_PIO | CEAFSR_P_DRD | CEAFSR_P_DWR | \ + CEAFSR_S_PIO | CEAFSR_S_DRD | CEAFSR_S_DWR) /* Definitions for the target address space register. */ #define PCITAS_ADDR_SHIFT 29 --BOKacYhQ+x31HxR3--