From owner-freebsd-sparc64@FreeBSD.ORG Mon Nov 29 21:57:27 2010 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E0AA106567A for ; Mon, 29 Nov 2010 21:57:27 +0000 (UTC) (envelope-from mexas@bristol.ac.uk) Received: from dirg.bris.ac.uk (dirg.bris.ac.uk [137.222.10.102]) by mx1.freebsd.org (Postfix) with ESMTP id D19B88FC12 for ; Mon, 29 Nov 2010 21:57:26 +0000 (UTC) Received: from ncsd.bris.ac.uk ([137.222.10.59] helo=ncs.bris.ac.uk) by dirg.bris.ac.uk with esmtp (Exim 4.69) (envelope-from ) id 1PNBiv-00025q-Ky; Mon, 29 Nov 2010 21:57:25 +0000 Received: from mech-cluster241.men.bris.ac.uk ([137.222.187.241]) by ncs.bris.ac.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1PNBiv-0003j4-FM; Mon, 29 Nov 2010 21:57:25 +0000 Received: from mech-cluster241.men.bris.ac.uk (localhost [127.0.0.1]) by mech-cluster241.men.bris.ac.uk (8.14.4/8.14.4) with ESMTP id oATLvPMl067238; Mon, 29 Nov 2010 21:57:25 GMT (envelope-from mexas@bristol.ac.uk) Received: (from mexas@localhost) by mech-cluster241.men.bris.ac.uk (8.14.4/8.14.4/Submit) id oATLvPux067237; Mon, 29 Nov 2010 21:57:25 GMT (envelope-from mexas@bristol.ac.uk) X-Authentication-Warning: mech-cluster241.men.bris.ac.uk: mexas set sender to mexas@bristol.ac.uk using -f Date: Mon, 29 Nov 2010 21:57:25 +0000 From: Anton Shterenlikht To: Marius Strobl Message-ID: <20101129215724.GA67174@mech-cluster241.men.bris.ac.uk> References: <20101129093231.GA96073@mech-cluster241.men.bris.ac.uk> <20101129184037.GB18481@alchemy.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101129184037.GB18481@alchemy.franken.de> User-Agent: Mutt/1.4.2.3i Cc: freebsd-sparc64@freebsd.org Subject: Re: kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2010 21:57:27 -0000 On Mon, Nov 29, 2010 at 07:40:37PM +0100, Marius Strobl wrote: > On Mon, Nov 29, 2010 at 09:32:31AM +0000, Anton Shterenlikht wrote: > > On blade1500 silver 9.0-CURRENT #0 r212302 I got a panic, > > which was preceded by these messages in /var/log/messages: > > > > > > Nov 28 22:59:13 mech-anton240 ntpd[860]: time reset +0.313838 s > > Nov 28 23:21:39 mech-anton240 ntpd[860]: time reset +0.354851 s > > Nov 28 23:40:17 mech-anton240 ntpd[860]: time reset +0.319586 s > > Nov 29 00:02:51 mech-anton240 ntpd[860]: time reset +0.357852 s > > Nov 29 00:21:34 mech-anton240 ntpd[860]: time reset +0.327949 s > > Nov 29 00:42:54 mech-anton240 ntpd[860]: time reset +0.347609 s > > Nov 29 01:01:46 mech-anton240 ntpd[860]: time reset +0.329297 s > > Nov 29 01:18:55 mech-anton240 ntpd[860]: time reset +0.317517 s > > Nov 29 01:42:21 mech-anton240 ntpd[860]: time reset +0.354540 s > > Nov 29 02:02:14 mech-anton240 ntpd[860]: time reset +0.344071 s > > Nov 29 02:10:26 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:26 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:26 mech-anton240 kernel: corrected Epcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:26 mech-anton240 kernel: CC error > > Nov 29 02:10:26 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:36 mech-anton240 last message repeated 40137 times > > Nov 29 02:10:36 mech-anton240 kernel: FAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:36 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:42 mech-anton240 last message repeated 26464 times > > Nov 29 02:10:42 mech-anton240 kernel: FAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:42 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:42 mech-anton240 last message repeated 14 times > > Nov 29 02:10:42 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000FAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:42 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:45 mech-anton240 last message repeated 12750 times > > Nov 29 02:10:46 mech-anton240 kernel: FAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:10:46 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:11:06 mech-anton240 last message repeated 73851 times > > Nov 29 02:11:06 mech-anton240 kernel: pcib1: correctable DMA error AFFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:11:06 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:11:37 mech-anton240 last message repeated 72138 times > > Nov 29 02:13:38 mech-anton240 last message repeated 180714 times > > Nov 29 02:20:33 mech-anton240 last message repeated 623033 times > > Nov 29 02:20:33 mech-anton240 ntpd[860]: time reset +0.317476 s > > Nov 29 02:20:33 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:21:03 mech-anton240 last message repeated 44694 times > > Nov 29 02:23:03 mech-anton240 last message repeated 179765 times > > Nov 29 02:33:03 mech-anton240 last message repeated 900956 times > > Nov 29 02:41:41 mech-anton240 last message repeated 774749 times > > Nov 29 02:41:41 mech-anton240 ntpd[860]: time reset +0.338347 s > > Nov 29 02:41:41 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 02:42:12 mech-anton240 last message repeated 45005 times > > Nov 29 02:44:13 mech-anton240 last message repeated 180767 times > > Nov 29 02:54:14 mech-anton240 last message repeated 901067 times > > Nov 29 03:04:01 mech-anton240 last message repeated 953527 times > > Nov 29 03:04:01 mech-anton240 ntpd[860]: time reset +0.352855 s > > Nov 29 03:04:02 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 03:04:33 mech-anton240 last message repeated 59600 times > > Nov 29 03:06:34 mech-anton240 last message repeated 210676 times > > Nov 29 03:16:35 mech-anton240 last message repeated 901966 times > > Nov 29 03:21:49 mech-anton240 last message repeated 473275 times > > Nov 29 03:21:49 mech-anton240 ntpd[860]: time reset +0.330125 s > > Nov 29 03:21:49 mech-anton240 kernel: pcib1: correctable DMA error AFAR 0x234323bc0 AFSR 0x1e000000 > > Nov 29 03:22:20 mech-anton240 last message repeated 44963 times > > Nov 29 03:24:21 mech-anton240 last message repeated 182191 times > > Nov 29 09:18:15 mech-anton240 syslogd: kernel boot file is /boot/kernel/kernel > > > > The panic was (copied by hand): > > > > panic: pcib1: JBus error 0. > > > > If it happens again, I'll post the full bt. > > > > Is /var/log/messages indicative of a hardware failure? > > I'm also intrigued by ntpd time reset preceding most DMA errors. > > > > This looks like RAM beginning to fail (note there's a "corrected ECC > error" message intermixed with a "correctable DMA error" one), though > it also could be just a problem with the connection and reseating the > modules might help. Marius, many thanks anton -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423