From owner-freebsd-stable@FreeBSD.ORG Mon Aug 23 12:40:32 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A66910656A5 for ; Mon, 23 Aug 2010 12:40:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2970B8FC08 for ; Mon, 23 Aug 2010 12:40:32 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C634D46B17; Mon, 23 Aug 2010 08:40:31 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id CCCAB8A04E; Mon, 23 Aug 2010 08:40:30 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Mon, 23 Aug 2010 08:20:35 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <4C71CC62.6060803@langille.org> <4C71D756.5080205@langille.org> <4C7218D6.6090408@icyb.net.ua> In-Reply-To: <4C7218D6.6090408@icyb.net.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201008230820.35260.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 23 Aug 2010 08:40:30 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Andriy Gapon , Dan Langille Subject: Re: kernel MCA messages X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2010 12:40:32 -0000 On Monday, August 23, 2010 2:44:38 am Andriy Gapon wrote: > on 23/08/2010 05:05 Dan Langille said the following: > > On 8/22/2010 9:18 PM, Dan Langille wrote: > >> What does this mean? > >> > >> kernel: MCA: Bank 4, Status 0x940c4001fe080813 > >> kernel: MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000 > >> kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0 > >> kernel: MCA: CPU 0 COR BUSLG Source RD Memory > >> kernel: MCA: Address 0x7ff6b0 > >> > >> FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 > > > > And another one: > > > > kernel: MCA: Bank 4, Status 0x9459c0014a080813 > > kernel: MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000 > > kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0 > > kernel: MCA: CPU 0 COR BUSLG Source RD Memory > > kernel: MCA: Address 0x7ff670 > > I believe that you get correctable RAM ECC errors, but not entirely sure. > There is mcelog utility that decodes such messages into human-friendly descriptions. > The utility is available on Linux-based systems. > John Baldwin has a port of it to FreeBSD, but it seems to be WIP and is private > so far. Wait and watch John posting decoded text in this thread :-) It is not private, it is in //depot/projects/mcelog/... in p4. It is not a complete port yet though (doesn't support the daemon and client modes for example). Details for these errors: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge ADDR 7ff6b0 Northbridge RAM Chipkill ECC error Chipkill ECC syndrome = fe18 bit32 = err cpu0 bit46 = corrected ecc error bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS 940c4001fe080813 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 5 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge ADDR 7ff670 Northbridge RAM Chipkill ECC error Chipkill ECC syndrome = 4ab3 bit32 = err cpu0 bit46 = corrected ecc error bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS 9459c0014a080813 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 5 As Andriy guessed, I believe both of these are corrected ECC errors. You can likely ignore them as a low rate of corrected ECC errors is not unexpected. -- John Baldwin