From owner-freebsd-stable@FreeBSD.ORG Tue Aug 24 06:28:25 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4024F106566C for ; Tue, 24 Aug 2010 06:28:25 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smtp-out0.tiscali.nl (smtp-out0.tiscali.nl [195.241.79.175]) by mx1.freebsd.org (Postfix) with ESMTP id C8AE78FC0C for ; Tue, 24 Aug 2010 06:28:24 +0000 (UTC) Received: from [212.123.145.58] (helo=sjakie.klop.ws) by smtp-out0.tiscali.nl with esmtp (Exim) (envelope-from ) id 1OnmmG-0006rI-6Z for freebsd-stable@freebsd.org; Tue, 24 Aug 2010 08:14:32 +0200 Received: from 212-123-145-58.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id C60606FC4 for ; Tue, 24 Aug 2010 08:14:29 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-stable@freebsd.org References: <4C71CC62.6060803@langille.org> <4C71D756.5080205@langille.org> <4C7218D6.6090408@icyb.net.ua> <201008230820.35260.jhb@freebsd.org> Date: Tue, 24 Aug 2010 08:14:29 +0200 MIME-Version: 1.0 From: "Ronald Klop" Message-ID: In-Reply-To: <201008230820.35260.jhb@freebsd.org> User-Agent: Opera Mail/10.61 (FreeBSD) Content-Transfer-Encoding: quoted-printable Subject: Re: kernel MCA messages X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Aug 2010 06:28:25 -0000 On Mon, 23 Aug 2010 14:20:35 +0200, John Baldwin wrote: > On Monday, August 23, 2010 2:44:38 am Andriy Gapon wrote: >> on 23/08/2010 05:05 Dan Langille said the following: >> > On 8/22/2010 9:18 PM, Dan Langille wrote: >> >> What does this mean? >> >> >> >> kernel: MCA: Bank 4, Status 0x940c4001fe080813 >> >> kernel: MCA: Global Cap 0x0000000000000105, Status 0x00000000000000= 00 >> >> kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0 >> >> kernel: MCA: CPU 0 COR BUSLG Source RD Memory >> >> kernel: MCA: Address 0x7ff6b0 >> >> >> >> FreeBSD 7.3-STABLE #1: Sun Aug 22 23:16:43 >> > >> > And another one: >> > >> > kernel: MCA: Bank 4, Status 0x9459c0014a080813 >> > kernel: MCA: Global Cap 0x0000000000000105, Status 0x000000000000000= 0 >> > kernel: MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0 >> > kernel: MCA: CPU 0 COR BUSLG Source RD Memory >> > kernel: MCA: Address 0x7ff670 >> >> I believe that you get correctable RAM ECC errors, but not entirely =20 >> sure. >> There is mcelog utility that decodes such messages into human-friendly= =20 >> descriptions. >> The utility is available on Linux-based systems. >> John Baldwin has a port of it to FreeBSD, but it seems to be WIP and i= s =20 >> private >> so far. Wait and watch John posting decoded text in this thread :-) > > It is not private, it is in //depot/projects/mcelog/... in p4. It is =20 > not a > complete port yet though (doesn't support the daemon and client modes f= or > example). > > Details for these errors: > > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 0 4 northbridge > ADDR 7ff6b0 > Northbridge RAM Chipkill ECC error > Chipkill ECC syndrome =3D fe18 > bit32 =3D err cpu0 > bit46 =3D corrected ecc error > bus error 'local node origin, request didn't time out > generic read mem transaction > memory access, level generic' > STATUS 940c4001fe080813 MCGSTATUS 0 > MCGCAP 105 APICID 0 SOCKETID 0 > CPUID Vendor AMD Family 15 Model 5 > HARDWARE ERROR. This is *NOT* a software problem! > Please contact your hardware vendor > CPU 0 4 northbridge > ADDR 7ff670 > Northbridge RAM Chipkill ECC error > Chipkill ECC syndrome =3D 4ab3 > bit32 =3D err cpu0 > bit46 =3D corrected ecc error > bus error 'local node origin, request didn't time out > generic read mem transaction > memory access, level generic' > STATUS 9459c0014a080813 MCGSTATUS 0 > MCGCAP 105 APICID 0 SOCKETID 0 > CPUID Vendor AMD Family 15 Model 5 > > As Andriy guessed, I believe both of these are corrected ECC errors. Y= ou > can likely ignore them as a low rate of corrected ECC errors is not > unexpected. > Hi, A little off topic, but what is 'a low rate of corrected ECC errors'? At = =20 work one machine has them like ones per day, but runs ok. Is ones per day= =20 much? Ronald.