From owner-freebsd-stable@FreeBSD.ORG Wed Dec 22 14:59:12 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7A251065697 for ; Wed, 22 Dec 2010 14:59:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id BB5408FC19 for ; Wed, 22 Dec 2010 14:59:12 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 7046846B6C; Wed, 22 Dec 2010 09:59:12 -0500 (EST) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 828FD8A009; Wed, 22 Dec 2010 09:59:11 -0500 (EST) From: John Baldwin To: freebsd-stable@freebsd.org Date: Wed, 22 Dec 2010 09:57:26 -0500 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20101102; KDE/4.4.5; amd64; ; ) References: <4D11F1F5.7050902@quip.cz> In-Reply-To: <4D11F1F5.7050902@quip.cz> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-2" Content-Transfer-Encoding: 7bit Message-Id: <201012220957.26854.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 22 Dec 2010 09:59:11 -0500 (EST) X-Virus-Scanned: clamav-milter 0.96.3 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.9 required=4.2 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on bigwig.baldwin.cx Cc: Miroslav Lachman <000.fbsd@quip.cz> Subject: Re: MCA messages after upgrade to 8.2-BEAT1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Dec 2010 14:59:13 -0000 On Wednesday, December 22, 2010 7:41:25 am Miroslav Lachman wrote: > Dec 21 12:42:26 kavkaz kernel: MCA: Bank 0, Status 0xd40e400000000833 > Dec 21 12:42:26 kavkaz kernel: MCA: Global Cap 0x0000000000000105, > Status 0x0000000000000000 > Dec 21 12:42:26 kavkaz kernel: MCA: Vendor "AuthenticAMD", ID 0x40f33, > APIC ID 0 > Dec 21 12:42:26 kavkaz kernel: MCA: CPU 0 COR OVER BUSLG Source DRD Memory > Dec 21 12:42:26 kavkaz kernel: MCA: Address 0x236493c0 You are getting corrected ECC errors in your RAM. You see them once an hour because we poll the machine check registers once an hour. If this happens constantly you might have a DIMM that is dying? % ~/mcelog --ascii < foo.txt mcelog: Cannot open /dev/mem for DMI decoding: Permission denied HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 0 data cache ADDR 236493c0 Data cache ECC error (syndrome 1c) bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out data read mem transaction memory access, level generic' STATUS d40e400000000833 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 1 instruction cache ADDR 2a1c9440 Instruction cache ECC error bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out instruction fetch mem transaction memory access, level generic' STATUS d400400000000853 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 2 bus unit L2 cache ECC error Bus or cache array error bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out prefetch mem transaction memory access, level generic' STATUS d000400000000863 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge MISC e00d0fff00000000 ADDR 2cac9678 Northbridge RAM ECC error ECC syndrome = 1c bit33 = err cpu1 bit46 = corrected ecc error bit59 = misc error valid bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS dc0e400200000813 MCGSTATUS 0 MCGCAP 105 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 0 data cache ADDR 23649640 Data cache ECC error (syndrome 1c) bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out data read mem transaction memory access, level generic' STATUS d40e400000000833 MCGSTATUS 0 MCGCAP 105 APICID 1 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 1 instruction cache ADDR 2a1c9440 Instruction cache ECC error bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out instruction fetch mem transaction memory access, level generic' STATUS d400400000000853 MCGSTATUS 0 MCGCAP 105 APICID 1 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 2 bus unit L2 cache ECC error Bus or cache array error bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out prefetch mem transaction memory access, level generic' STATUS d000400000000863 MCGSTATUS 0 MCGCAP 105 APICID 1 SOCKETID 0 CPUID Vendor AMD Family 15 Model 67 -- John Baldwin