From owner-freebsd-stable@FreeBSD.ORG Thu Sep 30 17:25:20 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3435F106566B for ; Thu, 30 Sep 2010 17:25:20 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 025068FC0C for ; Thu, 30 Sep 2010 17:25:20 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 97D9846B2E; Thu, 30 Sep 2010 13:25:19 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 671E48A04E; Thu, 30 Sep 2010 13:25:16 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Thu, 30 Sep 2010 13:25:15 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <201009300940.43136.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009301325.15113.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 30 Sep 2010 13:25:18 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Adam Vande More Subject: Re: MCA messages in dmesg X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2010 17:25:20 -0000 On Thursday, September 30, 2010 12:33:24 pm Adam Vande More wrote: > On Thu, Sep 30, 2010 at 8:40 AM, John Baldwin wrote: > > > On Thursday, September 30, 2010 2:49:24 am Adam Vande More wrote: > > > For awhile now, my home server has been acting up. Actually it had a bad > > > set of RAM long ago, replaced and it and worked fine. It's been weird > > again > > > now, and I've found this in dmesg: > > > > > > MCA: Bank 0, Status 0xf200000000000800 > > > MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000 > > > MCA: Vendor "GenuineIntel", ID 0x6fb, APIC ID 2 > > > MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory > > > MCA: Bank 0, Status 0xf200000000000800 > > > MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000 > > > MCA: Vendor "GenuineIntel", ID 0x6fb, APIC ID 3 > > > MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory > > > > Are you getting a panic when this happens? > > > > It's symptoms vary, but yes I think so. The box is headless, so I depend on > logs after boot to see what happens. Sometimes the box panics and powers > off with no warning, and other times it just seems to hit a stall state > where everything become unresponsive and I have to manually power off. Ok, it is a memory error of some sort, but mcelog claims it is a transaction timeout rather than an ECC error, per se: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 BANK 0 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE STATUS f200000000000800 MCGSTATUS 0 MCGCAP 806 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 3 BANK 0 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE STATUS f200000000000800 MCGSTATUS 0 MCGCAP 806 APICID 3 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 I've no idea what specific hardware is busted (memory or motherboard or CPU), but I suspect something is likely broken. -- John Baldwin