Date: Thu, 30 Sep 2010 13:25:15 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-stable@freebsd.org Cc: Adam Vande More <amvandemore@gmail.com> Subject: Re: MCA messages in dmesg Message-ID: <201009301325.15113.jhb@freebsd.org> In-Reply-To: <AANLkTinyBrF65LbjPfcBdEcHn1PE-=sHWaJhwnHibVvt@mail.gmail.com> References: <AANLkTine8Prmd-TOrHixJijHiR%2BNEMzwSKdcoTUsBJ_B@mail.gmail.com> <201009300940.43136.jhb@freebsd.org> <AANLkTinyBrF65LbjPfcBdEcHn1PE-=sHWaJhwnHibVvt@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, September 30, 2010 12:33:24 pm Adam Vande More wrote: > On Thu, Sep 30, 2010 at 8:40 AM, John Baldwin <jhb@freebsd.org> wrote: > > > On Thursday, September 30, 2010 2:49:24 am Adam Vande More wrote: > > > For awhile now, my home server has been acting up. Actually it had a bad > > > set of RAM long ago, replaced and it and worked fine. It's been weird > > again > > > now, and I've found this in dmesg: > > > > > > MCA: Bank 0, Status 0xf200000000000800 > > > MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000 > > > MCA: Vendor "GenuineIntel", ID 0x6fb, APIC ID 2 > > > MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory > > > MCA: Bank 0, Status 0xf200000000000800 > > > MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000 > > > MCA: Vendor "GenuineIntel", ID 0x6fb, APIC ID 3 > > > MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory > > > > Are you getting a panic when this happens? > > > > It's symptoms vary, but yes I think so. The box is headless, so I depend on > logs after boot to see what happens. Sometimes the box panics and powers > off with no warning, and other times it just seems to hit a stall state > where everything become unresponsive and I have to manually power off. Ok, it is a memory error of some sort, but mcelog claims it is a transaction timeout rather than an ECC error, per se: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 BANK 0 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE STATUS f200000000000800 MCGSTATUS 0 MCGCAP 806 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 3 BANK 0 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE STATUS f200000000000800 MCGSTATUS 0 MCGCAP 806 APICID 3 SOCKETID 0 CPUID Vendor Intel Family 6 Model 15 I've no idea what specific hardware is busted (memory or motherboard or CPU), but I suspect something is likely broken. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201009301325.15113.jhb>