From owner-freebsd-hardware@FreeBSD.ORG Fri Jan 31 17:26:06 2014 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5BA6CBE2; Fri, 31 Jan 2014 17:26:06 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 325401615; Fri, 31 Jan 2014 17:26:06 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1E4DCB945; Fri, 31 Jan 2014 12:26:05 -0500 (EST) From: John Baldwin To: freebsd-stable@freebsd.org Subject: Re: Need Help With MCA Code Date: Fri, 31 Jan 2014 12:22:12 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <52E73717.3000503@tundraware.com> <52E99381.5050803@tundraware.com> In-Reply-To: <52E99381.5050803@tundraware.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201401311222.12136.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 31 Jan 2014 12:26:05 -0500 (EST) Cc: Tim Daneliuk , FreeBSD Hardware Mailing List X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jan 2014 17:26:06 -0000 On Wednesday, January 29, 2014 6:49:21 pm Tim Daneliuk wrote: > Resending in hopes that people on one of the other lists will have some insight here: > > On 01/27/2014 10:50 PM, Tim Daneliuk wrote: > > I am running 9.2 stable i386 r261207. As noted earlier: > > > >> I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with > >> an Intel i3-4130). I am not overclocking ... but I continue to see this sort of thing: > > > >> MCA: CPU 0 COR (1) internal parity error > > > > Dmesg shows: > > > >> MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0 > >> MCA: CPU 0 COR (1) internal parity error > >> MCA: Bank 0, Status 0x90000040000f0005 > >> MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_ > > > > I've swapped CPUs (i5). I've fiddled with an endless supply of > > mobo settings. I've switched power supplies. I've moved mem > > sticks around .... No joy. > > > > So, I dug through the sources and found this: > > > > > > > > mca_log(const struct mca_record *rec) > > { > > uint16_t mca_error; > > > > printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank, > > (long long)rec->mr_status); > > printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n", > > (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status); > > printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor, > > rec->mr_cpu_id, rec->mr_apic_id); > > printf("MCA: CPU %d ", rec->mr_cpu); > > if (rec->mr_status & MC_STATUS_UC) > > printf("UNCOR "); > > else { > > printf("COR "); > > if (rec->mr_mcg_cap & MCG_CAP_CMCI_P) > > printf("(%lld) ", ((long long)rec->mr_status & > > MC_STATUS_COR_COUNT) >> 38); > > } > > > > > > It looks like the trailing else clause is kicking out the error but I am > > unclear what the error means, beyond the fact that it appears to be a parity > > error somewhere within the CPU's internal memory (cache?). Is this error > > getting corrected? Is this benign, Should I get a different mobo? > > > > Um .... Haaaaalp :) > > > I have now tried different motherboards, CPUs, memory, and power supplies and > this error is still showing up now and then. > > This points strongly to either FreeBSD bogus reporting, or these errors being > benign. It's hard to believe that the exact same error might occur with > completely different hardware ... unless it's being caused by the case. Are they all the same model CPU? Since it is a corrected error you can probably ignore it, but it is not bogus reporting. FreeBSD only reports these errors because they show up in registers on your CPU. -- John Baldwin