From owner-freebsd-hardware@FreeBSD.ORG Fri Jan 31 17:49:06 2014 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1C26AD6E; Fri, 31 Jan 2014 17:49:06 +0000 (UTC) Received: from ozzie.tundraware.com (ozzie.tundraware.com [75.145.138.73]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E273F1871; Fri, 31 Jan 2014 17:49:05 +0000 (UTC) Received: from [192.168.0.2] (viper.tundraware.com [192.168.0.2]) (authenticated bits=0) by ozzie.tundraware.com (8.14.7/8.14.7) with ESMTP id s0VHmg1H024802 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 31 Jan 2014 11:48:42 -0600 (CST) (envelope-from tundra@tundraware.com) Message-ID: <52EBE1FA.2040603@tundraware.com> Date: Fri, 31 Jan 2014 11:48:42 -0600 From: Tim Daneliuk User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: John Baldwin , freebsd-stable@freebsd.org Subject: Re: Need Help With MCA Code References: <52E73717.3000503@tundraware.com> <52E99381.5050803@tundraware.com> <201401311222.12136.jhb@freebsd.org> In-Reply-To: <201401311222.12136.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (ozzie.tundraware.com [75.145.138.73]); Fri, 31 Jan 2014 11:48:42 -0600 (CST) X-TundraWare-MailScanner-Information: Please contact the ISP for more information X-TundraWare-MailScanner-ID: s0VHmg1H024802 X-TundraWare-MailScanner: Found to be clean X-TundraWare-MailScanner-From: tundra@tundraware.com X-Spam-Status: No Cc: FreeBSD Hardware Mailing List X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jan 2014 17:49:06 -0000 On 01/31/2014 11:22 AM, John Baldwin wrote: > On Wednesday, January 29, 2014 6:49:21 pm Tim Daneliuk wrote: >> Resending in hopes that people on one of the other lists will have some insight here: >> >> On 01/27/2014 10:50 PM, Tim Daneliuk wrote: >>> I am running 9.2 stable i386 r261207. As noted earlier: >>> >>>> I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with >>>> an Intel i3-4130). I am not overclocking ... but I continue to see this sort of thing: >>> >>>> MCA: CPU 0 COR (1) internal parity error >>> >>> Dmesg shows: >>> >>>> MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0 >>>> MCA: CPU 0 COR (1) internal parity error >>>> MCA: Bank 0, Status 0x90000040000f0005 >>>> MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_ >>> >>> I've swapped CPUs (i5). I've fiddled with an endless supply of >>> mobo settings. I've switched power supplies. I've moved mem >>> sticks around .... No joy. >>> >>> So, I dug through the sources and found this: >>> >>> >>> >>> mca_log(const struct mca_record *rec) >>> { >>> uint16_t mca_error; >>> >>> printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank, >>> (long long)rec->mr_status); >>> printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n", >>> (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status); >>> printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor, >>> rec->mr_cpu_id, rec->mr_apic_id); >>> printf("MCA: CPU %d ", rec->mr_cpu); >>> if (rec->mr_status & MC_STATUS_UC) >>> printf("UNCOR "); >>> else { >>> printf("COR "); >>> if (rec->mr_mcg_cap & MCG_CAP_CMCI_P) >>> printf("(%lld) ", ((long long)rec->mr_status & >>> MC_STATUS_COR_COUNT) >> 38); >>> } >>> >>> >>> It looks like the trailing else clause is kicking out the error but I am >>> unclear what the error means, beyond the fact that it appears to be a parity >>> error somewhere within the CPU's internal memory (cache?). Is this error >>> getting corrected? Is this benign, Should I get a different mobo? >>> >>> Um .... Haaaaalp :) >> >> >> I have now tried different motherboards, CPUs, memory, and power supplies and >> this error is still showing up now and then. >> >> This points strongly to either FreeBSD bogus reporting, or these errors being >> benign. It's hard to believe that the exact same error might occur with >> completely different hardware ... unless it's being caused by the case. > > Are they all the same model CPU? Since it is a corrected error you can > probably ignore it, but it is not bogus reporting. FreeBSD only reports > these errors because they show up in registers on your CPU. > It's looking like this is an artifact of running 9.2-STABLE i386 on that hardware. I just installed 10-STABLE x64 and am beating the hardware to death and have yet to see an MCA check. It *is* possible the 9.2 install is boogered up (I went to grad school to learn how to say that), so I am pursuing a full rebuild of the server. While painful, this will also finally move this machine to x64 which is long overdue. -- ---------------------------------------------------------------------------- Tim Daneliuk tundra@tundraware.com PGP Key: http://www.tundraware.com/PGP/