From owner-freebsd-hardware@freebsd.org Wed Feb 24 20:51:03 2016 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16477AB2CB6 for ; Wed, 24 Feb 2016 20:51:03 +0000 (UTC) (envelope-from ultima1252@gmail.com) Received: from mail-vk0-x233.google.com (mail-vk0-x233.google.com [IPv6:2607:f8b0:400c:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D19D77BE; Wed, 24 Feb 2016 20:51:02 +0000 (UTC) (envelope-from ultima1252@gmail.com) Received: by mail-vk0-x233.google.com with SMTP id k196so29350252vka.0; Wed, 24 Feb 2016 12:51:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ck3zyUnWaErn1KIXeWfmxN5AHXPHr6adkCskx83l5o4=; b=J/Cf5XpKbbbtpWVzzC+bMzVkI82y7+ijpcL4lwmQ2N5CbSRt8ivePKJ5ATTE1MRTZS j6L8Ea6g3CLX/Mn0TNvPKfKqXZiQyhQ1NYz3ybXd1RWD9L12yxpevX2yy2oCVW53n5au kFjXDjkuLCyKWKD6wHLcM7Vs7/0uwrG+oyU1sgvMR56NGhSSzwDtJmwwARIwAUr1o6UB LTVAQtrq2EjNsnzqLAjLzRtVwieAACSEeh4O+DSCEO1I2QjLGVzeufCLkJ6fH8LNWtUI uvc6HRVD+Tm639whs+8QlkmH6TRVkYutV6P4BoUA8jWw+f8Nhhmi/9EFL5UcMYY7dhhH Bihw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ck3zyUnWaErn1KIXeWfmxN5AHXPHr6adkCskx83l5o4=; b=RAQkirSKG8vZtSWmFfwogUxh3cHGqJRryB9ls1vNiEikJKn4aMDpIJgumrLjiOA4SJ PKyUQRG0+LIEDToM1LtndfYJAeeAwuvDgt/r26KCsrX+Sfi3/QHPwdeNgaeGVZBI8rvT 57G+pxIp69gWnGRzqgaq7tV3w5gV26QBaNU+AlG2da7RZS0DOLkqrsmSVsAyBeX63M5D bEzTbxNjnfUDcolf5HD3metDf9lPdsqEpfsiLzPg1BXBI4zMKHl+UCGfYatXNwLD/Z+U h78ks7R4ZtjJPiv1lZcltktL05f0K4b8VXZV016N75owEJS8tNdEGwJdKbYlf4Dea0vK WxPg== X-Gm-Message-State: AG10YOTZ9wnm8LNCqOJpcCPvPQiGtz2J74xdBLzam4q4tSVYizQOkcTmzfoZ3R6eHLYRUN0qZti1Hh+J2/G3qA== MIME-Version: 1.0 X-Received: by 10.31.6.209 with SMTP id 200mr35079125vkg.152.1456347061463; Wed, 24 Feb 2016 12:51:01 -0800 (PST) Received: by 10.31.194.194 with HTTP; Wed, 24 Feb 2016 12:51:01 -0800 (PST) In-Reply-To: <1599604.5jmidy9vDx@ralph.baldwin.cx> References: <1599604.5jmidy9vDx@ralph.baldwin.cx> Date: Wed, 24 Feb 2016 15:51:01 -0500 Message-ID: Subject: Re: MCA error, possible causes? From: Ultima To: John Baldwin Cc: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2016 20:51:03 -0000 Hi John, Thanks for the explanation. I ran some tests and ended up being a power savings mode (aka unstable mode?). Disabling this feature put an end to the freezes. I came to this conclusion by stress testing the box for 3 days, and there were no issues. Nothing, then I stopped the stress test and about 15-30 min later it froze. It seemed to only occur during periods of low load. I have not received any of these errors after turning off this power savings mode. On Wed, Feb 24, 2016 at 3:14 PM, John Baldwin wrote: > On Friday, February 12, 2016 08:11:37 PM Ultima wrote: > > Recently installed some cpus and received two MCA errors. Using mcelog, > I > > found that the version in ports is about 5 years out of dated and didn't > > support my cpu. Decided to update it to the newest version (Will post on > > bugzilla shortly) to pull some more info. Going to post orig and decoded > > mcelog. > > > > > > Raw: > > MCA: Bank 20, Status 0xc800084000310e0f > > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 > > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 0 > > MCA: CPU 0 COR (33) OVER BUSLG ??? ERR Other > > MCA: Misc 0x1df87b000d9eff > > MCA: Bank 5, Status 0xc800008000310e0f > > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 > > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 42 > > MCA: CPU 34 COR (2) OVER BUSLG ??? ERR Other > > MCA: Misc 0xdf87b008d9eff > > > > mcelog v131: > > Hardware event. This is not a software error. > > CPU 0 BANK 20 > > MISC 1df87b000d9eff > > MCG status: > > QPI: Rx detected CRC error - successful LLR wihout Phy re-init > > STATUS c800084000310e0f MCGSTATUS 0 > > MCGCAP 7000c16 APICID 0 SOCKETID 0 > > CPUID Vendor Intel Family 6 Model 63 > > Hardware event. This is not a software error. > > CPU 34 BANK 5 > > MISC df87b008d9eff > > MCG status: > > QPI: Rx detected CRC error - successful LLR wihout Phy re-init > > STATUS c800008000310e0f MCGSTATUS 0 > > MCGCAP 7000c16 APICID 2a SOCKETID 0 > > CPUID Vendor Intel Family 6 Model 63 > > > > After receiving this error, the system was in a frozen state. Any ideas > > what may cause this? > > Well, hardware causes it. QPI is the interconnect bus between your > CPUs and RAM. "Rx detected CRC error" implies that a CPU detected a > corrupted message on that bus, but when it requested a resend the > resent message was ok. Normally corrected errors shouldn't hang your > machine, but perhaps your machine had another hardware error after this > that broke it too badly to report and/or log the subsequent error. > > -- > John Baldwin >