From owner-freebsd-current@freebsd.org  Mon Jan  4 15:10:45 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D03DA61030
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Mon,  4 Jan 2016 15:10:45 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7E1841FA2
 for <freebsd-current@freebsd.org>; Mon,  4 Jan 2016 15:10:45 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 73AF2B91E;
 Mon,  4 Jan 2016 10:10:44 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-current@freebsd.org
Cc: Steven Hartland <killing@multiplay.co.uk>
Subject: Re: FreeBsd MCA Panic Crash !!
Date: Mon, 04 Jan 2016 07:10:18 -0800
Message-ID: <7090189.HS4ZXl3oYZ@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; )
In-Reply-To: <568A7F0F.6060307@multiplay.co.uk>
References: <1451903649383-6064691.post@n5.nabble.com>
 <568A7F0F.6060307@multiplay.co.uk>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 04 Jan 2016 10:10:44 -0500 (EST)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jan 2016 15:10:45 -0000

On Monday, January 04, 2016 02:17:51 PM Steven Hartland wrote:
> Bank 5 seems to be common to all the crashes, which may suggest you have 
> some dodgy ram or possibly the driving CPU's memory controller.

No, this has nothing to do with that.  Bank 5 means that it is bank 5 of the
Machine check registers in the processor that are triggering the errors
(MC5_*).  Different "banks" of the MC registers handle errors for different
parts of the hardware (and this varies by CPU).  For example, on Nehalem
CPUs, the memory controller logs errors (e.g. ECC errors) in bank 8, but
that has no correlation to the "bank" of DIMMs that the error occurred in.
Later Intel CPUs can log the same errors in register banks 8 through 12
(IIRC).  Depending on the CPU model, you can determine more info about the
error using the CPU manuals (for Intel the SDM).

> As the error says this is a Hardware issue.

Well, mcelog has this hardcoded and prints this for every MCA just as a
matter of course.  It isn't selective but assumes every machine check is
a hardware error (which they are, though some are warnings for corrected
events that you can ignore as the hardware hasn't degraded enough to
warrant replacement.  However, corrected events don't generate panics,
just messages in the logs, and only a subset of corrected events include
the "yellow / green" indicators for which you can ignore "green" events.
Even corrected ECC errors I would ignore if you get a few events with
a count of 1 that don't recur).

-- 
John Baldwin