From owner-freebsd-hackers@freebsd.org Thu Oct 22 18:57:41 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95046A1CED2; Thu, 22 Oct 2015 18:57:41 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3ED9411E3; Thu, 22 Oct 2015 18:57:41 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] ([194.32.164.24]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t9MInDTL087303; Thu, 22 Oct 2015 19:49:13 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: <1492434.22kxSKhHEJ@ralph.baldwin.cx> Date: Thu, 22 Oct 2015 19:49:13 +0100 Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org, Dieter BSD Content-Transfer-Encoding: quoted-printable Message-Id: <74705089-408A-4FD3-899E-CA677390F855@gid.co.uk> References: <1492434.22kxSKhHEJ@ralph.baldwin.cx> To: John Baldwin X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Oct 2015 18:57:41 -0000 HI, > On 22 Oct 2015, at 19:09, John Baldwin wrote: >=20 > On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: >> Chris: >>> MCA: Bank 1, Status 0x9400000000000151 >>> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 >>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 >>>=20 >>> MCA: Address 0x81cc0e9f0 >>>=20 >>> Kind of freaky. I've never had this error on this board before. >>> On others tho. >>>=20 >>> Try a search for MCA instead. >>=20 >> Is there a decoder ring for those messages? I don't recall seeing >> messages like that, although I wasn't looking for them, and they >> don't leap out at you screaming ERROR! ERROR! Digital Unix had its >> problems, but at least the error messages were fairly clear. >> Something like "single bit memory error at address 0x12345..." >> A simple edit to sys/x86/x86/mca.c >> s/printf("UNCOR ");/printf("Uncorrectable ");/ >> s/printf("COR ");/printf("Correctable ");/ >> would make the messages at least slightly more meaningful to a viewer >> who isn't intimently(sp) familiar with the mca. Which most people = aren't. >=20 > The problem is that there are other fields to decode and you can only = fit so > much in one line. Also, there is not a CPU-independent way to know = the > address of an ECC error. [etc] On server-class hardware, the platform management (BMC or whatever) is = probably decoding this stuff for event logs and can be interrogated via = IPMI (or whatever). -- Bob Bishop rb@gid.co.uk