Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Sep 2015 15:10:25 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        dieterbsd@gmail.com
Cc:        freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org
Subject:   Re: ECC support
Message-ID:  <201509152210.t8FMAPnv022327@gw.catspoiler.org>
In-Reply-To: <CAA3ZYrBXZn1WpHWYGJYWJDPsk7iDahCas8RhnHC4w%2Babf4w4hA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 15 Sep, Dieter BSD wrote:
> Many of AMD's CPU/APU parts support ECC memory.  Not just the top of the
> line parts, but also many of the less expensive, less power hungry parts.
> However, many (most?) of the boards for these chips do not support ECC,
> or at least do not admit to it.  They specify "non-ECC memory".
> 
> Obviously there have to be connections between the memory controller and
> the memory for the extra bits.  Aside from a little extra time for the
> board designer to add a few traces to the wire list, this would not
> raise the cost of the board.  Despite this I have read that some boards
> lack the necessary traces.

I don't think the current APU parts support ECC.  My guess is that the
current APU sockets don't have the connections to support it.

I'm typing on a FreeBSD with an AMD CPU with ECC RAM.  I won't put
together a machine without ECC.  My experience is that many ASUS
motherboard support ECC RAM and usually document that fact.  Also many
Gigabyte mother boards also support ECC RAM, but don't document it. Even
if you look at the BIOS screenshots in the manual, you won't see the
knobs to configure ECC, I suspect because those knobs are not displayed
unless ECC RAM is installed.

> Does the firmware have to do anything to support ECC?  Program a few
> registers in the memory controller perhaps?  A few boards have FLOSS
> firmware available, so this code could be added, but most boards do not
> have firmware sources available.
> 
> Assuming that a board does have the necessary connections but
> the firmware does not have ECC support, is there some reason that
> ECC support could not be added to the OS instead of the firmware?
> I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find
> anything that looked relevant.  Also did not find any code that
> reported ECC errors, other than one device.  Perhaps I missed it?

It's in there ...

> I've been running machines with ECC for 15-20 years and have never seen
> a report of an ECC error from either NetBSD or FreeBSD.  I have seen
> reports of ECC errors from Digital Unix.  And remember getting panics
> due to parity errors on machines before ECC.  So I'm thinking that
> the BSDs must ignore hardware reports of single bit ECC errors.  :-(

>From daily mail to root about a month ago:

+MCA: Bank 4, Status 0x944a400096080a13
+MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
+MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0
+MCA: CPU 0 COR BUSLG Responder RD Memory
+MCA: Address 0x213e98b10
+MCA: Bank 4, Status 0xd44a400096080a13
+MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
+MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0
+MCA: CPU 0 COR OVER BUSLG Responder RD Memory
+MCA: Address 0x213e98b10




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201509152210.t8FMAPnv022327>