Date: Tue, 15 Sep 2015 16:52:30 -0500 From: Jim Thompson <jim@netgate.com> To: Dieter BSD <dieterbsd@gmail.com> Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: ECC support Message-ID: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> In-Reply-To: <CAA3ZYrBXZn1WpHWYGJYWJDPsk7iDahCas8RhnHC4w%2Babf4w4hA@mail.gmail.com> References: <CAA3ZYrBXZn1WpHWYGJYWJDPsk7iDahCas8RhnHC4w%2Babf4w4hA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
ECC is implemented by a =E2=80=98hashing=E2=80=99 algorithm that works = on eight (8) bytes (64 bits) at a time, and places the result into an = 8-bit ECC =E2=80=98word=E2=80=99. Errors are corrected "on-the-fly," corrected data is almost never placed = back in memory. If the same corrupt data is read again, the correction = process is repeated. Replacing the data in memory would require = processing overhead that could accumulate and significantly diminish = system performance. If the error occurred because of random events and = isn't a defect in the memory, the memory address will be cleaned of the = error when the data is overwritten with other data. In terms of expense, at a minimum, where you had 8 bytes to make up a = memory system, you will now have 9 (to hold the extra 8 bits). This = means your memory, without the extra complexity of the controller, is = 12.5% more expensive. This isn=E2=80=99t a huge impact at 8GB, = (you=E2=80=99ll need another 1GB of RAM), but at 1024GB you=E2=80=99ll = need another 128GB, and that much ram still costs enough that your = wallet won=E2=80=99t be happy. =20 The memory controller has to be able to run the ECC algorithm on every = read, *and* supply the corrected data as needed, within the cycle time = of the read. If you involve software in this path, the performance your = machine will be glacial. Yes, the firmware has to program the memory controller. =E2=80=9CProgram= a few registers=E2=80=9D is all you need, only the MRC setup on Intel = and AMD is both complex and proprietary. Good luck getting the details for this. This is =E2=80=9CIntel Red Book=E2=80=9D territory, = and you=E2=80=99ll need to be an employee with a need to know. The MRC = setup code is a binary blob for otherwise open source boot firmware such = as Coreboot. Others have answered (in the positive) about the OS reporting ECC errors = on FreeBSD. Jim > On Sep 15, 2015, at 3:53 PM, Dieter BSD <dieterbsd@gmail.com> wrote: >=20 > Many of AMD's CPU/APU parts support ECC memory. Not just the top of = the > line parts, but also many of the less expensive, less power hungry = parts. > However, many (most?) of the boards for these chips do not support = ECC, > or at least do not admit to it. They specify "non-ECC memory". >=20 > Obviously there have to be connections between the memory controller = and > the memory for the extra bits. Aside from a little extra time for the > board designer to add a few traces to the wire list, this would not > raise the cost of the board. Despite this I have read that some = boards > lack the necessary traces. >=20 > Does the firmware have to do anything to support ECC? Program a few > registers in the memory controller perhaps? A few boards have FLOSS > firmware available, so this code could be added, but most boards do = not > have firmware sources available. >=20 > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? > I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find > anything that looked relevant. Also did not find any code that > reported ECC errors, other than one device. Perhaps I missed it? >=20 > I've been running machines with ECC for 15-20 years and have never = seen > a report of an ECC error from either NetBSD or FreeBSD. I have seen > reports of ECC errors from Digital Unix. And remember getting panics > due to parity errors on machines before ECC. So I'm thinking that > the BSDs must ignore hardware reports of single bit ECC errors. :-( > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41EFCF21-D3B0-4EC4-8EAB-417CA33821FC>