Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Sep 2015 08:51:53 +0100
From:      Bob Bishop <rb@gid.co.uk>
To:        freebsd-hardware@freebsd.org
Cc:        Andriy Gapon <avg@freebsd.org>, freebsd-hackers@freebsd.org, Dieter BSD <dieterbsd@gmail.com>, Konstantin Belousov <kostikbel@gmail.com>
Subject:   Re: ECC support
Message-ID:  <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk>
In-Reply-To: <20150916035904.GE67105@kib.kiev.ua>
References:  <CAA3ZYrBXZn1WpHWYGJYWJDPsk7iDahCas8RhnHC4w%2Babf4w4hA@mail.gmail.com> <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

Arriving late to this thread, a few observations:

- Obviously the more RAM you have, the more errors you are going to see. =
In other words, ECC makes increasing sense as RAM sizes get larger. All =
server-class hardware should have it.

- DRAM has to be refreshed. In sensible designs, ECC scrub is integrated =
with refresh to minimise overhead. It doesn=E2=80=99t have to be very =
frequent, maybe every 24 hours.

- On server-class hardware, the platform management (BMC or whatever) =
should be picking up, logging, and possibly alarming on ECC errors =
regardless of the OS.

- You might think that as memory density increases (ie bit cell size =
shrinks), error rates would increase. Apparently this wasn=E2=80=99t so =
up to 2009 at least, see:

 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

which reports on a study of these issues across Google=E2=80=99s estate =
at the time. I don=E2=80=99t know of any more recent similar work.

--
Bob Bishop
rb@gid.co.uk







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93871ADA-EDA3-481C-9959-1D371AB44479>