From owner-freebsd-hardware@freebsd.org Wed Sep 16 07:52:07 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C90A19C270F; Wed, 16 Sep 2015 07:52:07 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 748D71E14; Wed, 16 Sep 2015 07:52:07 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t8G7pw6u085964; Wed, 16 Sep 2015 08:51:58 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: <20150916035904.GE67105@kib.kiev.ua> Date: Wed, 16 Sep 2015 08:51:53 +0100 Cc: Andriy Gapon , freebsd-hackers@freebsd.org, Dieter BSD , Konstantin Belousov Content-Transfer-Encoding: quoted-printable Message-Id: <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> To: freebsd-hardware@freebsd.org X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 07:52:07 -0000 Hi, Arriving late to this thread, a few observations: - Obviously the more RAM you have, the more errors you are going to see. = In other words, ECC makes increasing sense as RAM sizes get larger. All = server-class hardware should have it. - DRAM has to be refreshed. In sensible designs, ECC scrub is integrated = with refresh to minimise overhead. It doesn=E2=80=99t have to be very = frequent, maybe every 24 hours. - On server-class hardware, the platform management (BMC or whatever) = should be picking up, logging, and possibly alarming on ECC errors = regardless of the OS. - You might think that as memory density increases (ie bit cell size = shrinks), error rates would increase. Apparently this wasn=E2=80=99t so = up to 2009 at least, see: http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf which reports on a study of these issues across Google=E2=80=99s estate = at the time. I don=E2=80=99t know of any more recent similar work. -- Bob Bishop rb@gid.co.uk