From owner-freebsd-hackers@freebsd.org Tue Sep 15 21:52:33 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EB4A09CDD33 for ; Tue, 15 Sep 2015 21:52:33 +0000 (UTC) (envelope-from jim@netgate.com) Received: from mail-ob0-x230.google.com (mail-ob0-x230.google.com [IPv6:2607:f8b0:4003:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B5DEA1128 for ; Tue, 15 Sep 2015 21:52:33 +0000 (UTC) (envelope-from jim@netgate.com) Received: by obbda8 with SMTP id da8so146883705obb.1 for ; Tue, 15 Sep 2015 14:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netgate.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=fZAuLl3hvf6NzSKX5fLyQLm3ubF3UgxfmExQOcLNhag=; b=eADSrglSTHD8iBpQaa6ZI4LOW6IRIiwT5yvq+OL4JjqTBBzdUGLA0fFLBdn3LIMNh4 ehYHYywQJV0ZqexBzKRHpFNDzEQu1d/l+UdORQLRlS6kHR2vGdXgBx/3Os+nr7o8HJx5 LSTIvi9zzUcbg9zlyOMwLcHprSPzJ6YL9O1jU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=fZAuLl3hvf6NzSKX5fLyQLm3ubF3UgxfmExQOcLNhag=; b=G98cFVchtI+GqKfoadh4PF5BKnt/04JuMFZKcUyzDWRsneRExoAUPYTuivyu/j0Qbf EYaGsQLZEgAljZfkklO7Lprx0/Clno6neMQfL6UfNBqx7nFaWMSzQ0gEZ/FOHKlqLDeI 2A1JIFSID1ZWDJNS62+ydcthV02cHDaMt5nTFCoUbbUSa1FEiOrpRrVKfmxXzVAGzQy1 MJ/vpT1AETNbEmwzlrzSRd5nUc5QAvi80yf5AENL/3Z5jpz0LhY+96fWVyl3NtAykP7y R31smkWJuSgAjTeghMHcDTzxE59r16DMuoApi1cya+3/VCr52t/xR7ceg/L3GAr5+A7y MF6A== X-Gm-Message-State: ALoCoQmkaXuv81Mza9yPZIZVrwP4iRNMG8MYYCMhGsktMrihCrQ1eNVQhG1xDxZAS0iywVAwqJtW X-Received: by 10.182.120.100 with SMTP id lb4mr20360344obb.71.1442353952882; Tue, 15 Sep 2015 14:52:32 -0700 (PDT) Received: from ?IPv6:2610:160:11:33:343f:2e55:fcb2:6efb? ([2610:160:11:33:343f:2e55:fcb2:6efb]) by smtp.gmail.com with ESMTPSA id r63sm9637522oia.16.2015.09.15.14.52.31 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 15 Sep 2015 14:52:32 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Jim Thompson In-Reply-To: Date: Tue, 15 Sep 2015 16:52:30 -0500 Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> References: To: Dieter BSD X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 21:52:34 -0000 ECC is implemented by a =E2=80=98hashing=E2=80=99 algorithm that works = on eight (8) bytes (64 bits) at a time, and places the result into an = 8-bit ECC =E2=80=98word=E2=80=99. Errors are corrected "on-the-fly," corrected data is almost never placed = back in memory. If the same corrupt data is read again, the correction = process is repeated. Replacing the data in memory would require = processing overhead that could accumulate and significantly diminish = system performance. If the error occurred because of random events and = isn't a defect in the memory, the memory address will be cleaned of the = error when the data is overwritten with other data. In terms of expense, at a minimum, where you had 8 bytes to make up a = memory system, you will now have 9 (to hold the extra 8 bits). This = means your memory, without the extra complexity of the controller, is = 12.5% more expensive. This isn=E2=80=99t a huge impact at 8GB, = (you=E2=80=99ll need another 1GB of RAM), but at 1024GB you=E2=80=99ll = need another 128GB, and that much ram still costs enough that your = wallet won=E2=80=99t be happy. =20 The memory controller has to be able to run the ECC algorithm on every = read, *and* supply the corrected data as needed, within the cycle time = of the read. If you involve software in this path, the performance your = machine will be glacial. Yes, the firmware has to program the memory controller. =E2=80=9CProgram= a few registers=E2=80=9D is all you need, only the MRC setup on Intel = and AMD is both complex and proprietary. Good luck getting the details for this. This is =E2=80=9CIntel Red Book=E2=80=9D territory, = and you=E2=80=99ll need to be an employee with a need to know. The MRC = setup code is a binary blob for otherwise open source boot firmware such = as Coreboot. Others have answered (in the positive) about the OS reporting ECC errors = on FreeBSD. Jim > On Sep 15, 2015, at 3:53 PM, Dieter BSD wrote: >=20 > Many of AMD's CPU/APU parts support ECC memory. Not just the top of = the > line parts, but also many of the less expensive, less power hungry = parts. > However, many (most?) of the boards for these chips do not support = ECC, > or at least do not admit to it. They specify "non-ECC memory". >=20 > Obviously there have to be connections between the memory controller = and > the memory for the extra bits. Aside from a little extra time for the > board designer to add a few traces to the wire list, this would not > raise the cost of the board. Despite this I have read that some = boards > lack the necessary traces. >=20 > Does the firmware have to do anything to support ECC? Program a few > registers in the memory controller perhaps? A few boards have FLOSS > firmware available, so this code could be added, but most boards do = not > have firmware sources available. >=20 > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? > I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find > anything that looked relevant. Also did not find any code that > reported ECC errors, other than one device. Perhaps I missed it? >=20 > I've been running machines with ECC for 15-20 years and have never = seen > a report of an ECC error from either NetBSD or FreeBSD. I have seen > reports of ECC errors from Digital Unix. And remember getting panics > due to parity errors on machines before ECC. So I'm thinking that > the BSDs must ignore hardware reports of single bit ECC errors. :-( > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org"