From owner-freebsd-hardware@freebsd.org Tue Sep 15 20:53:28 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9E479C35E8; Tue, 15 Sep 2015 20:53:28 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ig0-x231.google.com (mail-ig0-x231.google.com [IPv6:2607:f8b0:4001:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AB15B1B14; Tue, 15 Sep 2015 20:53:28 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: by igxx6 with SMTP id x6so20545630igx.1; Tue, 15 Sep 2015 13:53:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=XqGQWr6b+KnJ3c5D9icW30yWVWo8sFa9ef0IqYkX+JA=; b=ZZfknXKk7T+cfEkqLJ0CLR0GmZ9c3JRTNTh9ebnLUHjJQ+P9YVuQ6tfnpuS8N/HlbP ykxJ6fv+T+CAaex7cYNqI5dNW1vM9VJAI/lIHrWSJrhy4ovBQtUDvpRIASiSayClBHNr 8iW/axY1Rsa+cGLBjh3XuOO5EleroFxdfhzmb0AQiHvZP933sXIrrpMv19SEpmieXoIc v7+15cvLTJ6eqWeF5lTeZdOm5Lpo6gQ3mgJ9vzgqyrsTPxxtgnsMgKElddUnuUp3EVg1 J+CMQnWTO7Gb71QSWYCnMPuPeTFieMungr+Kyu7rfoX9JjV8TFobsQfWnVbtBU4SU6pn BerA== MIME-Version: 1.0 X-Received: by 10.50.98.39 with SMTP id ef7mr9947461igb.2.1442350408172; Tue, 15 Sep 2015 13:53:28 -0700 (PDT) Received: by 10.64.2.132 with HTTP; Tue, 15 Sep 2015 13:53:27 -0700 (PDT) Date: Tue, 15 Sep 2015 13:53:27 -0700 Message-ID: Subject: ECC support From: Dieter BSD To: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 20:53:29 -0000 Many of AMD's CPU/APU parts support ECC memory. Not just the top of the line parts, but also many of the less expensive, less power hungry parts. However, many (most?) of the boards for these chips do not support ECC, or at least do not admit to it. They specify "non-ECC memory". Obviously there have to be connections between the memory controller and the memory for the extra bits. Aside from a little extra time for the board designer to add a few traces to the wire list, this would not raise the cost of the board. Despite this I have read that some boards lack the necessary traces. Does the firmware have to do anything to support ECC? Program a few registers in the memory controller perhaps? A few boards have FLOSS firmware available, so this code could be added, but most boards do not have firmware sources available. Assuming that a board does have the necessary connections but the firmware does not have ECC support, is there some reason that ECC support could not be added to the OS instead of the firmware? I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find anything that looked relevant. Also did not find any code that reported ECC errors, other than one device. Perhaps I missed it? I've been running machines with ECC for 15-20 years and have never seen a report of an ECC error from either NetBSD or FreeBSD. I have seen reports of ECC errors from Digital Unix. And remember getting panics due to parity errors on machines before ECC. So I'm thinking that the BSDs must ignore hardware reports of single bit ECC errors. :-( From owner-freebsd-hardware@freebsd.org Tue Sep 15 21:02:09 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1B309C3C8A; Tue, 15 Sep 2015 21:02:09 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D95C710DB; Tue, 15 Sep 2015 21:02:09 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from zeta.ixsystems.com (unknown [12.229.62.2]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 2248E20FDD; Tue, 15 Sep 2015 14:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1442350923; x=1442365323; bh=x3YN3kGCcLwT50qiEyhiHxtJir2J+onQoyeDHlw21RQ=; h=Reply-To:Subject:References:To:From:Date:In-Reply-To; b=Q9FgTUwm9KL2vF0UFCbT8pgRI4/l49BGCsfXHeGQudUmO49OlV/NwscXwBZWurY24 8v0sgziOyi83eByLGqpUOr0n+meJX75mE0VFaCpJM0wFGrHDPGVqeChG1iGD03cNFQ ndoyYKUJBi58nMp3NIC+inqqY6+8YSkz6Jq1Pw44= Reply-To: d@delphij.net Subject: Re: ECC support References: To: Dieter BSD , freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org From: Xin Li Organization: The FreeBSD Project Message-ID: <55F8874A.9030807@delphij.net> Date: Tue, 15 Sep 2015 14:02:02 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="x3qAOq2idBBLVQR3ojvVstq4GkSx0ATLM" X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 21:02:10 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --x3qAOq2idBBLVQR3ojvVstq4GkSx0ATLM Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 09/15/15 13:53, Dieter BSD wrote: > I've been running machines with ECC for 15-20 years and have never seen= > a report of an ECC error from either NetBSD or FreeBSD. I have seen > reports of ECC errors from Digital Unix. And remember getting panics > due to parity errors on machines before ECC. So I'm thinking that > the BSDs must ignore hardware reports of single bit ECC errors. :-( I'm not sure about NetBSD but FreeBSD supports reporting ECC errors (via Machine Check Architecture, added in 2009), and yes we have seen it in field. Cheers, --=20 Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die --x3qAOq2idBBLVQR3ojvVstq4GkSx0ATLM Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.1.8 (FreeBSD) iQIcBAEBCgAGBQJV+IdKAAoJEJW2GBstM+nssdoP/RLizgBZ9q/5+F43N9vzpv0p HxG9e5pb/IgCH+FmdC3J2DbLd6WSnGvuObfDzcgb9mCVwHwgESbrdAnYeF7LbJW8 DLN+wYipCpDugvNubgN/YoEgniR2jg+cl6MJKnecZhmFF1hxzWR/kHr5kTw/cAiT T5BddK4Ye4Emf+mBmtwE3XFshJovh4hSi0MwO0Nu57yyhbezyeng1vp1rE1ZobO0 LtYby4Zvim/t8ns2Bwc/ETb6A3ONJHBbokH+pZLUUbZYXvbwqIOg3mkb6Xc0im+S Zv/LkJFenjLoaB+0L9AqAfMrmqsdI9eEz6XB5bn0cvpIEYsgWTpkbzw5UFQgIds1 AKxKxmQGpkB3Afs7FaDf38vWUGoPfGZIKJ7HG2/8pByOnC84JEY6UP9bbI1ew/Wc 64cs7Je0JqmZXxJSPb47G5vLLgQnj8Dh0hFLnfS15nxgd+I8cBnC8WhQ39OCDY1K w4c0odPv4xVPDU6Mc51Sv3aHJHtXFDsbjRCzt6qDIPYqeFWpwP+6LhacVM45SzXW rDzw9vEEypq1rijnjAO2HDPGtt5D1BVTHfw5iZ5neHSvVBo9SfpbB3+ibiSJKCpg STlEt0FPkCf2P8z04WaHK2NRQKpPwuCa8gz3O8ipQfgX4F/AbWDPU4AZpo4M6ADj U9wQ5ONSpF9bmxkpnr1k =dm4P -----END PGP SIGNATURE----- --x3qAOq2idBBLVQR3ojvVstq4GkSx0ATLM-- From owner-freebsd-hardware@freebsd.org Tue Sep 15 21:15:32 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A85CB9C2634; Tue, 15 Sep 2015 21:15:32 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id B5843195B; Tue, 15 Sep 2015 21:15:31 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA25755; Wed, 16 Sep 2015 00:15:22 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZbxZS-0003nk-3u; Wed, 16 Sep 2015 00:15:22 +0300 Subject: Re: ECC support To: Dieter BSD , freebsd-hardware@FreeBSD.org, freebsd-hackers@FreeBSD.org References: From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <55F88A18.6090504@FreeBSD.org> Date: Wed, 16 Sep 2015 00:14:00 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 21:15:32 -0000 On 15/09/2015 23:53, Dieter BSD wrote: > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? Yes, there is. The memory controller is programmed by the code that runs from ROM and uses no RAM (or the CPU cache is used as the RAM). Once the real RAM gets used it's too late to reprogram the DRAM controller. This is true at least for most or all of the modern day x86 hardware. -- Andriy Gapon From owner-freebsd-hardware@freebsd.org Tue Sep 15 21:52:33 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ED4FB9CDD34 for ; Tue, 15 Sep 2015 21:52:33 +0000 (UTC) (envelope-from jim@netgate.com) Received: from mail-ob0-x230.google.com (mail-ob0-x230.google.com [IPv6:2607:f8b0:4003:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B5E341129 for ; Tue, 15 Sep 2015 21:52:33 +0000 (UTC) (envelope-from jim@netgate.com) Received: by obbbh8 with SMTP id bh8so147056737obb.0 for ; Tue, 15 Sep 2015 14:52:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netgate.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=fZAuLl3hvf6NzSKX5fLyQLm3ubF3UgxfmExQOcLNhag=; b=eADSrglSTHD8iBpQaa6ZI4LOW6IRIiwT5yvq+OL4JjqTBBzdUGLA0fFLBdn3LIMNh4 ehYHYywQJV0ZqexBzKRHpFNDzEQu1d/l+UdORQLRlS6kHR2vGdXgBx/3Os+nr7o8HJx5 LSTIvi9zzUcbg9zlyOMwLcHprSPzJ6YL9O1jU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=fZAuLl3hvf6NzSKX5fLyQLm3ubF3UgxfmExQOcLNhag=; b=FRJWpXF7ZDRpcASDiCM4NgHBs28Tbw7qyl2WNdbeDV+M08J4m84vfe9dzzOLb4kfHE lQVn/cpXE1rhD/6jIJ7dEnIN+HZ/c1tcevrnV/riMkCnkYQjnQ4r/2uV3RHexP04H3do 2GwUELXdWxkC8r5Cu+2r/8MNgeCxq/ioFFIgFQzBj0IzsHfXSB73KQhhHn/W/XubSDPJ n1CPMKCAekrnQgdZTy+1T7OluuCnArQvXpV55kpJysl9unWLzv+ERd/sXSe2GyQ1kwY/ jspxoC8/hYGourmOQGEuUYHUc6Z8ULL41WggyegH9bWCtGjqBM2RJA04wuxR4y5VEVbM ohMw== X-Gm-Message-State: ALoCoQmgSL2o3zy+I3xdoYb+he/m+PJQXh/Ko51AGqMadXbVLwnuuQx8bbbmEflTozPDhrwH9ijn X-Received: by 10.182.120.100 with SMTP id lb4mr20360344obb.71.1442353952882; Tue, 15 Sep 2015 14:52:32 -0700 (PDT) Received: from ?IPv6:2610:160:11:33:343f:2e55:fcb2:6efb? ([2610:160:11:33:343f:2e55:fcb2:6efb]) by smtp.gmail.com with ESMTPSA id r63sm9637522oia.16.2015.09.15.14.52.31 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 15 Sep 2015 14:52:32 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Jim Thompson In-Reply-To: Date: Tue, 15 Sep 2015 16:52:30 -0500 Cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> References: To: Dieter BSD X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 21:52:34 -0000 ECC is implemented by a =E2=80=98hashing=E2=80=99 algorithm that works = on eight (8) bytes (64 bits) at a time, and places the result into an = 8-bit ECC =E2=80=98word=E2=80=99. Errors are corrected "on-the-fly," corrected data is almost never placed = back in memory. If the same corrupt data is read again, the correction = process is repeated. Replacing the data in memory would require = processing overhead that could accumulate and significantly diminish = system performance. If the error occurred because of random events and = isn't a defect in the memory, the memory address will be cleaned of the = error when the data is overwritten with other data. In terms of expense, at a minimum, where you had 8 bytes to make up a = memory system, you will now have 9 (to hold the extra 8 bits). This = means your memory, without the extra complexity of the controller, is = 12.5% more expensive. This isn=E2=80=99t a huge impact at 8GB, = (you=E2=80=99ll need another 1GB of RAM), but at 1024GB you=E2=80=99ll = need another 128GB, and that much ram still costs enough that your = wallet won=E2=80=99t be happy. =20 The memory controller has to be able to run the ECC algorithm on every = read, *and* supply the corrected data as needed, within the cycle time = of the read. If you involve software in this path, the performance your = machine will be glacial. Yes, the firmware has to program the memory controller. =E2=80=9CProgram= a few registers=E2=80=9D is all you need, only the MRC setup on Intel = and AMD is both complex and proprietary. Good luck getting the details for this. This is =E2=80=9CIntel Red Book=E2=80=9D territory, = and you=E2=80=99ll need to be an employee with a need to know. The MRC = setup code is a binary blob for otherwise open source boot firmware such = as Coreboot. Others have answered (in the positive) about the OS reporting ECC errors = on FreeBSD. Jim > On Sep 15, 2015, at 3:53 PM, Dieter BSD wrote: >=20 > Many of AMD's CPU/APU parts support ECC memory. Not just the top of = the > line parts, but also many of the less expensive, less power hungry = parts. > However, many (most?) of the boards for these chips do not support = ECC, > or at least do not admit to it. They specify "non-ECC memory". >=20 > Obviously there have to be connections between the memory controller = and > the memory for the extra bits. Aside from a little extra time for the > board designer to add a few traces to the wire list, this would not > raise the cost of the board. Despite this I have read that some = boards > lack the necessary traces. >=20 > Does the firmware have to do anything to support ECC? Program a few > registers in the memory controller perhaps? A few boards have FLOSS > firmware available, so this code could be added, but most boards do = not > have firmware sources available. >=20 > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? > I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find > anything that looked relevant. Also did not find any code that > reported ECC errors, other than one device. Perhaps I missed it? >=20 > I've been running machines with ECC for 15-20 years and have never = seen > a report of an ECC error from either NetBSD or FreeBSD. I have seen > reports of ECC errors from Digital Unix. And remember getting panics > due to parity errors on machines before ECC. So I'm thinking that > the BSDs must ignore hardware reports of single bit ECC errors. :-( > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hardware@freebsd.org Tue Sep 15 22:10:34 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1B629C2831; Tue, 15 Sep 2015 22:10:34 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 782161C44; Tue, 15 Sep 2015 22:10:34 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id t8FMAPnv022327; Tue, 15 Sep 2015 15:10:29 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201509152210.t8FMAPnv022327@gw.catspoiler.org> Date: Tue, 15 Sep 2015 15:10:25 -0700 (PDT) From: Don Lewis Subject: Re: ECC support To: dieterbsd@gmail.com cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 22:10:34 -0000 On 15 Sep, Dieter BSD wrote: > Many of AMD's CPU/APU parts support ECC memory. Not just the top of the > line parts, but also many of the less expensive, less power hungry parts. > However, many (most?) of the boards for these chips do not support ECC, > or at least do not admit to it. They specify "non-ECC memory". > > Obviously there have to be connections between the memory controller and > the memory for the extra bits. Aside from a little extra time for the > board designer to add a few traces to the wire list, this would not > raise the cost of the board. Despite this I have read that some boards > lack the necessary traces. I don't think the current APU parts support ECC. My guess is that the current APU sockets don't have the connections to support it. I'm typing on a FreeBSD with an AMD CPU with ECC RAM. I won't put together a machine without ECC. My experience is that many ASUS motherboard support ECC RAM and usually document that fact. Also many Gigabyte mother boards also support ECC RAM, but don't document it. Even if you look at the BIOS screenshots in the manual, you won't see the knobs to configure ECC, I suspect because those knobs are not displayed unless ECC RAM is installed. > Does the firmware have to do anything to support ECC? Program a few > registers in the memory controller perhaps? A few boards have FLOSS > firmware available, so this code could be added, but most boards do not > have firmware sources available. > > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? > I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find > anything that looked relevant. Also did not find any code that > reported ECC errors, other than one device. Perhaps I missed it? It's in there ... > I've been running machines with ECC for 15-20 years and have never seen > a report of an ECC error from either NetBSD or FreeBSD. I have seen > reports of ECC errors from Digital Unix. And remember getting panics > due to parity errors on machines before ECC. So I'm thinking that > the BSDs must ignore hardware reports of single bit ECC errors. :-( >From daily mail to root about a month ago: +MCA: Bank 4, Status 0x944a400096080a13 +MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 +MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0 +MCA: CPU 0 COR BUSLG Responder RD Memory +MCA: Address 0x213e98b10 +MCA: Bank 4, Status 0xd44a400096080a13 +MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 +MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0 +MCA: CPU 0 COR OVER BUSLG Responder RD Memory +MCA: Address 0x213e98b10 From owner-freebsd-hardware@freebsd.org Tue Sep 15 22:20:13 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A9549CD1EB; Tue, 15 Sep 2015 22:20:13 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: from mail-wi0-x22c.google.com (mail-wi0-x22c.google.com [IPv6:2a00:1450:400c:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 98B5213B5; Tue, 15 Sep 2015 22:20:12 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: by wiclk2 with SMTP id lk2so46421989wic.1; Tue, 15 Sep 2015 15:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=Ywu45W0TmIThbrBRdQGk+TLp855viyrKm19mvovyCvk=; b=T1nZhNfajaQoehw4UHAW9H+AHjHc9CjxEB+3AekyqMaRB+i00bJQzTOXcRjlduKjpB a7E+k64n186EdvsuhLdPK9bF1ywwVs58oj/Zl5/hJoGzbZwE6qw6Qp83+q34JWIAkRAs gHXt94y2LrD51EYUCUopvdx2IHCETm1Q0+8PkKid4Dm8QdtZexLQ0ZUPhMb0Nj3yq+ac Dm2mL7U30G4vWNpG3CwQPnztu8mcTln9IeS37Q7V+yMxs3nK3EucC/HF+Lr9qXslevX5 K0ORlNH48c9nJVy2euceCsiVzLo3m1DvIJykxCtE7CPVnvhmPNM+V15XMC6tBffOjaQu cgrA== X-Received: by 10.180.24.3 with SMTP id q3mr12248775wif.24.1442355610637; Tue, 15 Sep 2015 15:20:10 -0700 (PDT) MIME-Version: 1.0 Sender: mozolevsky@gmail.com Received: by 10.28.55.18 with HTTP; Tue, 15 Sep 2015 15:19:31 -0700 (PDT) In-Reply-To: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> References: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> From: Igor Mozolevsky Date: Tue, 15 Sep 2015 23:19:31 +0100 X-Google-Sender-Auth: cwYKB4BO_2rXOu8O0nVQDetnIpQ Message-ID: Subject: Re: ECC support To: Jim Thompson Cc: Dieter BSD , Hackers freeBSD , freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 22:20:13 -0000 On 15 September 2015 at 22:52, Jim Thompson wrote: Errors are corrected "on-the-fly," corrected data is almost never placed > back in memory. If the same corrupt data is read again, the correction > process is repeated. Replacing the data in memory would require processing > overhead that could accumulate and significantly diminish system > performance. If the error occurred because of random events and isn't a > defect in the memory, the memory address will be cleaned of the error when > the data is overwritten with other data. > Just to correct a small oversight- most (if not all?) boards have an option to scrub ECC memory in the background so as to prevent single bit (recoverable) errors from turning into double bit (irrecoverable but detectable) errors ;-) -- Igor M. From owner-freebsd-hardware@freebsd.org Tue Sep 15 22:34:47 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 81EE79CD9A5 for ; Tue, 15 Sep 2015 22:34:47 +0000 (UTC) (envelope-from jim@netgate.com) Received: from mail-ob0-x233.google.com (mail-ob0-x233.google.com [IPv6:2607:f8b0:4003:c01::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FF111F14 for ; Tue, 15 Sep 2015 22:34:47 +0000 (UTC) (envelope-from jim@netgate.com) Received: by obbzf10 with SMTP id zf10so92179342obb.2 for ; Tue, 15 Sep 2015 15:34:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netgate.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=8Pw11p5+QtfpIjZDWwqcsOdqtUvUhJ1X3jXp4g/aR9Y=; b=nL6oRZQY89VzRjI1a6Im+KreK6Z6EvZlDTgAMdbUZTLA0slGFZuSc8QAUWXL5AnNhC KRWYIiOverCDq2s49B0TJyPvrZblyuqcUsOVXYutapi9WwFcrX7pt1nBCXMmZVhOWXfN oAFlEYJbAAVJnB+8alLby2YrwtoThVsMSWzvA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=8Pw11p5+QtfpIjZDWwqcsOdqtUvUhJ1X3jXp4g/aR9Y=; b=G7kygo0ukhcX3iPRcrcaOq3udAfrUnEpxrTqkjeKlyR5rvZPI9PtmPI3Ou2Q/2RxtL XplbdE3ETXCb4Qw/yQL9iSBMZUKODw4FvuAogpxkUefyMSRbxCTDzbgcxU/8jPBxzrGb v6WlKIkj2PndomRb/LmpWzHKo/ACN2fuTtOYWQ07fuVpgL9GPcvPV2Gc1Ia3GWSr1aVi 5eo3BOECKnst8bTdXxFkhw+kf+rGqQoZ6r3CY+IILom7p2mOERQFgysCEK1yC9eQF62d lnhI2//lkFcbApgkBaQXkTH8Fjhh9GIOOty3EvDQHtdUdn+hfANtUXNDoj4z4aPJcTec uncg== X-Gm-Message-State: ALoCoQlw6FyA1QoviIQS2FNwzn+glqyZZxeWTZc3SKBLpLG+/j49eZf1/QdTIEv8WNtmHt8jRUSW X-Received: by 10.60.142.170 with SMTP id rx10mr20567897oeb.28.1442356486530; Tue, 15 Sep 2015 15:34:46 -0700 (PDT) Received: from ?IPv6:2610:160:11:33:343f:2e55:fcb2:6efb? ([2610:160:11:33:343f:2e55:fcb2:6efb]) by smtp.gmail.com with ESMTPSA id dc9sm9705080obb.17.2015.09.15.15.34.45 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 15 Sep 2015 15:34:46 -0700 (PDT) Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Jim Thompson In-Reply-To: Date: Tue, 15 Sep 2015 17:34:44 -0500 Cc: Dieter BSD , Hackers freeBSD , freebsd-hardware@freebsd.org Message-Id: <8435FBF3-2F8E-4A25-ABEA-B7038AFFE372@netgate.com> References: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> To: Igor Mozolevsky X-Mailer: Apple Mail (2.2104) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 22:34:47 -0000 > On Sep 15, 2015, at 5:19 PM, Igor Mozolevsky = wrote: >=20 > On 15 September 2015 at 22:52, Jim Thompson > wrote: >=20 > >=20 > Errors are corrected "on-the-fly," corrected data is almost never = placed back in memory. If the same corrupt data is read again, the = correction process is repeated. Replacing the data in memory would = require processing overhead that could accumulate and significantly = diminish system performance. If the error occurred because of random = events and isn't a defect in the memory, the memory address will be = cleaned of the error when the data is overwritten with other data. >=20 > =20 >=20 > Just to correct a small oversight- most (if not all?) boards have an = option to scrub ECC memory in the background so as to prevent single bit = (recoverable) errors from turning into double bit (irrecoverable but = detectable) errors ;-) I think you=E2=80=99ll find that the default for =E2=80=98scrub=E2=80=99 = is off on most (perhaps all) boards. There are reasons, and these = relate directly to =E2=80=9Csignificantly diminish system = performance=E2=80=9D, (above), as well as the greatly increased RAM = sizes in use today. =E2=80=99Scrub' was popular about a decade ago, when DDR2 RAM was around = $100/GB. DDR3-1600 is about $6/GB today. Jim From owner-freebsd-hardware@freebsd.org Tue Sep 15 22:40:02 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92ABE9CDC8A for ; Tue, 15 Sep 2015 22:40:02 +0000 (UTC) (envelope-from jim@netgate.com) Received: from mail-ob0-x230.google.com (mail-ob0-x230.google.com [IPv6:2607:f8b0:4003:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 524AC115C for ; Tue, 15 Sep 2015 22:40:02 +0000 (UTC) (envelope-from jim@netgate.com) Received: by obqa2 with SMTP id a2so148304019obq.3 for ; Tue, 15 Sep 2015 15:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netgate.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=cR3eLpcl4/rweOVkyUpD4L9F1JmUbl/wKdwcCX7fkQA=; b=fzhd5eZTZPsFa9BSzA2nLPpFdVkQvYL4NMoNO9z42sAIWHxVJGeQkifRvzyGTZoay0 MeFmoczAA41URwsh5Uqr49cFAV5bjBqihZx7t9Fyq9+FJo9z7cACADQLdDlyNweqEDXc BI9aBen6ae5SI7kbL8JXOR3R1hRz2LAp7vBLU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=cR3eLpcl4/rweOVkyUpD4L9F1JmUbl/wKdwcCX7fkQA=; b=QVXytYnhH40ZSXCNMh4hMmp837b4xI1gObc0dfyf82el6e2FzHqz8cXvrje+PmArrx epItpTuyE/KsH1K5mkwhGtjprdicuAenlcSMB37NDmYpv7juyDCLdlb9NDAAPRw0fUx2 5qCGls6ErD0tYt7+3MuigsLRJCDKW2djvNU87qrwvSQAlwCfTB6cGbjozh2b6FfmAWtW bw/Lc6OuMuYaBOHSUleADss4ZnlO+FnAW8TfyOBC6oFyRRaKudlG5BkKylwNoVj71AhR Doua18aYnky/LPpkEDR1n178bPr10ia2KwSnIlx3DBhMgy8IXmcOA+uFFbuYgFvfnxmz fV4g== X-Gm-Message-State: ALoCoQluFIaPdQfG3ldBgqQa8ssYJitSMgo18/sIazs14ypgdhh/bsVCnWjVaMG2mDQt/zLRHAiM X-Received: by 10.182.33.39 with SMTP id o7mr20701087obi.42.1442356801732; Tue, 15 Sep 2015 15:40:01 -0700 (PDT) Received: from jims-mbp.pfmechanics.com ([208.123.73.28]) by smtp.gmail.com with ESMTPSA id ec5sm6579516obb.22.2015.09.15.15.40.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 15 Sep 2015 15:40:00 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Jim Thompson In-Reply-To: <201509152210.t8FMAPnv022327@gw.catspoiler.org> Date: Tue, 15 Sep 2015 17:39:59 -0500 Cc: dieterbsd@gmail.com, freebsd-hackers@freebsd.org, freebsd-hardware@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <9E71A23E-2563-43FC-89F2-8ECB098EAD63@netgate.com> References: <201509152210.t8FMAPnv022327@gw.catspoiler.org> To: Don Lewis X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 22:40:02 -0000 > On Sep 15, 2015, at 5:10 PM, Don Lewis wrote: >=20 > On 15 Sep, Dieter BSD wrote: >> Many of AMD's CPU/APU parts support ECC memory. Not just the top of = the >> line parts, but also many of the less expensive, less power hungry = parts. >> However, many (most?) of the boards for these chips do not support = ECC, >> or at least do not admit to it. They specify "non-ECC memory". >>=20 >> Obviously there have to be connections between the memory controller = and >> the memory for the extra bits. Aside from a little extra time for = the >> board designer to add a few traces to the wire list, this would not >> raise the cost of the board. Despite this I have read that some = boards >> lack the necessary traces. >=20 > I don't think the current APU parts support ECC. My guess is that the > current APU sockets don't have the connections to support it. The G-Series (such as the T40E used on the APU) doesn=E2=80=99t support = ECC. =E2=80=9CKabini=E2=80=9D (=E2=80=9CG-Series 2.0=E2=80=9D aka GX-210 / = GX-415/420) supports a single channel of ECC ram. Honestly, at the densities used by some of these boards, ECC doesn=E2=80=99= t make much sense. (Obviously, if you=E2=80=99re running storage appliance, this position = is reversed.) From owner-freebsd-hardware@freebsd.org Tue Sep 15 22:52:41 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D166E9C2580; Tue, 15 Sep 2015 22:52:41 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 697A71C14; Tue, 15 Sep 2015 22:52:41 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: by wiclk2 with SMTP id lk2so47084109wic.1; Tue, 15 Sep 2015 15:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=J/66igBcl8DIAG5+r6Ta9PFaAASr3njGy3pNBbrHGAI=; b=K79NGAiQZjplY1ijOeVGxTZHtaZ8dIjkyi4NT7F8zXzrgXbivdO9Hn46fJn3L7CrMz J6anaeql7y6b/qz1OsACDvHce/7OP2OQsJuxYfXGmRFrtKlClbP71cZNqTlL3iVr6ODz zMy8Jg7xx4m4rFHkco3TnoSU7veKjJxmOiIy3CRifJzdiUjKx8PDirQKN0X2jx4YF6ZN CKIwCRC4z7OalveNd6Sk+nahyEpVOPHDt+yPz6zv09KVKEu+Q4sARScv0u6j8K8tggzw Gq+3eSynBHwxffxKXTfBIeWwfoEGFKjE0dLp2dEOWqkPNZTd4HfQHtKwSZWftk01Jq+v M9Cg== X-Received: by 10.181.13.166 with SMTP id ez6mr12496273wid.24.1442357559559; Tue, 15 Sep 2015 15:52:39 -0700 (PDT) MIME-Version: 1.0 Sender: mozolevsky@gmail.com Received: by 10.28.55.18 with HTTP; Tue, 15 Sep 2015 15:52:00 -0700 (PDT) In-Reply-To: <8435FBF3-2F8E-4A25-ABEA-B7038AFFE372@netgate.com> References: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> <8435FBF3-2F8E-4A25-ABEA-B7038AFFE372@netgate.com> From: Igor Mozolevsky Date: Tue, 15 Sep 2015 23:52:00 +0100 X-Google-Sender-Auth: saJB1cOsq7agNNwBVJMRpV6QVBA Message-ID: Subject: Re: ECC support To: Jim Thompson Cc: Dieter BSD , Hackers freeBSD , freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 22:52:42 -0000 On 15 September 2015 at 23:34, Jim Thompson wrote: > I think you=E2=80=99ll find that the default for =E2=80=98scrub=E2=80=99 = is off on most (perhaps > all) boards. There are reasons, and these relate directly to > =E2=80=9Csignificantly diminish system performance=E2=80=9D, (above), as = well as the > greatly increased RAM sizes in use today. > Perhaps I missed something- what point is it that you're trying to make? I was saying that scrubbing aims to remove errors at the source (cf. "on demand") and prevent multi-bit errors that become detectable but irrecoverable, or worse, undetectable. Get hit by a few of the latter two at "interesting" points and you'd wish that scrubbing were on! And seriously, ECC scrubbing is slow but ZFS (or even hardware RAID) scrubbing is lightning fast??! C'mon are we going for data integrity or speed here?! =E2=80=99Scrub' was popular about a decade ago, when DDR2 RAM was around $1= 00/GB. > DDR3-1600 is about $6/GB today. > Yup- with a much higher density of smaller memory bits! ;-) --=20 Igor M. From owner-freebsd-hardware@freebsd.org Tue Sep 15 23:01:51 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 073999C2C90; Tue, 15 Sep 2015 23:01:51 +0000 (UTC) (envelope-from alex.burlyga.ietf@gmail.com) Received: from mail-vk0-x229.google.com (mail-vk0-x229.google.com [IPv6:2607:f8b0:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B54D31170; Tue, 15 Sep 2015 23:01:50 +0000 (UTC) (envelope-from alex.burlyga.ietf@gmail.com) Received: by vkgd64 with SMTP id d64so90339911vkg.0; Tue, 15 Sep 2015 16:01:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=q0yhzQavVtq1XHaKweWEIqwNGgW/5hbexLoMphim8pc=; b=DInN7ZnYWcdVC7UewLIicGBN5oj3ny8y/IeL0/9v5TGLTX9Ueoezayvbf6EcdjRRLj 5bXze5tMxwSXi3brEmnAygVSWbBNVja2Xpla3iNJHJmYjbtEfBurDBLjWu60CiVOZKFU LVz85jCfWG3lWJcbTQl8CeSzmaJus6x3eBFSJD8oRXIw0gzq1Kyy+xlQUHwMGfAdcT3k NN4fBb6Mw9z8/h3DQHioygvZr4oriPQU/I7dI+1IrjtpkQmLI6kxL62XqeGiO51HvX1R 20SKccZRTCkDsX6OajIDlnBH/lJnr6NN+TIiuPE3Q5oEa/1X/r3QqvZw7IhyhbnpchfF TR0g== MIME-Version: 1.0 X-Received: by 10.31.33.134 with SMTP id h128mr23799766vkh.138.1442358109882; Tue, 15 Sep 2015 16:01:49 -0700 (PDT) Received: by 10.103.81.193 with HTTP; Tue, 15 Sep 2015 16:01:49 -0700 (PDT) In-Reply-To: References: <41EFCF21-D3B0-4EC4-8EAB-417CA33821FC@netgate.com> <8435FBF3-2F8E-4A25-ABEA-B7038AFFE372@netgate.com> Date: Tue, 15 Sep 2015 16:01:49 -0700 Message-ID: Subject: Re: ECC support From: "alex.burlyga.ietf alex.burlyga.ietf" To: Igor Mozolevsky Cc: Jim Thompson , Hackers freeBSD , Dieter BSD , freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 23:01:51 -0000 On Tue, Sep 15, 2015 at 3:52 PM, Igor Mozolevsky wr= ote: > On 15 September 2015 at 23:34, Jim Thompson wrote: > > > > >> I think you=E2=80=99ll find that the default for =E2=80=98scrub=E2=80=99= is off on most (perhaps >> all) boards. There are reasons, and these relate directly to >> =E2=80=9Csignificantly diminish system performance=E2=80=9D, (above), as= well as the >> greatly increased RAM sizes in use today. >> > > Perhaps I missed something- what point is it that you're trying to make? = I > was saying that scrubbing aims to remove errors at the source (cf. "on > demand") and prevent multi-bit errors that become detectable but > irrecoverable, or worse, undetectable. Get hit by a few of the latter two > at "interesting" points and you'd wish that scrubbing were on! > > And seriously, ECC scrubbing is slow but ZFS (or even hardware RAID) > scrubbing is lightning fast??! C'mon are we going for data integrity or > speed here?! If I remember correctly enabling Patrol Scrub guaranties that each address gets hit once per 24 hours. So on 128GB system you are generating maybe 1-2MiB/s of reads. I'd say it's a good trade-off if you bothered to put ECC memory in. > > =E2=80=99Scrub' was popular about a decade ago, when DDR2 RAM was around = $100/GB. >> DDR3-1600 is about $6/GB today. >> > > Yup- with a much higher density of smaller memory bits! ;-) > > -- > Igor M. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " From owner-freebsd-hardware@freebsd.org Wed Sep 16 01:17:54 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C94849C24B2; Wed, 16 Sep 2015 01:17:54 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id ACE87104D; Wed, 16 Sep 2015 01:17:54 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id t8G1Hhbq023239; Tue, 15 Sep 2015 18:17:48 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201509160117.t8G1Hhbq023239@gw.catspoiler.org> Date: Tue, 15 Sep 2015 18:17:43 -0700 (PDT) From: Don Lewis Subject: Re: ECC support To: jim@netgate.com cc: dieterbsd@gmail.com, freebsd-hackers@freebsd.org, freebsd-hardware@freebsd.org In-Reply-To: <9E71A23E-2563-43FC-89F2-8ECB098EAD63@netgate.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-8859-13 Content-Transfer-Encoding: 8BIT X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 01:17:54 -0000 On 15 Sep, Jim Thompson wrote: > >> On Sep 15, 2015, at 5:10 PM, Don Lewis wrote: >> >> On 15 Sep, Dieter BSD wrote: >>> Many of AMD's CPU/APU parts support ECC memory. Not just the top of the >>> line parts, but also many of the less expensive, less power hungry parts. >>> However, many (most?) of the boards for these chips do not support ECC, >>> or at least do not admit to it. They specify "non-ECC memory". >>> >>> Obviously there have to be connections between the memory controller and >>> the memory for the extra bits. Aside from a little extra time for the >>> board designer to add a few traces to the wire list, this would not >>> raise the cost of the board. Despite this I have read that some boards >>> lack the necessary traces. >> >> I don't think the current APU parts support ECC. My guess is that the >> current APU sockets don't have the connections to support it. > > The G-Series (such as the T40E used on the APU) doesn˙t support ECC. > > ´Kabiniˇ (´G-Series 2.0ˇ aka GX-210 / GX-415/420) supports a single channel of ECC ram. Interesting ... it's been a while since I looked. I think the primary sockets at the time were FM1, FM2, and FM2+, and the mobile sockets, and they didn't support ECC. AM1 motherboard ECC support seems to be pretty lacking, though. From owner-freebsd-hardware@freebsd.org Wed Sep 16 01:23:50 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A74A9C2934; Wed, 16 Sep 2015 01:23:50 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D598157B; Wed, 16 Sep 2015 01:23:50 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id t8G1NdvM023263; Tue, 15 Sep 2015 18:23:43 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201509160123.t8G1NdvM023263@gw.catspoiler.org> Date: Tue, 15 Sep 2015 18:23:39 -0700 (PDT) From: Don Lewis Subject: Re: ECC support To: jim@netgate.com cc: igor@hybrid-lab.co.uk, freebsd-hackers@freebsd.org, dieterbsd@gmail.com, freebsd-hardware@freebsd.org In-Reply-To: <8435FBF3-2F8E-4A25-ABEA-B7038AFFE372@netgate.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-2022-jp X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 01:23:50 -0000 On 15 Sep, Jim Thompson wrote: > >> On Sep 15, 2015, at 5:19 PM, Igor Mozolevsky >> wrote: >> >> On 15 September 2015 at 22:52, Jim Thompson > > wrote: >> >> >> >> Errors are corrected "on-the-fly," corrected data is almost never >> placed back in memory. If the same corrupt data is read again, the >> correction process is repeated. Replacing the data in memory would >> require processing overhead that could accumulate and significantly >> diminish system performance. If the error occurred because of random >> events and isn't a defect in the memory, the memory address will be >> cleaned of the error when the data is overwritten with other data. >> >> >> >> Just to correct a small oversight- most (if not all?) boards have an >> option to scrub ECC memory in the background so as to prevent single >> bit (recoverable) errors from turning into double bit (irrecoverable >> but detectable) errors ;-) > > I think you$B!G(Bll find that the default for $B!F(Bscrub$B!G(B is off on most > (perhaps all) boards. There are reasons, and these relate directly to > $B!H(Bsignificantly diminish system performance$B!I(B, (above), as well as the > greatly increased RAM sizes in use today. The Gigabyte AM3+ motherboards that I'm using have all sorts of knobs for controlling the scrub rate, with different knobs for cache scrubbing vs. main memory scrubbing. My somewhat more recent Asus AM3+ board with different BIOS brand basically just has an ECC on/off knob. From owner-freebsd-hardware@freebsd.org Wed Sep 16 03:59:15 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A5E8A9CD7F4; Wed, 16 Sep 2015 03:59:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3973E187B; Wed, 16 Sep 2015 03:59:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id t8G3x4iA061706 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 16 Sep 2015 06:59:05 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua t8G3x4iA061706 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id t8G3x4Rj061705; Wed, 16 Sep 2015 06:59:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 16 Sep 2015 06:59:04 +0300 From: Konstantin Belousov To: Andriy Gapon Cc: Dieter BSD , freebsd-hardware@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: ECC support Message-ID: <20150916035904.GE67105@kib.kiev.ua> References: <55F88A18.6090504@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55F88A18.6090504@FreeBSD.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 03:59:15 -0000 On Wed, Sep 16, 2015 at 12:14:00AM +0300, Andriy Gapon wrote: > On 15/09/2015 23:53, Dieter BSD wrote: > > Assuming that a board does have the necessary connections but > > the firmware does not have ECC support, is there some reason that > > ECC support could not be added to the OS instead of the firmware? > > Yes, there is. The memory controller is programmed by the code that runs from > ROM and uses no RAM (or the CPU cache is used as the RAM). Once the real RAM > gets used it's too late to reprogram the DRAM controller. This is true at least > for most or all of the modern day x86 hardware. For modern Intel hardware, the IMC config is locked before BIOS passes the control to the user code, i.e. OS loader. It does not help much that the documentation for IMC is not provided even under NDA. From owner-freebsd-hardware@freebsd.org Wed Sep 16 07:52:07 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C90A19C270F; Wed, 16 Sep 2015 07:52:07 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 748D71E14; Wed, 16 Sep 2015 07:52:07 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t8G7pw6u085964; Wed, 16 Sep 2015 08:51:58 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: <20150916035904.GE67105@kib.kiev.ua> Date: Wed, 16 Sep 2015 08:51:53 +0100 Cc: Andriy Gapon , freebsd-hackers@freebsd.org, Dieter BSD , Konstantin Belousov Content-Transfer-Encoding: quoted-printable Message-Id: <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> To: freebsd-hardware@freebsd.org X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 07:52:07 -0000 Hi, Arriving late to this thread, a few observations: - Obviously the more RAM you have, the more errors you are going to see. = In other words, ECC makes increasing sense as RAM sizes get larger. All = server-class hardware should have it. - DRAM has to be refreshed. In sensible designs, ECC scrub is integrated = with refresh to minimise overhead. It doesn=E2=80=99t have to be very = frequent, maybe every 24 hours. - On server-class hardware, the platform management (BMC or whatever) = should be picking up, logging, and possibly alarming on ECC errors = regardless of the OS. - You might think that as memory density increases (ie bit cell size = shrinks), error rates would increase. Apparently this wasn=E2=80=99t so = up to 2009 at least, see: http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf which reports on a study of these issues across Google=E2=80=99s estate = at the time. I don=E2=80=99t know of any more recent similar work. -- Bob Bishop rb@gid.co.uk From owner-freebsd-hardware@freebsd.org Wed Sep 16 10:49:22 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 488929C2B92; Wed, 16 Sep 2015 10:49:22 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: from mail-wi0-x22c.google.com (mail-wi0-x22c.google.com [IPv6:2a00:1450:400c:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D49AA182B; Wed, 16 Sep 2015 10:49:21 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: by wiclk2 with SMTP id lk2so64824695wic.1; Wed, 16 Sep 2015 03:49:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=cly4E9Jaznh9xi6dPpqv0ErKCxiVRbD7tZYyReSQCf4=; b=zdTrx7crBj5UbiSIhN+6QuI5tsiUVTIeFck0xBNoX6xcsqVxZ+Njyi9v7I0TDGUjBf BN9GlLtu+lyO3jh32E/O5Wf3SGmYTCW2NiqqJHFwtCMLATfmhYp+lNwNi8qcpFNcdoOL UJ7pTqq9D6fHZqSXxR3Vv+AJHbHvgxzMUdIEDS8Xf+MeF1L2Em9MAAtKquo/pCZdOb8G ZPJI7t1XoX4eMJ+vsROsWpBdZCAGchM4CW1Lb5N8983ZxxRtC+w3YjWXTKhrnSdcguo/ DZg2kfSVl4hXFcp8EQyQeqNQDJRgHy6lUDcjQtUSMDtDUZ4eXhp7lxkOdrr0OYKSVJhe SY/g== X-Received: by 10.194.82.167 with SMTP id j7mr50695046wjy.123.1442400560244; Wed, 16 Sep 2015 03:49:20 -0700 (PDT) MIME-Version: 1.0 Sender: mozolevsky@gmail.com Received: by 10.28.55.18 with HTTP; Wed, 16 Sep 2015 03:48:37 -0700 (PDT) In-Reply-To: <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> From: Igor Mozolevsky Date: Wed, 16 Sep 2015 11:48:37 +0100 X-Google-Sender-Auth: OKGI12eR9wodPG2yv1zYc03yIkM Message-ID: Subject: Re: ECC support To: Bob Bishop Cc: freebsd-hardware@freebsd.org, Konstantin Belousov , Hackers freeBSD , Dieter BSD , Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 10:49:22 -0000 On 16 September 2015 at 08:51, Bob Bishop wrote: > - You might think that as memory density increases (ie bit cell size shrinks), error rates would increase. Apparently this wasn=E2=80=99t so up = to 2009 at least, see: > > http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf subsection 5.1: "=E2=80=A6 Figure 6 indicates a trend towards worse error behavior for increased capacities, although this trend is not consis- tent. While in some cases the doubling of capacity has a clear negative effect (factors larger than 1 in the graph), in others it has hardly any effect (factor close to 1 in the graph). For example, for Platform A -Mfg1 and Platform F - Mfg1 doubling the capacity increases uncorrectable errors, but not correctable errors. Conversely, for Platform D - Mfg6 doubling the capacity affects correctable errors, but not uncorrectable error." There are also other environmental factors which would be more apparent in "lone-server" configuration vs well maintained and insulated data centres with very good power conditioning ;-) --=20 Igor M. From owner-freebsd-hardware@freebsd.org Wed Sep 16 11:35:03 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADDD49CEC71; Wed, 16 Sep 2015 11:35:03 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3D44218E1; Wed, 16 Sep 2015 11:35:02 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t8GBYxFj012557; Wed, 16 Sep 2015 12:34:59 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: Date: Wed, 16 Sep 2015 12:34:54 +0100 Cc: Konstantin Belousov , Hackers freeBSD , Dieter BSD , Andriy Gapon , freebsd-hardware@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <3678FC1E-DDC5-4FB2-B6E9-6FC90D0C988E@gid.co.uk> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> To: Igor Mozolevsky X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 11:35:03 -0000 Hi, > On 16 Sep 2015, at 11:48, Igor Mozolevsky = wrote: >=20 > On 16 September 2015 at 08:51, Bob Bishop wrote: >=20 > >=20 >> - You might think that as memory density increases (ie bit cell size > shrinks), error rates would increase. Apparently this wasn=E2=80=99t = so up to 2009 > at least, see: >>=20 >> http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf >=20 > subsection 5.1: >=20 > "=E2=80=A6 Figure 6 indicates a trend towards worse error behavior > for increased capacities, although this trend is not consis- > tent. [etc] That=E2=80=99s talking about DIMM capacity, not the capacity (density) = of individual chips on which they say (at the end of the same = subsection): "The best we can conclude therefore is that any chip size effect is = unlikely to dominate error rates given that the trends are not = consistent across various other confounders such as age and = manufacturer.=E2=80=9D I=E2=80=99ll admit to talking that point up a bit but it is = counterintuitive. Memory designers have always been scared of cosmic = rays etc but the suspected effects simply have not been noticeable. Most = likely as they shrink features ever smaller, other factors like material = purity dominate. > There are also other environmental factors which would be more = apparent in > "lone-server" configuration vs well maintained and insulated data = centres > with very good power conditioning ;-) Indeed, and that=E2=80=99s a whole other PITA. We went to colo and never = looked back, but low-power options for small servers are getting better. > --=20 > Igor M. > _______________________________________________ > freebsd-hardware@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hardware > To unsubscribe, send any mail to = "freebsd-hardware-unsubscribe@freebsd.org" -- Bob Bishop rb@gid.co.uk From owner-freebsd-hardware@freebsd.org Wed Sep 16 11:53:40 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A2FA69CD83B; Wed, 16 Sep 2015 11:53:40 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2D0FE1347; Wed, 16 Sep 2015 11:53:40 +0000 (UTC) (envelope-from mozolevsky@gmail.com) Received: by wiclk2 with SMTP id lk2so67144756wic.1; Wed, 16 Sep 2015 04:53:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=RAhQvhmCVsEIXrRZsE2XUBJE5H1dWnWb7K2X93C+OeA=; b=dnC+PIUXlFQHxYPrnzDEslZyeN4XElUxUqASnA0RRMHfmB8hfB5pmv7gIHOaMc7PyA 2yTORDIf0pEZVzo6W2XcrYDDFZXzum18YkeW8n0sxkLSTLWvhqY0nNQgNy3SkqNPE0cZ nG0UHrUXAidZBHfW6McfXTcytoqpDhSlJO5uvfJxyBrVtRq6xF207S3DZz0gdHwgo7z7 JV2bv0xsq5aKaYlmDyhTMpuiPfiyYGulgZeBGFhHXBiMaOzjEqR6rL6MTMrH4/GZjuV+ 2fT4FccZ0S1QCOmFGj7m5OuGQcPzcTJ5YCyDM8mgBk4mJZCWX9MxlfIxIEpNHJllksly I63A== X-Received: by 10.181.13.166 with SMTP id ez6mr18779967wid.24.1442404417758; Wed, 16 Sep 2015 04:53:37 -0700 (PDT) MIME-Version: 1.0 Sender: mozolevsky@gmail.com Received: by 10.28.55.18 with HTTP; Wed, 16 Sep 2015 04:52:58 -0700 (PDT) In-Reply-To: <3678FC1E-DDC5-4FB2-B6E9-6FC90D0C988E@gid.co.uk> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> <3678FC1E-DDC5-4FB2-B6E9-6FC90D0C988E@gid.co.uk> From: Igor Mozolevsky Date: Wed, 16 Sep 2015 12:52:58 +0100 X-Google-Sender-Auth: _DFQg-Oibe5b_dGmy7JWDcQFFdI Message-ID: Subject: Re: ECC support To: Bob Bishop Cc: Konstantin Belousov , Hackers freeBSD , Dieter BSD , Andriy Gapon , freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 11:53:40 -0000 On 16 September 2015 at 12:34, Bob Bishop wrote: > "The best we can conclude therefore is that any chip size effect is > unlikely to dominate error rates given that the trends are not consistent > across various other confounders such as age and manufacturer.=E2=80=9D > > I=E2=80=99ll admit to talking that point up a bit but it is counterintuit= ive. > Memory designers have always been scared of cosmic rays etc but the > suspected effects simply have not been noticeable. Most likely as they > shrink features ever smaller, other factors like material purity dominate= . > I saw that after I posted, and had a long ponder as to why it would be so. The only thing I could think of is that the fab process was(/is?) large enough to not worry about "nonsense" like cosmic rays &c (but then I've not had much exposure to semi-conductor electronics theory since late 90s). Perhaps we're at a point where the fab process can't really shrink much more with DRAM due to the underlying tech (effectively many tiny RC circuits), which is the reason the manufacturers just stack ranks to get more capacity per DIMM instead of packing more in a single chip?.. --=20 Igor M. From owner-freebsd-hardware@freebsd.org Wed Sep 16 12:04:28 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 414069CE340; Wed, 16 Sep 2015 12:04:28 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DCF2E1B6D; Wed, 16 Sep 2015 12:04:27 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t8GC4PLp018426; Wed, 16 Sep 2015 13:04:25 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: Date: Wed, 16 Sep 2015 13:04:20 +0100 Cc: Konstantin Belousov , Hackers freeBSD , Dieter BSD , Andriy Gapon , freebsd-hardware@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <93106DFF-9741-4515-B6E0-AC43C0AF2179@gid.co.uk> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> <3678FC1E-DDC5-4FB2-B6E9-6FC90D0C988E@gid.co.uk> To: Igor Mozolevsky X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 12:04:28 -0000 > On 16 Sep 2015, at 12:52, Igor Mozolevsky = wrote: >=20 > On 16 September 2015 at 12:34, Bob Bishop wrote: >=20 > >=20 >=20 >> "The best we can conclude therefore is that any chip size effect is >> unlikely to dominate error rates given that the trends are not = consistent >> across various other confounders such as age and manufacturer.=E2=80=9D= >>=20 >> I=E2=80=99ll admit to talking that point up a bit but it is = counterintuitive. >> Memory designers have always been scared of cosmic rays etc but the >> suspected effects simply have not been noticeable. Most likely as = they >> shrink features ever smaller, other factors like material purity = dominate. >>=20 >=20 > I saw that after I posted, and had a long ponder as to why it would be = so. > The only thing I could think of is that the fab process was(/is?) = large > enough to not worry about "nonsense" like cosmic rays &c (but then = I've not > had much exposure to semi-conductor electronics theory since late = 90s). > Perhaps we're at a point where the fab process can't really shrink = much > more with DRAM due to the underlying tech (effectively many tiny RC > circuits), which is the reason the manufacturers just stack ranks to = get > more capacity per DIMM instead of packing more in a single chip?.. Dunno. I=E2=80=99ll ask my tame semiconductor expert when I see him = tomorrow... > --=20 > Igor M. > _______________________________________________ > freebsd-hardware@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hardware > To unsubscribe, send any mail to = "freebsd-hardware-unsubscribe@freebsd.org" -- Bob Bishop rb@gid.co.uk From owner-freebsd-hardware@freebsd.org Wed Sep 16 17:56:53 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A46E9CD1D6; Wed, 16 Sep 2015 17:56:53 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-ig0-x233.google.com (mail-ig0-x233.google.com [IPv6:2607:f8b0:4001:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0C0571CA0; Wed, 16 Sep 2015 17:56:53 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: by igcpb10 with SMTP id pb10so40089211igc.1; Wed, 16 Sep 2015 10:56:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=K2EqzLBHn/Q1rYHgLspRYQvHN2yT2gHt5G5Im9UfmEk=; b=ZWJ6UZg86GL0Ahx3XcYRnYqRUMYjUeThY7PucvHitBZ7pBImYYLfG4+Ou70mSlKDtc mbQ+xzcAohPX4nIBkJs9JAtC/o8PtSO1QQ/oQGraPK2BYKBWKnYKNsCkC10B7lpEWFTX a9YY3xlU7rfccqI93iddYBD3OPRt3YI25Z11XrKE+lg0CIfJuExo0Hl7akbRtjnQN0il pXUAH2H+IRYdEqC7wNjR6qlBi2Jcvye2SCqPB4+pBgppy6NQV29OsfEwrHfXGCXKOyUh HLBYzdv6PGuW2d6M8d3+051U11gOX3WLVw387R612gQgh4jzhb2s4zqVl8tvls64oxKo 9S7w== MIME-Version: 1.0 X-Received: by 10.50.78.138 with SMTP id b10mr13351087igx.67.1442426212442; Wed, 16 Sep 2015 10:56:52 -0700 (PDT) Received: by 10.64.2.132 with HTTP; Wed, 16 Sep 2015 10:56:52 -0700 (PDT) Date: Wed, 16 Sep 2015 10:56:52 -0700 Message-ID: Subject: Re: ECC support From: Dieter BSD To: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Sep 2015 17:56:53 -0000 Andriy: >> Assuming that a board does have the necessary connections but >> the firmware does not have ECC support, is there some reason that >> ECC support could not be added to the OS instead of the firmware? > > Yes, there is. The memory controller is programmed by the code that > runs from ROM and uses no RAM (or the CPU cache is used as the RAM). > Once the real RAM gets used it's too late to reprogram the DRAM controller. Perhaps one of the several bootloader stages could get itelf into CPU cache, program the memory controller, then load and execute the next stage or the OS? Jim: > Replacing the data in memory would require processing overhead > that could accumulate and significantly diminish system performance. If it only replaces data when there is a correctable error, and the errors are occasional soft errors, the effect on performance should be minimal. If there is a hard error, you would want to replace the defective memory before you get an additional error and it becomes uncorrectable. > If the error occurred because of random events and isn't a defect in > the memory, the memory address will be cleaned of the error when the > data is overwritten with other data. If and when new data gets written to that location. If that location contains info that never changes, such as kernel text, the bad bit will never get fixed. > memory, without the extra complexity of the controller, is 12.5% more > expensive. This <80><99>t a huge impact at 8GB, (<80><99>ll need > another 1GB of RAM), but at 1024GB <80><99>ll need another 128GB, > and that much ram still costs enough that your wallet <80><99>t be happy. It is 12.5% in both cases. How much does it cost to have undetected errors in your data? How much does it cost when an Interstate bridge collapses? How much does it cost when one of NASA's missions fails? How much does it cost when your pharmacy receives a prescription with an error in the dose? > the MRC setup on Intel and AMD is both complex and proprietary One wonders why the secrecy. AMD has been much more open than many (most?) chipmakers. They even forced the ATI people to document how to program their chips. I don't see a lot of companies popping up making competing chips. #include standard joke: "How do you make a small fortune in chipmaking? Start with a very large fortune." I can't see what secret would be revealed by saying "set bit 7 of register 4 to 1 to enable ECC". > Intel Red Book So the secret books are red this week, yawn. I remember the nightmare of the merced orange books and the brain damaged "features" the chips had. Not recommended. I'm interested in chips that work correctly, hence the interest in ECC and AMD. Looked for ARM boards with ECC but didn't find any. Is the Sparc stuff any more reliable than it used to be? Other arch choices? > The MRC setup code is a binary blob for otherwise open source boot > firmware such as Coreboot. So the libreboot people are forced to work on reverse engineering these blobs? :-( Don: > I don't think the current APU parts support ECC. According to wikipedia, socket FM2+ does not support ECC. :-( Kabini has support for ECC. And Berlin, (and I assume Toronto) but word is that Berlin and Toronto are basically dead. :-( I think Carrizo and Turion are supposed to support ECC? There really ought to be a list of which CPUs/APUs/sockets/boards do or do not support ECC. > My experience is that many ASUS motherboard support ECC RAM and > usually document that fact. Also many Gigabyte mother boards also > support ECC RAM, but don't document it. >From what I've been reading, both Asus and Gigabyte make good boards. I've seen reviews that complained about Gigabyte's firmware. http://www.xbitlabs.com/articles/mainboards/display/gigabyte-ga-990fxa-ud5_8.html I've also seen claims that the firmware bricked boards. Reviewers like Asus' firmware. I've seen complaints about Asus's support, and their website has significant problems. The firmware on my Tyan board is crap, and they refused to tell me how much power it needs. Which means I don't know how much other stuff I can run from the same P/S. It should have *way* more power than needed, but experience says "not enough", so I added a 2nd p/s for the disk farm and suddenly had fewer problems. The 2 p/s setup does allow powercycling the mainboard (because of the crappy firmware) without powercycling the disks. Given my experience with the Tyan board, and the apparent lack of FLOSS firmware for recent boards, I'm not real excited about the Gigabyte boards. Asus has a couple of AMD3+ boards that I could probably live with, if their website actually had things like lists of exactly which CPUs and memory are approved, and firmware updates, ... But there are also applications could use a lower wattage solution. Anyone have opinions on other mainboard companies? ECS? Asrock? MSI? Zotac? Others? Don: > +MCA: Bank 4, Status 0x944a400096080a13 > +MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 > +MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0 > +MCA: CPU 0 COR BUSLG Responder RD Memory > +MCA: Address 0x213e98b10 > +MCA: Bank 4, Status 0xd44a400096080a13 > +MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 > +MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0 > +MCA: CPU 0 COR OVER BUSLG Responder RD Memory > +MCA: Address 0x213e98b10 Chris: > MCA: Bank 1, Status 0x9400000000000151 > MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 > MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 > > MCA: Address 0x81cc0e9f0 > > Kind of freaky. I've never had this error on this board before. > On others tho. > > Try a search for MCA instead. Is there a decoder ring for those messages? I don't recall seeing messages like that, although I wasn't looking for them, and they don't leap out at you screaming ERROR! ERROR! Digital Unix had its problems, but at least the error messages were fairly clear. Something like "single bit memory error at address 0x12345..." A simple edit to sys/x86/x86/mca.c s/printf("UNCOR ");/printf("Uncorrectable ");/ s/printf("COR ");/printf("Correctable ");/ would make the messages at least slightly more meaningful to a viewer who isn't intimently(sp) familiar with the mca. Which most people aren't. I used to maintain code that dealt with a memory controller, and used a hardware circuit to inject errors into a memory board. But looking at those messages doesn't tell me anything beyond "Something happened, maybe I should grep through the source code for clues about those messages." Looking at the source doesn't add much, you'd need documentation for the mca. Which most people aren't going to have. And you'd need a lot of time to figure it out. # find /var/log | xargs bzgrep -i mca found no error messages. I seem to be buried under a mountain of boards that would be useful, if only they supported ECC. (and had firmware that actually works...) And I'm hardly the only one. So how do we fix this? Lobby AMD (and other chipmakers) to include ECC support in *all* memory controllers and sockets? It isn't like they have to redesign the logic for every chip, they only need one design per memory width. Lobby AMD to publish documentation on how to program the memory controller? Lobby the companies that make boards? From owner-freebsd-hardware@freebsd.org Thu Sep 17 05:25:05 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7F4F9CED58; Thu, 17 Sep 2015 05:25:05 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 72C2C1293; Thu, 17 Sep 2015 05:25:05 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id t8H5OuPj031505; Wed, 16 Sep 2015 22:25:00 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201509170525.t8H5OuPj031505@gw.catspoiler.org> Date: Wed, 16 Sep 2015 22:24:56 -0700 (PDT) From: Don Lewis Subject: Re: ECC support To: dieterbsd@gmail.com cc: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2015 05:25:05 -0000 On 16 Sep, Dieter BSD wrote: > Andriy: >>> Assuming that a board does have the necessary connections but >>> the firmware does not have ECC support, is there some reason that >>> ECC support could not be added to the OS instead of the firmware? >> >> Yes, there is. The memory controller is programmed by the code that >> runs from ROM and uses no RAM (or the CPU cache is used as the RAM). >> Once the real RAM gets used it's too late to reprogram the DRAM controller. > > Perhaps one of the several bootloader stages could get itelf into > CPU cache, program the memory controller, then load and execute the > next stage or the OS? > > Jim: >> Replacing the data in memory would require processing overhead >> that could accumulate and significantly diminish system performance. > > If it only replaces data when there is a correctable error, > and the errors are occasional soft errors, the effect on > performance should be minimal. If there is a hard error, > you would want to replace the defective memory before you get > an additional error and it becomes uncorrectable. > >> If the error occurred because of random events and isn't a defect in >> the memory, the memory address will be cleaned of the error when the >> data is overwritten with other data. > > If and when new data gets written to that location. If that location > contains info that never changes, such as kernel text, the bad bit will > never get fixed. > >> memory, without the extra complexity of the controller, is 12.5% more >> expensive. This <80><99>t a huge impact at 8GB, (<80><99>ll need >> another 1GB of RAM), but at 1024GB <80><99>ll need another 128GB, >> and that much ram still costs enough that your wallet <80><99>t be happy. > > It is 12.5% in both cases. How much does it cost to have undetected > errors in your data? How much does it cost when an Interstate > bridge collapses? How much does it cost when one of NASA's missions > fails? How much does it cost when your pharmacy receives a > prescription with an error in the dose? > >> the MRC setup on Intel and AMD is both complex and proprietary > > One wonders why the secrecy. AMD has been much more open than many > (most?) chipmakers. They even forced the ATI people to document > how to program their chips. I don't see a lot of companies popping up > making competing chips. #include standard joke: "How do you make a small > fortune in chipmaking? Start with a very large fortune." I can't > see what secret would be revealed by saying "set bit 7 of register 4 > to 1 to enable ECC". AMD documents a lot of this stuff in the BIOS and Kernel Developer's Guide (BKDG) for each CPU family. >> Intel Red Book > > So the secret books are red this week, yawn. I remember the nightmare > of the merced orange books and the brain damaged "features" the chips had. > Not recommended. I'm interested in chips that work correctly, hence the > interest in ECC and AMD. Looked for ARM boards with ECC but didn't find > any. Is the Sparc stuff any more reliable than it used to be? Other > arch choices? Supermicro has some Atom motherboards with ECC support. >> The MRC setup code is a binary blob for otherwise open source boot >> firmware such as Coreboot. > > So the libreboot people are forced to work on reverse engineering > these blobs? :-( > > Don: >> I don't think the current APU parts support ECC. > > According to wikipedia, socket FM2+ does not support ECC. :-( > Kabini has support for ECC. And Berlin, (and I assume Toronto) but > word is that Berlin and Toronto are basically dead. :-( > I think Carrizo and Turion are supposed to support ECC? There really > ought to be a list of which CPUs/APUs/sockets/boards do or do not > support ECC. Socket AM1 (Kabini) is supposed to support ECC, but motherboards with this socket that support ECC is another story. >> My experience is that many ASUS motherboard support ECC RAM and >> usually document that fact. Also many Gigabyte mother boards also >> support ECC RAM, but don't document it. > > From what I've been reading, both Asus and Gigabyte make good boards. > I've seen reviews that complained about Gigabyte's firmware. > http://www.xbitlabs.com/articles/mainboards/display/gigabyte-ga-990fxa-ud5_8.html > I've also seen claims that the firmware bricked boards. > Reviewers like Asus' firmware. I've seen complaints about Asus's support, > and their website has significant problems. I've got one of the Gigabyte GA_990FXA-UD5 boards. I actually like the BIOS. I'm not trying to overclock, but it does have lots of ECC-related knobs. I think you can even tell it to gang the two memory controller channels so that you can enable Chipkill. The latter isn't as good as it sounds because it really only works properly with DIMMs that us x4 DRAM chips, and there don't seem to be any unbuffered versions of those. The only unbuffered DDR3 DIMMS I've found use x8 DRAM chips. In that case if a multiple bits coming out of the chip are incorrect, the ECC checker has a just under 100% chance of detecting the error, but it is still uncorrectable. With x4 DRAM chips, Chipkill can correct the error even if all four bits from the DRAM are incorrect. Unfortunately, the only DDR3 DIMMs that use x4 chips are registered. Also, ganging the memory controllers does hurt performance. The things that I don't like about this board are the SATA connector placement (though it wasn't too bad in my specific application), and the combined keyboard/mouse PS/2 connector. I'm still using a PS/2 KVM switch here and I need motherboards with separate keyboard and mouse connectors, and the Y-adaptors don't seem to work. I'd love to upgrade to a newer KVM, but I'd want to also switch from VGA to DVI and KVMs that handle more than two dual-link DVI inputs are serious $$$. My newest motherboard is an Asus M5A97 R2.0. I bought it because it was inexpensive, had sufficient expansion potential, and had separate keyboard and mouse PS/2 connectors. I don't like the BIOS nearly as much. It's got lots of whizzy graphics, but it's hard to find where the various knobs are hidden. As I recall, ECC control is basically on/off. I also wasn't able to get WOL to work. If I power off the machine with shutdown -p, the LAN link light stays on, but sending a WOL packet doesn't start the machine. It might wake from sleep mode, but I didn't try that. > The firmware on my Tyan board is crap, and they refused to tell me > how much power it needs. Which means I don't know how much other stuff > I can run from the same P/S. It should have *way* more power than needed, > but experience says "not enough", so I added a 2nd p/s for the disk farm > and suddenly had fewer problems. The 2 p/s setup does allow powercycling > the mainboard (because of the crappy firmware) without powercycling the disks. > > Given my experience with the Tyan board, and the apparent lack of > FLOSS firmware for recent boards, I'm not real excited about the > Gigabyte boards. Asus has a couple of AMD3+ boards that I could > probably live with, if their website actually had things like > lists of exactly which CPUs and memory are approved, and firmware > updates, ... But there are also applications could use a lower wattage > solution. > > Anyone have opinions on other mainboard companies? ECS? Asrock? > MSI? Zotac? Others? If you are interested in something with low power consumption, take a look at the Supermicro C2000 series Atom boards: I'm seriously considering picking up an A1SRM-LN5F-2358. At first glance it seems pricey, especially considering the amount of CPU grunt, but I don't need much and I can use the extra LAN ports and possibly IPMI, so I don't have to add the cost of a CPU, an decent aftermarket cooler, extra NICs, or a video card. > Don: >> +MCA: Bank 4, Status 0x944a400096080a13 >> +MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 >> +MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0 >> +MCA: CPU 0 COR BUSLG Responder RD Memory >> +MCA: Address 0x213e98b10 >> +MCA: Bank 4, Status 0xd44a400096080a13 >> +MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 >> +MCA: Vendor "AuthenticAMD", ID 0x100f53, APIC ID 0 >> +MCA: CPU 0 COR OVER BUSLG Responder RD Memory >> +MCA: Address 0x213e98b10 > > Chris: >> MCA: Bank 1, Status 0x9400000000000151 >> MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000 >> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 >> >> MCA: Address 0x81cc0e9f0 >> >> Kind of freaky. I've never had this error on this board before. >> On others tho. >> >> Try a search for MCA instead. > > Is there a decoder ring for those messages? I don't recall seeing > messages like that, although I wasn't looking for them, and they > don't leap out at you screaming ERROR! ERROR! Digital Unix had its > problems, but at least the error messages were fairly clear. > Something like "single bit memory error at address 0x12345..." > A simple edit to sys/x86/x86/mca.c > s/printf("UNCOR ");/printf("Uncorrectable ");/ > s/printf("COR ");/printf("Correctable ");/ > would make the messages at least slightly more meaningful to a viewer > who isn't intimently(sp) familiar with the mca. Which most people aren't. > I used to maintain code that dealt with a memory controller, and > used a hardware circuit to inject errors into a memory board. > But looking at those messages doesn't tell me anything beyond > "Something happened, maybe I should grep through the source > code for clues about those messages." Looking at the source > doesn't add much, you'd need documentation for the mca. > Which most people aren't going to have. And you'd need a lot > of time to figure it out. I think jhb@ has some software that decodes this stuff. I'm not sure if it is in ports. From owner-freebsd-hardware@freebsd.org Thu Sep 17 23:05:51 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 312309CF520; Thu, 17 Sep 2015 23:05:51 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CACD21330; Thu, 17 Sep 2015 23:05:50 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from [194.32.164.24] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id t8HN5eaN097503; Fri, 18 Sep 2015 00:05:40 +0100 (BST) (envelope-from rb@gid.co.uk) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: ECC support From: Bob Bishop In-Reply-To: <93106DFF-9741-4515-B6E0-AC43C0AF2179@gid.co.uk> Date: Fri, 18 Sep 2015 00:05:35 +0100 Cc: Konstantin Belousov , Hackers freeBSD , Dieter BSD , Andriy Gapon , freebsd-hardware@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> <3678FC1E-DDC5-4FB2-B6E9-6FC90D0C988E@gid.co.uk> <93106DFF-9741-4515-B6E0-AC43C0AF2179@gid.co.uk> To: Igor Mozolevsky X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Sep 2015 23:05:51 -0000 Hi, > On 16 Sep 2015, at 13:04, Bob Bishop wrote: >=20 >=20 >> On 16 Sep 2015, at 12:52, Igor Mozolevsky = wrote: >>=20 >> [=E2=80=A6]The only thing I could think of is that the fab process = was(/is?) large >> enough to not worry about "nonsense" like cosmic rays &c (but then = I've not >> had much exposure to semi-conductor electronics theory since late = 90s). >> [=E2=80=A6] >=20 > Dunno. I=E2=80=99ll ask my tame semiconductor expert when I see him = tomorrow=E2=80=A6 The answer is quite interesting. A few process shrinks ago, alpha = particle effects were becoming worryingly intrusive and everybody was = concerned how much smaller features on ICs could actually be pushed. Then they did the next process shrink, and the effects disappeared = completely! A couple more shrinks later and they still haven=E2=80=99t = reappeared. Nobody understands why, but they don=E2=80=99t worry about = it any more. >> --=20 >> Igor M. >> _______________________________________________ >> freebsd-hardware@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-hardware >> To unsubscribe, send any mail to = "freebsd-hardware-unsubscribe@freebsd.org" >=20 > -- > Bob Bishop > rb@gid.co.uk >=20 >=20 >=20 >=20 From owner-freebsd-hardware@freebsd.org Fri Sep 18 02:49:27 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0117A9CFEB5; Fri, 18 Sep 2015 02:49:27 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: from mail-io0-x229.google.com (mail-io0-x229.google.com [IPv6:2607:f8b0:4001:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C58E61770; Fri, 18 Sep 2015 02:49:26 +0000 (UTC) (envelope-from dieterbsd@gmail.com) Received: by iofh134 with SMTP id h134so43120227iof.0; Thu, 17 Sep 2015 19:49:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=hs1AHxzYfflL/aTj78Xmr09kpKFFK3SZPuUtG7KSNCg=; b=o6vL7Vr+NQ/ttr3o9E3X14ZdUMn8TFD+btNWJrJU0NoNIA4/0KfKlg/Is6Ezw9ESOB vDHesGYrVbo9v8vA8y1fDothANNs50U3x0MV/KIeJdgOEgEePN0pcpBXkAv5gbAMu0gb IPSKskqq9X/uX5kLRmlNh1D86T5qhfemI7c+7IDlmfjjuP241rnuIyekx8BJGYDWGq5p ET88WMgC8IWv5AEMoiA3Hra1LLLptWmwNTyUffQ8/ymJ1oJ65dXZo+YtkTvZveyh0hDF /I5x0FhJ9Do1inctukTLPsrPukIs2lF/jvn1Zkuy6WKReoP4eJImmNmueT2RhCbC7YcY UtcQ== MIME-Version: 1.0 X-Received: by 10.107.149.129 with SMTP id x123mr10760727iod.68.1442544566156; Thu, 17 Sep 2015 19:49:26 -0700 (PDT) Received: by 10.64.2.132 with HTTP; Thu, 17 Sep 2015 19:49:26 -0700 (PDT) Date: Thu, 17 Sep 2015 19:49:26 -0700 Message-ID: Subject: Re: ECC support From: Dieter BSD To: freebsd-hardware@freebsd.org, freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Sep 2015 02:49:27 -0000 It appears that they are no longer selling the MSI 880GMA-E45. There used to be a web page with useful info about how well various boards worked with FreeBSD. My notes say it was http://www.freebsd.org/platforms/amd64/motherboards.html but that URL now gives: "Page not found. Oh no. :(" Don: > Supermicro has some Atom motherboards with ECC support. Thanks, but the company that designed the atom has a rather long history of design problems. The whole point of ECC is to avoid corrupting the right answer, not to avoid corrupting the wrong answer. They also steal technology from other companies, admit to it, and somehow usually get away with it. > Socket AM1 (Kabini) is supposed to support ECC, but motherboards with > this socket that support ECC is another story. Word is that Asus updated the AM1M-A manual to say that it supports ECC. http://www.planet3dnow.de/vbulletin/threads/421749-Geruecht-Zen-kommt-zuerst-als-Opteron?p=4988619&viewfull=1#post4988619 Google translation from pages 1 & 2: Onkel_Dithmeyer: I have Athlon 5350, Asus AM1M-A and ECC Ram drSeehas: Simply read the CPU registers D18F3xE8 and post the result here. I bet there comes 1F74F00h out. Onkel_Dithmeyer: The bet you've won! https://en.wikipedia.org/wiki/List_of_AMD_accelerated_processing_unit_microprocessors lists 5350 as "Kabini" So it sounds like Kabini doesn't support ECC after all? :-( Word is that old versions of memtest86 incorrectly assumed that ECC was available and can therefore give incorrect results. > Gigabyte GA_990FXA-UD5 > I actually like the BIOS. Will the firmware talk to a RS-232 console? The slot selection looks better than most: x16 x8 x8 x4 x4 x1 pci The x1 slot looks crowded, some of the newegg user reviews complained about that. Some of the x1 cards are small, if not, a riser should work. At least it *has* a 7th slot. More than 7 slots would be great, but such a board doesn't seem to exist. The hardware looks ok. Does FreeBSD have *good* support for all the hardware? (Other than the VIA VT6308P firewire which probably has the same problems as the 6307. If so, that uses up one slot for a firewire card. Looks like the firmware chips are soldered to the board, the Asus sabertooth has a socket. Newegg user review: "North and South will get hot." Word is that UD5 and UD7 have "vdroop issues due to lack of an LLC unit" Is that something to be concerned about if I'm not overclocking? Gigabyte's website is obscenely slow: 186 B/s :-( They know about it: "#1. Download speed may be varied in different region. If you have experienced lower download speed, please try other region download sites." At least I found the lists of approved CPUs and memory, and the firmware and manuals, unlike Asus. > The things that I don't like about this board are the SATA connector > placement Location looks okay, as long as they aren't too crowded or something. Sata cables can be reasonably long. 2 meters works for me, even with ports that don't claim to be e-sata. (e-sata is supposed to have slightly higher Voltages) Bad placement is putting a pata connector on the far left next to the i/o panel. Wimpy short cable barely reaches the drive. HVD SCSI was nice, the drives could be on the other side of the room, and often were. *grin* > I'd want to also switch from VGA to DVI Nothing against DVI, but isn't it in the process of going away? Displayport looks good, as long as you don't need analog. High resolution, Freesync, inexpensive adapters to DVI and HDMI. > combined keyboard/mouse PS/2 connector They get to save a bit of space on the i/o panel, and they get to sell you a Y cable. > the Y-adaptors don't seem to work I assume you've tried more than one adapter, and tried them without the KVM? As long as both the firmware and BSD will listen to USB, I guess I can go shopping for USB ones that I can stand. Presumably the USB ones are safe to hotplug, unlike ps/2. I have proper Unix style keyboards with control next to 'a' but both the firmware and FreeBSD think I have some brain-dead keyboard with control and cap-lock switched. Xmodmap fixes it in X, but I get a lot of typos in firmware and single user mode. :-( > Asus M5A97 R2.0 > It's got lots of whizzy graphics Firmware shouldn't have graphics at all. Firmware needs to be absolutely reliable. Graphics adds a lot of unneeded complexity. Graphics over RS-232 are rather slow. Word is that Asus firmware doesn't support an RS-232 console which is a major negative. YA Asus board with only 6 slots. Can't they count to at least 7? The best Asus board I've found is the Sabertooth 990FX R2.0. Again only 6 slots, but at least they have more lanes. And it has 4 extra sata ports. But again, Asus firmware is said to not talk RS-232. Sometimes things fly by too fast to read. With RS-232 you can scroll back, capture it in a disk file, etc. > If you are interested in something with low power consumption, I need a new firewall/gateway/proxy/uucp/mail/... machine, which shouldn't need massive cpu power, could be headless if RS-232 console works, and doesn't need massive amounts of i/o. A low power consumption machine should work great, if I can find one. Current machine is dying, so need a replacement asap. Same or similar machine with a good framebuffer (>= 4K, Freesync) and UVD could be X terminal / HTPC. Minimal GPU, if any, needed. But can't find a video card with a good framebuffer that doesn't also have some total overkill gpu that is expensive, lots of power&heat, uses up 2 slots and has a fan. Using up 2 slots is unacceptable when there are so few slots to start with. I'm sure it will be easy to find a replacement for the oddball board specific fan when it dies. Also need a faster box with more i/o. Take the i/o on the UD5, twice as many of everything would be about right. I don't see what IPMI can do that an RS-232 console can't, other than talk to the firmware with the machine mostly powered down, and powering the machine up and down. I don't need to do that. At least there is *some* feature I don't need! (besides a hyperthyroid gpu) Bob: > Then they did the next process shrink, and the effects disappeared > completely! [ ... ] Nobody understands why Sounds very bizzare. Figuring out why would make a good project for a phd student? From owner-freebsd-hardware@freebsd.org Fri Sep 18 09:14:14 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 58D5F9CF819 for ; Fri, 18 Sep 2015 09:14:14 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-io0-x22c.google.com (mail-io0-x22c.google.com [IPv6:2607:f8b0:4001:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 32E8F1138 for ; Fri, 18 Sep 2015 09:14:14 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: by ioiz6 with SMTP id z6so49875466ioi.2 for ; Fri, 18 Sep 2015 02:14:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=hB5nx8EoolYJ3JhESngTeDj1tgr41w22VrkCkx5OjQ4=; b=G6dBISxgbStxOiLdgmZsKdYhSGWeGBoIo/3oIlWKBkRLYa27TqOUJGDHpgZjB5RQMo conw8zus+U8nCsvTE4hED8nM99qRikMTpKcmzMQVUFp9GNKu5f58ntiQaUuyC+mUGoHa dSTQgK/HjcI9DhTTvJHphJClY7VfzIEx8loolixBeTuWFC/34gEr7J/vk2y5rww8K6lR oqDRuT7rZLr+53fvGNrtkrIUfoCWAC6Z21EVD6xy8kNXwZvHItfZSBdZ5gRcOO7ksSNq 1Cgb1p2m/ofUP4YgDu8E+WWAZcoZC8qHw5I3BO/8e33Suauthn5zMM+i2bbmlp3kQ2rU 6ClQ== MIME-Version: 1.0 X-Received: by 10.107.160.143 with SMTP id j137mr12990693ioe.13.1442567653629; Fri, 18 Sep 2015 02:14:13 -0700 (PDT) Received: by 10.107.200.211 with HTTP; Fri, 18 Sep 2015 02:14:13 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Sep 2015 10:14:13 +0100 Message-ID: Subject: Re: ECC support From: Tom Evans To: Dieter BSD Cc: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Sep 2015 09:14:14 -0000 On Fri, Sep 18, 2015 at 3:49 AM, Dieter BSD wrote: > Current machine is dying, > so need a replacement asap. Same or similar machine with a good > framebuffer (>= 4K, Freesync) and UVD could be X terminal / HTPC. > Minimal GPU, if any, needed. But can't find a video card with a good > framebuffer that doesn't also have some total overkill gpu that > is expensive, lots of power&heat, uses up 2 slots and has a fan. > Using up 2 slots is unacceptable when there are so few slots to start > with. I'm sure it will be easy to find a replacement for the oddball > board specific fan when it dies. nVidia GT 720 Silent. Single slot, no fan, 4k H264 decoding, enough GL to do anything you might want to do on a desktop. Cheers Tom From owner-freebsd-hardware@freebsd.org Sat Sep 19 06:11:04 2015 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88E379CE3A5; Sat, 19 Sep 2015 06:11:04 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (vps.rulingia.com [103.243.244.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps.rulingia.com", Issuer "CAcert Class 3 Root" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1B8A012A3; Sat, 19 Sep 2015 06:11:03 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from server.rulingia.com (c220-239-242-83.belrs5.nsw.optusnet.com.au [220.239.242.83]) by vps.rulingia.com (8.15.2/8.15.2) with ESMTPS id t8J6AfoX040742 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 19 Sep 2015 16:10:48 +1000 (AEST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.15.2/8.15.2) with ESMTPS id t8J6AZLH023397 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 19 Sep 2015 16:10:35 +1000 (AEST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.15.2/8.15.2/Submit) id t8J6AWdo023396; Sat, 19 Sep 2015 16:10:32 +1000 (AEST) (envelope-from peter) Date: Sat, 19 Sep 2015 16:10:32 +1000 From: Peter Jeremy To: Bob Bishop Cc: Hackers freeBSD , freebsd-hardware@freebsd.org Subject: Re: ECC support Message-ID: <20150919061032.GB20691@server.rulingia.com> References: <55F88A18.6090504@FreeBSD.org> <20150916035904.GE67105@kib.kiev.ua> <93871ADA-EDA3-481C-9959-1D371AB44479@gid.co.uk> <3678FC1E-DDC5-4FB2-B6E9-6FC90D0C988E@gid.co.uk> <93106DFF-9741-4515-B6E0-AC43C0AF2179@gid.co.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="0OAP2g/MAC+5xKAE" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender succeeded STARTTLS authentication, not delayed by milter-greylist-4.4.3 (vps.rulingia.com [103.243.244.15]); Sat, 19 Sep 2015 16:10:51 +1000 (AEST) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Sep 2015 06:11:04 -0000 --0OAP2g/MAC+5xKAE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2015-Sep-18 00:05:35 +0100, Bob Bishop wrote: >The answer is quite interesting. A few process shrinks ago, alpha particle= effects were becoming worryingly intrusive and everybody was concerned how= much smaller features on ICs could actually be pushed. I recall when the 64kb DRAMs first appeared and alpha particle hits were suddenly a critical issue - the doomsayers were claiming DRAM cells couldn't be shrunk any further. I suspect moving from CERDIP to plastic packages and adding suitable coatings to stop alpha particles helped. In any case, things have shrunk by about 5 orders of magnitude and no-one mentions alpha particles any more. --=20 Peter Jeremy --0OAP2g/MAC+5xKAE Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJV/PxXXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs0ZE4QAIXRbIB03xeKKbjsiWNy/hkE Zr+9ZbKwZiuPq8cdxiIgjiYnZ/tt3kzhpJGRtbV5XkoX0mqmGedMG2alyVSrgASb m4egXlK1PLQzC6Wap2aeOfAE2kPd/YwCM6eSFBwjJnzYe2moJbjgensVzUlUOhGt bzATqtLkcg1w1UmH3OqSsUnCJMIr9W6cKSs3WhcWL242Tcs1nbrzg4WY8iM9VAOu MihibUbXtDqRpbg9gkZ74qmRydJdU2gESR4kPyQ5yfl69Xveo5KPZPpAkPXXGKqD TlWTdepamuoiyvRaZyAnLt0B5eIkndLBytFgH4XXRxnFDivMGfKbIHGY4wY6IQAE 52er5a703yee9UbmdtmNq257Q1TZQDyz+ejaft5qRj6/Utwe9mC9xXKkv+TMpBSU NthSfgbZ2fGYncohOfzLr/4ParrS2o9hDmrDqwrY0ywn6MTqNFlmlI3j7lTguEJ7 Dj8xrDB1EnXhs5AUUlcFKwDd1JNNESssXiPihVJiScGnbRWhK40as6Vx7qDQFIH6 kzePIK8PF1fahzxJIFSwZbQkTkUOGoU7fMVZN8CyaNiB0F9mqApvmbSPLabbvwJL UKESruon4iq7jv2LaST8D30y8b63ASnQ2xtnIj6MMyz/60/61mZbE5PLdnqbcx3U IZdCFi17hghOUtsw7FTa =GUNY -----END PGP SIGNATURE----- --0OAP2g/MAC+5xKAE--