From owner-freebsd-stable@FreeBSD.ORG Tue Jun 27 06:41:26 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3605116A40E for ; Tue, 27 Jun 2006 06:41:26 +0000 (UTC) (envelope-from dmitry@atlantis.dp.ua) Received: from postman.atlantis.dp.ua (postman.atlantis.dp.ua [193.108.47.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7D6D843D53 for ; Tue, 27 Jun 2006 06:41:23 +0000 (GMT) (envelope-from dmitry@atlantis.dp.ua) Received: from smtp.atlantis.dp.ua (smtp.atlantis.dp.ua [193.108.46.231]) by postman.atlantis.dp.ua (8.13.1/8.13.1) with ESMTP id k5R6fE0S052054; Tue, 27 Jun 2006 09:41:14 +0300 (EEST) (envelope-from dmitry@atlantis.dp.ua) Date: Tue, 27 Jun 2006 09:41:14 +0300 (EEST) From: Dmitry Pryanishnikov To: "M.Hirsch" In-Reply-To: <44A06FFB.40104@hirsch.it> Message-ID: <20060627092159.T35218@atlantis.atlantis.dp.ua> References: <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org> <20060626140333.M38418@fledge.watson.org> <20060626235355.Q95667@atlantis.atlantis.dp.ua> <44A04FD2.1030001@hirsch.it> <20060627011512.N95667@atlantis.atlantis.dp.ua> <44A06233.1090704@hirsch.it> <20060627014335.E87535@atlantis.atlantis.dp.ua> <44A068A7.3090403@hirsch.it> <20060627020819.L3403@atlantis.atlantis.dp.ua> <44A06FFB.40104@hirsch.it> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2006 06:41:26 -0000 On Tue, 27 Jun 2006, M.Hirsch wrote: > Yes, the result may be correct. If you're talking about single-bit error, you aren't quite correct. It isn't "may be correct", it's _definitely_ correct (in mathematical sense; that it, correcting code proves that we have one and only one error in bit number N, hardware just inverts this bit, and result _is_ OK). > 'Do not take "ECC" for "equals additional security"' Not security. ECC adds reliability. > But, in FreeBSD, the function is a result of hardware-level correction. > Something that only kicks in in _real_ _serious_ situations. > I just would like you (not specifically you, Dmitry) to aknowledge that > broken RAM is worth a "panic" in "standard situations"- if I may call it like > that. The predominant RAM errors are exactly the single-bit ones. Moreover, usually they _don't_ reappear again at the same cell. They (for example) may be caused by the spontaneous alpha-radioactivity (brought into the your computer by the usual dust) and as such don't indicate that RAM module must be replaced. They just break your data in unpredictable way, not your hardware. They (single-bit errors) are the main reason why ECC-capable memory and chipset must be used in the computer which calculates/transfers actually valuable data. > If the RAM is broken for some bits, chances are great that there are more > following soon. If multiple-bit error happens, then yes, it can be the sign of actual hardware fault. And yes, ECC logic will report this event instantly. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry@atlantis.dp.ua nic-hdl: LYNX-RIPE