From owner-freebsd-stable@FreeBSD.ORG  Mon Jun 26 23:44:41 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6BA1A16A47B
	for <freebsd-stable@freebsd.org>; Mon, 26 Jun 2006 23:44:41 +0000 (UTC)
	(envelope-from dmitry@atlantis.dp.ua)
Received: from postman.atlantis.dp.ua (postman.atlantis.dp.ua [193.108.47.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E8BAC442A1
	for <freebsd-stable@freebsd.org>; Mon, 26 Jun 2006 23:21:36 +0000 (GMT)
	(envelope-from dmitry@atlantis.dp.ua)
Received: from smtp.atlantis.dp.ua (smtp.atlantis.dp.ua [193.108.46.231])
	by postman.atlantis.dp.ua (8.13.1/8.13.1) with ESMTP id k5QNLSlY017832; 
	Tue, 27 Jun 2006 02:21:28 +0300 (EEST)
	(envelope-from dmitry@atlantis.dp.ua)
Date: Tue, 27 Jun 2006 02:21:28 +0300 (EEST)
From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
To: "M.Hirsch" <webmaster@hirsch.it>
In-Reply-To: <44A068A7.3090403@hirsch.it>
Message-ID: <20060627020819.L3403@atlantis.atlantis.dp.ua>
References: <E1FuYsL-000HT3-H2@dilbert.firstcallgroup.co.uk>
	<20060626100949.G24406@fledge.watson.org>
	<20060626081029.L1114@ganymede.hub.org>
	<20060626140333.M38418@fledge.watson.org>
	<20060626235355.Q95667@atlantis.atlantis.dp.ua>
	<44A04FD2.1030001@hirsch.it>
	<20060627011512.N95667@atlantis.atlantis.dp.ua>
	<44A06233.1090704@hirsch.it>
	<20060627014335.E87535@atlantis.atlantis.dp.ua>
	<44A068A7.3090403@hirsch.it>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-stable@freebsd.org
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jun 2006 23:44:41 -0000

On Tue, 27 Jun 2006, M.Hirsch wrote:
>> If you're using hardware w/o ECC, it just can't tell whether error present
>> or absent. So ECC _is_ the way to detect (not mask) broken hardware.
>> 
> Ok, thanks. I think I understand the meaning of ECC now.
> So, unlike my supplier claims, ECC is not supposed to help against hardware 
> failures.
> But it is the way to detect them, right?

  ECC stands for Error Checking and Correction. It's a hardware feature,
and its primary task is Checking (that is, detection) of errors. It just 
happens that number of additional bits which carry checking code is sufficient 
to correct _any_ _single-bit_ data error (not mask it, but really correct), 
and to detect any double-bit and most of several-bit errors (w/o 
correction).

>> Intel's ECC-capable chipset allows it. But if we're speaking about
>> production environment, such behaviour (abnormal termination on _corrected_
>> error) is unacceptable.
>
> "abnormal termination" is not only acceptable for me, it is what I am looking 
> for.
> Make the node crash completely, so one of the others can take over its 
> task(s).

  Again, when single-bit correction has happened, it's not fake, the result is 
actually correct. Why panic the machine immediately if all data OK?

Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry@atlantis.dp.ua
nic-hdl: LYNX-RIPE