From owner-freebsd-stable@FreeBSD.ORG  Mon Jun 26 22:57:29 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AFBED16A404
	for <freebsd-stable@freebsd.org>; Mon, 26 Jun 2006 22:57:29 +0000 (UTC)
	(envelope-from dmitry@atlantis.dp.ua)
Received: from postman.atlantis.dp.ua (postman.atlantis.dp.ua [193.108.47.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9F1F443D66
	for <freebsd-stable@freebsd.org>; Mon, 26 Jun 2006 22:57:26 +0000 (GMT)
	(envelope-from dmitry@atlantis.dp.ua)
Received: from smtp.atlantis.dp.ua (smtp.atlantis.dp.ua [193.108.46.231])
	by postman.atlantis.dp.ua (8.13.1/8.13.1) with ESMTP id k5QMvHoe098868; 
	Tue, 27 Jun 2006 01:57:17 +0300 (EEST)
	(envelope-from dmitry@atlantis.dp.ua)
Date: Tue, 27 Jun 2006 01:57:17 +0300 (EEST)
From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>
To: "M.Hirsch" <M.Hirsch@hirsch.it>
In-Reply-To: <44A06233.1090704@hirsch.it>
Message-ID: <20060627014335.E87535@atlantis.atlantis.dp.ua>
References: <E1FuYsL-000HT3-H2@dilbert.firstcallgroup.co.uk>
	<20060626100949.G24406@fledge.watson.org>
	<20060626081029.L1114@ganymede.hub.org>
	<20060626140333.M38418@fledge.watson.org>
	<20060626235355.Q95667@atlantis.atlantis.dp.ua>
	<44A04FD2.1030001@hirsch.it>
	<20060627011512.N95667@atlantis.atlantis.dp.ua>
	<44A06233.1090704@hirsch.it>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-stable@freebsd.org
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jun 2006 22:57:29 -0000

On Tue, 27 Jun 2006, M.Hirsch wrote:
>> On Mon, 26 Jun 2006, M.Hirsch wrote:
>>> ECC is a way to mask broken hardware. I rather have my hardware fail 
>>> directly when it does first, so I can replace it _immediately_
>> 
>>
>>  You got it backwards. If your data has any value to you, then you don't 
>> 
> Nope, I am right on track.
> I do not want to lose any data. So I'd prefer a ECC error to raise a panic so 
> I can replace the hardware ASAP.

  When you wrote "ECC is a way to mask broken hardware", you were plain wrong.
If you're using hardware w/o ECC, it just can't tell whether error present
or absent. So ECC _is_ the way to detect (not mask) broken hardware.

  If you want ECC corrector to raise NMI on corrected error (as well as 
uncorrectable), just set approproate bit in control register - every
Intel's ECC-capable chipset allows it. But if we're speaking about
production environment, such behaviour (abnormal termination on _corrected_
error) is unacceptable.

> Don't get me wrong, but tracking bugs in FreeBSD is quite more of an effort 
> than "just" akquiring a new box...

  I don't see connection between this sentence and ECC (which is hardware 
option).

> Does the standard fs, UFS2, do "extra sanity checks", then?

  Ditto. And don't forget that _every_ data sector on HDD _is_ checked
with CRC. As well as ATA data transfers in UDMA modes. As well as data
in CPU cache. Extra check gives extra reliability.

Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry@atlantis.dp.ua
nic-hdl: LYNX-RIPE