From owner-freebsd-stable@FreeBSD.ORG Mon Jun 26 22:57:29 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AFBED16A404 for ; Mon, 26 Jun 2006 22:57:29 +0000 (UTC) (envelope-from dmitry@atlantis.dp.ua) Received: from postman.atlantis.dp.ua (postman.atlantis.dp.ua [193.108.47.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9F1F443D66 for ; Mon, 26 Jun 2006 22:57:26 +0000 (GMT) (envelope-from dmitry@atlantis.dp.ua) Received: from smtp.atlantis.dp.ua (smtp.atlantis.dp.ua [193.108.46.231]) by postman.atlantis.dp.ua (8.13.1/8.13.1) with ESMTP id k5QMvHoe098868; Tue, 27 Jun 2006 01:57:17 +0300 (EEST) (envelope-from dmitry@atlantis.dp.ua) Date: Tue, 27 Jun 2006 01:57:17 +0300 (EEST) From: Dmitry Pryanishnikov To: "M.Hirsch" In-Reply-To: <44A06233.1090704@hirsch.it> Message-ID: <20060627014335.E87535@atlantis.atlantis.dp.ua> References: <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org> <20060626140333.M38418@fledge.watson.org> <20060626235355.Q95667@atlantis.atlantis.dp.ua> <44A04FD2.1030001@hirsch.it> <20060627011512.N95667@atlantis.atlantis.dp.ua> <44A06233.1090704@hirsch.it> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 22:57:29 -0000 On Tue, 27 Jun 2006, M.Hirsch wrote: >> On Mon, 26 Jun 2006, M.Hirsch wrote: >>> ECC is a way to mask broken hardware. I rather have my hardware fail >>> directly when it does first, so I can replace it _immediately_ >> >> >> You got it backwards. If your data has any value to you, then you don't >> > Nope, I am right on track. > I do not want to lose any data. So I'd prefer a ECC error to raise a panic so > I can replace the hardware ASAP. When you wrote "ECC is a way to mask broken hardware", you were plain wrong. If you're using hardware w/o ECC, it just can't tell whether error present or absent. So ECC _is_ the way to detect (not mask) broken hardware. If you want ECC corrector to raise NMI on corrected error (as well as uncorrectable), just set approproate bit in control register - every Intel's ECC-capable chipset allows it. But if we're speaking about production environment, such behaviour (abnormal termination on _corrected_ error) is unacceptable. > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an effort > than "just" akquiring a new box... I don't see connection between this sentence and ECC (which is hardware option). > Does the standard fs, UFS2, do "extra sanity checks", then? Ditto. And don't forget that _every_ data sector on HDD _is_ checked with CRC. As well as ATA data transfers in UDMA modes. As well as data in CPU cache. Extra check gives extra reliability. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry@atlantis.dp.ua nic-hdl: LYNX-RIPE