Date: Tue, 27 Jun 2006 01:57:17 +0300 (EEST) From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> To: "M.Hirsch" <M.Hirsch@hirsch.it> Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... Message-ID: <20060627014335.E87535@atlantis.atlantis.dp.ua> In-Reply-To: <44A06233.1090704@hirsch.it> References: <E1FuYsL-000HT3-H2@dilbert.firstcallgroup.co.uk> <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org> <20060626140333.M38418@fledge.watson.org> <20060626235355.Q95667@atlantis.atlantis.dp.ua> <44A04FD2.1030001@hirsch.it> <20060627011512.N95667@atlantis.atlantis.dp.ua> <44A06233.1090704@hirsch.it>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 27 Jun 2006, M.Hirsch wrote: >> On Mon, 26 Jun 2006, M.Hirsch wrote: >>> ECC is a way to mask broken hardware. I rather have my hardware fail >>> directly when it does first, so I can replace it _immediately_ >> >> >> You got it backwards. If your data has any value to you, then you don't >> > Nope, I am right on track. > I do not want to lose any data. So I'd prefer a ECC error to raise a panic so > I can replace the hardware ASAP. When you wrote "ECC is a way to mask broken hardware", you were plain wrong. If you're using hardware w/o ECC, it just can't tell whether error present or absent. So ECC _is_ the way to detect (not mask) broken hardware. If you want ECC corrector to raise NMI on corrected error (as well as uncorrectable), just set approproate bit in control register - every Intel's ECC-capable chipset allows it. But if we're speaking about production environment, such behaviour (abnormal termination on _corrected_ error) is unacceptable. > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an effort > than "just" akquiring a new box... I don't see connection between this sentence and ECC (which is hardware option). > Does the standard fs, UFS2, do "extra sanity checks", then? Ditto. And don't forget that _every_ data sector on HDD _is_ checked with CRC. As well as ATA data transfers in UDMA modes. As well as data in CPU cache. Extra check gives extra reliability. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry@atlantis.dp.ua nic-hdl: LYNX-RIPE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060627014335.E87535>