Date: Tue, 27 Jun 2006 01:07:19 +0200 From: "M.Hirsch" <webmaster@hirsch.it> To: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... Message-ID: <44A068A7.3090403@hirsch.it> In-Reply-To: <20060627014335.E87535@atlantis.atlantis.dp.ua> References: <E1FuYsL-000HT3-H2@dilbert.firstcallgroup.co.uk> <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org> <20060626140333.M38418@fledge.watson.org> <20060626235355.Q95667@atlantis.atlantis.dp.ua> <44A04FD2.1030001@hirsch.it> <20060627011512.N95667@atlantis.atlantis.dp.ua> <44A06233.1090704@hirsch.it> <20060627014335.E87535@atlantis.atlantis.dp.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Dmitry Pryanishnikov schrieb: > When you wrote "ECC is a way to mask broken hardware", you were plain > wrong. > If you're using hardware w/o ECC, it just can't tell whether error > present > or absent. So ECC _is_ the way to detect (not mask) broken hardware. > Ok, thanks. I think I understand the meaning of ECC now. So, unlike my supplier claims, ECC is not supposed to help against hardware failures. But it is the way to detect them, right? > If you want ECC corrector to raise NMI on corrected error (as well as > uncorrectable), just set approproate bit in control register - every > Intel's ECC-capable chipset allows it. But if we're speaking about > production environment, such behaviour (abnormal termination on > _corrected_ > error) is unacceptable. "abnormal termination" is not only acceptable for me, it is what I am looking for. Make the node crash completely, so one of the others can take over its task(s). > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an > effort than "just" akquiring a new box... > > I don't see connection between this sentence and ECC (which is > hardware option). What I wanted to say: Looking for errors in the logs is only a few seconds. Finding out what caused them, is hours... Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if you regard it from the business side. ... I rather rent 100 boxes to do the task of ten, than employ 100 admins to find the "real" problem. Thanks, Dmitry. I think I know what to look for now... M.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44A068A7.3090403>