From owner-freebsd-stable@FreeBSD.ORG Mon Jun 26 23:07:39 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2007416A409 for ; Mon, 26 Jun 2006 23:07:39 +0000 (UTC) (envelope-from webmaster@hirsch.it) Received: from server1.hirsch.it (server1.hirsch.it [213.239.214.99]) by mx1.FreeBSD.org (Postfix) with ESMTP id A545C43D45 for ; Mon, 26 Jun 2006 23:07:25 +0000 (GMT) (envelope-from webmaster@hirsch.it) Received: from hsi-kbw-085-216-025-126.hsi.kabelbw.de ([85.216.25.126] helo=[192.168.101.121]) by server1.hirsch.it with esmtpa (Exim 4.50) id 1Fv0Ap-0007g6-US; Tue, 27 Jun 2006 01:07:24 +0200 Message-ID: <44A068A7.3090403@hirsch.it> Date: Tue, 27 Jun 2006 01:07:19 +0200 From: "M.Hirsch" User-Agent: Mozilla Thunderbird 1.0.6 (Macintosh/20050716) X-Accept-Language: de-DE, de, en-us, en MIME-Version: 1.0 To: Dmitry Pryanishnikov References: <20060626100949.G24406@fledge.watson.org> <20060626081029.L1114@ganymede.hub.org> <20060626140333.M38418@fledge.watson.org> <20060626235355.Q95667@atlantis.atlantis.dp.ua> <44A04FD2.1030001@hirsch.it> <20060627011512.N95667@atlantis.atlantis.dp.ua> <44A06233.1090704@hirsch.it> <20060627014335.E87535@atlantis.atlantis.dp.ua> In-Reply-To: <20060627014335.E87535@atlantis.atlantis.dp.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Spam-Report: Spam detection software, running on the system "server1.hirsch.it", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Dmitry Pryanishnikov schrieb: > When you wrote "ECC is a way to mask broken hardware", you were plain > wrong. > If you're using hardware w/o ECC, it just can't tell whether error > present > or absent. So ECC _is_ the way to detect (not mask) broken hardware. > Ok, thanks. I think I understand the meaning of ECC now. So, unlike my supplier claims, ECC is not supposed to help against hardware failures. But it is the way to detect them, right? [...] Content analysis details: (0.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 23:07:39 -0000 Dmitry Pryanishnikov schrieb: > When you wrote "ECC is a way to mask broken hardware", you were plain > wrong. > If you're using hardware w/o ECC, it just can't tell whether error > present > or absent. So ECC _is_ the way to detect (not mask) broken hardware. > Ok, thanks. I think I understand the meaning of ECC now. So, unlike my supplier claims, ECC is not supposed to help against hardware failures. But it is the way to detect them, right? > If you want ECC corrector to raise NMI on corrected error (as well as > uncorrectable), just set approproate bit in control register - every > Intel's ECC-capable chipset allows it. But if we're speaking about > production environment, such behaviour (abnormal termination on > _corrected_ > error) is unacceptable. "abnormal termination" is not only acceptable for me, it is what I am looking for. Make the node crash completely, so one of the others can take over its task(s). > Don't get me wrong, but tracking bugs in FreeBSD is quite more of an > effort than "just" akquiring a new box... > > I don't see connection between this sentence and ECC (which is > hardware option). What I wanted to say: Looking for errors in the logs is only a few seconds. Finding out what caused them, is hours... Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if you regard it from the business side. ... I rather rent 100 boxes to do the task of ten, than employ 100 admins to find the "real" problem. Thanks, Dmitry. I think I know what to look for now... M.