From owner-freebsd-questions@FreeBSD.ORG Wed Oct 8 07:01:45 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A0711065687 for ; Wed, 8 Oct 2008 07:01:45 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA09.westchester.pa.mail.comcast.net (qmta09.westchester.pa.mail.comcast.net [76.96.62.96]) by mx1.freebsd.org (Postfix) with ESMTP id B615D8FC14 for ; Wed, 8 Oct 2008 07:01:44 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA05.westchester.pa.mail.comcast.net ([76.96.62.43]) by QMTA09.westchester.pa.mail.comcast.net with comcast id Q6xo1a00Q0vyq2s5971kf6; Wed, 08 Oct 2008 07:01:44 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA05.westchester.pa.mail.comcast.net with comcast id Q71j1a0022P6wsM3R71jlQ; Wed, 08 Oct 2008 07:01:44 +0000 X-Authority-Analysis: v=1.0 c=1 a=QycZ5dHgAAAA:8 a=GL_7_xdEuhnlVGxA3LkA:9 a=DQpl7b44-JIDLbYEDTwA:7 a=WzUz3W_b0k-mXmn8yluFboWXBpcA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id D0971C941A; Wed, 8 Oct 2008 00:01:42 -0700 (PDT) Date: Wed, 8 Oct 2008 00:01:42 -0700 From: Jeremy Chadwick To: Mister Olli Message-ID: <20081008070142.GA69250@icarus.home.lan> References: <1223273047.23248.25.camel@phoenix.blechhirn.net> <20081006171809.GA26368@icarus.home.lan> <20081006174502.GB71024@gizmo.acns.msu.edu> <1223447412.5896.9.camel@phoenix.blechhirn.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1223447412.5896.9.camel@phoenix.blechhirn.net> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: Jerry McAllister , freebsd-questions@freebsd.org Subject: Re: analyzing freebsd core dumps X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Oct 2008 07:01:45 -0000 On Wed, Oct 08, 2008 at 08:30:12AM +0200, Mister Olli wrote: > hi... > > thanks for the feedback on this topic. > the first step to clean the machine and check all connectors has been > done yesterday. I hope that this will fix the problem, and that it's not > some kind of hardware failure. > > to run tests with memtest is quite a problem, since the machine has high > availability requirements. to take it off for nearly one hour for > cleaning and checking during daily work of our company was a pain. > 6 hours or more of RAM tests is not possible. > > is there some other way to detect hardware failure with less time > consuming tool/ process? Yes -- you start replacing hardware one piece at a time until the problem goes away. That will also require downtime, quite regularly, and waste money. So to answer your question: no, there is no way to easily track down the source of a hardware failure, or determine what piece has failed (if any). This is completely 100% normal when it comes to computers, especially x86 PCs. Anyone who has worked in the IT field for many years knows this. :-) I'm amazed that in this day and age, any company would have a single host as a single-point-of-failure. You can't take this machine down for troubleshooting, but you have no failover available. The company has put themselves into this situation. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |