From owner-freebsd-stable@FreeBSD.ORG Fri Nov 9 06:52:07 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 681F416A41B for ; Fri, 9 Nov 2007 06:52:07 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id 5086613C4B3 for ; Fri, 9 Nov 2007 06:52:07 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 162411CC079; Thu, 8 Nov 2007 22:52:01 -0800 (PST) Date: Thu, 8 Nov 2007 22:52:01 -0800 From: Jeremy Chadwick To: David Naylor Message-ID: <20071109065201.GA47328@eos.sc1.parodius.com> References: <20071108212921.GA34721@eos.sc1.parodius.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-stable@freebsd.org Subject: Re: Harddisk failure causes system crash, please help X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Nov 2007 06:52:07 -0000 On Fri, Nov 09, 2007 at 08:29:52AM +0200, David Naylor wrote: > I remember seeing a timeout of sorts once, it was while doing a dd. I > have done further dd tests and only the one slice causes this problem: > ad0e Okay, so it's probably that area of the disk which has some problem... > > broken somehow), but all your problems seem to indicate issues with the > > disk. > > Do you know of any test I can run using Windows (BartPE) that could > possibly diagnose the problem (or at least confirm it is not FreeBSD's > fault for rebooting and just hardware error)? There's a free utility called HDTune which has a sector scanner which explicitly looks for bad sectors ("Error Scan"). I would *uncheck* the Quick Scan box. If nothing shows up there, I'd check your Event Log to see if there's any reports of disk/controller issues. You might also be able to use that utility to get SMART stats for the drive, although smartctl -a /dev/ad0 should suffice too. The disk itself may have been relocating data onto working sectors all this time; usually SMART will show that (but not always -- depends on how the disk manufacturer did their firmware). But keep in mind Windows is one of the most silent OSes I've ever seen when it comes to disk errors. A disk can be failing miserably and it'll never bother to report ATA timeouts or anything else in the event log. The easiest ones to detect are mechanical failures, since all disk I/O will stop ("why is my machine hanging?!?"), and if you're "lucky", you'll hear the drive making scary noises. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |