From owner-freebsd-stable@FreeBSD.ORG  Fri Nov  9 06:52:07 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 681F416A41B
	for <freebsd-stable@freebsd.org>; Fri,  9 Nov 2007 06:52:07 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3])
	by mx1.freebsd.org (Postfix) with ESMTP id 5086613C4B3
	for <freebsd-stable@freebsd.org>; Fri,  9 Nov 2007 06:52:07 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: by mx01.sc1.parodius.com (Postfix, from userid 1000)
	id 162411CC079; Thu,  8 Nov 2007 22:52:01 -0800 (PST)
Date: Thu, 8 Nov 2007 22:52:01 -0800
From: Jeremy Chadwick <koitsu@FreeBSD.org>
To: David Naylor <blackdragon@highveldmail.co.za>
Message-ID: <20071109065201.GA47328@eos.sc1.parodius.com>
References: <b53f6f940711081240q7100a08djae76b560cddfed6f@mail.gmail.com>
	<20071108212921.GA34721@eos.sc1.parodius.com>
	<b53f6f940711082229l67f9a77ch497ee6270490249a@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <b53f6f940711082229l67f9a77ch497ee6270490249a@mail.gmail.com>
User-Agent: Mutt/1.5.16 (2007-06-09)
Cc: freebsd-stable@freebsd.org
Subject: Re: Harddisk failure causes system crash, please help
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Nov 2007 06:52:07 -0000

On Fri, Nov 09, 2007 at 08:29:52AM +0200, David Naylor wrote:
> I remember seeing a timeout of sorts once, it was while doing a dd.  I
> have done further dd tests and only the one slice causes this problem:
> ad0e

Okay, so it's probably that area of the disk which has some problem...

> > broken somehow), but all your problems seem to indicate issues with the
> > disk.
> 
> Do you know of any test I can run using Windows (BartPE) that could
> possibly diagnose the problem (or at least confirm it is not FreeBSD's
> fault for rebooting and just hardware error)?

There's a free utility called HDTune which has a sector scanner which
explicitly looks for bad sectors ("Error Scan").  I would *uncheck* the
Quick Scan box.  If nothing shows up there, I'd check your Event Log to
see if there's any reports of disk/controller issues.

You might also be able to use that utility to get SMART stats for the
drive, although smartctl -a /dev/ad0 should suffice too.  The disk
itself may have been relocating data onto working sectors all this time;
usually SMART will show that (but not always -- depends on how the disk
manufacturer did their firmware).

But keep in mind Windows is one of the most silent OSes I've ever seen
when it comes to disk errors.  A disk can be failing miserably and it'll
never bother to report ATA timeouts or anything else in the event log.
The easiest ones to detect are mechanical failures, since all disk I/O
will stop ("why is my machine hanging?!?"), and if you're "lucky",
you'll hear the drive making scary noises.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |