From owner-freebsd-questions@FreeBSD.ORG  Sun Feb 19 14:47:09 2006
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 62B2616A420
	for <freebsd-questions@freebsd.org>;
	Sun, 19 Feb 2006 14:47:09 +0000 (GMT) (envelope-from cswiger@mac.com)
Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.86])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0396043D45
	for <freebsd-questions@freebsd.org>;
	Sun, 19 Feb 2006 14:47:08 +0000 (GMT) (envelope-from cswiger@mac.com)
Received: from mac.com (smtpin01-en2 [10.13.10.146])
	by smtpout.mac.com (Xserve/8.12.11/smtpout04/MantshX 4.0) with ESMTP id
	k1JEl81E004698; Sun, 19 Feb 2006 06:47:08 -0800 (PST)
Received: from [192.168.1.3] (pool-68-160-251-207.ny325.east.verizon.net
	[68.160.251.207]) (authenticated bits=0)
	by mac.com (Xserve/smtpin01/MantshX 4.0) with ESMTP id k1JEl5B9025896
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 19 Feb 2006 06:47:07 -0800 (PST)
Message-ID: <43F884EC.70902@mac.com>
Date: Sun, 19 Feb 2006 09:47:08 -0500
From: Chuck Swiger <cswiger@mac.com>
Organization: The Courts of Chaos
User-Agent: Thunderbird 1.5 (Windows/20051201)
MIME-Version: 1.0
To: "Don O'Neil" <don@lizardhill.com>
References: <008201c63550$2a413490$6d00a8c0@MickeyLaptop>
In-Reply-To: <008201c63550$2a413490$6d00a8c0@MickeyLaptop>
X-Enigmail-Version: 0.94.0.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-questions@freebsd.org
Subject: Re: 3Ware Escalade Issues
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2006 14:47:09 -0000

Don O'Neil wrote:
> There appears to be a bad sector on one of the drives according to smartctl,
> but nothing serious. 

What that may mean is that there have been many bad sectors, which have been
corrected using the spares, until no more spare sectors are left for replacements.

That drive may well fail catastrophically, soon.

> However, every time the system tried to write to that sector in the array,
> the system would freeze, and then reboot, and of course it would say the
> file system isn't clean, etc...
> 
> Since the file system is 1 TB in size, it would take 8+ hours to FSCK it.
> The array is only striped, and not mirrored or built with redunancy. I'm
> basically using the card/driver to make one large volume for a web server.

OK.  Well, if this data is important to you, you should give consideration to
using a RAID-1, RAID-10, or RAID-5 configuration to gain redundancy.

> I have a few questions:
> 
> 1) Is this a known bug? I'm running FreeBSD 4.11 (for software compatibility
> issues at the moment, I will upgrade at some point in the future)

Normally, the OS will only kill the affected processes using that sector, but
without knowing where it is, perhaps it's affecting some important file like the
kernel itself, /bin/sh...?

> 2) How can I trap the errors and eliminate the re-boot issue?

Shut down the system.  Replace the failing hard drive.  Use dd to make an exact
copy onto the new drive on some other system. and put the new drive back into
the array.  Note that the replacement drive must be an exact match for this to
work, otherwise you will have to backup your data and rebuild the array.

Speaking of which, do you have known-good backups available?

> 3) Is there some way I can do a faster FSCK, or perhaps 'fool' the system
> into thinking the file system is clean?

If you update to 5.x or later, you can use background FSCK rather than having to
wait for the FSCK to complete the way it does under 4.x.

> 4) Any suggestions on how to fix this?

Also, if you update to 5.x, you can run the smartmon tools, which will let you
do a drive self-test using SMART, this will give much better information about
what is going on with the drive, and also give an estimate of its remaining
lifespan.

How old are the drives, if you know?

-- 
-Chuck