From owner-freebsd-questions@FreeBSD.ORG  Wed Feb 22 17:32:46 2006
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 64D7B16A420
	for <freebsd-questions@freebsd.org>;
	Wed, 22 Feb 2006 17:32:46 +0000 (GMT)
	(envelope-from don@lizardhill.com)
Received: from bigbird.whtech.com (bigbird.whtech.com [64.125.72.2])
	by mx1.FreeBSD.org (Postfix) with SMTP id 2DA7543D49
	for <freebsd-questions@freebsd.org>;
	Wed, 22 Feb 2006 17:32:45 +0000 (GMT)
	(envelope-from don@lizardhill.com)
Received: (qmail 82377 invoked by uid 0); 22 Feb 2006 17:31:50 -0000
Received: from unknown (HELO mickey) (unknown)
	by unknown with SMTP; 22 Feb 2006 17:31:50 -0000
From: "Don O'Neil" <don@lizardhill.com>
To: <freebsd-questions@freebsd.org>
Date: Wed, 22 Feb 2006 09:31:54 -0800
Message-ID: <03d101c637d5$dfe89040$0300020a@mickey>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670
In-Reply-To: <20060219214815.30F6A16A423@hub.freebsd.org>
Thread-Index: AcY1nqqafAI9ZkcWRh6M2m11n/geCwAt11Jg
Cc: 
Subject: Re: 3Ware Escalade Issues
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Feb 2006 17:32:46 -0000

Chuck,
  Thanks for the response, this helps me a lot... My answers are inline:

>Don O'Neil wrote:
>> There appears to be a bad sector on one of the drives according to
smartctl,
>> but nothing serious. 

>What that may mean is that there have been many bad sectors, which have
been
>corrected using the spares, until no more spare sectors are left for
replacements.

>That drive may well fail catastrophically, soon.

I figured as much, which is why I'm going to re-build the whole array with a
new drive, etc... Fortunatly I got all my data off ok without any issues.

>> However, every time the system tried to write to that sector in the
array,
>> the system would freeze, and then reboot, and of course it would say the
>> file system isn't clean, etc...
>> 
>> Since the file system is 1 TB in size, it would take 8+ hours to FSCK it.
>> The array is only striped, and not mirrored or built with redunancy. I'm
>> basically using the card/driver to make one large volume for a web
server.

>OK.  Well, if this data is important to you, you should give consideration
to
>using a RAID-1, RAID-10, or RAID-5 configuration to gain redundancy.

Yes, and when I re-build it with will be RAID-5 rather than just RAID-0

>> I have a few questions:
>> 
>> 1) Is this a known bug? I'm running FreeBSD 4.11 (for software
compatibility
.> issues at the moment, I will upgrade at some point in the future)

>Normally, the OS will only kill the affected processes using that sector,
but
>without knowing where it is, perhaps it's affecting some important file
like the
>kernel itself, /bin/sh...?

Actually the only thing that was on the array was a DB, so I think the
failure may have been causing MySQL to go nuts, and cascading up. 

>> 2) How can I trap the errors and eliminate the re-boot issue?

>Shut down the system.  Replace the failing hard drive.  Use dd to make an
exact
>copy onto the new drive on some other system. and put the new drive back
into
>the array.  Note that the replacement drive must be an exact match for this
to
>work, otherwise you will have to backup your data and rebuild the array.

>Speaking of which, do you have known-good backups available?

Of course I have backups!! Never work without them. I'm going to re-build
with RAID-5 this time.

>> 3) Is there some way I can do a faster FSCK, or perhaps 'fool' the system
>> into thinking the file system is clean?

>If you update to 5.x or later, you can use background FSCK rather than
having to
>wait for the FSCK to complete the way it does under 4.x.

I wasn't aware 5.x could do this. My next question is how are my existing
apps going to be affected by upgrading to 5.x? I have some builds of
packages that were done by a company that is no longer in operation. I
haven't fully figured out how they built the software yet so I can't
re-build under 5.X yet. If I try to put the elf binaries and the other
builds from 4.X on 5.X are they going to run ok or do I just need to give it
a try? Would you suggest going all the way to 6.x or sticking with the 5.x
chain?

>> 4) Any suggestions on how to fix this?

>Also, if you update to 5.x, you can run the smartmon tools, which will let
you
>do a drive self-test using SMART, this will give much better information
about
>what is going on with the drive, and also give an estimate of its remaining
>lifespan.

Yes, this would help a lot!!!

>How old are the drives, if you know?

They're less than 2 years old, and still under warranty. This is the second
drive to fail and it's driving me nuts.

They're Maxtor DiamondMax Plus 9 6Y250P0 250 GB PATA drives... Never had a
problem with that particular drive until this batch. 

Can anyone suggest some good 250GB PATA drives for me to use? I might as
well swap them all out since I'm starting over. The 6000 series Escalade
card I'm using doesn't support anything more than 250 GB.

Thanks all again!!!
Don