From owner-freebsd-questions@FreeBSD.ORG Wed Feb 22 17:32:46 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 64D7B16A420 for ; Wed, 22 Feb 2006 17:32:46 +0000 (GMT) (envelope-from don@lizardhill.com) Received: from bigbird.whtech.com (bigbird.whtech.com [64.125.72.2]) by mx1.FreeBSD.org (Postfix) with SMTP id 2DA7543D49 for ; Wed, 22 Feb 2006 17:32:45 +0000 (GMT) (envelope-from don@lizardhill.com) Received: (qmail 82377 invoked by uid 0); 22 Feb 2006 17:31:50 -0000 Received: from unknown (HELO mickey) (unknown) by unknown with SMTP; 22 Feb 2006 17:31:50 -0000 From: "Don O'Neil" To: Date: Wed, 22 Feb 2006 09:31:54 -0800 Message-ID: <03d101c637d5$dfe89040$0300020a@mickey> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670 In-Reply-To: <20060219214815.30F6A16A423@hub.freebsd.org> Thread-Index: AcY1nqqafAI9ZkcWRh6M2m11n/geCwAt11Jg Cc: Subject: Re: 3Ware Escalade Issues X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Feb 2006 17:32:46 -0000 Chuck, Thanks for the response, this helps me a lot... My answers are inline: >Don O'Neil wrote: >> There appears to be a bad sector on one of the drives according to smartctl, >> but nothing serious. >What that may mean is that there have been many bad sectors, which have been >corrected using the spares, until no more spare sectors are left for replacements. >That drive may well fail catastrophically, soon. I figured as much, which is why I'm going to re-build the whole array with a new drive, etc... Fortunatly I got all my data off ok without any issues. >> However, every time the system tried to write to that sector in the array, >> the system would freeze, and then reboot, and of course it would say the >> file system isn't clean, etc... >> >> Since the file system is 1 TB in size, it would take 8+ hours to FSCK it. >> The array is only striped, and not mirrored or built with redunancy. I'm >> basically using the card/driver to make one large volume for a web server. >OK. Well, if this data is important to you, you should give consideration to >using a RAID-1, RAID-10, or RAID-5 configuration to gain redundancy. Yes, and when I re-build it with will be RAID-5 rather than just RAID-0 >> I have a few questions: >> >> 1) Is this a known bug? I'm running FreeBSD 4.11 (for software compatibility .> issues at the moment, I will upgrade at some point in the future) >Normally, the OS will only kill the affected processes using that sector, but >without knowing where it is, perhaps it's affecting some important file like the >kernel itself, /bin/sh...? Actually the only thing that was on the array was a DB, so I think the failure may have been causing MySQL to go nuts, and cascading up. >> 2) How can I trap the errors and eliminate the re-boot issue? >Shut down the system. Replace the failing hard drive. Use dd to make an exact >copy onto the new drive on some other system. and put the new drive back into >the array. Note that the replacement drive must be an exact match for this to >work, otherwise you will have to backup your data and rebuild the array. >Speaking of which, do you have known-good backups available? Of course I have backups!! Never work without them. I'm going to re-build with RAID-5 this time. >> 3) Is there some way I can do a faster FSCK, or perhaps 'fool' the system >> into thinking the file system is clean? >If you update to 5.x or later, you can use background FSCK rather than having to >wait for the FSCK to complete the way it does under 4.x. I wasn't aware 5.x could do this. My next question is how are my existing apps going to be affected by upgrading to 5.x? I have some builds of packages that were done by a company that is no longer in operation. I haven't fully figured out how they built the software yet so I can't re-build under 5.X yet. If I try to put the elf binaries and the other builds from 4.X on 5.X are they going to run ok or do I just need to give it a try? Would you suggest going all the way to 6.x or sticking with the 5.x chain? >> 4) Any suggestions on how to fix this? >Also, if you update to 5.x, you can run the smartmon tools, which will let you >do a drive self-test using SMART, this will give much better information about >what is going on with the drive, and also give an estimate of its remaining >lifespan. Yes, this would help a lot!!! >How old are the drives, if you know? They're less than 2 years old, and still under warranty. This is the second drive to fail and it's driving me nuts. They're Maxtor DiamondMax Plus 9 6Y250P0 250 GB PATA drives... Never had a problem with that particular drive until this batch. Can anyone suggest some good 250GB PATA drives for me to use? I might as well swap them all out since I'm starting over. The 6000 series Escalade card I'm using doesn't support anything more than 250 GB. Thanks all again!!! Don