From owner-freebsd-questions@FreeBSD.ORG Sun Nov 11 05:09:57 2007 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B42F16A421 for ; Sun, 11 Nov 2007 05:09:57 +0000 (UTC) (envelope-from modulok@gmail.com) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.229]) by mx1.freebsd.org (Postfix) with ESMTP id 0DA7013C4BB for ; Sun, 11 Nov 2007 05:09:56 +0000 (UTC) (envelope-from modulok@gmail.com) Received: by wr-out-0506.google.com with SMTP id 70so431925wra for ; Sat, 10 Nov 2007 21:09:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=i8kOVULtKa93XexVkYD6ZVGUHJ0NqroHLoIG5L0hED0=; b=oKsmITBHEQNEOteDRH0iZ/+xAjy3lLqj+Bu3D59o1FkDBmUlqghR5bWyEVRxTMdedyFA7cTy1euwWCMOd2R6YWtNW9e1otRdLlCwME5mEzxrPnYV8V1uJKU4hZiwEPDMGZQkij/UxQeyvs0g6M+HR7KLSotkJbgNMs6F9rcGS6E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=mQjIJlMtBpPzaLDuIICEbnQKYJegGEFAUttU9GV3bsGUiE6PvTsiTXt/9VNVxYOGzltOn7ZdmV4yAMdPTbu7KZd7bmd9uS75B/1r9NXS00SqdsskY9HUrxtZC4oRXP7TAv14UpQX0Y5KPfi8jMoxUKSfRKnJNJq7yXq1WI98Gck= Received: by 10.70.48.2 with SMTP id v2mr4236234wxv.1194757788733; Sat, 10 Nov 2007 21:09:48 -0800 (PST) Received: by 10.70.70.2 with HTTP; Sat, 10 Nov 2007 21:09:48 -0800 (PST) Message-ID: <64c038660711102109x2ea186afjdd219292d8eed700@mail.gmail.com> Date: Sat, 10 Nov 2007 22:09:48 -0700 From: Modulok To: "David Newman" In-Reply-To: <4736593E.1090905@networktest.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4736593E.1090905@networktest.com> Cc: freebsd-questions@freebsd.org Subject: Re: dealing with a failing drive X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Nov 2007 05:09:57 -0000 >> I'd welcome suggestions on how (or whether) to try to revive a SCSI drive that's failing. It depends on how valuable the data on the array is, and more importantly, how much funding you have at your disposal to fix the problem. If it were me, I would set aside the bad disk, connect a new disk to the card and re-synchronize the array. (Assuming one of the members still retains a good copy of the data.) Afterwards I would destroy, or toss the existing disk in the trash can (depending on the sensitivity of the data stored on it.) >> Is there some other way to: >> b)monitor the health of disks on a Compaq controller so it doesn't get to this point to begin with? There are various tools out there that attempt to 'monitor' the condition of disk drives to try and predict when failure is eminent. For valuable data, it is safer to setup a mirror and simply toss out bad disks as they fail. For extremely valuable data use a 3 disk array. With a 3 disk setup you will still be covered in the event that an additional disk craps out during the re-sync. To quote google's article on disk failure, regarding SMART: "...we find that failure prediction models based on SMART parameters alone are likely to be severely limited in the prediction accuracy, given that a large fraction of our failed drives have shown on SMART error signals whatsoever. This result suggests that SMART models are more useful in predicting trends for large aggregate populations that for individual components." http://labs.google.com/papers/disk_failures.pdf My 2 cents. -Modulok- On 11/10/07, David Newman wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'd welcome suggestions on how (or whether) to try to revive a SCSI > drive that's failing. > > This is on FreeBSD 6.2-RELENG on a Compaq Proliant DL320, onboard RAID > and two SCSI drives in a RAID1 array. > > Today this system rebooted and hung on Compaq's "what do you want the > RAID controller to do?" message. I told it to fix any errors. > > When I brought the system back up (after running fsck in single-user > mode), the log had lots of errors like this: > > Nov 10 09:00:40 mail kernel: ida0: hard write error > Nov 10 09:00:40 mail kernel: ida0: invalid request > Nov 10 09:01:48 mail last message repeated 35 times > Nov 10 09:03:49 mail last message repeated 571 times > Nov 10 09:12:27 mail last message repeated 796 times > > I vaguely remember trying about a year ago to load a SMART utility from > the ports collection but it wouldn't work on drives in a RAID array. > > Is there some other way to: > > a) diagnose/fix the errant disk here? > b) monitor the health of disks on a Compaq controller so it doesn't get > to this point to begin with? > > thanks in advance > > dn > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (Darwin) > > iD8DBQFHNlk+yPxGVjntI4IRAntlAJ9FWA2ez+BdnViq7mrIpkLBTLm/CgCfRyEA > czDvMn6+8KjlI3V0iBG4U3I= > =36+k > -----END PGP SIGNATURE----- > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >