From owner-freebsd-questions@FreeBSD.ORG Wed Feb 14 20:06:43 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CEE6616A401 for ; Wed, 14 Feb 2007 20:06:43 +0000 (UTC) (envelope-from lavalamp@spiritual-machines.org) Received: from mail.digitalfreaks.org (arbitor.digitalfreaks.org [216.151.95.158]) by mx1.freebsd.org (Postfix) with ESMTP id A461C13C471 for ; Wed, 14 Feb 2007 20:06:43 +0000 (UTC) (envelope-from lavalamp@spiritual-machines.org) Received: from localhost (localhost [127.0.0.1]) by mail.digitalfreaks.org (Postfix) with ESMTP id 7029417F03; Wed, 14 Feb 2007 15:06:42 -0500 (EST) Received: from mail.digitalfreaks.org ([127.0.0.1]) by localhost (mail.digitalfreaks.org [127.0.0.1]) (amavisd-maia, port 10024) with ESMTP id 16544-04; Wed, 14 Feb 2007 15:06:38 -0500 (EST) Received: from mail.digitalfreaks.org (mail.digitalfreaks.org [216.151.95.156]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.digitalfreaks.org (Postfix) with ESMTP id 79D2F17F12; Wed, 14 Feb 2007 15:06:38 -0500 (EST) Date: Wed, 14 Feb 2007 15:06:38 -0500 (EST) From: "Brian A. Seklecki" X-X-Sender: lavalamp@arbitor.digitalfreaks.org To: Alex Zbyslaw , Bob Johnson In-Reply-To: <4511745C.2080701@dial.pipex.com> Message-ID: <20070214150017.G59589@arbitor.digitalfreaks.org> References: <45116E76.6020009@chamonix.reportlab.co.uk> <4511745C.2080701@dial.pipex.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: Maia Mailguard 1.0.2 X-Mailman-Approved-At: Wed, 14 Feb 2007 22:40:49 +0000 Cc: Dave , Robin Becker , freebsd-questions@freebsd.org, wmoran@collaborativefusion.com Subject: geom(4)/gmirror(4) automatic device DEGRADED status promotion (WAS:Re: gmirror HD failure detection) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Feb 2007 20:06:43 -0000 All: For a while our strategy was to use NRPE2+a custom nagios check (check_raid_fbsdgmirror -- ugly as hell Perl but which I can make available to the public). However, this morning a drive in a Dell PE1850 (one without a PERC4 controller) started erroring. It has just regular old (bad) mpt(4) controller. The problem is that gmirror(4) never marked the drive as failed. I'd have to tear through the code to find where the logic is for automatic demotion of a failed mirror. Either way, the original thinking behind the Nagios pluging check, was that gmirror(4) would have some threshold of failed attempts to write/read from a provider disk should lead to flagging a provider as "DEGRADED" Its entirely possible that we never had a chance to test it. Now I have to go back and re-visit all of that. ~BAS On Wed, 20 Sep 2006, Alex Zbyslaw wrote: > Robin Becker wrote: > >> After using Dru Lavigne's excellent article http://tinyurl.com/da66a about >> Raid-1 I have a full Raid-1 mirror on a new rack server. I'm wondering if >> anyone can tell me how best to monitor the hardware status to detect >> imminent failure of one of the disks? Do I use something like smartctl in a >> cron or what? > > Assuming that the disks support SMART then just read the man page for smartd. > No need for cron. You can also schedule "short" and "long" tests to run in > off hours. smartmontools is easy to uninstall if it doesn't work for you. > However, this will tell you that a disk is failing (or failed) which is not > quite the same as array status. An array (theoretically) might be > sub-optimal for non-SMART reasons. Someone familiar with gmirror will have > to answer that bit... but gmirror status -s looks from the man page like it > might be interesting and *that* could be run from cron and parsed to weed out > "status OK results". > > --Alex > > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" > l8* -lava (Brian A. Seklecki - Pittsburgh, PA, USA) http://www.spiritual-machines.org/ "...from back in the heady days when "helpdesk" meant nothing, "diskquota" meant everything, and lives could be bought and sold for a couple of pages of laser printout - and frequently were."