From owner-freebsd-questions@FreeBSD.ORG Wed Feb 14 20:56:05 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BCD2816A41F for ; Wed, 14 Feb 2007 20:56:05 +0000 (UTC) (envelope-from lavalamp@spiritual-machines.org) Received: from mail.digitalfreaks.org (arbitor.digitalfreaks.org [216.151.95.158]) by mx1.freebsd.org (Postfix) with ESMTP id 8FE5B13C4B4 for ; Wed, 14 Feb 2007 20:56:05 +0000 (UTC) (envelope-from lavalamp@spiritual-machines.org) Received: from localhost (localhost [127.0.0.1]) by mail.digitalfreaks.org (Postfix) with ESMTP id 1309117EE2 for ; Wed, 14 Feb 2007 15:56:05 -0500 (EST) Received: from mail.digitalfreaks.org ([127.0.0.1]) by localhost (mail.digitalfreaks.org [127.0.0.1]) (amavisd-maia, port 10024) with ESMTP id 28447-03 for ; Wed, 14 Feb 2007 15:56:00 -0500 (EST) Received: from mail.digitalfreaks.org (mail.digitalfreaks.org [216.151.95.156]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.digitalfreaks.org (Postfix) with ESMTP id 5678917F2F for ; Wed, 14 Feb 2007 15:56:00 -0500 (EST) Date: Wed, 14 Feb 2007 15:56:00 -0500 (EST) From: "Brian A. Seklecki" X-X-Sender: lavalamp@arbitor.digitalfreaks.org To: freebsd-questions@freebsd.org In-Reply-To: <20070214150017.G59589@arbitor.digitalfreaks.org> Message-ID: <20070214155407.O59589@arbitor.digitalfreaks.org> References: <45116E76.6020009@chamonix.reportlab.co.uk> <4511745C.2080701@dial.pipex.com> <20070214150017.G59589@arbitor.digitalfreaks.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: Maia Mailguard 1.0.2 Subject: geom(4)/gmirror(4) automatic device DEGRADED status demotion (WAS:Re: gmirror HD failure detection) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Feb 2007 20:56:05 -0000 On Wed, 14 Feb 2007, Brian A. Seklecki wrote: > All: > > For a while our strategy was to use NRPE2+ a custom nagios check > (check_raid_fbsdgmirror -- ugly-as-hell Perl, but which I can make > available to the public). > > However, this morning a drive in a Dell PE1850 (one without a PERC4 > controller) started erroring. It has just regular old (bad) mpt(4) > controller. > > The problem is that gmirror(4) never marked the drive as failed. > > I'd have to tear through the code to find where the logic is for automatic > demotion of a failed mirror. > > Either way, the original thinking behind the Nagios pluging check, was that > gmirror(4) would have some threshold of failed attempts to write/read from a > provider disk should lead to flagging a provider as "DEGRADED" > > Its entirely possible that we never had a chance to test it. > > Now I have to go back and re-visit all of that. > > ~BAS > > On Wed, 20 Sep 2006, Alex Zbyslaw wrote: > >> Robin Becker wrote: >> >>> After using Dru Lavigne's excellent article http://tinyurl.com/da66a about >>> Raid-1 I have a full Raid-1 mirror on a new rack server. I'm wondering if >>> anyone can tell me how best to monitor the hardware status to detect >>> imminent failure of one of the disks? Do I use something like smartctl in >>> a cron or what? >> >> Assuming that the disks support SMART then just read the man page for >> smartd. No need for cron. You can also schedule "short" and "long" tests >> to run in off hours. smartmontools is easy to uninstall if it doesn't work >> for you. However, this will tell you that a disk is failing (or failed) >> which is not quite the same as array status. An array (theoretically) >> might be sub-optimal for non-SMART reasons. Someone familiar with gmirror >> will have to answer that bit... but gmirror status -s looks from the man >> page like it might be interesting and *that* could be run from cron and >> parsed to weed out "status OK results".