Date: Wed, 14 Feb 2007 15:56:00 -0500 (EST) From: "Brian A. Seklecki" <lavalamp@spiritual-machines.org> To: freebsd-questions@freebsd.org Subject: geom(4)/gmirror(4) automatic device DEGRADED status demotion (WAS:Re: gmirror HD failure detection) Message-ID: <20070214155407.O59589@arbitor.digitalfreaks.org> In-Reply-To: <20070214150017.G59589@arbitor.digitalfreaks.org> References: <45116E76.6020009@chamonix.reportlab.co.uk> <4511745C.2080701@dial.pipex.com> <20070214150017.G59589@arbitor.digitalfreaks.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 14 Feb 2007, Brian A. Seklecki wrote: > All: > > For a while our strategy was to use NRPE2+ a custom nagios check > (check_raid_fbsdgmirror -- ugly-as-hell Perl, but which I can make > available to the public). > > However, this morning a drive in a Dell PE1850 (one without a PERC4 > controller) started erroring. It has just regular old (bad) mpt(4) > controller. > > The problem is that gmirror(4) never marked the drive as failed. > > I'd have to tear through the code to find where the logic is for automatic > demotion of a failed mirror. > > Either way, the original thinking behind the Nagios pluging check, was that > gmirror(4) would have some threshold of failed attempts to write/read from a > provider disk should lead to flagging a provider as "DEGRADED" > > Its entirely possible that we never had a chance to test it. > > Now I have to go back and re-visit all of that. > > ~BAS > > On Wed, 20 Sep 2006, Alex Zbyslaw wrote: > >> Robin Becker wrote: >> >>> After using Dru Lavigne's excellent article http://tinyurl.com/da66a about >>> Raid-1 I have a full Raid-1 mirror on a new rack server. I'm wondering if >>> anyone can tell me how best to monitor the hardware status to detect >>> imminent failure of one of the disks? Do I use something like smartctl in >>> a cron or what? >> >> Assuming that the disks support SMART then just read the man page for >> smartd. No need for cron. You can also schedule "short" and "long" tests >> to run in off hours. smartmontools is easy to uninstall if it doesn't work >> for you. However, this will tell you that a disk is failing (or failed) >> which is not quite the same as array status. An array (theoretically) >> might be sub-optimal for non-SMART reasons. Someone familiar with gmirror >> will have to answer that bit... but gmirror status -s looks from the man >> page like it might be interesting and *that* could be run from cron and >> parsed to weed out "status OK results".
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070214155407.O59589>