Date: Sat, 18 Nov 2006 19:40:53 +0100 From: Anders Nordby <anders@FreeBSD.org> To: Edwin Groothuis <edwin@mavetju.org> Cc: freebsd-proliant@freebsd.org Subject: Re: RAID monitoring tools Message-ID: <20061118184053.GA55302@fupp.net> In-Reply-To: <20061029043926.GI90772@k7.mavetju> References: <20061029043926.GI90772@k7.mavetju>
next in thread | previous in thread | raw e-mail | index | archive | help
--4Ckj6UjgE2iN1+kY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, On Sun, Oct 29, 2006 at 03:39:26PM +1100, Edwin Groothuis wrote: > Last week we had two failing disks, and if it wasn't for a walk > through the datacenter (which is off-site, and ten dollars away) > we wouldn't have noticed it. I've read the thread about hpacucli, > and have had my failed attempts to get it up and running under the > linuxolator. > > So the question is: how do *you* monitor the status of your disks > and RAID arrays? Any suggestions will be appriciated. Apart from using camcontrol, you can do log monitoring to catch events from the ciss driver. On a server that had a failing disk recently, I got this in the messages log: Nov 14 03:17:44 aicache7 kernel: ciss0: *** SCSI bus speed downshifted, SCSI port 2 Nov 14 03:17:48 aicache7 kernel: ciss0: *** Physical drive failure: SCSI port 2 ID 1 Nov 14 03:17:48 aicache7 kernel: ciss0: *** State change, logical drive 0 Nov 14 03:17:48 aicache7 kernel: ciss0: logical drive 0 (pass0) changed status OK->interim recovery, spare status 0x0 Attached is also a Nagios plugin to check the status of a Compaq RAID using camcontrol. Cheers, -- Anders. --4Ckj6UjgE2iN1+kY Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=check_raid #! /usr/bin/perl # anders@aftenposten.no, 2006-08-22 # check status of COMPAQ RAID volumes in FreeBSD %modelist=(); $okstatus="OK"; $arraytxt="COMPAQ RAID"; $ENV{PATH} = "/usr/local/bin:/usr/local/sbin:$ENV{PATH}:/sbin"; $volumes = 0; if (!open(CAM, "sudo -u root camcontrol devlist |")) { print "ERROR, could not open sudo -u root /sbin/camcontrol.\n"; exit(3); } while(<CAM>) { next if ($_ !~ /$arraytxt/); $volumes++; $mode = $_; chomp($mode); $mode =~ s@<COMPAQ RAID \d+\s+VOLUME @@; $mode =~ s@>.*@@; # print "Mode: $mode\n"; if (defined $modelist{"$mode"}) { $modelist{"$mode"}++; } else { $modelist{"$mode"}=1; } } close(CAM); if ($volumes == 0) { print "No $arraytxt arrays found. Sudo problem?\n"; exit(3); } elsif ($volumes == $modelist{"$okstatus"}) { # All volumes are OK print $modelist{"$okstatus"} . " of " . $volumes . " volumes OK\n"; exit(0); } else { # Not all volumes are OK print "ERROR, $volumes volumes:"; foreach $key (keys %modelist) { print " " . $modelist{"$key"} . " $key" } print "\n"; # This is critical exit(2); } --4Ckj6UjgE2iN1+kY--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061118184053.GA55302>