Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Nov 2006 19:40:53 +0100
From:      Anders Nordby <anders@FreeBSD.org>
To:        Edwin Groothuis <edwin@mavetju.org>
Cc:        freebsd-proliant@freebsd.org
Subject:   Re: RAID monitoring tools
Message-ID:  <20061118184053.GA55302@fupp.net>
In-Reply-To: <20061029043926.GI90772@k7.mavetju>
References:  <20061029043926.GI90772@k7.mavetju>

next in thread | previous in thread | raw e-mail | index | archive | help

--4Ckj6UjgE2iN1+kY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi,

On Sun, Oct 29, 2006 at 03:39:26PM +1100, Edwin Groothuis wrote:
> Last week we had two failing disks, and if it wasn't for a walk
> through the datacenter (which is off-site, and ten dollars away)
> we wouldn't have noticed it. I've read the thread about hpacucli,
> and have had my failed attempts to get it up and running under the
> linuxolator.
> 
> So the question is: how do *you* monitor the status of your disks
> and RAID arrays? Any suggestions will be appriciated.

Apart from using camcontrol, you can do log monitoring to catch events
from the ciss driver. On a server that had a failing disk recently, I
got this in the messages log:

Nov 14 03:17:44 aicache7 kernel: ciss0: *** SCSI bus speed downshifted,
SCSI port 2
Nov 14 03:17:48 aicache7 kernel: ciss0: *** Physical drive failure: SCSI
port 2 ID 1
Nov 14 03:17:48 aicache7 kernel: ciss0: *** State change, logical drive
0
Nov 14 03:17:48 aicache7 kernel: ciss0: logical drive 0 (pass0) changed
status OK->interim recovery, spare status 0x0

Attached is also a Nagios plugin to check the status of a Compaq RAID
using camcontrol.

Cheers,

-- 
Anders.

--4Ckj6UjgE2iN1+kY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=check_raid

#! /usr/bin/perl
# anders@aftenposten.no, 2006-08-22
# check status of COMPAQ RAID volumes in FreeBSD

%modelist=();
$okstatus="OK";
$arraytxt="COMPAQ RAID";
$ENV{PATH} = "/usr/local/bin:/usr/local/sbin:$ENV{PATH}:/sbin";

$volumes = 0;

if (!open(CAM, "sudo -u root camcontrol devlist |")) {
	print "ERROR, could not open sudo -u root /sbin/camcontrol.\n";
	exit(3);
}
while(<CAM>) {
	next if ($_ !~ /$arraytxt/);
	$volumes++;
	$mode = $_;
	chomp($mode);
	$mode =~ s@<COMPAQ RAID \d+\s+VOLUME @@;
	$mode =~ s@>.*@@;
#	print "Mode: $mode\n";
	if (defined $modelist{"$mode"}) {
		$modelist{"$mode"}++;
	} else {
		$modelist{"$mode"}=1;
	}
}
close(CAM);

if ($volumes == 0) {
	print "No $arraytxt arrays found. Sudo problem?\n";
	exit(3);
} elsif ($volumes == $modelist{"$okstatus"}) {
	# All volumes are OK
	print $modelist{"$okstatus"} . " of " . $volumes . " volumes OK\n";
	exit(0);
} else {
	# Not all volumes are OK
	print "ERROR, $volumes volumes:";
	foreach $key (keys %modelist) {
		print " " . $modelist{"$key"} . " $key"
	}
	print "\n";
	# This is critical
	exit(2);
}

--4Ckj6UjgE2iN1+kY--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061118184053.GA55302>