From owner-freebsd-bugs@FreeBSD.ORG Mon Dec 6 12:10:08 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABD66106564A for ; Mon, 6 Dec 2010 12:10:08 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 834618FC12 for ; Mon, 6 Dec 2010 12:10:08 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oB6CA8oH035554 for ; Mon, 6 Dec 2010 12:10:08 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id oB6CA8tp035553; Mon, 6 Dec 2010 12:10:08 GMT (envelope-from gnats) Resent-Date: Mon, 6 Dec 2010 12:10:08 GMT Resent-Message-Id: <201012061210.oB6CA8tp035553@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Marian Jamrich Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B6AF1065693 for ; Mon, 6 Dec 2010 12:03:41 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (unknown [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 673F78FC0C for ; Mon, 6 Dec 2010 12:03:41 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id oB6C3f2w041449 for ; Mon, 6 Dec 2010 12:03:41 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id oB6C3fYw041448; Mon, 6 Dec 2010 12:03:41 GMT (envelope-from nobody) Message-Id: <201012061203.oB6C3fYw041448@red.freebsd.org> Date: Mon, 6 Dec 2010 12:03:41 GMT From: Marian Jamrich To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: misc/152859: [new port] net-mgmt/nagios-check_hdd_health , is a Nagios plug-in written in shell to check your HDD health using SmartMonTools X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Dec 2010 12:10:08 -0000 >Number: 152859 >Category: misc >Synopsis: [new port] net-mgmt/nagios-check_hdd_health , is a Nagios plug-in written in shell to check your HDD health using SmartMonTools >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Mon Dec 06 12:10:08 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Marian Jamrich >Release: 8.2 prerelease >Organization: >Environment: >Description: check_hdd_health is a Nagios plug-in written in shell to check your HDD health using SmartMonTools. This script check HDD from S.M.A.R.T this values: - Spin Retry Count - Reallocated Sector Ct - Reallocated Event Count - Current Pending Sector - Offline Uncorrectable - Total health test >How-To-Repeat: >Fix: Patch attached with submission follows: # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # check_hdd_health # echo x - check_hdd_health sed 's/^X//' >check_hdd_health << '53eb126359c9c0d8f2d23c32c84ef809' X#!/bin/sh X# XPATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/usr/local/bin X XST_OK=0 XST_WR=1 XST_CR=2 XST_UN=3 X Xsmartctl=$(which smartctl) X X## Smartmontools XSMT=Smartmontools X X# Plugin name XPROGNAME=`basename $0` X X# Version XVERSION="Version 1.0" X X# Author XAUTHOR="Marian Jamrich" X XTMPFILE=/tmp/smart.nagios.$$ X X# Clean up when done or when aborting Xtrap "rm -f ${TMPFILE}" 0 1 2 3 15 X X#print_version() { X# echo "$PROGNAME $VERSION $1" X#} X Xmini_help() { X echo "Usage $0 --device $device --without [src rsc rec cps ou]" X} X Xprint_help() { X clear; X echo "*********************************************************************************" X echo "* $PROGNAME $VERSION $1""($AUTHOR) (2010) *" X echo "*********************************************************************************" X echo "This is Nagios plugin to check HDD health from S.M.A.R.T. by Smartmontools." X echo ' XThe S.M.A.R.T. attributes are specific properties (parameters) of various parts of a disk. XS.M.A.R.T. uses attributes to monitor the disk condition and to analyze its reliability. X XScript check HDD from S.M.A.R.T with the following properties (if your HDD supports it): X X** Spin Retry Count (src) ** XCount of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the Xcondition that the first attempt was unsuccessful). A decrease of this attribute value is a sign of problems in the hard disk mechanical subsystem. X X** Reallocated Sector Count (rsc) ** XCount of reallocated sectors. When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a Xspecial reserved area (spare area). This process is also known as remapping and "reallocated" sectors are called remaps. This is why, on a modern hard Xdisks, you can not see "bad blocks" while testing the surface - all bad blocks are hidden in reallocated sectors. X X** Reallocated Event Count (rec) ** XCount of remap operations (transferring data from a bad sector to a special reserved disk area - spare area). The raw value of this attribute shows the Xtotal number of attempts to transfer data from reallocated sectors to a spare area. Unsuccessful attempts are counted as well as successful. X X** Current Pending Sector (cps) ** XCurrent count of unstable sectors (waiting for remapping). The raw value of this attribute indicates the total number of sectors waiting for remapping. XLater, when some of these sectors are read successfully, the value is decreased. If errors still occur when reading some sector, the hard drive will try Xto restore the data, transfer it to the reserved disk area (spare area) and mark this sector as remapped. If this attribute value remains at zero, it Xindicates that the quality of the corresponding surface area is low. X X** Offline Uncorrectable (ou) ** XQuantity of uncorrectable errors. The raw value of this attribute indicates the total number of uncorrectable errors when reading/writing a sector. XA rise in the value of this attribute indicates that there are evident defects of the disk surface and/or there are problems in the hard disk drive Xmechanical subsystem. X X** Total health test (pass) ** XThis is test provided by Smartmontools. If total disk state is "health", Smartmontools marked as "PASSED". X ' X echo "Nagios states:" X echo X echo "OK - if all values are \"0\"." X echo "Warning - if one or both values \"Spin Retry Count\" and \"Reallocated Event Count\" is between the values 1 to 9." X echo "Critical - if some value is greater than \"0\" except \"Spin Retry Count (>=10)\" and \"Reallocated Event Count (>=10)\"." X echo -e "\n---------------------------------------------------------------------" X echo "Usage:" X echo "$0 --device /dev/ad0 [ --without [src rsc rec cps ou]]" X echo "---------------------------------------------------------------------" X exit $ST_UN X} X Xcase "$1" in X --help|-h|--usage|-u) X print_help X exit $ST_UN X ;; X -d | --device) X device=$2 X ;; X -V) X print_version X exit X ;; X *) X echo "Unknown argument: $1" X echo "For more information please try -h or --help!" X exit $ST_UN X ;; Xesac Xshift X Xtest -z $device && echo -e "\nYou forgot to define device! Please try \"-h or --help\" to help." && exit $ST_UN Xtest `uname` != "FreeBSD" && echo "This plugin is only for FreeBSD." && exit $ST_UN X Xif [ ! -e $device ]; then X echo X echo "Unknown device \"$device\"!" X exit $ST_UK Xfi X Xif [ -z $smartctl ]; then X echo -e "\nYou don't have installed $SMT. Please install it at http://smartmontools.sourceforge.net or pkg_add -r \"smartmontools\"..." X exit $ST_UN Xfi X X$smartctl -a $device > ${TMPFILE} XSMART_SUPPORT=`awk '/SMART support is/ {print $4}' ${TMPFILE} | tail -n 1` X Xif [ "${SMART_SUPPORT}" = "Unavailable" ]; then X echo -e "\nS.M.A.R.T support is Unavailable for $device !!! You should enable it \"smartctl -s on $device\"." X exit $ST_UN Xelif [ "${SMART_SUPPORT}" != "Enabled" ]; then X echo -e "\nMaybe you don't have enabled S.M.A.R.T support in $SMT! Please type \"smartctl -s on $device\" that you have it turned on. Or device does not support S.M.A.R.T function." X exit $ST_UN Xfi X X## start S.M.A.R.T test and set variables Xsrc=`awk '/Spin_Retry_Count/ {print $10}' ${TMPFILE} ` Xrsc=`awk '/Reallocated_Sector_Ct/ {print $10}' ${TMPFILE} ` Xrec=`awk '/Reallocated_Event_Count/ {print $10}' ${TMPFILE} ` Xcps=`awk '/Current_Pending_Sector/ {print $10}' ${TMPFILE} ` Xou=`awk '/Offline_Uncorrectable/ {print $10}' ${TMPFILE} ` Xpass=`awk -F\: '/test result/ { if ( $2 == " PASSED") print "PASSED"; else print "FAILED" }' ${TMPFILE} ` X X## if one or more S.M.A.R.T function is not supported by your HDD, then you define --without variable and then value is set to "0" Xargs=`getopt w:without: $*` Xfor arg; do X case "$arg" in X src) src=0;; X rsc) rsc=0;; X rec) rec=0;; X cps) cps=0;; X ou) ou=0;; X esac Xdone X X# test if your HDD support all parameters: X[ -z "$src" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Spin_Retry_Count. Please try \"--without src\"." && mini_help && exit $ST_UN X[ -z "$rsc" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Reallocated_Sector_Ct. Please try \"--without rsc\"." && mini_help && exit $ST_UN X[ -z "$rec" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Reallocated_Event_Count. Please try --without rec." && mini_help && exit $ST_UN X[ -z "$cps" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Current_Pending_Sector. Please try --without cps." && mini_help && exit $ST_UN X[ -z "$ou" ] && echo -e "***********\n** ERROR **\n***********\n${device} don't support Offline_Uncorrectable. Please try \"--without ou\"." && mini_help && exit $ST_UN X Xperfdata="smart=src=$src; rsc=$rsc; rec=$rec; cps=$cps; ou=$ou; pass=$pass" X X##### finally run test, print result and set exit code ##### Xif [ $src -eq 0 ] && [ $rsc -eq 0 ] && [ $rec -eq 0 ] && [ $cps -eq 0 ] && [ $ou -eq 0 ] && [ "$pass" = "PASSED" ]; then X echo "OK - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, cps=$cps, ou=$ou, HEALTH_STATUS=$pass for $device. |${perfdata}" X exit $ST_OK Xelif [ $src -gt 1 -a $src -lt 10 ] && [ $rsc -gt 0 ] && [ $rec -gt 1 -a $rec -lt 10 ] && [ $cps -eq 0 ] && [ $ou -eq 0 ] && [ "$pass" = "PASSED" ]; then X echo "WARNING - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, cps=$cps, ou=$ou, HEALTH_STATUS=$pass for $device. |${perfdata}" X exit $ST_WR Xelse X echo "CRITICAL - HDD S.M.A.R.T health: src=$src, rsc=$rsc, rec=$rec, cps=$cps, ou=$ou, HEALT_STATUS=$pass for $device. |${perfdata}" X exit $ST_CR Xfi 53eb126359c9c0d8f2d23c32c84ef809 exit >Release-Note: >Audit-Trail: >Unformatted: