Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 May 2005 12:44:41 +0100
From:      Mike Woods <Mike@the-rubber-chicken-network.co.uk>
To:        Warren Block <wblock@wonkity.com>
Cc:        Duane Winner <dwinner-lists@att.net>
Subject:   Re: monitoring and alerting software  ????
Message-ID:  <4285E4A9.1040604@the-rubber-chicken-network.co.uk>
In-Reply-To: <20050512141024.U37797@wonkity.com>
References:  <428394B2.20409@att.net> <20050512141024.U37797@wonkity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Warren Block wrote:
> On Thu, 12 May 2005, Duane Winner wrote:
> 
>> Does anybody have recommendations for a good solution to alert me 
>> while I am not at work if something goes wrong with my 
>> infrastucture/network/servers?
>> In other words, if I am at home, I need to be alerted if one of my 
>> FreeBSD servers go down, but also if the router, firewall or switches 
>> go haywire.
> 
> 
> Here's something I wrote recently on setting up Nagios on FreeBSD:
> 
> http://www.wonkity.com/~wblock/nagios.pdf

Nagios is a good choice indeed, i've recently implemented a monitoring 
system for our rack at redbus using Nagios and i'm rather impressed with 
how well it all works!

I've picked up a couple of "tricks" while doing this, the first one is 
simply to make very good use of service templates, most of the services 
we monitor in our rack are websites (using check_http) so that becomes a 
somewhat repeating entry in the config, to minimize this i have a 
template defined for website checks containing all of the static values 
which looks an awful lot like this

define service{
         use                             generic-service
         name                            website-service
         is_volatile                     0
         check_period                    24x7
         max_check_attempts              5
         normal_check_interval           1
         retry_check_interval            1
         contact_groups                  admins
         notification_interval           240
         notification_period             24x7
         notification_options            w,u,c,r
         register                        0
         }


since the check command will be different for each site since the site 
address to query is included that gets specified in the site description 
resulting in an entry that looks a lot like this

define service{
         use                             website-service
         host_name                       <ServerName>
         service_description             <ServiceName> (I use sitename)
         check_command                   check_site!http://<SiteName>;
}

which greatly reduces the size of my config files and makes them a whole 
lot easier to maintain!

The other trick i've picked up is split all my host definitions into 
individual files for each host then add an entry for them in the main 
Nagios config (much as i do with vhosts in apache), again this makes it 
far easier to maintain and has the bonus that removing a host is simply 
a matter of commenting out/deleting a line in the master config file.

Last two things, firstly, nagios -v is your friend, it will give you 
concise and quite useful information on any errors in your config files 
  and saves you loosing the system because of a typo, secondly, for 
remote checks nrpe is a godsend, it can be used to allow Nagios to check 
pretty much any local information on a remote machine and is quite easy 
to configure, for example I have it monitoring the capacity of the /usr 
mount our Solars machine (along with a few other bits).

Hope that's helpful to someone :)

---------------------
Mike Woods
Systems Administrator



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4285E4A9.1040604>