Date: Sat, 14 May 2005 12:44:41 +0100 From: Mike Woods <Mike@the-rubber-chicken-network.co.uk> To: Warren Block <wblock@wonkity.com> Cc: Duane Winner <dwinner-lists@att.net> Subject: Re: monitoring and alerting software ???? Message-ID: <4285E4A9.1040604@the-rubber-chicken-network.co.uk> In-Reply-To: <20050512141024.U37797@wonkity.com> References: <428394B2.20409@att.net> <20050512141024.U37797@wonkity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Warren Block wrote: > On Thu, 12 May 2005, Duane Winner wrote: > >> Does anybody have recommendations for a good solution to alert me >> while I am not at work if something goes wrong with my >> infrastucture/network/servers? >> In other words, if I am at home, I need to be alerted if one of my >> FreeBSD servers go down, but also if the router, firewall or switches >> go haywire. > > > Here's something I wrote recently on setting up Nagios on FreeBSD: > > http://www.wonkity.com/~wblock/nagios.pdf Nagios is a good choice indeed, i've recently implemented a monitoring system for our rack at redbus using Nagios and i'm rather impressed with how well it all works! I've picked up a couple of "tricks" while doing this, the first one is simply to make very good use of service templates, most of the services we monitor in our rack are websites (using check_http) so that becomes a somewhat repeating entry in the config, to minimize this i have a template defined for website checks containing all of the static values which looks an awful lot like this define service{ use generic-service name website-service is_volatile 0 check_period 24x7 max_check_attempts 5 normal_check_interval 1 retry_check_interval 1 contact_groups admins notification_interval 240 notification_period 24x7 notification_options w,u,c,r register 0 } since the check command will be different for each site since the site address to query is included that gets specified in the site description resulting in an entry that looks a lot like this define service{ use website-service host_name <ServerName> service_description <ServiceName> (I use sitename) check_command check_site!http://<SiteName> } which greatly reduces the size of my config files and makes them a whole lot easier to maintain! The other trick i've picked up is split all my host definitions into individual files for each host then add an entry for them in the main Nagios config (much as i do with vhosts in apache), again this makes it far easier to maintain and has the bonus that removing a host is simply a matter of commenting out/deleting a line in the master config file. Last two things, firstly, nagios -v is your friend, it will give you concise and quite useful information on any errors in your config files and saves you loosing the system because of a typo, secondly, for remote checks nrpe is a godsend, it can be used to allow Nagios to check pretty much any local information on a remote machine and is quite easy to configure, for example I have it monitoring the capacity of the /usr mount our Solars machine (along with a few other bits). Hope that's helpful to someone :) --------------------- Mike Woods Systems Administrator
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4285E4A9.1040604>