Date: Wed, 29 Jun 2011 04:19:15 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Glen Barber <gjb@FreeBSD.org> Cc: Alexander Leidinger <Alexander@Leidinger.net>, fs@FreeBSD.org Subject: Re: [RFC] [patch] periodic status-zfs: list pools in daily emails Message-ID: <20110629111915.GA75648@icarus.home.lan> In-Reply-To: <4E0B006C.8050000@FreeBSD.org> References: <20110628203228.GA4957@onyx.glenbarber.us> <20110629104633.26824evikzh8tgtl@webmail.leidinger.net> <4E0B006C.8050000@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 29, 2011 at 06:37:32AM -0400, Glen Barber wrote: > Hi Alexander, > > On 6/29/11 4:46 AM, Alexander Leidinger wrote: > >> I added a default behavior to list the pools on the system, in > >> addition to > >> checking if the pool is healthy. I think it might be useful for > >> others to > >> have this as the default behavior, for example on systems where dedup is > >> enabled to track the dedup statistics over time. > > > > I do not think this is a bad idea to be able to see the pools... but > > IMHO it should be configurable (no strong opinion about "enabled or > > disabled by default"). > > > > Agreed. I can add this in. > > >> The output of the the script after my changes follows: > > > > Info to others: this is the default output, there is no special option > > to track DEDUP. > > > >> Checking status of zfs pools: > >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > >> zroot 456G 147G 309G 32% 1.00x ONLINE - > >> zstore 928G 258G 670G 27% 1.00x ONLINE - > >> all pools are healthy > >> > >> Feedback would be appreciated. A diff is attached. > > > > Did you test it with an unhealthy pool? If yes, how does the result look > > like? > > > > I have not, yet. I can do this later today by breaking a mirror. > > > For the healthy case we have redundant info (but as the brain is good at > > pattern matching, I would object to replace the status with the list > > output, in case someone would suggest this). In the unhealthy case we > > will surely have more info, my inquiry about it is if an empty line > > between the list and the status would make it more readable or not. > > > > I will reply later today with of the script with an unhealthy pool, and > will make listing the pools configurable. I imagine an empty line would > certainly make it more readable in either case. I would be reluctant to > replace 'status' output with 'list' output for healthy pools mostly to > avoid headaches for people parsing their daily email, specifically > looking for (or missing) 'all pools are healthy.' At my workplace we use a heavily modified version of Netsaint, with bits and pieces Nagios-like created. I happened to write the perl code used to monitor our production Solaris systems (~2000+ servers) for ZFS pool status. It parses "zpool status -x" output, monitoring read, write, and checksum errors per pool, vdev, and device, in addition to general pool status. I tested too many conditions, not to mention had to deal with parsing pains as a result of ZFS code changes, plus supporting completely different revisions of Solaris 10 in production. And before someone asks: no, I cannot provide the source (employee agreements, LCA, etc...). I did have to dig through ZFS source code to figure out a bunch of necessary bits too, so don't be surprised if you have to too. My recommendation: just look for pools which are in any state other than ONLINE (don't try to be smart with an OR regex looking for all the combos; it doesn't scale when ZFS changes), and you should also handle situations where a device is currently undergoing manual or automatic device replacement (specifically regex '^[\t\s]+replacing\s+DEGRADED'), which will be important to people who keep spares in pools. This might be difficult with just standard BSD sh, but BSD awk should be able to handle this. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110629111915.GA75648>