From owner-freebsd-fs@FreeBSD.ORG Wed Jun 29 11:19:18 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E0FC1065673 for ; Wed, 29 Jun 2011 11:19:18 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id 04A598FC13 for ; Wed, 29 Jun 2011 11:19:17 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta07.emeryville.ca.mail.comcast.net with comcast id 1nJy1h0060x6nqcA7nKFwk; Wed, 29 Jun 2011 11:19:15 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta12.emeryville.ca.mail.comcast.net with comcast id 1nKC1h0091t3BNj8YnKCpK; Wed, 29 Jun 2011 11:19:13 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4A27C102C19; Wed, 29 Jun 2011 04:19:15 -0700 (PDT) Date: Wed, 29 Jun 2011 04:19:15 -0700 From: Jeremy Chadwick To: Glen Barber Message-ID: <20110629111915.GA75648@icarus.home.lan> References: <20110628203228.GA4957@onyx.glenbarber.us> <20110629104633.26824evikzh8tgtl@webmail.leidinger.net> <4E0B006C.8050000@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E0B006C.8050000@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Alexander Leidinger , fs@FreeBSD.org Subject: Re: [RFC] [patch] periodic status-zfs: list pools in daily emails X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 11:19:18 -0000 On Wed, Jun 29, 2011 at 06:37:32AM -0400, Glen Barber wrote: > Hi Alexander, > > On 6/29/11 4:46 AM, Alexander Leidinger wrote: > >> I added a default behavior to list the pools on the system, in > >> addition to > >> checking if the pool is healthy. I think it might be useful for > >> others to > >> have this as the default behavior, for example on systems where dedup is > >> enabled to track the dedup statistics over time. > > > > I do not think this is a bad idea to be able to see the pools... but > > IMHO it should be configurable (no strong opinion about "enabled or > > disabled by default"). > > > > Agreed. I can add this in. > > >> The output of the the script after my changes follows: > > > > Info to others: this is the default output, there is no special option > > to track DEDUP. > > > >> Checking status of zfs pools: > >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > >> zroot 456G 147G 309G 32% 1.00x ONLINE - > >> zstore 928G 258G 670G 27% 1.00x ONLINE - > >> all pools are healthy > >> > >> Feedback would be appreciated. A diff is attached. > > > > Did you test it with an unhealthy pool? If yes, how does the result look > > like? > > > > I have not, yet. I can do this later today by breaking a mirror. > > > For the healthy case we have redundant info (but as the brain is good at > > pattern matching, I would object to replace the status with the list > > output, in case someone would suggest this). In the unhealthy case we > > will surely have more info, my inquiry about it is if an empty line > > between the list and the status would make it more readable or not. > > > > I will reply later today with of the script with an unhealthy pool, and > will make listing the pools configurable. I imagine an empty line would > certainly make it more readable in either case. I would be reluctant to > replace 'status' output with 'list' output for healthy pools mostly to > avoid headaches for people parsing their daily email, specifically > looking for (or missing) 'all pools are healthy.' At my workplace we use a heavily modified version of Netsaint, with bits and pieces Nagios-like created. I happened to write the perl code used to monitor our production Solaris systems (~2000+ servers) for ZFS pool status. It parses "zpool status -x" output, monitoring read, write, and checksum errors per pool, vdev, and device, in addition to general pool status. I tested too many conditions, not to mention had to deal with parsing pains as a result of ZFS code changes, plus supporting completely different revisions of Solaris 10 in production. And before someone asks: no, I cannot provide the source (employee agreements, LCA, etc...). I did have to dig through ZFS source code to figure out a bunch of necessary bits too, so don't be surprised if you have to too. My recommendation: just look for pools which are in any state other than ONLINE (don't try to be smart with an OR regex looking for all the combos; it doesn't scale when ZFS changes), and you should also handle situations where a device is currently undergoing manual or automatic device replacement (specifically regex '^[\t\s]+replacing\s+DEGRADED'), which will be important to people who keep spares in pools. This might be difficult with just standard BSD sh, but BSD awk should be able to handle this. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |