From owner-freebsd-fs@FreeBSD.ORG  Wed Jun 29 11:19:18 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E0FC1065673
	for <fs@freebsd.org>; Wed, 29 Jun 2011 11:19:18 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.emeryville.ca.mail.comcast.net
	(qmta07.emeryville.ca.mail.comcast.net [76.96.30.64])
	by mx1.freebsd.org (Postfix) with ESMTP id 04A598FC13
	for <fs@freebsd.org>; Wed, 29 Jun 2011 11:19:17 +0000 (UTC)
Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44])
	by qmta07.emeryville.ca.mail.comcast.net with comcast
	id 1nJy1h0060x6nqcA7nKFwk; Wed, 29 Jun 2011 11:19:15 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta12.emeryville.ca.mail.comcast.net with comcast
	id 1nKC1h0091t3BNj8YnKCpK; Wed, 29 Jun 2011 11:19:13 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 4A27C102C19; Wed, 29 Jun 2011 04:19:15 -0700 (PDT)
Date: Wed, 29 Jun 2011 04:19:15 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Glen Barber <gjb@FreeBSD.org>
Message-ID: <20110629111915.GA75648@icarus.home.lan>
References: <20110628203228.GA4957@onyx.glenbarber.us>
	<20110629104633.26824evikzh8tgtl@webmail.leidinger.net>
	<4E0B006C.8050000@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E0B006C.8050000@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Alexander Leidinger <Alexander@Leidinger.net>, fs@FreeBSD.org
Subject: Re: [RFC] [patch] periodic status-zfs: list pools in daily emails
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 11:19:18 -0000

On Wed, Jun 29, 2011 at 06:37:32AM -0400, Glen Barber wrote:
> Hi Alexander,
> 
> On 6/29/11 4:46 AM, Alexander Leidinger wrote:
> >> I added a default behavior to list the pools on the system, in
> >> addition to
> >> checking if the pool is healthy.  I think it might be useful for
> >> others to
> >> have this as the default behavior, for example on systems where dedup is
> >> enabled to track the dedup statistics over time.
> > 
> > I do not think this is a bad idea to be able to see the pools... but
> > IMHO it should be configurable (no strong opinion about "enabled or
> > disabled by default").
> > 
> 
> Agreed.  I can add this in.
> 
> >> The output of the the script after my changes follows:
> > 
> > Info to others: this is the default output, there is no special option
> > to track DEDUP.
> > 
> >> Checking status of zfs pools:
> >> NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> >> zroot    456G   147G   309G    32%  1.00x  ONLINE  -
> >> zstore   928G   258G   670G    27%  1.00x  ONLINE  -
> >> all pools are healthy
> >>
> >> Feedback would be appreciated.  A diff is attached.
> > 
> > Did you test it with an unhealthy pool? If yes, how does the result look
> > like?
> > 
> 
> I have not, yet.  I can do this later today by breaking a mirror.
> 
> > For the healthy case we have redundant info (but as the brain is good at
> > pattern matching, I would object to replace the status with the list
> > output, in case someone would suggest this). In the unhealthy case we
> > will surely have more info, my inquiry about it is if an empty line
> > between the list and the status would make it more readable or not.
> > 
> 
> I will reply later today with of the script with an unhealthy pool, and
> will make listing the pools configurable.  I imagine an empty line would
> certainly make it more readable in either case.  I would be reluctant to
> replace 'status' output with 'list' output for healthy pools mostly to
> avoid headaches for people parsing their daily email, specifically
> looking for (or missing) 'all pools are healthy.'

At my workplace we use a heavily modified version of Netsaint, with bits
and pieces Nagios-like created.  I happened to write the perl code used
to monitor our production Solaris systems (~2000+ servers) for ZFS pool
status.  It parses "zpool status -x" output, monitoring read, write, and
checksum errors per pool, vdev, and device, in addition to general pool
status.  I tested too many conditions, not to mention had to deal with
parsing pains as a result of ZFS code changes, plus supporting
completely different revisions of Solaris 10 in production.  And before
someone asks: no, I cannot provide the source (employee agreements, LCA,
etc...).  I did have to dig through ZFS source code to figure out a
bunch of necessary bits too, so don't be surprised if you have to too.

My recommendation: just look for pools which are in any state other than
ONLINE (don't try to be smart with an OR regex looking for all the
combos; it doesn't scale when ZFS changes), and you should also handle
situations where a device is currently undergoing manual or automatic
device replacement (specifically regex '^[\t\s]+replacing\s+DEGRADED'),
which will be important to people who keep spares in pools.  This might
be difficult with just standard BSD sh, but BSD awk should be able to
handle this.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |