From owner-freebsd-fs@FreeBSD.ORG  Sun Jan 20 22:26:57 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 67E80C1F;
 Sun, 20 Jan 2013 22:26:57 +0000 (UTC)
 (envelope-from zbeeble@gmail.com)
Received: from mail-la0-f53.google.com (mail-la0-f53.google.com
 [209.85.215.53]) by mx1.freebsd.org (Postfix) with ESMTP id C141A8A7;
 Sun, 20 Jan 2013 22:26:56 +0000 (UTC)
Received: by mail-la0-f53.google.com with SMTP id fn20so5431393lab.40
 for <multiple recipients>; Sun, 20 Jan 2013 14:26:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:date:message-id:subject:from:to
 :content-type; bh=JuNTwzghQnLFi7b4kiekbY/7yuz5RXU+skBR1484Yts=;
 b=jG8/p/Q+tBT0MGMnibYW3KSGnqU4BC0pjxAKeBYrRpvxu6Y7c6dPSzZ4oOon5t9X3p
 QPrfrHOL5JYV9bfr+oE5lDunivuHHfAy71SvZFFaW6seynZuLX6VlDMa91hhixjcNvJ4
 abCmG0u6RvklvMgRrWClvSBswaXki54+gtM1xT23ScDaBiBoB1SasbXXwTlXb5Ft6Kwm
 jZF2npvpAu+Q5rH9XbPfnWmDVQtf+26o1Yb7GWpbhRZJrkInODdyV4Oc1OnBVUsMDckg
 UrIbyCMxPeZViXEa+t2jyyUS6WxuWTJkTLHIUxwSMLDDzDb/bpiKjuriFYyAGV9PNkSL
 Zytw==
MIME-Version: 1.0
X-Received: by 10.112.28.9 with SMTP id x9mr6710216lbg.27.1358720810293; Sun,
 20 Jan 2013 14:26:50 -0800 (PST)
Received: by 10.112.6.38 with HTTP; Sun, 20 Jan 2013 14:26:50 -0800 (PST)
Date: Sun, 20 Jan 2013 17:26:50 -0500
Message-ID: <CACpH0Mf6sNb8JOsTzC+WSfQRB62+Zn7VtzEnihEKmEV2aO2p+w@mail.gmail.com>
Subject: ZFS regimen: scrub, scrub, scrub and scrub again.
From: Zaphod Beeblebrox <zbeeble@gmail.com>
To: freebsd-fs <freebsd-fs@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Jan 2013 22:26:57 -0000

Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be
a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS
has become a go-to filesystem for most of my applications.

One of the best recommendations I can give for ZFS is it's
crash-recoverability.  As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as "to be repaired" ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.  On my Windows desktop, the pair of 2T's take 3 or 4
hours to do this.  A pair of green 2T's can take over 6.  You don't loose
any data, but you have severely reduced performance until it's repaired.

The rub is that you know only one or two blocks could possibly even be
different ... and that this is a highly unoptimized way of going about the
problem.

ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).

MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).  The drives themselves are
housed (4 each) in external drive bays with a single SATA connection for
each.  I think I have spoken of this here before.

A full scrub of my drives weighs in at 36 hours or so.

Now around Christmas, while moving some things, I managed to pull the plug
on one cabinet of 4 drives.  It was likely that the only active use of the
filesystem was an automated cvs checkin (backup) given that the errors only
appeared on the cvs directory.

IN-THE-END, no data was lost, but I had to scrub 4 times to remove the
complaints, which showed like this from "zpool status -v"

errors: Permanent errors have been detected in the following files:

        vr2/cvs:<0x1c1>

Now ... this is just an example: after each scrub, the hex number was
different.  I also couldn't actually find the error on the cvs filesystem,
as a side note.  Not many files are stored there, and they all seemed to be
present.

MY TAKEAWAY from this is that 2 major improvements could be made to ZFS:

1) a pause for scrub... such that long scrubs could be paused during
working hours.

2) going back over errors... during each scrub, the "new" error was found
before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the "known" errors, this could save whole new
scrub runs from being required.