From owner-freebsd-hackers@FreeBSD.ORG  Mon Jan 21 11:12:56 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2CF77807;
 Mon, 21 Jan 2013 11:12:56 +0000 (UTC)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl
 [188.252.31.196])
 by mx1.freebsd.org (Postfix) with ESMTP id 983D8930;
 Mon, 21 Jan 2013 11:12:55 +0000 (UTC)
Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1])
 by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id r0LBCjTd001085;
 Mon, 21 Jan 2013 12:12:45 +0100 (CET)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from localhost (wojtek@localhost)
 by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id r0LBCjTT001082;
 Mon, 21 Jan 2013 12:12:45 +0100 (CET)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Date: Mon, 21 Jan 2013 12:12:45 +0100 (CET)
From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To: Zaphod Beeblebrox <zbeeble@gmail.com>
Subject: Re: ZFS regimen: scrub, scrub, scrub and scrub again.
In-Reply-To: <CACpH0Mf6sNb8JOsTzC+WSfQRB62+Zn7VtzEnihEKmEV2aO2p+w@mail.gmail.com>
Message-ID: <alpine.BSF.2.00.1301211201570.9447@wojtek.tensor.gdynia.pl>
References: <CACpH0Mf6sNb8JOsTzC+WSfQRB62+Zn7VtzEnihEKmEV2aO2p+w@mail.gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7
 (wojtek.tensor.gdynia.pl [127.0.0.1]); Mon, 21 Jan 2013 12:12:46 +0100 (CET)
Cc: freebsd-fs <freebsd-fs@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jan 2013 11:12:56 -0000

> Please don't misinterpret this post: ZFS's ability to recover from fairly
> catastrophic failures is pretty stellar, but I'm wondering if there can be

from my testing it is exactly opposite. You have to see a difference 
between marketing and reality.

> a little room for improvement.
>
> I use RAID pretty much everywhere.  I don't like to loose data and disks
> are cheap.  I have a fair amount of experience with all flavors ... and ZFS

just like me. And because i want performance and - as you described - 
disks are cheap - i use RAID-1 (gmirror).

> has become a go-to filesystem for most of my applications.

My applications doesn't tolerate low performance, overcomplexity and 
high risk of data loss.

That's why i use properly tuned UFS, gmirror, and prefer not to use 
gstripe but have multiple filesystems

> One of the best recommendations I can give for ZFS is it's
> crash-recoverability.

Which is marketing, not truth. If you want bullet-proof recoverability, 
UFS beats everything i've ever seen.

If you want FAST crash recovery, use softupdates+journal, available in 
FreeBSD 9.

>  As a counter example, if you have most hardware RAID
> going or a software whole-disk raid, after a crash it will generally
> declare one disk as good and the other disk as "to be repaired" ... after
> which a full surface scan of the affected disks --- reading one and writing
> the other --- ensues.

true. gmirror do it, but you can defer mirror rebuild, which i use.
I have a script that send me a mail when gmirror is degraded, and i - 
after finding out the cause of problem, and possibly replacing disk - run 
rebuild after work hours, so no slowdown is experienced.

> ZFS is smart on this point: it will recover on reboot with a minimum amount
> of fuss.  Even if you dislodge a drive ... so that it's missing the last
> 'n' transactions, ZFS seems to figure this out (which I thought was extra
> cudos).

Yes this is marketing. practice is somehow different. as you discovered 
yourself.

>
> MY PROBLEM comes from problems that scrub can fix.
>
> Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
> a RAID-Z configuration (2 sets, obviously).

While RAID-Z is already a king of bad performance, i assume 
you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would 
spread load unevenly and make performance even worse.

>
> A full scrub of my drives weighs in at 36 hours or so.

which is funny as ZFS is marketed as doing this efficient (like checking 
only used space).

dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. 
and you may do all in parallel.

>        vr2/cvs:<0x1c1>
>
> Now ... this is just an example: after each scrub, the hex number was

seems like scrub simply not do it's work right.

> before the old error was cleared.  Then this new error gets similarly
> cleared by the next scrub.  It seems that if the scrub returned to this new
> found error after fixing the "known" errors, this could save whole new
> scrub runs from being required.

Even better - use UFS.
For both bullet proof recoverability and performance.
If you need help in tuning you may ask me privately.