From owner-freebsd-fs@FreeBSD.ORG Sun Jan 20 22:26:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 67E80C1F; Sun, 20 Jan 2013 22:26:57 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by mx1.freebsd.org (Postfix) with ESMTP id C141A8A7; Sun, 20 Jan 2013 22:26:56 +0000 (UTC) Received: by mail-la0-f53.google.com with SMTP id fn20so5431393lab.40 for ; Sun, 20 Jan 2013 14:26:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=JuNTwzghQnLFi7b4kiekbY/7yuz5RXU+skBR1484Yts=; b=jG8/p/Q+tBT0MGMnibYW3KSGnqU4BC0pjxAKeBYrRpvxu6Y7c6dPSzZ4oOon5t9X3p QPrfrHOL5JYV9bfr+oE5lDunivuHHfAy71SvZFFaW6seynZuLX6VlDMa91hhixjcNvJ4 abCmG0u6RvklvMgRrWClvSBswaXki54+gtM1xT23ScDaBiBoB1SasbXXwTlXb5Ft6Kwm jZF2npvpAu+Q5rH9XbPfnWmDVQtf+26o1Yb7GWpbhRZJrkInODdyV4Oc1OnBVUsMDckg UrIbyCMxPeZViXEa+t2jyyUS6WxuWTJkTLHIUxwSMLDDzDb/bpiKjuriFYyAGV9PNkSL Zytw== MIME-Version: 1.0 X-Received: by 10.112.28.9 with SMTP id x9mr6710216lbg.27.1358720810293; Sun, 20 Jan 2013 14:26:50 -0800 (PST) Received: by 10.112.6.38 with HTTP; Sun, 20 Jan 2013 14:26:50 -0800 (PST) Date: Sun, 20 Jan 2013 17:26:50 -0500 Message-ID: Subject: ZFS regimen: scrub, scrub, scrub and scrub again. From: Zaphod Beeblebrox To: freebsd-fs , FreeBSD Hackers Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jan 2013 22:26:57 -0000 Please don't misinterpret this post: ZFS's ability to recover from fairly catastrophic failures is pretty stellar, but I'm wondering if there can be a little room for improvement. I use RAID pretty much everywhere. I don't like to loose data and disks are cheap. I have a fair amount of experience with all flavors ... and ZFS has become a go-to filesystem for most of my applications. One of the best recommendations I can give for ZFS is it's crash-recoverability. As a counter example, if you have most hardware RAID going or a software whole-disk raid, after a crash it will generally declare one disk as good and the other disk as "to be repaired" ... after which a full surface scan of the affected disks --- reading one and writing the other --- ensues. On my Windows desktop, the pair of 2T's take 3 or 4 hours to do this. A pair of green 2T's can take over 6. You don't loose any data, but you have severely reduced performance until it's repaired. The rub is that you know only one or two blocks could possibly even be different ... and that this is a highly unoptimized way of going about the problem. ZFS is smart on this point: it will recover on reboot with a minimum amount of fuss. Even if you dislodge a drive ... so that it's missing the last 'n' transactions, ZFS seems to figure this out (which I thought was extra cudos). MY PROBLEM comes from problems that scrub can fix. Let's talk, in specific, about my home array. It has 9x 1.5T and 8x 2T in a RAID-Z configuration (2 sets, obviously). The drives themselves are housed (4 each) in external drive bays with a single SATA connection for each. I think I have spoken of this here before. A full scrub of my drives weighs in at 36 hours or so. Now around Christmas, while moving some things, I managed to pull the plug on one cabinet of 4 drives. It was likely that the only active use of the filesystem was an automated cvs checkin (backup) given that the errors only appeared on the cvs directory. IN-THE-END, no data was lost, but I had to scrub 4 times to remove the complaints, which showed like this from "zpool status -v" errors: Permanent errors have been detected in the following files: vr2/cvs:<0x1c1> Now ... this is just an example: after each scrub, the hex number was different. I also couldn't actually find the error on the cvs filesystem, as a side note. Not many files are stored there, and they all seemed to be present. MY TAKEAWAY from this is that 2 major improvements could be made to ZFS: 1) a pause for scrub... such that long scrubs could be paused during working hours. 2) going back over errors... during each scrub, the "new" error was found before the old error was cleared. Then this new error gets similarly cleared by the next scrub. It seems that if the scrub returned to this new found error after fixing the "known" errors, this could save whole new scrub runs from being required.