From owner-freebsd-fs@FreeBSD.ORG Thu Apr 26 03:45:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16592106566C for ; Thu, 26 Apr 2012 03:45:01 +0000 (UTC) (envelope-from areilly@bigpond.net.au) Received: from nskntmtas02p.mx.bigpond.com (nskntmtas02p.mx.bigpond.com [61.9.168.140]) by mx1.freebsd.org (Postfix) with ESMTP id CCDFC8FC12 for ; Thu, 26 Apr 2012 03:45:00 +0000 (UTC) Received: from nskntcmgw07p ([61.9.169.167]) by nskntmtas02p.mx.bigpond.com with ESMTP id <20120426034453.KCSG24575.nskntmtas02p.mx.bigpond.com@nskntcmgw07p>; Thu, 26 Apr 2012 03:44:53 +0000 Received: from johnny.reilly.home ([124.188.161.100]) by nskntcmgw07p with BigPond Outbound id 2Tks1j00Y2AGJ5o01TksRN; Thu, 26 Apr 2012 03:44:53 +0000 X-Authority-Analysis: v=2.0 cv=R/eB6KtX c=1 sm=1 a=+rWFdGQzZE3xDYVtG1Y/Og==:17 a=z1TLwsU0kBEA:10 a=ea6dOSa9tC4A:10 a=kj9zAlcOel0A:10 a=0pSUehbIupVPdvFBYokA:9 a=AARTbVhTkrq5M_tpdT8A:7 a=CjuIK1q_8ugA:10 a=+rWFdGQzZE3xDYVtG1Y/Og==:117 Date: Thu, 26 Apr 2012 13:44:52 +1000 From: Andrew Reilly To: Bob Friesenhahn Message-ID: <20120426034452.GB9016@johnny.reilly.home> References: <20120424143014.GA2865@johnny.reilly.home> <4F96BAB9.9080303@brockmann-consult.de> <20120424232136.GA1441@johnny.reilly.home> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org Subject: Re: Odd file system corruption in ZFS pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2012 03:45:10 -0000 On Wed, Apr 25, 2012 at 08:58:41AM -0500, Bob Friesenhahn wrote: > With properly implemented hardware (i.e. drives which obey the cache > flush request) it should not be possible to corrupt zfs due to power > failure. Does that comment apply to enerprise-class SATA drives? I was under the impression that all SATA drives lied about cache flush status. Hence the notion that I need to get myself a UPS. > Some of the most recently written data may be lost, but zfs > should come up totally coherent at some point in the recent past. Certainly it has been my experience that ZFS is extremely robust in this regard, even with the inexpensive hardware that I have. The power has gone down many times (mostly thanks to builders on site) with no problems. Not this time, though. > It > is important to use a system which supports ECC memory to assure that > data is not corrupted in memory since zfs does not defend against > that. Not reasonable for an inexpensive home file/e-mail/whatever server, IMO. Well, none of the mini-ITX motherboards I saw touted ECC as an available option. This box does quite a bit of work though, and rebuilds itself from source every couple of weeks with nary a hiccup. So I'm fairly confident that it's solid. > Storage redundancy is necessary to correct any data read > errors but should not be necessary to defend against the result of > power failure. I have raidz on the broken filesystem, and a separate nightly backup. That ought to be enough redundancy to get me through, assuming that I can work around the filesystem damage in the former that seems to have propagated itself to the latter. johnny [220]$ /bin/ls -a /backup2/home/andrew/Maildir.bad/ . .. .AppleDouble .Suppliers.2010 .Unix johnny [221]$ /bin/ls -ai /backup2/home/andrew/Maildir.bad/ ls: .Suppliers.2010: No such file or directory 7906 . 82016 .AppleDouble 7810 .. 80774 .Unix johnny [218]$ sudo zpool status bkp2pool pool: bkp2pool state: ONLINE scan: scrub in progress since Thu Apr 26 13:29:36 2012 14.3G scanned out of 745G at 23.1M/s, 8h59m to go 0 repaired, 1.93% done config: NAME STATE READ WRITE CKSUM bkp2pool ONLINE 0 0 0 gpt/backup3g ONLINE 0 0 0 errors: No known data errors So: the corruption in the dangling .Suppliers.2010 reference has (a) propagated to the backup, using zfs send/receive (b) is at weirder level than simple inode corruption, because I can't even list the inode... (c) doesn't show up in zpool status as cksum or other errors. Cheers, -- Andrew