From owner-freebsd-fs@FreeBSD.ORG Thu Apr 26 03:33:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6DC0C106566B for ; Thu, 26 Apr 2012 03:33:22 +0000 (UTC) (envelope-from areilly@bigpond.net.au) Received: from nskntmtas04p.mx.bigpond.com (nskntmtas04p.mx.bigpond.com [61.9.168.146]) by mx1.freebsd.org (Postfix) with ESMTP id 0BC8C8FC15 for ; Thu, 26 Apr 2012 03:33:21 +0000 (UTC) Received: from nskntcmgw09p ([61.9.169.169]) by nskntmtas04p.mx.bigpond.com with ESMTP id <20120426033315.MVAP6123.nskntmtas04p.mx.bigpond.com@nskntcmgw09p>; Thu, 26 Apr 2012 03:33:15 +0000 Received: from johnny.reilly.home ([124.188.161.100]) by nskntcmgw09p with BigPond Outbound id 2TZE1j00E2AGJ5o01TZEsc; Thu, 26 Apr 2012 03:33:15 +0000 X-Authority-Analysis: v=2.0 cv=Lam+G0ji c=1 sm=1 a=+rWFdGQzZE3xDYVtG1Y/Og==:17 a=z1TLwsU0kBEA:10 a=ea6dOSa9tC4A:10 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=FlvcrlTfTUzdFB0reuYA:9 a=a4JVoHFC1-deISbwue4A:7 a=CjuIK1q_8ugA:10 a=GQuej_zXljwA:10 a=yb16PtubOwQA:10 a=+rWFdGQzZE3xDYVtG1Y/Og==:117 Date: Thu, 26 Apr 2012 13:33:14 +1000 From: Andrew Reilly To: Peter Maloney Message-ID: <20120426033314.GA9016@johnny.reilly.home> References: <20120424143014.GA2865@johnny.reilly.home> <4F96BAB9.9080303@brockmann-consult.de> <20120424232136.GA1441@johnny.reilly.home> <4F981A0D.40507@brockmann-consult.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F981A0D.40507@brockmann-consult.de> User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org Subject: Re: Odd file system corruption in ZFS pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2012 03:33:22 -0000 On Wed, Apr 25, 2012 at 05:36:45PM +0200, Peter Maloney wrote: > On 04/25/2012 01:21 AM, Andrew Reilly wrote: > >On Tue, Apr 24, 2012 at 04:37:45PM +0200, Peter Maloney wrote: > >Rm and rm -r doesn't work. Even as root, rm -rf Maildir.bad > >returns a lot of messages of the form: foo/bar: no such file > >or directory. The result is that I now have a directory that > >contains no "good" files, but a concentrated collection of > >breakage. > That sucks. But there is one thing I forgot... you need to run the "rm" > command immediately after scrub. (no export, reboot, etc. in between). I believe that I've tried that, and it still didn't work. The system is behaving as though the directory has a file with an illegal or unallocated inode number. Directories don't seem to be amenable to the old-school techniques of looking at them with hexdump or whatever, either, so I can't tell more than that. The names exist in the directory, but ask for any info that would be in the inode and you get an error. > Is your broken stuff limited to a single dataset, or the whole pool? You > could try making a second dataset, copying good files to it, and > destroying the old one (losing all your snapshots on that dataset, of > course). Seems to be only associated with the filesystem, rather than the pool. Well, my "tank" pool, (the raidz) shows zpool scrub making 0 fixes but there being unrecoverable erorrs in tank/home:<0x0>, but my backup file system (the one I send snapshot deltas to) shows exactly the same errors with no tank problems. (Hmm. Hold that thought: I haven't actually tried a scrub on the backup file system. It's just zpool status that shows no errors. Running a scrub now. Will take a while: it's a fairly slow USB2-connected disk. Zpool status says expect 10+ hours...) > Here is another thread about it: > http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027902.html That does seem to be the same situation that I'm seeing. > And this message looks interesting: "but if you search on the lists for > up to a year or so, you'll find some useful commands to inspect and > destroy corrupted objects." > http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027926.html Not sure about destroying corrupted objects smaller than at the file-system level. It's annoying: if I could just remove these files, I'd be happy, because I've already restored them from the backup. Instead, it is starting to look as though the only way to proceed is to destroy my home filesystem, recreate it and repopulate it from the backup (using something like rsync that doesn't also replicate the filesystem damage.) That sounds like a lot of down-time on what is a fairly busy system. > And > "I tried your suggestion and ran the command "zdb -ccv backups" to try > and check the consistency of the troublesome "backups" pool. This is > what I ended up with:" > > But they don't say what the solution is (other than destroy the pool, > and I would think the dataset could be enough since the filesystem is > corrupt, but maybe not the pool). FYI: I've been running "zdb -ccv bkp2pool" on my backup disk, to see if it has anything to say about the dangling directory entries. Problem is that it currently has a process size of about 5G (RES 2305M) on a system with 4G of physical RAM: it's paging like crazy. Probably unhelpful. > >I have another zpool scrub running at the moment. We'll see if > >that is able to clean it up, but it hasn't had much luck in the > >past. > > > >Note that none of these broken files or directories show up in > >the zpool status -v error list. That just contains the one > >entry for the zfs root directory: tank/home:<0x0> > > > >Cheers, > > > I doubt scrubbing more than once (repeating the same thing and expecting > different results) should fix anything. But if you scrubbed on > OpenIndiana, it would at least be different. And if it worked, you could > file a PR about it. Some of the (perhaps Solaris related) ZFS web pages I've been reading lately suggested that several zpool scrub passes were beneficial. Certainly I seem to have hit a local minimum on the goodness curve at the moment. Thanks for the suggestions. Appreciated. Cheers, -- Andrew