From owner-freebsd-fs@FreeBSD.ORG  Thu Apr 26 03:33:22 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6DC0C106566B
	for <freebsd-fs@freebsd.org>; Thu, 26 Apr 2012 03:33:22 +0000 (UTC)
	(envelope-from areilly@bigpond.net.au)
Received: from nskntmtas04p.mx.bigpond.com (nskntmtas04p.mx.bigpond.com
	[61.9.168.146]) by mx1.freebsd.org (Postfix) with ESMTP id 0BC8C8FC15
	for <freebsd-fs@freebsd.org>; Thu, 26 Apr 2012 03:33:21 +0000 (UTC)
Received: from nskntcmgw09p ([61.9.169.169]) by nskntmtas04p.mx.bigpond.com
	with ESMTP
	id <20120426033315.MVAP6123.nskntmtas04p.mx.bigpond.com@nskntcmgw09p>; 
	Thu, 26 Apr 2012 03:33:15 +0000
Received: from johnny.reilly.home ([124.188.161.100])
	by nskntcmgw09p with BigPond Outbound
	id 2TZE1j00E2AGJ5o01TZEsc; Thu, 26 Apr 2012 03:33:15 +0000
X-Authority-Analysis: v=2.0 cv=Lam+G0ji c=1 sm=1
	a=+rWFdGQzZE3xDYVtG1Y/Og==:17 a=z1TLwsU0kBEA:10 a=ea6dOSa9tC4A:10
	a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=FlvcrlTfTUzdFB0reuYA:9
	a=a4JVoHFC1-deISbwue4A:7 a=CjuIK1q_8ugA:10 a=GQuej_zXljwA:10
	a=yb16PtubOwQA:10 a=+rWFdGQzZE3xDYVtG1Y/Og==:117
Date: Thu, 26 Apr 2012 13:33:14 +1000
From: Andrew Reilly <areilly@bigpond.net.au>
To: Peter Maloney <peter.maloney@brockmann-consult.de>
Message-ID: <20120426033314.GA9016@johnny.reilly.home>
References: <20120424143014.GA2865@johnny.reilly.home>
	<4F96BAB9.9080303@brockmann-consult.de>
	<20120424232136.GA1441@johnny.reilly.home>
	<4F981A0D.40507@brockmann-consult.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4F981A0D.40507@brockmann-consult.de>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-fs@freebsd.org
Subject: Re: Odd file system corruption in ZFS pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Apr 2012 03:33:22 -0000

On Wed, Apr 25, 2012 at 05:36:45PM +0200, Peter Maloney wrote:
> On 04/25/2012 01:21 AM, Andrew Reilly wrote:
> >On Tue, Apr 24, 2012 at 04:37:45PM +0200, Peter Maloney wrote:
> >Rm and rm -r doesn't work.  Even as root, rm -rf Maildir.bad
> >returns a lot of messages of the form: foo/bar: no such file
> >or directory.  The result is that I now have a directory that
> >contains no "good" files, but a concentrated collection of
> >breakage.
> That sucks. But there is one thing I forgot... you need to run the "rm" 
> command immediately after scrub. (no export, reboot, etc. in between). 

I believe that I've tried that, and it still didn't work.  The
system is behaving as though the directory has a file with an
illegal or unallocated inode number.  Directories don't seem to
be amenable to the old-school techniques of looking at them with
hexdump or whatever, either, so I can't tell more than that.
The names exist in the directory, but ask for any info that
would be in the inode and you get an error.

> Is your broken stuff limited to a single dataset, or the whole pool? You 
> could try making a second dataset, copying good files to it, and 
> destroying the old one (losing all your snapshots on that dataset, of 
> course).

Seems to be only associated with the filesystem, rather than the
pool.  Well, my "tank" pool, (the raidz) shows zpool scrub
making 0 fixes but there being unrecoverable erorrs in
tank/home:<0x0>, but my backup file system (the one I send
snapshot deltas to) shows exactly the same errors with no tank
problems.  (Hmm. Hold that thought: I haven't actually tried a
scrub on the backup file system.  It's just zpool status that
shows no errors.  Running a scrub now.  Will take a while: it's
a fairly slow USB2-connected disk.  Zpool status says expect 10+
hours...)

> Here is another thread about it:
> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027902.html

That does seem to be the same situation that I'm seeing.

> And this message looks interesting: "but if you search on the lists for 
> up to a year or so, you'll find some useful commands to inspect and 
> destroy corrupted objects."
> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/027926.html

Not sure about destroying corrupted objects smaller than at the
file-system level.  It's annoying: if I could just remove these
files, I'd be happy, because I've already restored them from the
backup.  Instead, it is starting to look as though the only way
to proceed is to destroy my home filesystem, recreate it and
repopulate it from the backup (using something like rsync that
doesn't also replicate the filesystem damage.)  That sounds like
a lot of down-time on what is a fairly busy system.

> And
> "I tried your suggestion and ran the command "zdb -ccv backups" to try 
> and check the consistency of the troublesome "backups" pool.  This is 
> what I ended up with:"
> 
> But they don't say what the solution is (other than destroy the pool, 
> and I would think the dataset could be enough since the filesystem is 
> corrupt, but maybe not the pool).

FYI: I've been running "zdb -ccv bkp2pool" on my backup disk, to
see if it has anything to say about the dangling directory
entries.  Problem is that it currently has a process size of
about 5G (RES 2305M) on a system with 4G of physical RAM: it's
paging like crazy.  Probably unhelpful.

> >I have another zpool scrub running at the moment.  We'll see if
> >that is able to clean it up, but it hasn't had much luck in the
> >past.
> >
> >Note that none of these broken files or directories show up in
> >the zpool status -v error list.  That just contains the one
> >entry for the zfs root directory: tank/home:<0x0>
> >
> >Cheers,
> >
> I doubt scrubbing more than once (repeating the same thing and expecting 
> different results) should fix anything. But if you scrubbed on 
> OpenIndiana, it would at least be different. And if it worked, you could 
> file a PR about it.

Some of the (perhaps Solaris related) ZFS web pages I've been
reading lately suggested that several zpool scrub passes were
beneficial.  Certainly I seem to have hit a local minimum on the
goodness curve at the moment.

Thanks for the suggestions.  Appreciated.

Cheers,

-- 
Andrew