Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Nov 2012 13:09:30 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Raymond Jimenez <raymondj@caltech.edu>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: ZFS kernel panics due to corrupt DVAs (despite RAIDZ)
Message-ID:  <50B49F6A.2020509@FreeBSD.org>
In-Reply-To: <50B3E680.8060606@caltech.edu>
References:  <50B3E680.8060606@caltech.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
on 27/11/2012 00:00 Raymond Jimenez said the following:
> Hello,
> 
> We recently sent our drives out for data recovery (blown drive
> electronics), and when we got the new drives/data back, ZFS
> started to kernel panic whenever listing certain items in a
> directory, or whenever a scrub is close to finishing (~99.97%)

Perhaps this thread could be of some interest to you:
http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/15611/focus=15616

> The zpool worked fine before data recovery, and most of the
> files are accessible (only a couple hundred unavailable out of
> several million).
> 
> Here's the kernel panic output if I scrub the disk:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address  = 0x38
> fault code             = supervisor read data, page not present
> instruction pointer    = 0x20:0xffffffff810792d1
> stack pointer          = 0x28:0xffffff8235122720
> frame pointer          = 0x28:0xffffff8235122750
> code segment           = base 0x0, limit 0xffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags       = interrupt enabled, resume, IOPL = 0
> current process        = 52 (txg_thread_enter)
> [thread pid 52 tid 101230 ]
> Stopped at vdev_is_dead+0x1: cmpq $0x5, 0x38(%rdi)
> 
> $rdi is zero, so this seems to be just a null pointer exception.
> 
> The vdev setup looks like:
> 
>   pool: mfs-zpool004
>  state: ONLINE
>   scan: scrub canceled on Mon Nov 26 05:40:49 2012
> config:
> 
>         NAME                        STATE     READ WRITE CKSUM
>         mfs-zpool004                ONLINE       0     0     0
>           raidz1-0                  ONLINE       0     0     0
>             gpt/lenin3-drive8       ONLINE       0     0     0
>             gpt/lenin3-drive9.eli   ONLINE       0     0     0
>             gpt/lenin3-drive10      ONLINE       0     0     0
>             gpt/lenin3-drive11.eli  ONLINE       0     0     0
>           raidz1-1                  ONLINE       0     0     0
>             gpt/lenin3-drive12      ONLINE       0     0     0
>             gpt/lenin3-drive13.eli  ONLINE       0     0     0
>             gpt/lenin3-drive14      ONLINE       0     0     0
>             gpt/lenin3-drive15.eli  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> The initial scrub fixed some data (~24k) in the early stages, but
> also crashed at 99.97%.
> 
> Right now, I'm using an interim work-around patch[1] so that our
> users can get files without worrying about crashing the server.
> It's a small check in dbuf_findbp() that checks if the DVA that will
> be returned has a small (=<16) vdev number, and if not, returns EIO.
> This just results in ZFS returning I/O errors for any of the corrupt
> files I try to access, which at least lets us get at our data for now.
> 
> My suspicion is that somehow, bad data is getting interpreted as
> a block pointer/shift constant, and this sends ZFS into the woods.
> I haven't been able to track down how this data could get past
> checksum verification, especially with RAIDZ.
> 
> Backtraces:
> 
> (both crashes due to vdev_is_dead() dereferencing a null pointer)
> 
> Scrub crash:
> http://wsyntax.com/~raymond/zfs/zfs-scrub-bt.txt
> 
> Prefetch off, ls -al of "/06/chunk_0000000001417E06_00000001.mfs":
> http://wsyntax.com/~raymond/zfs/zfs-ls-bt.txt
> 
> Regards,
> Raymond Jimenez
> 
> [1] http://wsyntax.com/~raymond/zfs/zfs-dva-corrupt-workaround.patch

For one reason or the other wrong data (but correct looking - proper checksums,
etc) got written to the disk.  I'd say use the patch, lift the data and
re-create the pool.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50B49F6A.2020509>