From owner-freebsd-fs@FreeBSD.ORG Tue Nov 27 00:54:21 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4A101250 for ; Tue, 27 Nov 2012 00:54:21 +0000 (UTC) (envelope-from raymondj@caltech.edu) Received: from outgoing-mail.its.caltech.edu (outgoing-mail.its.caltech.edu [131.215.239.19]) by mx1.freebsd.org (Postfix) with ESMTP id 266308FC0C for ; Tue, 27 Nov 2012 00:54:20 +0000 (UTC) Received: from fire-doxen.imss.caltech.edu (localhost [127.0.0.1]) by fire-doxen-postvirus (Postfix) with ESMTP id 9AEC12E50E6C; Mon, 26 Nov 2012 16:54:20 -0800 (PST) X-Spam-Scanned: at Caltech-IMSS on fire-doxen by amavisd-new Received: from [127.0.0.1] (mitsuki.caltech.edu [131.215.167.33]) (Authenticated sender: raymondj) by fire-doxen-submit (Postfix) with ESMTP id 577A42E50E69; Mon, 26 Nov 2012 16:54:17 -0800 (PST) Message-ID: <50B40F26.7070608@caltech.edu> Date: Mon, 26 Nov 2012 16:53:58 -0800 From: Raymond Jimenez User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Garrett Cooper Subject: Re: ZFS kernel panics due to corrupt DVAs (despite RAIDZ) References: <50B3E680.8060606@caltech.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Nov 2012 00:54:21 -0000 On 11/26/2012 2:21 PM, Garrett Cooper wrote: > On Mon, Nov 26, 2012 at 2:00 PM, Raymond Jimenez wrote: >> Hello, >> >> We recently sent our drives out for data recovery (blown drive >> electronics), and when we got the new drives/data back, ZFS >> started to kernel panic whenever listing certain items in a >> directory, or whenever a scrub is close to finishing (~99.97%) >> >> The zpool worked fine before data recovery, and most of the >> files are accessible (only a couple hundred unavailable out of >> several million). >> >> Here's the kernel panic output if I scrub the disk: >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x38 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff810792d1 >> stack pointer = 0x28:0xffffff8235122720 >> frame pointer = 0x28:0xffffff8235122750 >> code segment = base 0x0, limit 0xffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 52 (txg_thread_enter) >> [thread pid 52 tid 101230 ] >> Stopped at vdev_is_dead+0x1: cmpq $0x5, 0x38(%rdi) >> >> $rdi is zero, so this seems to be just a null pointer exception. >> >> The vdev setup looks like: >> >> pool: mfs-zpool004 >> state: ONLINE >> scan: scrub canceled on Mon Nov 26 05:40:49 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> mfs-zpool004 ONLINE 0 0 0 >> raidz1-0 ONLINE 0 0 0 >> gpt/lenin3-drive8 ONLINE 0 0 0 >> gpt/lenin3-drive9.eli ONLINE 0 0 0 >> gpt/lenin3-drive10 ONLINE 0 0 0 >> gpt/lenin3-drive11.eli ONLINE 0 0 0 >> raidz1-1 ONLINE 0 0 0 >> gpt/lenin3-drive12 ONLINE 0 0 0 >> gpt/lenin3-drive13.eli ONLINE 0 0 0 >> gpt/lenin3-drive14 ONLINE 0 0 0 >> gpt/lenin3-drive15.eli ONLINE 0 0 0 >> >> errors: No known data errors >> >> The initial scrub fixed some data (~24k) in the early stages, but >> also crashed at 99.97%. >> >> Right now, I'm using an interim work-around patch[1] so that our >> users can get files without worrying about crashing the server. >> It's a small check in dbuf_findbp() that checks if the DVA that will >> be returned has a small (=<16) vdev number, and if not, returns EIO. >> This just results in ZFS returning I/O errors for any of the corrupt >> files I try to access, which at least lets us get at our data for now. >> >> My suspicion is that somehow, bad data is getting interpreted as >> a block pointer/shift constant, and this sends ZFS into the woods. >> I haven't been able to track down how this data could get past >> checksum verification, especially with RAIDZ. >> >> Backtraces: >> >> (both crashes due to vdev_is_dead() dereferencing a null pointer) >> >> Scrub crash: >> http://wsyntax.com/~raymond/zfs/zfs-scrub-bt.txt >> >> Prefetch off, ls -al of "/06/chunk_0000000001417E06_00000001.mfs": >> http://wsyntax.com/~raymond/zfs/zfs-ls-bt.txt > > This is missing key details like uname, zpool version, etc. Sorry, total oversight on my part. uname -a: FreeBSD 03.chunk.dabney 9.0-STABLE FreeBSD 9.0-STABLE #25: Sat Nov 24 05:02:35 PST 2012 root@mfsmaster.dabney:/usr/obj/usr/src/sys/LENIN amd64 (updated as of a couple months ago) ZFS pool version 28, ZFS filesystem version 5. All disks are 3TB Seagate Barracuda 7200.14 ST3000DM001's, on a LSI 9211-8i, shows up as: mps0: port 0xb000-0xb0ff mem 0xfb33c000-0xfb33ffff,0xfb340000-0xfb37ffff irq 16 at device 0.0 on pci1 mps0: Firmware: 07.00.00.00 mps0: IOCCapabilities: 185c /boot/loader.conf: vfs.zfs.prefetch_disable="0" kern.geom.label.gptid.enable="0" vfs.zfs.arc_max="5G" kern.ipc.nmbclusters="131072" kern.ipc.maxsockbuf=16777216 kern.ipc.nmbjumbo9="38300" boot_multicons="YES" boot_serial="YES" console="comconsole,vidconsole" No ZFS tunables in /etc/sysctl.conf. The system is limited to 5GB ARC since the system has 8GB memory and is a diskless client; we were running into some lockups if we didn't restrict the ARC. Thanks, Raymond Jimenez