From owner-freebsd-fs@FreeBSD.ORG Mon Nov 26 22:21:53 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C9A40C6 for ; Mon, 26 Nov 2012 22:21:53 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 47DA98FC12 for ; Mon, 26 Nov 2012 22:21:52 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id j13so11205181lah.13 for ; Mon, 26 Nov 2012 14:21:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TP/qGWgceSySL1zKNubUXepMd5j3tEtxbab43hdLqrc=; b=KoDseEuvo9FwNBPQ8Mfye9zTIqLslW9eFzDJVrJ+7Wu/p71dm+O2bwMzlOmi0ryJRk qR/wVvXl8ZL+mmqUVZ9FnCx3J5zvkH7MwtEweVu6kxycFVVQMSui+yduMONC8XSYoIOF x9mXqlJUQ9bLV6p3G8KIeKk7VCE1Rau3wt0+sWIra/O44RxSUH75mq3w4FZQT4V3txYP sq4FlxLcsJinB3I2RWdKA838u0oR2K7ldfp3vHC0nQRbclNSqPdFCFi3RNWc354MZBf5 7Ep7z/zH5b/a5eHXjT5vXSr1U3yKlPpTrBeIuvRc1rU9bquuIxDrNJ7PAEJTbVx+Ailv 9EZw== MIME-Version: 1.0 Received: by 10.152.106.212 with SMTP id gw20mr12616349lab.8.1353968511609; Mon, 26 Nov 2012 14:21:51 -0800 (PST) Received: by 10.112.144.101 with HTTP; Mon, 26 Nov 2012 14:21:51 -0800 (PST) In-Reply-To: <50B3E680.8060606@caltech.edu> References: <50B3E680.8060606@caltech.edu> Date: Mon, 26 Nov 2012 14:21:51 -0800 Message-ID: Subject: Re: ZFS kernel panics due to corrupt DVAs (despite RAIDZ) From: Garrett Cooper To: Raymond Jimenez Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2012 22:21:53 -0000 On Mon, Nov 26, 2012 at 2:00 PM, Raymond Jimenez wrote: > Hello, > > We recently sent our drives out for data recovery (blown drive > electronics), and when we got the new drives/data back, ZFS > started to kernel panic whenever listing certain items in a > directory, or whenever a scrub is close to finishing (~99.97%) > > The zpool worked fine before data recovery, and most of the > files are accessible (only a couple hundred unavailable out of > several million). > > Here's the kernel panic output if I scrub the disk: > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x38 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff810792d1 > stack pointer = 0x28:0xffffff8235122720 > frame pointer = 0x28:0xffffff8235122750 > code segment = base 0x0, limit 0xffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 52 (txg_thread_enter) > [thread pid 52 tid 101230 ] > Stopped at vdev_is_dead+0x1: cmpq $0x5, 0x38(%rdi) > > $rdi is zero, so this seems to be just a null pointer exception. > > The vdev setup looks like: > > pool: mfs-zpool004 > state: ONLINE > scan: scrub canceled on Mon Nov 26 05:40:49 2012 > config: > > NAME STATE READ WRITE CKSUM > mfs-zpool004 ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > gpt/lenin3-drive8 ONLINE 0 0 0 > gpt/lenin3-drive9.eli ONLINE 0 0 0 > gpt/lenin3-drive10 ONLINE 0 0 0 > gpt/lenin3-drive11.eli ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > gpt/lenin3-drive12 ONLINE 0 0 0 > gpt/lenin3-drive13.eli ONLINE 0 0 0 > gpt/lenin3-drive14 ONLINE 0 0 0 > gpt/lenin3-drive15.eli ONLINE 0 0 0 > > errors: No known data errors > > The initial scrub fixed some data (~24k) in the early stages, but > also crashed at 99.97%. > > Right now, I'm using an interim work-around patch[1] so that our > users can get files without worrying about crashing the server. > It's a small check in dbuf_findbp() that checks if the DVA that will > be returned has a small (=<16) vdev number, and if not, returns EIO. > This just results in ZFS returning I/O errors for any of the corrupt > files I try to access, which at least lets us get at our data for now. > > My suspicion is that somehow, bad data is getting interpreted as > a block pointer/shift constant, and this sends ZFS into the woods. > I haven't been able to track down how this data could get past > checksum verification, especially with RAIDZ. > > Backtraces: > > (both crashes due to vdev_is_dead() dereferencing a null pointer) > > Scrub crash: > http://wsyntax.com/~raymond/zfs/zfs-scrub-bt.txt > > Prefetch off, ls -al of "/06/chunk_0000000001417E06_00000001.mfs": > http://wsyntax.com/~raymond/zfs/zfs-ls-bt.txt This is missing key details like uname, zpool version, etc. Thanks, -Garrett