From owner-freebsd-fs@FreeBSD.ORG  Tue Nov 27 00:54:21 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 4A101250
 for <freebsd-fs@freebsd.org>; Tue, 27 Nov 2012 00:54:21 +0000 (UTC)
 (envelope-from raymondj@caltech.edu)
Received: from outgoing-mail.its.caltech.edu (outgoing-mail.its.caltech.edu
 [131.215.239.19])
 by mx1.freebsd.org (Postfix) with ESMTP id 266308FC0C
 for <freebsd-fs@freebsd.org>; Tue, 27 Nov 2012 00:54:20 +0000 (UTC)
Received: from fire-doxen.imss.caltech.edu (localhost [127.0.0.1])
 by fire-doxen-postvirus (Postfix) with ESMTP id 9AEC12E50E6C;
 Mon, 26 Nov 2012 16:54:20 -0800 (PST)
X-Spam-Scanned: at Caltech-IMSS on fire-doxen by amavisd-new
Received: from [127.0.0.1] (mitsuki.caltech.edu [131.215.167.33])
 (Authenticated sender: raymondj)
 by fire-doxen-submit (Postfix) with ESMTP id 577A42E50E69;
 Mon, 26 Nov 2012 16:54:17 -0800 (PST)
Message-ID: <50B40F26.7070608@caltech.edu>
Date: Mon, 26 Nov 2012 16:53:58 -0800
From: Raymond Jimenez <raymondj@caltech.edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
To: Garrett Cooper <yanegomi@gmail.com>
Subject: Re: ZFS kernel panics due to corrupt DVAs (despite RAIDZ)
References: <50B3E680.8060606@caltech.edu>
 <CAGH67wRYDg5gagx_Wx0Jji8wwYGgvzuui-yiAe4v8sup_bHzxw@mail.gmail.com>
In-Reply-To: <CAGH67wRYDg5gagx_Wx0Jji8wwYGgvzuui-yiAe4v8sup_bHzxw@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Nov 2012 00:54:21 -0000

On 11/26/2012 2:21 PM, Garrett Cooper wrote:
> On Mon, Nov 26, 2012 at 2:00 PM, Raymond Jimenez <raymondj@caltech.edu> wrote:
>> Hello,
>>
>> We recently sent our drives out for data recovery (blown drive
>> electronics), and when we got the new drives/data back, ZFS
>> started to kernel panic whenever listing certain items in a
>> directory, or whenever a scrub is close to finishing (~99.97%)
>>
>> The zpool worked fine before data recovery, and most of the
>> files are accessible (only a couple hundred unavailable out of
>> several million).
>>
>> Here's the kernel panic output if I scrub the disk:
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 00
>> fault virtual address  = 0x38
>> fault code             = supervisor read data, page not present
>> instruction pointer    = 0x20:0xffffffff810792d1
>> stack pointer          = 0x28:0xffffff8235122720
>> frame pointer          = 0x28:0xffffff8235122750
>> code segment           = base 0x0, limit 0xffff, type 0x1b
>>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags       = interrupt enabled, resume, IOPL = 0
>> current process        = 52 (txg_thread_enter)
>> [thread pid 52 tid 101230 ]
>> Stopped at vdev_is_dead+0x1: cmpq $0x5, 0x38(%rdi)
>>
>> $rdi is zero, so this seems to be just a null pointer exception.
>>
>> The vdev setup looks like:
>>
>>    pool: mfs-zpool004
>>   state: ONLINE
>>    scan: scrub canceled on Mon Nov 26 05:40:49 2012
>> config:
>>
>>          NAME                        STATE     READ WRITE CKSUM
>>          mfs-zpool004                ONLINE       0     0     0
>>            raidz1-0                  ONLINE       0     0     0
>>              gpt/lenin3-drive8       ONLINE       0     0     0
>>              gpt/lenin3-drive9.eli   ONLINE       0     0     0
>>              gpt/lenin3-drive10      ONLINE       0     0     0
>>              gpt/lenin3-drive11.eli  ONLINE       0     0     0
>>            raidz1-1                  ONLINE       0     0     0
>>              gpt/lenin3-drive12      ONLINE       0     0     0
>>              gpt/lenin3-drive13.eli  ONLINE       0     0     0
>>              gpt/lenin3-drive14      ONLINE       0     0     0
>>              gpt/lenin3-drive15.eli  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> The initial scrub fixed some data (~24k) in the early stages, but
>> also crashed at 99.97%.
>>
>> Right now, I'm using an interim work-around patch[1] so that our
>> users can get files without worrying about crashing the server.
>> It's a small check in dbuf_findbp() that checks if the DVA that will
>> be returned has a small (=<16) vdev number, and if not, returns EIO.
>> This just results in ZFS returning I/O errors for any of the corrupt
>> files I try to access, which at least lets us get at our data for now.
>>
>> My suspicion is that somehow, bad data is getting interpreted as
>> a block pointer/shift constant, and this sends ZFS into the woods.
>> I haven't been able to track down how this data could get past
>> checksum verification, especially with RAIDZ.
>>
>> Backtraces:
>>
>> (both crashes due to vdev_is_dead() dereferencing a null pointer)
>>
>> Scrub crash:
>> http://wsyntax.com/~raymond/zfs/zfs-scrub-bt.txt
>>
>> Prefetch off, ls -al of "/06/chunk_0000000001417E06_00000001.mfs":
>> http://wsyntax.com/~raymond/zfs/zfs-ls-bt.txt
>
>      This is missing key details like uname, zpool version, etc.

Sorry, total oversight on my part.

uname -a:

FreeBSD 03.chunk.dabney 9.0-STABLE FreeBSD 9.0-STABLE #25: Sat Nov 24 
05:02:35 PST 2012     root@mfsmaster.dabney:/usr/obj/usr/src/sys/LENIN 
amd64

(updated as of a couple months ago)

ZFS pool version 28, ZFS filesystem version 5.

All disks are 3TB Seagate Barracuda 7200.14 ST3000DM001's, on a LSI
9211-8i, shows up as:

mps0: <LSI SAS2008> port 0xb000-0xb0ff mem 
0xfb33c000-0xfb33ffff,0xfb340000-0xfb37ffff irq 16 at device 0.0 on pci1
mps0: Firmware: 07.00.00.00
mps0: IOCCapabilities: 
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>

/boot/loader.conf:

vfs.zfs.prefetch_disable="0"
kern.geom.label.gptid.enable="0"
vfs.zfs.arc_max="5G"
kern.ipc.nmbclusters="131072"
kern.ipc.maxsockbuf=16777216
kern.ipc.nmbjumbo9="38300"
boot_multicons="YES"
boot_serial="YES"
console="comconsole,vidconsole"

No ZFS tunables in /etc/sysctl.conf.

The system is limited to 5GB ARC since the system has 8GB memory and
is a diskless client; we were running into some lockups if we didn't
restrict the ARC.

Thanks,
Raymond Jimenez