From owner-freebsd-current@FreeBSD.ORG Sun May 24 19:02:34 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D871106566C for ; Sun, 24 May 2009 19:02:34 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 1FB768FC0C for ; Sun, 24 May 2009 19:02:34 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:45946 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1M8Ixj-0008NS-3z for freebsd-current@freebsd.org; Sun, 24 May 2009 21:02:30 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id BB83E5E5CA for ; Sun, 24 May 2009 21:02:18 +0200 (CEST) Message-Id: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org> From: Thomas Backman To: freebsd-current@freebsd.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Sun, 24 May 2009 21:02:18 +0200 X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1M8Ixj-0008NS-3z. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1M8Ixj-0008NS-3z 34cef638d5cec358f8da86e4c020ee57 Subject: ZFS panic under extreme circumstances (2/3 disks corrupted) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 May 2009 19:02:34 -0000 So, I was playing around with RAID-Z and self-healing, when I decided to take it another step and corrupt the data on *two* disks (well, files via ggate) and see what happened. I obviously expected the pool to go offline, but I didn't expect a kernel panic to follow! What I did was something resembling: 1) create three 100MB files, ggatel create to create GEOM providers from them 2) zpool create test raidz ggate{1..3} 3) create a 100MB file inside the pool, md5 the file 4) overwrite 10~20MB (IIRC) of disk2 with /dev/random, with dd if=/dev/ random of=./disk2 bs=1000k count=20 skip=40, or so (I now know that I wanted *seek*, not *skip*, but it still shouldn't panic!) 5) Check if the md5 of file: everything OK, zpool status shows a degraded pool. 6) Repeat step #4, but with disk 3. 7) zpool scrub test 8) Panic! FreeBSD chaos.exscape.org 8.0-CURRENT FreeBSD 8.0-CURRENT #2: Thu May 21 22:42:42 CEST 2009 root@chaos.exscape.org:/usr/obj/usr/src/sys/ DTRACE amd64 May 24 09:13:12 chaos root: ZFS: vdev failure, zpool=test type=vdev.bad_label May 24 09:13:15 chaos last message repeated 2 times panic: solaris assert: 0 == zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_SCRUB_FUNC, sizeof (uint32_t), 1, &dp->dp_scrub_func, tx), file: /usr/src/sys/modules/zfs/../../cddl/ contrib/opensolaris/uts/common/fs/zfs/dsl_scrub.c, line: 122 cpuid = 0 KDB: enter: panic panic: from debugger cpuid = 0 Uptime: 22h47m41s Physical memory: 2028 MB Dumping 1754 MB: ... #0 doadump () at pcpu.h:223 223 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:223 #1 0xffffffff80576039 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 #2 0xffffffff8057648c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:576 #3 0xffffffff801d5b07 in db_panic (addr=Variable "addr" is not available. ) at /usr/src/sys/ddb/db_command.c:478 #4 0xffffffff801d5f11 in db_command (last_cmdp=0xffffffff80bd8820, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #5 0xffffffff801d6160 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #6 0xffffffff801d80f9 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #7 0xffffffff805a6ad5 in kdb_trap (type=3, code=0, tf=0xffffff803ea9e700) at /usr/src/sys/kern/subr_kdb.c:534 #8 0xffffffff808610e8 in trap (frame=0xffffff803ea9e700) at /usr/src/sys/amd64/amd64/trap.c:613 #9 0xffffffff8083af97 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:223 #10 0xffffffff805a6cad in kdb_enter (why=0xffffffff8095e234 "panic", msg=0xa
) at cpufunc.h:63 #11 0xffffffff8057649b in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:559 #12 0xffffffff80eaa157 in dsl_pool_scrub_setup_sync () from /boot/kernel/zfs.ko #13 0xffffffff80ea562b in dsl_sync_task_group_sync () from /boot/ kernel/zfs.ko #14 0xffffff00560fb298 in ?? () #15 0xffffff803ea9e980 in ?? () #16 0x0000000000000000 in ?? () #17 0xffffff001ef49b48 in ?? () #18 0x0000000000000029 in ?? () #19 0xffffff00384c4b00 in ?? () #20 0xffffff803ea9ea00 in ?? () #21 0xffffff803ea9ea40 in ?? () #22 0xffffffff80ea5153 in dsl_pool_sync () from /boot/kernel/zfs.ko Previous frame inner to this frame (corrupt stack?) Full core.txt: http://pastebin.com/f546fefdf Regards, Thomas PS. Should I file PRs regarding 8-CURRENT or not?