From owner-freebsd-fs@FreeBSD.ORG Sat Jul 11 14:08:49 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1D35106564A; Sat, 11 Jul 2009 14:08:49 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 2D2BC8FC15; Sat, 11 Jul 2009 14:08:49 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:52658 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1MPdFd-0000Yg-3F; Sat, 11 Jul 2009 16:08:31 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 0A8405ECFC; Sat, 11 Jul 2009 16:08:29 +0200 (CEST) Message-Id: From: Thomas Backman To: Kip Macy In-Reply-To: <3c1674c90907101227ueab78eem6f8c5c7fdf0337cc@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Sat, 11 Jul 2009 16:08:26 +0200 References: <72163521-40BF-4764-8B74-5446A88DFBF8@exscape.org> <45291598-D091-4E90-B968-22E59BEB3846@exscape.org> <3c1674c90907101227ueab78eem6f8c5c7fdf0337cc@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1MPdFd-0000Yg-3F. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1MPdFd-0000Yg-3F d1f60bc2a28a5c5ed1e432016c4e6079 Cc: freebsd-fs@freebsd.org, FreeBSD current Subject: Re: Reproducible ZFS panic, w/ script (Was: "New" ZFS crash on FS (pool?) unmount/export) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jul 2009 14:08:50 -0000 On Jul 10, 2009, at 21:27, Kip Macy wrote: > "zfs export" does a forced unmount. We may not be properly handling > dangling references. > > -Kip A bit more digging: [root@chaos ~]# bash zfs_crash.sh initial [root@chaos ~]# bash zfs_crash.sh stress ## with the unmount part (line 107) **commented out** I then let the above run for say 20 seconds to create a bunch of snapshots (ignoring errors; in my own script I added a random number to the snapshot name to avoid collisions), and then: [root@chaos ~]# zpool export crashtestmaster [root@chaos ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT crashtestslave 20.3M 40.7M 20K /crashtestslave/ crashtestslave crashtestslave/test_cloned 19.8M 40.7M 19.8M /crashtestslave/ crashtestslave/test_cloned crashtestslave/test_orig 0 40.7M 19.8M /crashtestslave/ crashtestslave/test_orig tank 5.67G 59.3G 18K none tank/root 616M 59.3G 224M / tank/... [root@chaos ~]# zfs unmount crashtestslave/test_orig [root@chaos ~]# zfs unmount crashtestslave/test_cloned [root@chaos ~]# zfs unmount crashtestslave ... panic here. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff803a5682 stack pointer = 0x28:0xffffff803ea09980 frame pointer = 0x28:0xffffff803ea099b0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 5099 (zfs) 0xffffff002ac4a938: tag zfs, type VDIR usecount 1, writecount 0, refcount 1 mountedhere 0xffffff00068be8d0 flags () lock type zfs: EXCL by thread 0xffffff0006f13390 (pid 5099) BT: ... #9 0xffffffff805edc42 in trap (frame=0xffffff803ea098d0) at /usr/src/ sys/amd64/amd64/trap.c:345 #10 0xffffffff805d36a7 in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:223 #11 0xffffffff803a5682 in propagate_priority (td=0xffffff0027174ab0) at /usr/src/sys/kern/subr_turnstile.c:194 #12 0xffffffff803a64ec in turnstile_wait (ts=Variable "ts" is not available. ) at /usr/src/sys/kern/subr_turnstile.c:738 #13 0xffffffff80355101 in _mtx_lock_sleep (m=0xffffff002ca6d9f8, tid=18446742974314394512, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:447 #14 0xffffffff803f7893 in vfs_msync (mp=0xffffff00068be8d0, flags=1) at /usr/src/sys/kern/vfs_subr.c:3179 #15 0xffffffff803f0c7e in dounmount (mp=0xffffff00068be8d0, flags=0, td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_mount.c:1263 #16 0xffffffff803f1568 in unmount (td=0xffffff0006f13390, uap=0xffffff803ea09c00) at /usr/src/sys/kern/vfs_mount.c:1174 #17 0xffffffff805ed4cf in syscall (frame=0xffffff803ea09c90) at /usr/ src/sys/amd64/amd64/trap.c:984 #18 0xffffffff805d3930 in Xfast_syscall () at /usr/src/sys/amd64/amd64/ exception.S:364 #19 0x0000000800f4b9ac in ?? () Previous frame inner to this frame (corrupt stack?) NOT the same backtrace as before (nothing after dounmount() is the same as the zpool export panic), and this time from zfs unmount, not zpool export. I tried it again, and got another backtrace(!) - it "ends" (or begins, depending on your view) with propagate_priority(), turnstile_wait() and _mtx_lock_sleep() in both cases, though. Here's the second, which happened while doing the same as above - initial, stress and then manually zfs unmount the them. "zfs unmount crashtestslave" (the root fs) is what panics yet again: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff803aa722 stack pointer = 0x28:0xffffff8000025a60 frame pointer = 0x28:0xffffff8000025a90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 12 (swi4: clock) ... #8 0xffffffff805f1fcd in trap_fatal (frame=0xffffff80000259b0, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #9 0xffffffff805f2e22 in trap (frame=0xffffff80000259b0) at /usr/src/ sys/amd64/amd64/trap.c:345 #10 0xffffffff805d87c7 in calltrap () at /usr/src/sys/amd64/amd64/ exception.S:224 #11 0xffffffff803aa722 in propagate_priority (td=0xffffff00296ce390) at /usr/src/sys/kern/subr_turnstile.c:194 #12 0xffffffff803ab58c in turnstile_wait (ts=Variable "ts" is not available. ) at /usr/src/sys/kern/subr_turnstile.c:738 #13 0xffffffff8035a1c1 in _mtx_lock_sleep (m=0xffffffff808a1de0, tid=18446742974234830624, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:447 #14 0xffffffff8037ea92 in softclock (arg=Variable "arg" is not available. ) at /usr/src/sys/kern/kern_timeout.c:376 #15 0xffffffff803417b0 in intr_event_execute_handlers (p=Variable "p" is not available. ) at /usr/src/sys/kern/kern_intr.c:1165 #16 0xffffffff80342d1e in ithread_loop (arg=0xffffff000231e6a0) at / usr/src/sys/kern/kern_intr.c:1178 #17 0xffffffff8033ebb8 in fork_exit (callout=0xffffffff80342c90 , arg=0xffffff000231e6a0, frame=0xffffff8000025c80) at /usr/src/sys/kern/kern_fork.c:842 #18 0xffffffff805d8c9e in fork_trampoline () at /usr/src/sys/amd64/ amd64/exception.S:561 #19 0x0000000000000000 in ?? () #20 0x0000000000000000 in ?? () #21 0x0000000000000001 in ?? () #22 0x0000000000000000 in ?? () #23 0x0000000000000000 in ?? () #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () Note that the active process is *not* zfs this time. Regards, Thomas