From owner-freebsd-current@FreeBSD.ORG Fri Jul 31 17:10:37 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26671106566C; Fri, 31 Jul 2009 17:10:37 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 9653A8FC26; Fri, 31 Jul 2009 17:10:36 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:42358 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.68) (envelope-from ) id 1MWvcO-0001VQ-4K; Fri, 31 Jul 2009 19:10:10 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 051CAE5B4; Fri, 31 Jul 2009 19:10:06 +0200 (CEST) Message-Id: From: Thomas Backman To: Pawel Jakub Dawidek In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Fri, 31 Jul 2009 19:10:03 +0200 References: <20090727072503.GA52309@jpru.ffm.jpru.de> <20090729084723.GD1586@garage.freebsd.pl> <4A7030B6.8010205@icyb.net.ua> <97D5950F-4E4D-4446-AC22-92679135868D@exscape.org> <4A7048A9.4020507@icyb.net.ua> <52AA86CB-6C06-4370-BA73-CE19175467D0@exscape.org> <4A705299.8060504@icyb.net.ua> <4A7054E1.5060402@icyb.net.ua> <5918824D-A67C-43E6-8685-7B72A52B9CAE@exscape.org> <4A705E50.8070307@icyb.net.ua> <4A70728C.7020004@freebsd.org> <6D47A34B-0753-4CED-BF3D-C505B37748FC@exscape.org> <4A708455.5070304@freebsd.org> <86983A55-E5C4-4C04-A4C7-0AE9A9EE37A3@exscape.org> <4A718E03.6030909@freebsd.org> <71A038EC-02B1-4606-96C2-5E84BE80F005@exscape.org> <4A719CA4.4060400@freebsd.org> <19347561-3CE6-40B3-930A-EB9925D3AFD1@exscape.org> <4A71AD29.10705@freebsd.org> <7544AED1-1216-4A24-B287-F54117641F76@exscape.org> <4 A71B239.8060007@freebsd.org> <3AA3C1CB-CEF7-46CC-A9C7-1648093D679E@exsca! pe.org> <4A71BED8.7050300@freebsd.org> X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1MWvcO-0001VQ-4K. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1MWvcO-0001VQ-4K dd068384b9ef3b915ada255a211abdc2 Cc: freebsd-fs@freebsd.org, FreeBSD current , Andriy Gapon Subject: Re: zfs: Fatal trap 12: page fault while in kernel mode X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jul 2009 17:10:37 -0000 On Jul 30, 2009, at 20:29, Thomas Backman wrote: > On Jul 30, 2009, at 18:41, Thomas Backman wrote: > >> On Jul 30, 2009, at 17:40, Andriy Gapon wrote: >>> on 30/07/2009 18:25 Thomas Backman said the following: >>>> PS. I'll test Pawel's patch sometime after dinner. ;) >>> >>> I believe that you should get a perfect result with it. >>> >>> -- Andriy Gapon >> If I dare say it, you were right! I've been testing for about half >> an hour or so (probably a bit more) now. >> Still using DEBUG_VFS_LOCKS, and I've tried the test case several >> times, ran an initial backup (i.e. destroy target pool and send| >> recv the entire pool) and a few incrementals. Rebooted, tried it >> again. No panic, no problems! :) >> Let's hope it stays this way. >> >> So, in short: With that patch (copied here just in case: http://exscape.org/temp/zfs_vnops.working.patch >> ) and the libzfs patch linked previously, it appears zfs send/recv >> works plain fine. I have yet to try it with clone/promote and >> stuff, but since that gave the same panic that this solved, I'm >> hoping there will be no problems with that anymore. > > Arrrgh! > I guess I spoke too soon after all... new panic yet again. :( > *sigh* It feels as if this will never become stable right now. > (Maybe that's because I've spent all day and most of yesterday too > on this ;) > > [... same panic as I'm posting in the reply below snipped ...] > > Unfortunately, I'm not sure I can reproduce this reliably, since it > worked a bunch of times both before and after my previous mail. > > Oh, and I'm still using -DDEBUG=1 and DEBUG_VFS_LOCKS... If this > isn't a new panic because of the changes, perhaps it was triggered > now and never before because of the -DDEBUG? OK, I created a "test case" that triggers this panic for me every time, and reproduced it on another machine, so it should, uh, "work" for anyone reading this as well. Here are my patches, and the script used to reproduce the panic: (This assumes that you've got a clean SVN/cvsup source tree. If you have any of the patches mentioned below, remove them from the .patch first.) http://exscape.org/temp/zfs_destroy_panic_patches.patch (contains: James R. Van Artsdalen's libzfs_sendrecv patch that makes it not coredump(...), activating ZFS debugging (-DDEBUG=1), and Pawel's zfs_vnops.c patch.) http://exscape.org/temp/zfs_destroy_panic.sh (needs bash and 200MB free on your /root/-containing FS, unless you change the variables at the top; usage: "bash ...sh crash") You'll need to rebuild zfs.ko and libzfs, and if you use zfs.ko already, of course, reboot. (The libzfs patch can be installed and used without rebooting.) 1) cd /usr/src; fetch http://exscape.org/temp/zfs_destroy_panic_patches.patch && patch < zfs_destroy_panic_patches.patch 2) cd /usr/src/cddl/lib/libzfs/ ; make && make install 3) cd /usr/src/sys/modules/zfs ; make && make install 3b) (reboot, or kldload zfs) 4) fetch http://exscape.org/temp/zfs_destroy_panic.sh && bash zfs_destroy_panic.sh crash My output (snipped for brevity, most is useless stuff from dd, etc.): (I prepended a >> to output written by my script; the rest is from zfs. This isn't in the script itself.) >> Creating pools >> Creating filesystems >> Creating snapshot(s) >> Doing initial clone to slave pool receiving full stream of crashtestmaster@backup-20090731-185218 into crashtestslave@backup-20090731-185218 received 15.0KB stream in 1 seconds (15.0KB/sec) receiving full stream of crashtestmaster/ testroot@backup-20090731-185218 into crashtestslave/ testroot@backup-20090731-185218 received 15.0KB stream in 1 seconds (15.0KB/sec) receiving full stream of crashtestmaster/testroot/ testfs@backup-20090731-185218 into crashtestslave/testroot/ testfs@backup-20090731-185218 received 1.02MB stream in 1 seconds (1.02MB/sec) >> Initial step done! >> Destroying testfs >> Taking snapshots >> Starting backup... sending from @backup-20090731-185218 to crashtestmaster@backup-20090731-185226-11214-7776 sending from @backup-20090731-185218 to crashtestmaster/ testroot@backup-20090731-185226-11214-7776 attempting destroy crashtestslave/testroot/testfs@backup-20090731-185218 success attempting destroy crashtestslave/testroot/testfs success receiving incremental stream of crashtestmaster@backup-20090731-185226-11214-7776 into crashtestslave@backup-20090731-185226-11214-7776 received 312B stream in 1 seconds (312B/sec) receiving incremental stream of crashtestmaster/ testroot@backup-20090731-185226-11214-7776 into crashtestslave/ testroot@backup-20090731-185226-11214-7776 [... panic, no no more output ...] DDB info, etc (from the original box; not the same run as above, but the same panic, so...): Unread portion of the kernel message buffer: panic: solaris assert: ((zp)->z_vnode)->v_usecount > 0, file: /usr/src/ sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/ zfs_vfsops.c, line: 920 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x182 zfsvfs_teardown() at zfsvfs_teardown+0x24d zfs_suspend_fs() at zfs_suspend_fs+0x2b zfs_ioc_recv() at zfs_ioc_recv+0x28b zfsdev_ioctl() at zfsdev_ioctl+0x8a devfs_ioctl_f() at devfs_ioctl_f+0x77 kern_ioctl() at kern_ioctl+0xf6 ioctl() at ioctl+0xfd syscall() at syscall+0x28f Xfast_syscall() at Xfast_syscall+0xe1 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800fe5f7c, rsp = 0x7fffffff8ee8, rbp = 0x7fffffff9c20 --- KDB: enter: panic panic: from debugger cpuid = 0 Uptime: 25m47s Physical memory: 2030 MB Dumping 1663 MB: ... #11 0xffffffff8033abcb in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:558 #12 0xffffffff80b0ec5d in zfsvfs_teardown () from /boot/kernel/zfs.ko #13 0x0000000000100000 in ?? () #14 0xffffff0048a7e250 in ?? () #15 0xffffff0048a7e000 in ?? () #16 0xffffff00063c0000 in ?? () #17 0xffffff803e8f27a0 in ?? () #18 0xffffff803e8f27d0 in ?? () #19 0xffffff803e8f2770 in ?? () #20 0xffffff803e8f2740 in ?? () #21 0xffffffff80b0ecab in zfs_suspend_fs () from /boot/kernel/zfs.ko Previous frame inner to this frame (corrupt stack?) I commented out -DDEBUG=1 and rebuilt+installed just the zfs module, and the panic appears to be gone. With DEBUG, it panicked every time (and I tried it at least 4-5 times). Without, it has worked flawlessly three times in a row, as has my regular backup. So, the big, TL;DR question is: is the ASSERT() unnecessary, as Andriy proposed it *might* be, or is this a real issue that actually needs fixing? It doesn't feel right to just ignore a potential bug by ignoring a failed assertion... Pawel? Regards, Thomas