Date: Thu, 17 Mar 2011 00:08:01 -0400 From: Luke Marsden <luke-lists@hybrid-logic.co.uk> To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, freebsd-current@freebsd.org Subject: Guaranteed kernel panic with ZFS + nullfs Message-ID: <1300334881.3837.126.camel@pow>
next in thread | raw e-mail | index | archive | help
Hi all, The following script seems to cause a guaranteed kernel panic on 8.1-R, 8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect 9-CURRENT but have not tested this yet. #!/usr/local/bin/bash export POOL=hpool # change this to your pool name sudo zfs destroy -r $POOL/foo sudo zfs create $POOL/foo sudo zfs set mountpoint=/foo $POOL/foo sudo mount -t nullfs /foo /bar sudo touch /foo/baz ls /bar # should see baz sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file descriptor) sudo zfs mount $POOL/foo # PANIC! Can anyone suggest a patch which fixes this? Preferably against 8-STABLE :-) I also have a more subtle problem where, after mounting and then quickly force-unmounting a ZFS filesystem (call it A) with two nullfs-mounted filesystems and a devfs filesystem within it, running "ls" on the mountpoint of the parent filesystem of A hangs. I'm working on narrowing it down to a shell script like the above - as soon as I have one I'll post a followup. This latter problem is actually more of an issue for me - I can avoid the behaviour which triggers the panic ("if it hurts, don't do it"), but I need to be able to perform the actions which trigger the deadlock (mounting and unmounting filesystems). This also affects 8.1-R, 8.2-R, 8-STABLE and 8.2-R+v28. It seems to be the "zfs umount -f" process which hangs and triggers further accesses to the parent filesystem to hang. Note that I have definitely correctly unmounted the nullfs and devfs mounts from within the filesystem before I force the unmount. Unfortunately the -f is necessary in my application. After the hang: hybrid@dev3:/opt/HybridCluster$ sudo ps ax |grep zfs 41 ?? DL 0:00.11 [zfskern] 3751 ?? D 0:00.03 /sbin/zfs unmount -f hpool/hcfs/filesystem1 hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 3751 PID TID COMM TDNAME KSTACK 3751 100264 zfs - mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c zfsvfs_teardown+0x269 zfs_umount+0x1a7 dounmount+0x28a unmount+0x3c8 syscall+0x1e7 Xfast_syscall+0xe1 hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 41 PID TID COMM TDNAME KSTACK 41 100058 zfskern arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe 41 100062 zfskern l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe 41 100090 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread +0xb5 fork_exit+0x118 fork_trampoline+0xe 41 100091 zfskern txg_thread_enter mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 txg_thread_wait+0x3c txg_sync_thread+0x355 fork_exit+0x118 fork_trampoline+0xe I will continue to attempt to create a shell script which makes this latter bug easily reproducible. In the meantime, what further information can I gather? I will build a debug kernel in the morning. If it helps accelerate finding a solution to this problem, Hybrid Logic Ltd might be able to fund a small bounty for a fix. Contact me off-list if you can help in this way. -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Phone: +441172232002 / +16179496062
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1300334881.3837.126.camel>