Date: Fri, 08 Jul 2016 14:03:48 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 203864] ZFS deadlock between zfs send, zfs rename and ctrl-C Message-ID: <bug-203864-3630-lFJeIR1TaG@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-203864-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-203864-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203864 Andriy Gapon <avg@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mm@FreeBSD.org, | |pjd@FreeBSD.org Status|New |Open --- Comment #13 from Andriy Gapon <avg@FreeBSD.org> --- I think that I've been able to reproduce this problem or, at least, somethi= ng that looks very much like it. I did the standard procstat debugging and I noticed something that did not appear in any of the previous reports: 6 100572 zfskern txg_thread_enter mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5 zvol_rename_minors+0x104 dsl_dataset_rename_snapshot_sync_impl+0x308 dsl_dataset_rename_snapshot_sync+0xc1 dsl_sync_task_sync+0xef dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84 fork_trampoline+0xe 1226 100746 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b dsl_sync_task+0x205 dsl_dataset_user_release_impl+0x1cf dsl_dataset_user_release_onexit+0x86 zfs_onexit_destroy+0x56 zfsdev_close+0= x88 devfs_destroy_cdevpriv+0x8b devfs_close_f+0x65 _fdrop+0x1a closef+0x200 closefp+0xa3 amd64_syscall+0x2db Xfast_syscall+0xfb 1228 100579 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b dsl_sync_task+0x205 dsl_dataset_rename_snapshot+0x3a zfs_ioc_rename+0x157 zfsdev_ioctl+0x635 devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171 amd64_syscall+0x2db Xfast_syscall+0xfb Thread 100746 is it. zfsdev_close() holds spa_namespace_lock and then calls dsl_sync_task() -> txg_wait_synced(). On the other hand the sync thread (100572) gets stuck on spa_namespace_lock in a call to zvol_rename_minors(). My opinion is that the sync thread must never try to take spa_namespace_loc= k.=20 The problem seems to be introduced quite a while ago in base r219317. Some later commits like base r272474 also followed the same pattern. The proble= m is certainly FreeBSD-specific as illumos handles ZVOL names in a very different manner. Also, the problem is rather deep-rooted and at the moment I do not see any = easy way to fix without breaking ZVOL name tracking. P.S. A bit of information from ddb: db> p spa_namespace_lock ffffffff822b1ee0 db> show lock 0xffffffff822b1ee0 class: sx name: spa_namespace_lock state: XLOCK: 0xfffff8001da60500 (tid 100746, pid 1226, "zfs") waiters: exclusive db> thread 100746 [ thread pid 1226 tid 100746 ] sched_switch+0x48a: movl %gs:0x34,%eax db> bt Tracing pid 1226 tid 100746 td 0xfffff8001da60500 sched_switch() at sched_switch+0x48a/frame 0xfffffe004def4590 mi_switch() at mi_switch+0x167/frame 0xfffffe004def45c0 sleepq_switch() at sleepq_switch+0xe7/frame 0xfffffe004def4600 sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe004def4630 _cv_wait() at _cv_wait+0x1e4/frame 0xfffffe004def4690 txg_wait_synced() at txg_wait_synced+0x13b/frame 0xfffffe004def46d0 dsl_sync_task() at dsl_sync_task+0x205/frame 0xfffffe004def4790 dsl_dataset_user_release_impl() at dsl_dataset_user_release_impl+0x1cf/frame 0xfffffe004def4910 dsl_dataset_user_release_onexit() at dsl_dataset_user_release_onexit+0x86/f= rame 0xfffffe004def4940 zfs_onexit_destroy() at zfs_onexit_destroy+0x56/frame 0xfffffe004def4970 zfsdev_close() at zfsdev_close+0x88/frame 0xfffffe004def4990 devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame 0xfffffe004def49b0 devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe004def49e0 _fdrop() at _fdrop+0x1a/frame 0xfffffe004def4a00 closef() at closef+0x200/frame 0xfffffe004def4a90 closefp() at closefp+0xa3/frame 0xfffffe004def4ae0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004def4bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004def4bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8013f996a, rsp =3D 0x7fffffffd438, rbp =3D 0x7fffffffd450 --- --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-203864-3630-lFJeIR1TaG>