From owner-freebsd-fs@freebsd.org Fri Jul 8 14:03:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F2A0B8494A for ; Fri, 8 Jul 2016 14:03:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3505219BC for ; Fri, 8 Jul 2016 14:03:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u68E3mkk007163 for ; Fri, 8 Jul 2016 14:03:50 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 203864] ZFS deadlock between zfs send, zfs rename and ctrl-C Date: Fri, 08 Jul 2016 14:03:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: dogfood, needs-qa X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 14:03:50 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203864 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mm@FreeBSD.org, | |pjd@FreeBSD.org Status|New |Open --- Comment #13 from Andriy Gapon --- I think that I've been able to reproduce this problem or, at least, somethi= ng that looks very much like it. I did the standard procstat debugging and I noticed something that did not appear in any of the previous reports: 6 100572 zfskern txg_thread_enter mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5 zvol_rename_minors+0x104 dsl_dataset_rename_snapshot_sync_impl+0x308 dsl_dataset_rename_snapshot_sync+0xc1 dsl_sync_task_sync+0xef dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84 fork_trampoline+0xe 1226 100746 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b dsl_sync_task+0x205 dsl_dataset_user_release_impl+0x1cf dsl_dataset_user_release_onexit+0x86 zfs_onexit_destroy+0x56 zfsdev_close+0= x88 devfs_destroy_cdevpriv+0x8b devfs_close_f+0x65 _fdrop+0x1a closef+0x200 closefp+0xa3 amd64_syscall+0x2db Xfast_syscall+0xfb 1228 100579 zfs - mi_switch+0x167 sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b dsl_sync_task+0x205 dsl_dataset_rename_snapshot+0x3a zfs_ioc_rename+0x157 zfsdev_ioctl+0x635 devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171 amd64_syscall+0x2db Xfast_syscall+0xfb Thread 100746 is it. zfsdev_close() holds spa_namespace_lock and then calls dsl_sync_task() -> txg_wait_synced(). On the other hand the sync thread (100572) gets stuck on spa_namespace_lock in a call to zvol_rename_minors(). My opinion is that the sync thread must never try to take spa_namespace_loc= k.=20 The problem seems to be introduced quite a while ago in base r219317. Some later commits like base r272474 also followed the same pattern. The proble= m is certainly FreeBSD-specific as illumos handles ZVOL names in a very different manner. Also, the problem is rather deep-rooted and at the moment I do not see any = easy way to fix without breaking ZVOL name tracking. P.S. A bit of information from ddb: db> p spa_namespace_lock ffffffff822b1ee0 db> show lock 0xffffffff822b1ee0 class: sx name: spa_namespace_lock state: XLOCK: 0xfffff8001da60500 (tid 100746, pid 1226, "zfs") waiters: exclusive db> thread 100746 [ thread pid 1226 tid 100746 ] sched_switch+0x48a: movl %gs:0x34,%eax db> bt Tracing pid 1226 tid 100746 td 0xfffff8001da60500 sched_switch() at sched_switch+0x48a/frame 0xfffffe004def4590 mi_switch() at mi_switch+0x167/frame 0xfffffe004def45c0 sleepq_switch() at sleepq_switch+0xe7/frame 0xfffffe004def4600 sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe004def4630 _cv_wait() at _cv_wait+0x1e4/frame 0xfffffe004def4690 txg_wait_synced() at txg_wait_synced+0x13b/frame 0xfffffe004def46d0 dsl_sync_task() at dsl_sync_task+0x205/frame 0xfffffe004def4790 dsl_dataset_user_release_impl() at dsl_dataset_user_release_impl+0x1cf/frame 0xfffffe004def4910 dsl_dataset_user_release_onexit() at dsl_dataset_user_release_onexit+0x86/f= rame 0xfffffe004def4940 zfs_onexit_destroy() at zfs_onexit_destroy+0x56/frame 0xfffffe004def4970 zfsdev_close() at zfsdev_close+0x88/frame 0xfffffe004def4990 devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame 0xfffffe004def49b0 devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe004def49e0 _fdrop() at _fdrop+0x1a/frame 0xfffffe004def4a00 closef() at closef+0x200/frame 0xfffffe004def4a90 closefp() at closefp+0xa3/frame 0xfffffe004def4ae0 amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004def4bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004def4bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8013f996a, rsp =3D 0x7fffffffd438, rbp =3D 0x7fffffffd450 --- --=20 You are receiving this mail because: You are the assignee for the bug.=