Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 08 Jul 2016 14:03:48 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-fs@FreeBSD.org
Subject:   [Bug 203864] ZFS deadlock between zfs send, zfs rename and ctrl-C
Message-ID:  <bug-203864-3630-lFJeIR1TaG@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-203864-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-203864-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203864

Andriy Gapon <avg@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mm@FreeBSD.org,
                   |                            |pjd@FreeBSD.org
             Status|New                         |Open

--- Comment #13 from Andriy Gapon <avg@FreeBSD.org> ---
I think that I've been able to reproduce this problem or, at least, somethi=
ng
that looks very much like it.  I did the standard procstat debugging and I
noticed something that did not appear in any of the previous reports:

    6 100572 zfskern          txg_thread_enter mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _sx_xlock_hard+0x49d _sx_xlock+0xc5
zvol_rename_minors+0x104 dsl_dataset_rename_snapshot_sync_impl+0x308
dsl_dataset_rename_snapshot_sync+0xc1 dsl_sync_task_sync+0xef
dsl_pool_sync+0x45b spa_sync+0x7c7 txg_sync_thread+0x383 fork_exit+0x84
fork_trampoline+0xe

 1226 100746 zfs              -                mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b
dsl_sync_task+0x205 dsl_dataset_user_release_impl+0x1cf
dsl_dataset_user_release_onexit+0x86 zfs_onexit_destroy+0x56 zfsdev_close+0=
x88
devfs_destroy_cdevpriv+0x8b devfs_close_f+0x65 _fdrop+0x1a closef+0x200
closefp+0xa3 amd64_syscall+0x2db Xfast_syscall+0xfb

 1228 100579 zfs              -                mi_switch+0x167
sleepq_switch+0xe7 sleepq_wait+0x43 _cv_wait+0x1e4 txg_wait_synced+0x13b
dsl_sync_task+0x205 dsl_dataset_rename_snapshot+0x3a zfs_ioc_rename+0x157
zfsdev_ioctl+0x635 devfs_ioctl_f+0x156 kern_ioctl+0x246 sys_ioctl+0x171
amd64_syscall+0x2db Xfast_syscall+0xfb

Thread 100746 is it.  zfsdev_close() holds spa_namespace_lock and then calls
dsl_sync_task() -> txg_wait_synced().  On the other hand the sync thread
(100572) gets stuck on spa_namespace_lock in a call to zvol_rename_minors().

My opinion is that the sync thread must never try to take spa_namespace_loc=
k.=20
The problem seems to be introduced quite a while ago in base r219317.  Some
later commits like base r272474 also followed the same pattern.  The proble=
m is
certainly FreeBSD-specific as illumos handles ZVOL names in a very different
manner.

Also, the problem is rather deep-rooted and at the moment I do not see any =
easy
way to fix without breaking ZVOL name tracking.

P.S.
A bit of information from ddb:
db> p spa_namespace_lock
ffffffff822b1ee0
db> show lock 0xffffffff822b1ee0
 class: sx
 name: spa_namespace_lock
 state: XLOCK: 0xfffff8001da60500 (tid 100746, pid 1226, "zfs")
 waiters: exclusive
db> thread 100746
[ thread pid 1226 tid 100746 ]
sched_switch+0x48a:     movl    %gs:0x34,%eax
db> bt
Tracing pid 1226 tid 100746 td 0xfffff8001da60500
sched_switch() at sched_switch+0x48a/frame 0xfffffe004def4590
mi_switch() at mi_switch+0x167/frame 0xfffffe004def45c0
sleepq_switch() at sleepq_switch+0xe7/frame 0xfffffe004def4600
sleepq_wait() at sleepq_wait+0x43/frame 0xfffffe004def4630
_cv_wait() at _cv_wait+0x1e4/frame 0xfffffe004def4690
txg_wait_synced() at txg_wait_synced+0x13b/frame 0xfffffe004def46d0
dsl_sync_task() at dsl_sync_task+0x205/frame 0xfffffe004def4790
dsl_dataset_user_release_impl() at dsl_dataset_user_release_impl+0x1cf/frame
0xfffffe004def4910
dsl_dataset_user_release_onexit() at dsl_dataset_user_release_onexit+0x86/f=
rame
0xfffffe004def4940
zfs_onexit_destroy() at zfs_onexit_destroy+0x56/frame 0xfffffe004def4970
zfsdev_close() at zfsdev_close+0x88/frame 0xfffffe004def4990
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0x8b/frame
0xfffffe004def49b0
devfs_close_f() at devfs_close_f+0x65/frame 0xfffffe004def49e0
_fdrop() at _fdrop+0x1a/frame 0xfffffe004def4a00
closef() at closef+0x200/frame 0xfffffe004def4a90
closefp() at closefp+0xa3/frame 0xfffffe004def4ae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe004def4bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe004def4bf0
--- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x8013f996a, rsp =3D
0x7fffffffd438, rbp =3D 0x7fffffffd450 ---

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-203864-3630-lFJeIR1TaG>