Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Aug 2023 16:11:10 +0200
From:      =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@FreeBSD.org>
To:        current@freebsd.org
Subject:   Re: ZFS deadlock in 14
Message-ID:  <86h6p4s64h.fsf@ltc.des.no>
In-Reply-To: <86leeltqcb.fsf@ltc.des.no> ("Dag-Erling =?utf-8?Q?Sm=C3=B8rg?= =?utf-8?Q?rav=22's?= message of "Tue, 08 Aug 2023 19:07:48 %2B0200")
References:  <86leeltqcb.fsf@ltc.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Dag-Erling Sm=C3=B8rgrav <des@FreeBSD.org> writes:
> At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
> deadlock was introduced in ZFS.

Trying to narrow this range down, I did not get a deadlock with
4e8d558c9d1c (10 June) but I did with b7198dcfc039 (16 June), albeit
after building ~1800 packages.  This is surprising since we have a
report of this or a very similar deadlock occurring with a kernel from 8
June (https://bugs.freebsd.org/271945).  Perhaps I should try
4e8d558c9d1c again.

Here's the complete kgdb session showing, once again, a zfs rollback
stuck waiting for the txg to sync:

    Reading symbols from /boot/GENERIC/kernel...
    Reading symbols from /usr/lib/debug//boot/GENERIC/kernel.debug...
=20=20=20=20
    Unread portion of the kernel message buffer:
    panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffe0356=
7a01e0 (sh), blocked for 180242 ticks
=20=20=20=20
    cpuid =3D 17
    time =3D 1691802362
    KDB: stack backtrace:
    db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe021=
507ce00
    vpanic() at vpanic+0x150/frame 0xfffffe021507ce50
    panic() at panic+0x43/frame 0xfffffe021507ceb0
    deadlkres() at deadlkres+0x350/frame 0xfffffe021507cef0
    fork_exit() at fork_exit+0x80/frame 0xfffffe021507cf30
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe021507cf30
    --- trap 0xdeadc0de, rip =3D 0xdeadc0dedeadc0de, rsp =3D 0xdeadc0dedead=
c0de, rbp =3D 0xdeadc0dedeadc0de ---
    KDB: enter: panic
=20=20=20=20
    __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:59
    59		__asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n" (offsetof(struct pcpu,
    (kgdb) tid 0xfffffe03567a01e0
    (kgdb) bt
    #0  sched_switch (td=3Dtd@entry=3D0xfffffe03567a01e0, flags=3Dflags@ent=
ry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
    #1  0xffffffff80b5fbd4 in mi_switch (flags=3Dflags@entry=3D259) at /usr=
/src/sys/kern/kern_synch.c:550
    #2  0xffffffff80bb1257 in sleepq_switch (wchan=3D0xfffff80b139e4770, wc=
han@entry=3D0xffffffff8113878f, pri=3Dpri@entry=3D64) at /usr/src/sys/kern/=
subr_sleepqueue.c:609
    #3  0xffffffff80bb112e in sleepq_wait (wchan=3D<unavailable>, pri=3D<un=
available>) at /usr/src/sys/kern/subr_sleepqueue.c:660
    #4  0xffffffff80b21d6f in sleeplk (lk=3Dlk@entry=3D0xfffff80b139e4770, =
flags=3Dflags@entry=3D2122752, ilk=3Dilk@entry=3D0x0, wmesg=3Dwmesg@entry=
=3D0xffffffff8113878f "tmpfs", pri=3D<optimized out>, pri@entry=3D64, timo=
=3Dtimo@entry=3D6, queue=3D1) at /usr/src/sys/kern/kern_lock.c:310
    #5  0xffffffff80b1fd9f in lockmgr_slock_hard (lk=3D0xfffff80b139e4770, =
flags=3D2122752, ilk=3D<optimized out>, file=3D0xffffffff81296919 "/usr/src=
/sys/kern/vfs_lookup.c", line=3D1012, lwa=3D0x0) at /usr/src/sys/kern/kern_=
lock.c:705
    #6  0xffffffff80c5e444 in VOP_LOCK1 (vp=3D0xfffff80b139e4700, flags=3D2=
106368, file=3D0xffffffff81296919 "/usr/src/sys/kern/vfs_lookup.c", line=3D=
1012) at ./vnode_if.h:1120
    #7  _vn_lock (vp=3D0xfffff80b139e4700, flags=3D2106368, file=3D<unavail=
able>, line=3D<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:1808
    #8  0xffffffff80c36eae in vfs_lookup (ndp=3Dndp@entry=3D0xfffffe03c63a6=
bd8) at /usr/src/sys/kern/vfs_lookup.c:1010
    #9  0xffffffff80c36291 in namei (ndp=3Dndp@entry=3D0xfffffe03c63a6bd8) =
at /usr/src/sys/kern/vfs_lookup.c:689
    #10 0xffffffff80c5681f in kern_statat (td=3D0xfffffe03567a01e0, td@entr=
y=3D<unavailable>, flag=3D<optimized out>, fd=3D-100, path=3D0x1032a8685a15=
 <error: Cannot access memory at address 0x1032a8685a15>, pathseg=3Dpathseg=
@entry=3DUIO_USERSPACE, sbp=3Dsbp@entry=3D0xfffffe03c63a6d18)
        at /usr/src/sys/kern/vfs_syscalls.c:2441
    #11 0xffffffff80c56f17 in sys_fstatat (td=3D<unavailable>, td@entry=3D<=
error reading variable: value is not available>, uap=3D0xfffffe03567a05e0, =
uap@entry=3D<error reading variable: value is not available>) at /usr/src/s=
ys/kern/vfs_syscalls.c:2419
    #12 0xffffffff8104d8e0 in syscallenter (td=3D<optimized out>) at /usr/s=
rc/sys/amd64/amd64/../../kern/subr_syscall.c:190
    #13 amd64_syscall (td=3D0xfffffe03567a01e0, traced=3D0) at /usr/src/sys=
/amd64/amd64/trap.c:1199
    #14 <signal handler called>
    #15 0x0000103acaf3b03a in ?? ()
    Backtrace stopped: Cannot access memory at address 0x103ac929af28
    (kgdb) f 5
    #5  0xffffffff80b1fd9f in lockmgr_slock_hard (lk=3D0xfffff80b139e4770, =
flags=3D2122752, ilk=3D<optimized out>, file=3D0xffffffff81296919 "/usr/src=
/sys/kern/vfs_lookup.c", line=3D1012, lwa=3D0x0) at /usr/src/sys/kern/kern_=
lock.c:705
    705			error =3D sleeplk(lk, flags, ilk, iwmesg, ipri, itimo,
    (kgdb) p (struct thread *)(lk->lk_lock & ~0x1f)
    $1 =3D (struct thread *) 0xfffffe02ddae0e40
    (kgdb) tid 0xfffffe02ddae0e40
    (kgdb) bt
    #0  sched_switch (td=3Dtd@entry=3D0xfffffe02ddae0e40, flags=3Dflags@ent=
ry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
    #1  0xffffffff80b5fbd4 in mi_switch (flags=3Dflags@entry=3D259) at /usr=
/src/sys/kern/kern_synch.c:550
    #2  0xffffffff80bb1257 in sleepq_switch (wchan=3Dwchan@entry=3D0xfffff8=
1ab3c81154, pri=3D87, pri@entry=3D-1278734048) at /usr/src/sys/kern/subr_sl=
eepqueue.c:609
    #3  0xffffffff80bb112e in sleepq_wait (wchan=3D<unavailable>, pri=3D<un=
available>) at /usr/src/sys/kern/subr_sleepqueue.c:660
    #4  0xffffffff80b5f11d in _sleep (ident=3D0xfffff81ab3c81154, lock=3D0x=
fffff81ab3c81120, priority=3D87, wmesg=3D0xffffffff82239fba "zfs teardown i=
nactive", sbt=3D0, pr=3D0, flags=3D256) at /usr/src/sys/kern/kern_synch.c:2=
25
    #5  0xffffffff80b4b640 in rms_rlock_fallback (rms=3Drms@entry=3D0xfffff=
81ab3c81120) at /usr/src/sys/kern/kern_rmlock.c:1015
    #6  0xffffffff80b4b51c in rms_rlock (rms=3D<unavailable>, rms@entry=3D0=
xfffff81ab3c81120) at /usr/src/sys/kern/kern_rmlock.c:1036
    #7  0xffffffff81faff5c in zfs_freebsd_reclaim (ap=3D<optimized out>) at=
 /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
    #8  0xffffffff811215e4 in VOP_RECLAIM_APV (vop=3D0xffffffff822e61a0 <zf=
s_vnodeops>, a=3Da@entry=3D0xfffffe02fb2118a0) at vnode_if.c:2180
    #9  0xffffffff80c47d54 in VOP_RECLAIM (vp=3D0xfffff80912340000) at ./vn=
ode_if.h:1084
    #10 vgonel (vp=3Dvp@entry=3D0xfffff80912340000) at /usr/src/sys/kern/vf=
s_subr.c:4143
    #11 0xffffffff80c436f2 in vtryrecycle (vp=3D0xfffff80912340000) at /usr=
/src/sys/kern/vfs_subr.c:1693
    #12 vnlru_free_impl (count=3Dcount@entry=3D1, mnt_op=3Dmnt_op@entry=3D0=
x0, mvp=3D0xfffff8010945da00) at /usr/src/sys/kern/vfs_subr.c:1344
    #13 0xffffffff80c4dd83 in vnlru_free_locked (count=3D1) at /usr/src/sys=
/kern/vfs_subr.c:1357
    #14 vn_alloc_hard (mp=3Dmp@entry=3D0xfffffe0314140000) at /usr/src/sys/=
kern/vfs_subr.c:1744
    #15 0xffffffff80c43db1 in vn_alloc (mp=3D0xfffffe0314140000) at /usr/sr=
c/sys/amd64/include/atomic.h:375
    #16 getnewvnode (tag=3D0xffffffff8113878f "tmpfs", mp=3D0xfffffe0314140=
000, vops=3D0xffffffff816b7a70 <tmpfs_vnodeop_entries>, vpp=3Dvpp@entry=3D0=
xfffffe02fb211a30) at /usr/src/sys/kern/vfs_subr.c:1810
    #17 0xffffffff80a7b27c in tmpfs_alloc_vp (mp=3D0xfffffe0314140000, node=
=3Dnode@entry=3D0xfffff81924deabc8, lkflag=3Dlkflag@entry=3D524288, vpp=3D0=
xfffffe02fb211cf0) at /usr/src/sys/fs/tmpfs/tmpfs_subr.c:1027
    #18 0xffffffff80a7b985 in tmpfs_alloc_file (dvp=3Ddvp@entry=3D0xfffff80=
b139e4700, vpp=3D<unavailable>, vpp@entry=3D0xfffffe02fb211cf0, vap=3D<opti=
mized out>, cnp=3Dcnp@entry=3D0xfffffe02fb211d18, target=3Dtarget@entry=3D0=
x0) at /usr/src/sys/fs/tmpfs/tmpfs_subr.c:1203
    #19 0xffffffff80a74d28 in tmpfs_create (v=3D<optimized out>) at /usr/sr=
c/sys/fs/tmpfs/tmpfs_vnops.c:271
    #20 0xffffffff8111eb99 in VOP_CREATE_APV (vop=3D0xffffffff816b7a70 <tmp=
fs_vnodeop_entries>, a=3Da@entry=3D0xfffffe02fb211be0) at vnode_if.c:244
    #21 0xffffffff80c5d94c in VOP_CREATE (dvp=3D<unavailable>, vpp=3D0xffff=
fe02fb211cf0, cnp=3D0xfffffe02fb211d18, vap=3D0xfffffe02fb211b20) at ./vnod=
e_if.h:133
    #22 vn_open_cred (ndp=3Dndp@entry=3D0xfffffe02fb211c98, flagp=3Dflagp@e=
ntry=3D0xfffffe02fb211da4, cmode=3Dcmode@entry=3D420, vn_open_flags=3Dvn_op=
en_flags@entry=3D16, cred=3D0xfffff8010978bc00, fp=3D0xfffff804f42cff00) at=
 /usr/src/sys/kern/vfs_vnops.c:287
    #23 0xffffffff80c53f43 in kern_openat (td=3D0xfffffe02ddae0e40, td@entr=
y=3D<unavailable>, fd=3Dfd@entry=3D-100, path=3D0x8222f799b <error: Cannot =
access memory at address 0x8222f799b>, pathseg=3Dpathseg@entry=3DUIO_USERSP=
ACE, flags=3D1538, mode=3D<optimized out>)
        at /usr/src/sys/kern/vfs_syscalls.c:1167
    #24 0xffffffff80c53cad in sys_open (td=3Dtd@entry=3D<unavailable>, uap=
=3Duap@entry=3D<unavailable>) at /usr/src/sys/kern/vfs_syscalls.c:1095
    #25 0xffffffff82b18365 in filemon_wrapper_open (td=3D<unavailable>, td@=
entry=3D<error reading variable: value is not available>, uap=3D<unavailabl=
e>, uap@entry=3D<error reading variable: value is not available>) at /usr/s=
rc/sys/dev/filemon/filemon_wrapper.c:220
    #26 0xffffffff8104d8e0 in syscallenter (td=3D<optimized out>) at /usr/s=
rc/sys/amd64/amd64/../../kern/subr_syscall.c:190
    #27 amd64_syscall (td=3D0xfffffe02ddae0e40, traced=3D0) at /usr/src/sys=
/amd64/amd64/trap.c:1199
    #28 <signal handler called>
    #29 0x0000000829c8227a in ?? ()
    Backtrace stopped: Cannot access memory at address 0x8222f6868
    (kgdb) f 7
    #7  0xffffffff81faff5c in zfs_freebsd_reclaim (ap=3D<optimized out>) at=
 /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:5164
    5164		ZFS_TEARDOWN_INACTIVE_ENTER_READ(zfsvfs);
    (kgdb) p zp->z_zfsvfs->z_teardown_inactive_lock->owner
    $2 =3D (struct thread *) 0xfffffe0314249020
    (kgdb) tid 0xfffffe0314249020
    (kgdb) bt
    #0  sched_switch (td=3Dtd@entry=3D0xfffffe0314249020, flags=3Dflags@ent=
ry=3D259) at /usr/src/sys/kern/sched_ule.c:2299
    #1  0xffffffff80b5fbd4 in mi_switch (flags=3Dflags@entry=3D259) at /usr=
/src/sys/kern/kern_synch.c:550
    #2  0xffffffff80bb1257 in sleepq_switch (wchan=3Dwchan@entry=3D0xfffff8=
0108fe1540, pri=3D0, pri@entry=3D150869200) at /usr/src/sys/kern/subr_sleep=
queue.c:609
    #3  0xffffffff80bb112e in sleepq_wait (wchan=3D<unavailable>, wchan@ent=
ry=3D0xfffff80108fe1540, pri=3D<unavailable>, pri@entry=3D0) at /usr/src/sy=
s/kern/subr_sleepqueue.c:660
    #4  0xffffffff80ade224 in _cv_wait (cvp=3D0xfffff80108fe1540, lock=3D0x=
fffff80108fe14d0) at /usr/src/sys/kern/kern_condvar.c:146
    #5  0xffffffff820b383b in txg_wait_synced_impl (dp=3D0xfffff80108fe1000=
, txg=3D8751529, txg@entry=3D0, wait_sig=3Dwait_sig@entry=3D0) at /usr/src/=
sys/contrib/openzfs/module/zfs/txg.c:726
    #6  0xffffffff820b31eb in txg_wait_synced (dp=3D<unavailable>, txg=3D<u=
navailable>, txg@entry=3D0) at /usr/src/sys/contrib/openzfs/module/zfs/txg.=
c:736
    #7  0xffffffff81fa5fc5 in zfsvfs_teardown (zfsvfs=3D0xfffff81ab3c81000,=
 unmounting=3Dunmounting@entry=3D0) at /usr/src/sys/contrib/openzfs/module/=
os/freebsd/zfs/zfs_vfsops.c:1661
    #8  0xffffffff81fa5db9 in zfs_suspend_fs (zfsvfs=3D<unavailable>) at /u=
sr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1954
    #9  0xffffffff821680ff in zfs_ioc_rollback (fsname=3D0xfffffe0301913000=
 "zroot-default-ref/03", fsname@entry=3D<error reading variable: value is n=
ot available>, innvl=3D<unavailable>, innvl@entry=3D<error reading variable=
: value is not available>,=20
        outnvl=3D0xfffff81601748640, outnvl@entry=3D<error reading variable=
: value is not available>) at /usr/src/sys/contrib/openzfs/module/zfs/zfs_i=
octl.c:4401
    #10 0xffffffff82163836 in zfsdev_ioctl_common (vecnum=3Dvecnum@entry=3D=
25, zc=3Dzc@entry=3D0xfffffe0301913000, flag=3Dflag@entry=3D0) at /usr/src/=
sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7798
    #11 0xffffffff81f969aa in zfsdev_ioctl (dev=3D<optimized out>, zcmd=3D<=
unavailable>, zcmd@entry=3D<error reading variable: value is not available>=
, arg=3D0xfffffe02fd546d50 "\017", arg@entry=3D<error reading variable: val=
ue is not available>, flag=3D<optimized out>, td=3D<optimized out>)
        at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:1=
68
    #12 0xffffffff809dc9cc in devfs_ioctl (ap=3D0xfffffe02fd546c40) at /usr=
/src/sys/fs/devfs/devfs_vnops.c:935
    #13 0xffffffff80c5cac0 in vn_ioctl (fp=3D0xfffff81e9207f0a0, com=3D<opt=
imized out>, data=3D0xfffffe02fd546d50, active_cred=3D0xfffff8026a65a900, t=
d=3D<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:1697
    #14 0xffffffff809dd07e in devfs_ioctl_f (fp=3D<unavailable>, fp@entry=
=3D<error reading variable: value is not available>, com=3D<unavailable>, c=
om@entry=3D<error reading variable: value is not available>, data=3D<unavai=
lable>, data@entry=3D<error reading variable: value is not available>,=20
        cred=3D<unavailable>, cred@entry=3D<error reading variable: value i=
s not available>, td=3D<unavailable>, td@entry=3D<error reading variable: v=
alue is not available>) at /usr/src/sys/fs/devfs/devfs_vnops.c:866
    #15 0xffffffff80bca1ce in fo_ioctl (fp=3D0xfffff81e9207f0a0, com=3D3222=
821401, data=3D<unavailable>, active_cred=3D<unavailable>, td=3D<optimized =
out>) at /usr/src/sys/sys/file.h:367
    #16 kern_ioctl (td=3Dtd@entry=3D0xfffffe0314249020, fd=3D<optimized out=
>, com=3Dcom@entry=3D3222821401, data=3D<unavailable>, data@entry=3D0xfffff=
e02fd546d50 "\017") at /usr/src/sys/kern/sys_generic.c:807
    #17 0xffffffff80bc9f64 in sys_ioctl (td=3D0xfffffe0314249020, td@entry=
=3D<error reading variable: value is not available>, uap=3D0xfffffe03142494=
20, uap@entry=3D<error reading variable: value is not available>) at /usr/s=
rc/sys/kern/sys_generic.c:715
    #18 0xffffffff8104d8e0 in syscallenter (td=3D<optimized out>) at /usr/s=
rc/sys/amd64/amd64/../../kern/subr_syscall.c:190
    #19 amd64_syscall (td=3D0xfffffe0314249020, traced=3D0) at /usr/src/sys=
/amd64/amd64/trap.c:1199
    #20 <signal handler called>
    #21 0x000005c8e125953a in ?? ()
    Backtrace stopped: Cannot access memory at address 0x5c8d89c8018

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86h6p4s64h.fsf>