Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Mar 2021 08:12:59 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 224292] processes are hanging in state ufs / possible deadlock in file system
Message-ID:  <bug-224292-3630-MIee7Ko2wT@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-224292-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-224292-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224292

--- Comment #18 from sigsys@gmail.com ---
(In reply to Konstantin Belousov from comment #17)
This sure seems to have helped.  I was about to report that the problem is =
most
likely gone since it hadn't happened in a while (despite running kyua in a =
loop
for hours) after getting that patch series.

But then it happened again with chrome this time and I got a dump.  Dunno if
running "sync" would have unwedged the whole thing since I made it panic
instead.

There were two threads from two processes looping and doing crazy I/O: a ch=
rome
process and a zsh process.

zsh thread backtrace:

#0  sched_switch (td=3Dtd@entry=3D0xfffffe00aa35ce00, flags=3D<optimized ou=
t>,
flags@entry=3D260) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff80c1f4c9 in mi_switch (flags=3Dflags@entry=3D260) at
/usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff80c6f929 in sleepq_switch (wchan=3Dwchan@entry=3D0xfffffe0009=
7da0a8,
pri=3D92, pri@entry=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:608
#3  0xffffffff80c6f7fe in sleepq_wait (wchan=3D<optimized out>, pri=3D<opti=
mized
out>) at /usr/src/sys/kern/subr_sleepqueue.c:659
#4  0xffffffff80c1e9e6 in _sleep (ident=3Dident@entry=3D0xfffffe00097da0a8,
lock=3D<optimized out>, lock@entry=3D0xfffffe000863b0c0,
priority=3Dpriority@entry=3D92, wmesg=3D<optimized out>, sbt=3Dsbt@entry=3D=
0,
pr=3Dpr@entry=3D0, flags=3D256) at /usr/src/sys/kern/kern_synch.c:221
#5  0xffffffff80cd5214 in bwait (bp=3D0xfffffe00097da0a8, pri=3D92 '\\',
wchan=3D<optimized out>) at /usr/src/sys/kern/vfs_bio.c:5020
#6  bufwait (bp=3Dbp@entry=3D0xfffffe00097da0a8) at
/usr/src/sys/kern/vfs_bio.c:4433
#7  0xffffffff80cd285a in bufwrite (bp=3D0xfffffe00097da0a8, bp@entry=3D<er=
ror
reading variable: value is not available>) at /usr/src/sys/kern/vfs_bio.c:2=
305
#8  0xffffffff80f01789 in bwrite (bp=3D<unavailable>) at
/usr/src/sys/sys/buf.h:430
#9  ffs_update (vp=3Dvp@entry=3D0xfffff80004c61380, waitfor=3Dwaitfor@entry=
=3D1) at
/usr/src/sys/ufs/ffs/ffs_inode.c:204
#10 0xffffffff80f2f98a in ffs_syncvnode (vp=3Dvp@entry=3D0xfffff80004c61380,
waitfor=3D<optimized out>, waitfor@entry=3D1, flags=3D<optimized out>, flag=
s@entry=3D0)
at /usr/src/sys/ufs/ffs/ffs_vnops.c:447
#11 0xffffffff80f0f91d in softdep_prelink (dvp=3Ddvp@entry=3D0xfffff80004c6=
1380,
vp=3Dvp@entry=3D0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417
#12 0xffffffff80f3fee3 in ufs_makeinode (mode=3D33188, dvp=3D0xfffff80004c6=
1380,
vpp=3D0xfffffe00aae0a9d8, cnp=3D<unavailable>, callfunc=3D<unavailable>) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:2741
#13 0xffffffff80f3bfa4 in ufs_create (ap=3D0xfffffe00aae0a8a8) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:213
#14 0xffffffff8118a31d in VOP_CREATE_APV (vop=3D0xffffffff81b63158
<ffs_vnodeops2>, a=3Da@entry=3D0xfffffe00aae0a8a8) at vnode_if.c:244
#15 0xffffffff80d15233 in VOP_CREATE (dvp=3D<unavailable>,
vpp=3D0xfffffe00aae0a9d8, cnp=3D0xfffffe00aae0aa00, vap=3D0xfffffe00aae0a7f=
0) at
./vnode_if.h:133
#16 vn_open_cred (ndp=3Dndp@entry=3D0xfffffe00aae0a968,
flagp=3Dflagp@entry=3D0xfffffe00aae0aa94, cmode=3Dcmode@entry=3D420,
vn_open_flags=3D<optimized out>, vn_open_flags@entry=3D0, cred=3D0xfffff800=
48d42e00,
fp=3D0xfffff8010aeabc30) at /usr/src/sys/kern/vfs_vnops.c:285
#17 0xffffffff80d14f6d in vn_open (ndp=3D<unavailable>,
ndp@entry=3D0xfffffe00aae0a968, flagp=3D<unavailable>,
flagp@entry=3D0xfffffe00aae0aa94, cmode=3D<unavailable>, cmode@entry=3D420,
fp=3D<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:202
#18 0xffffffff80d08999 in kern_openat (td=3D0xfffffe00aa35ce00, fd=3D-100,
path=3D0x8002fd420 <error: Cannot access memory at address 0x8002fd420>,
pathseg=3DUIO_USERSPACE, flags=3D34306, mode=3D<optimized out>) at
/usr/src/sys/kern/vfs_syscalls.c:1142
#19 0xffffffff810c5803 in syscallenter (td=3D<optimized out>) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205
#20 amd64_syscall (td=3D0xfffffe00aa35ce00, traced=3D0) at
/usr/src/sys/amd64/amd64/trap.c:1156
#21 <signal handler called>
#22 0x00000008004f223a in ?? ()

chrome thread backtrace:

#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1475
#1  0xffffffff8108afe9 in ipi_nmi_handler () at
/usr/src/sys/x86/x86/mp_x86.c:1432
#2  0xffffffff810c4256 in trap (frame=3D0xfffffe0009848f30) at
/usr/src/sys/amd64/amd64/trap.c:201
#3  <signal handler called>
#4  vtpci_legacy_notify_vq (dev=3D<optimized out>, queue=3D0, offset=3D16) =
at
/usr/src/sys/dev/virtio/pci/virtio_pci_legacy.c:485
#5  0xffffffff80a45417 in VIRTIO_BUS_NOTIFY_VQ (dev=3D0xfffff8000362fb00,
queue=3D0, offset=3D16) at ./virtio_bus_if.h:144
#6  vq_ring_notify_host (vq=3D0xfffffe0063e27000) at
/usr/src/sys/dev/virtio/virtqueue.c:834
#7  virtqueue_notify (vq=3D0xfffffe0063e27000, vq@entry=3D0xfffff8004de6f60=
0) at
/usr/src/sys/dev/virtio/virtqueue.c:439
#8  0xffffffff80a538c0 in vtblk_startio (sc=3Dsc@entry=3D0xfffff8000362f100=
) at
/usr/src/sys/dev/virtio/block/virtio_blk.c:1123
#9  0xffffffff80a53bed in vtblk_strategy (bp=3D0xfffff8004de6f600) at
/usr/src/sys/dev/virtio/block/virtio_blk.c:571
#10 0xffffffff80b4bcfc in g_disk_start (bp=3D<optimized out>) at
/usr/src/sys/geom/geom_disk.c:473
#11 0xffffffff80b4f147 in g_io_request (bp=3D0xfffff80021d33c00, cp=3D<opti=
mized
out>, cp@entry=3D0xfffff8000398ce80) at /usr/src/sys/geom/geom_io.c:589
#12 0xffffffff80b5b1a9 in g_part_start (bp=3D0xfffff8004e974900) at
/usr/src/sys/geom/part/g_part.c:2332
#13 0xffffffff80b4f147 in g_io_request (bp=3D0xfffff8004e974900, cp=3D<opti=
mized
out>) at /usr/src/sys/geom/geom_io.c:589
#14 0xffffffff80cd284c in bstrategy (bp=3D0xfffffe0008ac5388) at
/usr/src/sys/sys/buf.h:442
#15 bufwrite (bp=3D0xfffffe0008ac5388) at /usr/src/sys/kern/vfs_bio.c:2302
#16 0xffffffff80f01789 in bwrite (bp=3D0x0) at /usr/src/sys/sys/buf.h:430
#17 ffs_update (vp=3Dvp@entry=3D0xfffff80139495000, waitfor=3Dwaitfor@entry=
=3D1) at
/usr/src/sys/ufs/ffs/ffs_inode.c:204
#18 0xffffffff80f2f98a in ffs_syncvnode (vp=3Dvp@entry=3D0xfffff80139495000,
waitfor=3D<optimized out>, waitfor@entry=3D1, flags=3D<optimized out>, flag=
s@entry=3D0)
at /usr/src/sys/ufs/ffs/ffs_vnops.c:447
#19 0xffffffff80f0f86f in softdep_prelink (dvp=3Ddvp@entry=3D0xfffff8013949=
5000,
vp=3Dvp@entry=3D0xfffff8013c8328c0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3=
417
#20 0xffffffff80f3d797 in ufs_remove (ap=3D0xfffffe00aabdfa20) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:1011
#21 0xffffffff8118bf90 in VOP_REMOVE_APV (vop=3D0xffffffff81b63158
<ffs_vnodeops2>, a=3Da@entry=3D0xfffffe00aabdfa20) at vnode_if.c:1540
#22 0xffffffff80d0a468 in VOP_REMOVE (dvp=3D0x0, vp=3D0xfffff8013c8328c0,
cnp=3D<optimized out>) at ./vnode_if.h:802
#23 kern_funlinkat (td=3D0xfffffe00aa6e3100, dfd=3Ddfd@entry=3D-100, path=
=3D0x8288d40e0
<error: Cannot access memory at address 0x8288d40e0>, fd=3D<optimized out>,
fd@entry=3D-200, pathseg=3Dpathseg@entry=3DUIO_USERSPACE, flag=3D<optimized=
 out>,
flag@entry=3D0, oldinum=3D0) at /usr/src/sys/kern/vfs_syscalls.c:1927
#24 0xffffffff80d0a138 in sys_unlink (td=3D0xfffff8000362fb00, uap=3D<optim=
ized
out>) at /usr/src/sys/kern/vfs_syscalls.c:1808
#25 0xffffffff810c5803 in syscallenter (td=3D<optimized out>) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205
#26 amd64_syscall (td=3D0xfffffe00aa6e3100, traced=3D0) at
/usr/src/sys/amd64/amd64/trap.c:1156
#27 <signal handler called>
#28 0x000000080e40d17a in ?? ()

syncer backtrace:

#0  sched_switch (td=3Dtd@entry=3D0xfffffe00a5e29100, flags=3D<optimized ou=
t>,
flags@entry=3D260) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff80c1f4c9 in mi_switch (flags=3Dflags@entry=3D260) at
/usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff80c6f929 in sleepq_switch (wchan=3Dwchan@entry=3D0xffffffff81=
fa9550
<sync_wakeup>, pri=3Dpri@entry=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:=
608
#3  0xffffffff80c6fe3b in sleepq_timedwait
(wchan=3Dwchan@entry=3D0xffffffff81fa9550 <sync_wakeup>, pri=3Dpri@entry=3D=
0) at
/usr/src/sys/kern/subr_sleepqueue.c:690
#4  0xffffffff80ba34b0 in _cv_timedwait_sbt (cvp=3D0xffffffff81fa9550
<sync_wakeup>, lock=3D0xffffffff81fa9520 <sync_mtx>, sbt=3D<optimized out>,
pr=3D<optimized out>, pr@entry=3D0, flags=3D0, flags@entry=3D256) at
/usr/src/sys/kern/kern_condvar.c:312
#5  0xffffffff80d036dc in sched_sync () at /usr/src/sys/kern/vfs_subr.c:2739
#6  0xffffffff80bcb9a0 in fork_exit (callout=3D0xffffffff80d03090 <sched_sy=
nc>,
arg=3D0x0, frame=3D0xfffffe006a491c00) at /usr/src/sys/kern/kern_fork.c:1077
#7  <signal handler called>

It seems like some kind of livelock involving ERELOOKUP loops. I can only g=
uess
though, softupdates' is way too complicated for me.

That's with cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 merged.  This kernel h=
as
some local changes and I'm a little bit worried that this might be the prob=
lem
but I think it's unlikely.  The problem happens pretty rarely and that's the
only -CURRENT install on UFS that I'm working with so that's the best that =
I've
got.  That's with a virtio disk backed by a ZFS volume on bhyve BTW.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224292-3630-MIee7Ko2wT>