Date: Wed, 31 Mar 2021 08:12:59 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 224292] processes are hanging in state ufs / possible deadlock in file system Message-ID: <bug-224292-3630-MIee7Ko2wT@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-224292-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-224292-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224292 --- Comment #18 from sigsys@gmail.com --- (In reply to Konstantin Belousov from comment #17) This sure seems to have helped. I was about to report that the problem is = most likely gone since it hadn't happened in a while (despite running kyua in a = loop for hours) after getting that patch series. But then it happened again with chrome this time and I got a dump. Dunno if running "sync" would have unwedged the whole thing since I made it panic instead. There were two threads from two processes looping and doing crazy I/O: a ch= rome process and a zsh process. zsh thread backtrace: #0 sched_switch (td=3Dtd@entry=3D0xfffffe00aa35ce00, flags=3D<optimized ou= t>, flags@entry=3D260) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80c1f4c9 in mi_switch (flags=3Dflags@entry=3D260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff80c6f929 in sleepq_switch (wchan=3Dwchan@entry=3D0xfffffe0009= 7da0a8, pri=3D92, pri@entry=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:608 #3 0xffffffff80c6f7fe in sleepq_wait (wchan=3D<optimized out>, pri=3D<opti= mized out>) at /usr/src/sys/kern/subr_sleepqueue.c:659 #4 0xffffffff80c1e9e6 in _sleep (ident=3Dident@entry=3D0xfffffe00097da0a8, lock=3D<optimized out>, lock@entry=3D0xfffffe000863b0c0, priority=3Dpriority@entry=3D92, wmesg=3D<optimized out>, sbt=3Dsbt@entry=3D= 0, pr=3Dpr@entry=3D0, flags=3D256) at /usr/src/sys/kern/kern_synch.c:221 #5 0xffffffff80cd5214 in bwait (bp=3D0xfffffe00097da0a8, pri=3D92 '\\', wchan=3D<optimized out>) at /usr/src/sys/kern/vfs_bio.c:5020 #6 bufwait (bp=3Dbp@entry=3D0xfffffe00097da0a8) at /usr/src/sys/kern/vfs_bio.c:4433 #7 0xffffffff80cd285a in bufwrite (bp=3D0xfffffe00097da0a8, bp@entry=3D<er= ror reading variable: value is not available>) at /usr/src/sys/kern/vfs_bio.c:2= 305 #8 0xffffffff80f01789 in bwrite (bp=3D<unavailable>) at /usr/src/sys/sys/buf.h:430 #9 ffs_update (vp=3Dvp@entry=3D0xfffff80004c61380, waitfor=3Dwaitfor@entry= =3D1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204 #10 0xffffffff80f2f98a in ffs_syncvnode (vp=3Dvp@entry=3D0xfffff80004c61380, waitfor=3D<optimized out>, waitfor@entry=3D1, flags=3D<optimized out>, flag= s@entry=3D0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447 #11 0xffffffff80f0f91d in softdep_prelink (dvp=3Ddvp@entry=3D0xfffff80004c6= 1380, vp=3Dvp@entry=3D0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417 #12 0xffffffff80f3fee3 in ufs_makeinode (mode=3D33188, dvp=3D0xfffff80004c6= 1380, vpp=3D0xfffffe00aae0a9d8, cnp=3D<unavailable>, callfunc=3D<unavailable>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2741 #13 0xffffffff80f3bfa4 in ufs_create (ap=3D0xfffffe00aae0a8a8) at /usr/src/sys/ufs/ufs/ufs_vnops.c:213 #14 0xffffffff8118a31d in VOP_CREATE_APV (vop=3D0xffffffff81b63158 <ffs_vnodeops2>, a=3Da@entry=3D0xfffffe00aae0a8a8) at vnode_if.c:244 #15 0xffffffff80d15233 in VOP_CREATE (dvp=3D<unavailable>, vpp=3D0xfffffe00aae0a9d8, cnp=3D0xfffffe00aae0aa00, vap=3D0xfffffe00aae0a7f= 0) at ./vnode_if.h:133 #16 vn_open_cred (ndp=3Dndp@entry=3D0xfffffe00aae0a968, flagp=3Dflagp@entry=3D0xfffffe00aae0aa94, cmode=3Dcmode@entry=3D420, vn_open_flags=3D<optimized out>, vn_open_flags@entry=3D0, cred=3D0xfffff800= 48d42e00, fp=3D0xfffff8010aeabc30) at /usr/src/sys/kern/vfs_vnops.c:285 #17 0xffffffff80d14f6d in vn_open (ndp=3D<unavailable>, ndp@entry=3D0xfffffe00aae0a968, flagp=3D<unavailable>, flagp@entry=3D0xfffffe00aae0aa94, cmode=3D<unavailable>, cmode@entry=3D420, fp=3D<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:202 #18 0xffffffff80d08999 in kern_openat (td=3D0xfffffe00aa35ce00, fd=3D-100, path=3D0x8002fd420 <error: Cannot access memory at address 0x8002fd420>, pathseg=3DUIO_USERSPACE, flags=3D34306, mode=3D<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1142 #19 0xffffffff810c5803 in syscallenter (td=3D<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205 #20 amd64_syscall (td=3D0xfffffe00aa35ce00, traced=3D0) at /usr/src/sys/amd64/amd64/trap.c:1156 #21 <signal handler called> #22 0x00000008004f223a in ?? () chrome thread backtrace: #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1475 #1 0xffffffff8108afe9 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1432 #2 0xffffffff810c4256 in trap (frame=3D0xfffffe0009848f30) at /usr/src/sys/amd64/amd64/trap.c:201 #3 <signal handler called> #4 vtpci_legacy_notify_vq (dev=3D<optimized out>, queue=3D0, offset=3D16) = at /usr/src/sys/dev/virtio/pci/virtio_pci_legacy.c:485 #5 0xffffffff80a45417 in VIRTIO_BUS_NOTIFY_VQ (dev=3D0xfffff8000362fb00, queue=3D0, offset=3D16) at ./virtio_bus_if.h:144 #6 vq_ring_notify_host (vq=3D0xfffffe0063e27000) at /usr/src/sys/dev/virtio/virtqueue.c:834 #7 virtqueue_notify (vq=3D0xfffffe0063e27000, vq@entry=3D0xfffff8004de6f60= 0) at /usr/src/sys/dev/virtio/virtqueue.c:439 #8 0xffffffff80a538c0 in vtblk_startio (sc=3Dsc@entry=3D0xfffff8000362f100= ) at /usr/src/sys/dev/virtio/block/virtio_blk.c:1123 #9 0xffffffff80a53bed in vtblk_strategy (bp=3D0xfffff8004de6f600) at /usr/src/sys/dev/virtio/block/virtio_blk.c:571 #10 0xffffffff80b4bcfc in g_disk_start (bp=3D<optimized out>) at /usr/src/sys/geom/geom_disk.c:473 #11 0xffffffff80b4f147 in g_io_request (bp=3D0xfffff80021d33c00, cp=3D<opti= mized out>, cp@entry=3D0xfffff8000398ce80) at /usr/src/sys/geom/geom_io.c:589 #12 0xffffffff80b5b1a9 in g_part_start (bp=3D0xfffff8004e974900) at /usr/src/sys/geom/part/g_part.c:2332 #13 0xffffffff80b4f147 in g_io_request (bp=3D0xfffff8004e974900, cp=3D<opti= mized out>) at /usr/src/sys/geom/geom_io.c:589 #14 0xffffffff80cd284c in bstrategy (bp=3D0xfffffe0008ac5388) at /usr/src/sys/sys/buf.h:442 #15 bufwrite (bp=3D0xfffffe0008ac5388) at /usr/src/sys/kern/vfs_bio.c:2302 #16 0xffffffff80f01789 in bwrite (bp=3D0x0) at /usr/src/sys/sys/buf.h:430 #17 ffs_update (vp=3Dvp@entry=3D0xfffff80139495000, waitfor=3Dwaitfor@entry= =3D1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204 #18 0xffffffff80f2f98a in ffs_syncvnode (vp=3Dvp@entry=3D0xfffff80139495000, waitfor=3D<optimized out>, waitfor@entry=3D1, flags=3D<optimized out>, flag= s@entry=3D0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447 #19 0xffffffff80f0f86f in softdep_prelink (dvp=3Ddvp@entry=3D0xfffff8013949= 5000, vp=3Dvp@entry=3D0xfffff8013c8328c0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3= 417 #20 0xffffffff80f3d797 in ufs_remove (ap=3D0xfffffe00aabdfa20) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1011 #21 0xffffffff8118bf90 in VOP_REMOVE_APV (vop=3D0xffffffff81b63158 <ffs_vnodeops2>, a=3Da@entry=3D0xfffffe00aabdfa20) at vnode_if.c:1540 #22 0xffffffff80d0a468 in VOP_REMOVE (dvp=3D0x0, vp=3D0xfffff8013c8328c0, cnp=3D<optimized out>) at ./vnode_if.h:802 #23 kern_funlinkat (td=3D0xfffffe00aa6e3100, dfd=3Ddfd@entry=3D-100, path= =3D0x8288d40e0 <error: Cannot access memory at address 0x8288d40e0>, fd=3D<optimized out>, fd@entry=3D-200, pathseg=3Dpathseg@entry=3DUIO_USERSPACE, flag=3D<optimized= out>, flag@entry=3D0, oldinum=3D0) at /usr/src/sys/kern/vfs_syscalls.c:1927 #24 0xffffffff80d0a138 in sys_unlink (td=3D0xfffff8000362fb00, uap=3D<optim= ized out>) at /usr/src/sys/kern/vfs_syscalls.c:1808 #25 0xffffffff810c5803 in syscallenter (td=3D<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205 #26 amd64_syscall (td=3D0xfffffe00aa6e3100, traced=3D0) at /usr/src/sys/amd64/amd64/trap.c:1156 #27 <signal handler called> #28 0x000000080e40d17a in ?? () syncer backtrace: #0 sched_switch (td=3Dtd@entry=3D0xfffffe00a5e29100, flags=3D<optimized ou= t>, flags@entry=3D260) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80c1f4c9 in mi_switch (flags=3Dflags@entry=3D260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff80c6f929 in sleepq_switch (wchan=3Dwchan@entry=3D0xffffffff81= fa9550 <sync_wakeup>, pri=3Dpri@entry=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:= 608 #3 0xffffffff80c6fe3b in sleepq_timedwait (wchan=3Dwchan@entry=3D0xffffffff81fa9550 <sync_wakeup>, pri=3Dpri@entry=3D= 0) at /usr/src/sys/kern/subr_sleepqueue.c:690 #4 0xffffffff80ba34b0 in _cv_timedwait_sbt (cvp=3D0xffffffff81fa9550 <sync_wakeup>, lock=3D0xffffffff81fa9520 <sync_mtx>, sbt=3D<optimized out>, pr=3D<optimized out>, pr@entry=3D0, flags=3D0, flags@entry=3D256) at /usr/src/sys/kern/kern_condvar.c:312 #5 0xffffffff80d036dc in sched_sync () at /usr/src/sys/kern/vfs_subr.c:2739 #6 0xffffffff80bcb9a0 in fork_exit (callout=3D0xffffffff80d03090 <sched_sy= nc>, arg=3D0x0, frame=3D0xfffffe006a491c00) at /usr/src/sys/kern/kern_fork.c:1077 #7 <signal handler called> It seems like some kind of livelock involving ERELOOKUP loops. I can only g= uess though, softupdates' is way too complicated for me. That's with cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 merged. This kernel h= as some local changes and I'm a little bit worried that this might be the prob= lem but I think it's unlikely. The problem happens pretty rarely and that's the only -CURRENT install on UFS that I'm working with so that's the best that = I've got. That's with a virtio disk backed by a ZFS volume on bhyve BTW. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224292-3630-MIee7Ko2wT>