Date: Wed, 31 Mar 2021 08:12:59 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 224292] processes are hanging in state ufs / possible deadlock in file system Message-ID: <bug-224292-3630-MIee7Ko2wT@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-224292-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-224292-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224292 --- Comment #18 from sigsys@gmail.com --- (In reply to Konstantin Belousov from comment #17) This sure seems to have helped. I was about to report that the problem is most likely gone since it hadn't happened in a while (despite running kyua in a loop for hours) after getting that patch series. But then it happened again with chrome this time and I got a dump. Dunno if running "sync" would have unwedged the whole thing since I made it panic instead. There were two threads from two processes looping and doing crazy I/O: a chrome process and a zsh process. zsh thread backtrace: #0 sched_switch (td=td@entry=0xfffffe00aa35ce00, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80c1f4c9 in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff80c6f929 in sleepq_switch (wchan=wchan@entry=0xfffffe00097da0a8, pri=92, pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608 #3 0xffffffff80c6f7fe in sleepq_wait (wchan=<optimized out>, pri=<optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:659 #4 0xffffffff80c1e9e6 in _sleep (ident=ident@entry=0xfffffe00097da0a8, lock=<optimized out>, lock@entry=0xfffffe000863b0c0, priority=priority@entry=92, wmesg=<optimized out>, sbt=sbt@entry=0, pr=pr@entry=0, flags=256) at /usr/src/sys/kern/kern_synch.c:221 #5 0xffffffff80cd5214 in bwait (bp=0xfffffe00097da0a8, pri=92 '\\', wchan=<optimized out>) at /usr/src/sys/kern/vfs_bio.c:5020 #6 bufwait (bp=bp@entry=0xfffffe00097da0a8) at /usr/src/sys/kern/vfs_bio.c:4433 #7 0xffffffff80cd285a in bufwrite (bp=0xfffffe00097da0a8, bp@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/vfs_bio.c:2305 #8 0xffffffff80f01789 in bwrite (bp=<unavailable>) at /usr/src/sys/sys/buf.h:430 #9 ffs_update (vp=vp@entry=0xfffff80004c61380, waitfor=waitfor@entry=1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204 #10 0xffffffff80f2f98a in ffs_syncvnode (vp=vp@entry=0xfffff80004c61380, waitfor=<optimized out>, waitfor@entry=1, flags=<optimized out>, flags@entry=0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447 #11 0xffffffff80f0f91d in softdep_prelink (dvp=dvp@entry=0xfffff80004c61380, vp=vp@entry=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417 #12 0xffffffff80f3fee3 in ufs_makeinode (mode=33188, dvp=0xfffff80004c61380, vpp=0xfffffe00aae0a9d8, cnp=<unavailable>, callfunc=<unavailable>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2741 #13 0xffffffff80f3bfa4 in ufs_create (ap=0xfffffe00aae0a8a8) at /usr/src/sys/ufs/ufs/ufs_vnops.c:213 #14 0xffffffff8118a31d in VOP_CREATE_APV (vop=0xffffffff81b63158 <ffs_vnodeops2>, a=a@entry=0xfffffe00aae0a8a8) at vnode_if.c:244 #15 0xffffffff80d15233 in VOP_CREATE (dvp=<unavailable>, vpp=0xfffffe00aae0a9d8, cnp=0xfffffe00aae0aa00, vap=0xfffffe00aae0a7f0) at ./vnode_if.h:133 #16 vn_open_cred (ndp=ndp@entry=0xfffffe00aae0a968, flagp=flagp@entry=0xfffffe00aae0aa94, cmode=cmode@entry=420, vn_open_flags=<optimized out>, vn_open_flags@entry=0, cred=0xfffff80048d42e00, fp=0xfffff8010aeabc30) at /usr/src/sys/kern/vfs_vnops.c:285 #17 0xffffffff80d14f6d in vn_open (ndp=<unavailable>, ndp@entry=0xfffffe00aae0a968, flagp=<unavailable>, flagp@entry=0xfffffe00aae0aa94, cmode=<unavailable>, cmode@entry=420, fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:202 #18 0xffffffff80d08999 in kern_openat (td=0xfffffe00aa35ce00, fd=-100, path=0x8002fd420 <error: Cannot access memory at address 0x8002fd420>, pathseg=UIO_USERSPACE, flags=34306, mode=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1142 #19 0xffffffff810c5803 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205 #20 amd64_syscall (td=0xfffffe00aa35ce00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156 #21 <signal handler called> #22 0x00000008004f223a in ?? () chrome thread backtrace: #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1475 #1 0xffffffff8108afe9 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1432 #2 0xffffffff810c4256 in trap (frame=0xfffffe0009848f30) at /usr/src/sys/amd64/amd64/trap.c:201 #3 <signal handler called> #4 vtpci_legacy_notify_vq (dev=<optimized out>, queue=0, offset=16) at /usr/src/sys/dev/virtio/pci/virtio_pci_legacy.c:485 #5 0xffffffff80a45417 in VIRTIO_BUS_NOTIFY_VQ (dev=0xfffff8000362fb00, queue=0, offset=16) at ./virtio_bus_if.h:144 #6 vq_ring_notify_host (vq=0xfffffe0063e27000) at /usr/src/sys/dev/virtio/virtqueue.c:834 #7 virtqueue_notify (vq=0xfffffe0063e27000, vq@entry=0xfffff8004de6f600) at /usr/src/sys/dev/virtio/virtqueue.c:439 #8 0xffffffff80a538c0 in vtblk_startio (sc=sc@entry=0xfffff8000362f100) at /usr/src/sys/dev/virtio/block/virtio_blk.c:1123 #9 0xffffffff80a53bed in vtblk_strategy (bp=0xfffff8004de6f600) at /usr/src/sys/dev/virtio/block/virtio_blk.c:571 #10 0xffffffff80b4bcfc in g_disk_start (bp=<optimized out>) at /usr/src/sys/geom/geom_disk.c:473 #11 0xffffffff80b4f147 in g_io_request (bp=0xfffff80021d33c00, cp=<optimized out>, cp@entry=0xfffff8000398ce80) at /usr/src/sys/geom/geom_io.c:589 #12 0xffffffff80b5b1a9 in g_part_start (bp=0xfffff8004e974900) at /usr/src/sys/geom/part/g_part.c:2332 #13 0xffffffff80b4f147 in g_io_request (bp=0xfffff8004e974900, cp=<optimized out>) at /usr/src/sys/geom/geom_io.c:589 #14 0xffffffff80cd284c in bstrategy (bp=0xfffffe0008ac5388) at /usr/src/sys/sys/buf.h:442 #15 bufwrite (bp=0xfffffe0008ac5388) at /usr/src/sys/kern/vfs_bio.c:2302 #16 0xffffffff80f01789 in bwrite (bp=0x0) at /usr/src/sys/sys/buf.h:430 #17 ffs_update (vp=vp@entry=0xfffff80139495000, waitfor=waitfor@entry=1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204 #18 0xffffffff80f2f98a in ffs_syncvnode (vp=vp@entry=0xfffff80139495000, waitfor=<optimized out>, waitfor@entry=1, flags=<optimized out>, flags@entry=0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447 #19 0xffffffff80f0f86f in softdep_prelink (dvp=dvp@entry=0xfffff80139495000, vp=vp@entry=0xfffff8013c8328c0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417 #20 0xffffffff80f3d797 in ufs_remove (ap=0xfffffe00aabdfa20) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1011 #21 0xffffffff8118bf90 in VOP_REMOVE_APV (vop=0xffffffff81b63158 <ffs_vnodeops2>, a=a@entry=0xfffffe00aabdfa20) at vnode_if.c:1540 #22 0xffffffff80d0a468 in VOP_REMOVE (dvp=0x0, vp=0xfffff8013c8328c0, cnp=<optimized out>) at ./vnode_if.h:802 #23 kern_funlinkat (td=0xfffffe00aa6e3100, dfd=dfd@entry=-100, path=0x8288d40e0 <error: Cannot access memory at address 0x8288d40e0>, fd=<optimized out>, fd@entry=-200, pathseg=pathseg@entry=UIO_USERSPACE, flag=<optimized out>, flag@entry=0, oldinum=0) at /usr/src/sys/kern/vfs_syscalls.c:1927 #24 0xffffffff80d0a138 in sys_unlink (td=0xfffff8000362fb00, uap=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1808 #25 0xffffffff810c5803 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205 #26 amd64_syscall (td=0xfffffe00aa6e3100, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156 #27 <signal handler called> #28 0x000000080e40d17a in ?? () syncer backtrace: #0 sched_switch (td=td@entry=0xfffffe00a5e29100, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80c1f4c9 in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff80c6f929 in sleepq_switch (wchan=wchan@entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608 #3 0xffffffff80c6fe3b in sleepq_timedwait (wchan=wchan@entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:690 #4 0xffffffff80ba34b0 in _cv_timedwait_sbt (cvp=0xffffffff81fa9550 <sync_wakeup>, lock=0xffffffff81fa9520 <sync_mtx>, sbt=<optimized out>, pr=<optimized out>, pr@entry=0, flags=0, flags@entry=256) at /usr/src/sys/kern/kern_condvar.c:312 #5 0xffffffff80d036dc in sched_sync () at /usr/src/sys/kern/vfs_subr.c:2739 #6 0xffffffff80bcb9a0 in fork_exit (callout=0xffffffff80d03090 <sched_sync>, arg=0x0, frame=0xfffffe006a491c00) at /usr/src/sys/kern/kern_fork.c:1077 #7 <signal handler called> It seems like some kind of livelock involving ERELOOKUP loops. I can only guess though, softupdates' is way too complicated for me. That's with cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 merged. This kernel has some local changes and I'm a little bit worried that this might be the problem but I think it's unlikely. The problem happens pretty rarely and that's the only -CURRENT install on UFS that I'm working with so that's the best that I've got. That's with a virtio disk backed by a ZFS volume on bhyve BTW. -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224292-3630-MIee7Ko2wT>
