Date: Sun, 21 Feb 2021 18:51:38 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 244048] mksnap_ffs hangs machine (12.1 regression over 11.3) Message-ID: <bug-244048-3630-RlsBk1Xrgy@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-244048-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-244048-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244048 --- Comment #6 from ml@netfence.it --- After some investigation this is what I found. (Notice I'm no kernel expert, so I just hope I'm not saying stupid things). The thread originating from mksnap_ffs is stuck in softdep_check_suspend, sleeping on mp->mnt_secondary_writes ("secwr" for userland utilities). Full backtrace: #0 sched_switch (td=3D0xfffff8000237e760, newtd=3D0xfffff8000212c760, flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143 #1 0xffffffff805a5294 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:452 #2 0xffffffff805f272b in sleepq_switch (wchan=3D0xfffff80004120a00, pri=3D= 119) at /usr/src/sys/kern/subr_sleepqueue.c:626 #3 0xffffffff805f25c3 in sleepq_wait (wchan=3D0xfffff80004120a00, pri=3D11= 9) at /usr/src/sys/kern/subr_sleepqueue.c:705 #4 0xffffffff805a4a6b in _sleep (ident=3D0xfffff80004120a00, lock=3D<optim= ized out>, priority=3D631, wmesg=3D0xffffffff80921d1b "secwr", sbt=3D0, pr=3D0, = flags=3D256) at /usr/src/sys/kern/kern_synch.c:217 #5 0xffffffff807b6324 in softdep_check_suspend (mp=3D0xfffff80004120000, devvp=3D0xfffff800041ff780, softdep_depcnt=3D15, softdep_accdepcnt=3D50188, secondary_writes=3D1, secondary_accwrites=3D106) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14299 #6 0xffffffff807c19a6 in ffs_sync (mp=3D0xfffff80004120000, waitfor=3D4) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1620 #7 0xffffffff8067c8bf in vfs_write_suspend (mp=3D0xfffff80004120000, flags= =3D0) at /usr/src/sys/kern/vfs_vnops.c:1864 #8 0xffffffff8079c0b9 in ffs_snapshot (mp=3D0xfffff80004120000, snapfile=3D0xfffff80004214780 "A\003\215\200\377\377\377\377\070\232\254\200\377\377\377\377 \b\361\035")= at /usr/src/sys/ufs/ffs/ffs_snapshot.c:430 #9 0xffffffff807bfe5a in ffs_mount (mp=3D<unavailable>) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:479 #10 0xffffffff80661a54 in vfs_domount_update (td=3D0xfffff80080ac9401, vp=3D<optimized out>, fsflags=3D<optimized out>, optlist=3D<optimized out>)= at /usr/src/sys/kern/vfs_mount.c:1037 #11 vfs_domount (td=3D0xfffff80080ac9401, fstype=3D<optimized out>, fspath=3D<optimized out>, fsflags=3D<optimized out>, optlist=3D0xfffffe0000= 53aa38) at /usr/src/sys/kern/vfs_mount.c:1191 #12 0xffffffff80660b27 in vfs_donmount (td=3D0xfffff8000237e760, fsflags=3D= 2166784, fsoptions=3D0xfffff80004108600) at /usr/src/sys/kern/vfs_mount.c:726 #13 0xffffffff80660312 in sys_nmount (td=3D0xfffff8000237e760, uap=3D0xfffff8000237eb20) at /usr/src/sys/kern/vfs_mount.c:431 #14 0xffffffff808418b7 in syscallenter (td=3D0xfffff8000237e760) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #15 amd64_syscall (td=3D0xfffff8000237e760, traced=3D0) at /usr/src/sys/amd64/amd64/trap.c:1163 #16 <signal handler called> #17 0x00000008002dcb9a in ?? () I *think* it should be awakened by softdep_flush thread, in function process_worklist_item. Alas, this is stuck waiting for a buffer in bufspace_wait. Full backtrace: #0 sched_switch (td=3D0xfffff8000425f760, newtd=3D0xfffff800040df000, flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143 #1 0xffffffff805a5294 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:452 #2 0xffffffff805f272b in sleepq_switch (wchan=3D0xffffffff80a0a8b8 <bdomain+33208>, pri=3D96) at /usr/src/sys/kern/subr_sleepqueue.c:626 #3 0xffffffff805f25c3 in sleepq_wait (wchan=3D0xffffffff80a0a8b8 <bdomain+33208>, pri=3D96) at /usr/src/sys/kern/subr_sleepqueue.c:705 #4 0xffffffff805a4a6b in _sleep (ident=3D0xffffffff80a0a8b8 <bdomain+33208= >, lock=3D<optimized out>, priority=3D96, wmesg=3D0xffffffff808e5676 "newbuf",= sbt=3D0, pr=3D0, flags=3D256) at /usr/src/sys/kern/kern_synch.c:217 #5 0xffffffff8064f6ff in bufspace_wait (bd=3D0xffffffff80a02700 <bdomain>, vp=3D0xfffff800042145a0, gbflags=3D<optimized out>, slpflag=3D<optimized ou= t>, slptimeo=3D<optimized out>) at /usr/src/sys/kern/vfs_bio.c:773 #6 0xffffffff8064bbfc in getnewbuf (vp=3D<optimized out>, slpflag=3D0, slp= timeo=3D0, maxsize=3D32768, gbflags=3D0) at /usr/src/sys/kern/vfs_bio.c:3284 #7 0xffffffff80649155 in getblkx (vp=3D0xfffff800042145a0, blkno=3D<optimi= zed out>, size=3D32768, slpflag=3D0, slptimeo=3D0, flags=3D0, bpp=3D0xfffffe000= 05853d8) at /usr/src/sys/kern/vfs_bio.c:4022 #8 0xffffffff8064b905 in getblk (vp=3D<unavailable>, blkno=3D<unavailable>, size=3D<unavailable>, slpflag=3D<unavailable>, slptimeo=3D<unavailable>, flags=3D<unavailable>) at /usr/src/sys/kern/vfs_bio.c:3802 #9 0xffffffff807c69b5 in readindir (vp=3D<unavailable>, lbn=3D<unavailable= >, daddr=3D56528, bpp=3D0xfffffe0000585478) at /usr/src/sys/ufs/ufs/ufs_bmap.c= :111 #10 0xffffffff807c6468 in ufs_bmaparray (vp=3D0xfffff800042145a0, bn=3D-604= 9804, bnp=3D0xfffffe0000585518, nbp=3D<optimized out>, runp=3D<optimized out>, ru= nb=3D0x0) at /usr/src/sys/ufs/ufs/ufs_bmap.c:266 #11 0xffffffff807d3975 in ufs_strategy (ap=3D<optimized out>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2309 #12 0xffffffff808b9911 in VOP_STRATEGY_APV (vop=3D0xffffffff80aca540 <ufs_vnodeops>, a=3D0xfffffe0000585570) at vnode_if.c:2279 #13 0xffffffff80647114 in VOP_STRATEGY (vp=3D<unavailable>, bp=3D0xfffffe000090b5c0) at ./vnode_if.h:940 #14 bufstrategy (bo=3D<optimized out>, bp=3D0xfffffe000090b5c0) at /usr/src/sys/kern/vfs_bio.c:4999 #15 0xffffffff80648b1e in bstrategy (bp=3D<optimized out>) at /usr/src/sys/sys/buf.h:419 #16 breadn_flags (vp=3D<optimized out>, blkno=3D<optimized out>, size=3D<op= timized out>, rablkno=3D0x0, rabsize=3D0x0, cnt=3D0, cred=3D0x0, flags=3D0, ckhashf= unc=3D0x0, bpp=3D0xfffffe0000585780) at /usr/src/sys/kern/vfs_bio.c:2181 #17 0xffffffff807982b8 in ffs_balloc_ufs2 (vp=3D<optimized out>, startoffset=3D<optimized out>, size=3D<optimized out>, cred=3D0xfffff800021= 1b000, flags=3D<optimized out>, bpp=3D0xfffffe0000585820) at /usr/src/sys/ufs/ffs/ffs_balloc.c:894 #18 0xffffffff8079fb22 in ffs_snapblkfree (fs=3D0xfffffe0011c76000, devvp=3D<optimized out>, bno=3D48416024, size=3D32768, inum=3D5, vtype=3DVR= EG, wkhd=3D0xfffffe0000585950) at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1790 #19 0xffffffff80790d16 in ffs_blkfree (ump=3D0xfffff80004116800, fs=3D0xfffffe0011c76000, devvp=3D0xfffff800041ff780, bno=3D48416024, size= =3D32768, inum=3D5, vtype=3DVREG, dephd=3D0xfffffe0000585950, key=3D2) at /usr/src/sys/ufs/ffs/ffs_alloc.c:2602 #20 0xffffffff807baa4f in indir_trunc (freework=3D0xfffff800048ed480, dbn=3D<optimized out>, lbn=3D<optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:8259 #21 0xffffffff807ba93b in indir_trunc (freework=3D0xfffff800048ed480, dbn=3D<optimized out>, lbn=3D<optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:8240 #22 0xffffffff807ad4f9 in handle_workitem_indirblk (freework=3D<optimized o= ut>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:7875 #23 handle_workitem_freeblocks (freeblks=3D0xfffff800048eda00, flags=3D512)= at /usr/src/sys/ufs/ffs/ffs_softdep.c:7970 #24 0xffffffff807b5ba1 in process_worklist_item (mp=3D0xfffff80004120000, target=3D10, flags=3D512) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1806 #25 0xffffffff807a1e92 in softdep_process_worklist (mp=3D0xfffff80004120000, full=3D0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1600 #26 0xffffffff807a580f in softdep_flush (addr=3D0xfffff80004120000) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1402 #27 0xffffffff80556f8c in fork_exit (callout=3D0xffffffff807a5720 <softdep_flush>, arg=3D0xfffff80004120000, frame=3D0xfffffe0000585c00) at /usr/src/sys/kern/kern_fork.c:1080 #28 <signal handler called> The above explains why mksnap_ffs is halted; however the whole machine is hanged. It seems almost any user thread (e.g. an "ls" *on a different filesystem*) = is stuck in bufspace_wait. Neither buf_daemon thread, nor its child bufspace_daemon are stuck (they are running their normal loop). So far I wasn't able to pinpoint what changed between 11.4 and 12.1 to cause this. Any hint appreciated. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244048-3630-RlsBk1Xrgy>