Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Feb 2021 18:51:38 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 244048] mksnap_ffs hangs machine (12.1 regression over 11.3)
Message-ID:  <bug-244048-3630-RlsBk1Xrgy@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-244048-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-244048-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244048

--- Comment #6 from ml@netfence.it ---
After some investigation this is what I found.
(Notice I'm no kernel expert, so I just hope I'm not saying stupid things).

The thread originating from mksnap_ffs is stuck in softdep_check_suspend,
sleeping on mp->mnt_secondary_writes ("secwr" for userland utilities).
Full backtrace:
#0  sched_switch (td=3D0xfffff8000237e760, newtd=3D0xfffff8000212c760,
flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143
#1  0xffffffff805a5294 in mi_switch (flags=3D260, newtd=3D0x0) at
/usr/src/sys/kern/kern_synch.c:452
#2  0xffffffff805f272b in sleepq_switch (wchan=3D0xfffff80004120a00, pri=3D=
119) at
/usr/src/sys/kern/subr_sleepqueue.c:626
#3  0xffffffff805f25c3 in sleepq_wait (wchan=3D0xfffff80004120a00, pri=3D11=
9) at
/usr/src/sys/kern/subr_sleepqueue.c:705
#4  0xffffffff805a4a6b in _sleep (ident=3D0xfffff80004120a00, lock=3D<optim=
ized
out>, priority=3D631, wmesg=3D0xffffffff80921d1b "secwr", sbt=3D0, pr=3D0, =
flags=3D256)
at /usr/src/sys/kern/kern_synch.c:217
#5  0xffffffff807b6324 in softdep_check_suspend (mp=3D0xfffff80004120000,
devvp=3D0xfffff800041ff780, softdep_depcnt=3D15, softdep_accdepcnt=3D50188,
secondary_writes=3D1, secondary_accwrites=3D106) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:14299
#6  0xffffffff807c19a6 in ffs_sync (mp=3D0xfffff80004120000, waitfor=3D4) at
/usr/src/sys/ufs/ffs/ffs_vfsops.c:1620
#7  0xffffffff8067c8bf in vfs_write_suspend (mp=3D0xfffff80004120000, flags=
=3D0) at
/usr/src/sys/kern/vfs_vnops.c:1864
#8  0xffffffff8079c0b9 in ffs_snapshot (mp=3D0xfffff80004120000,
snapfile=3D0xfffff80004214780
"A\003\215\200\377\377\377\377\070\232\254\200\377\377\377\377 \b\361\035")=
 at
/usr/src/sys/ufs/ffs/ffs_snapshot.c:430
#9  0xffffffff807bfe5a in ffs_mount (mp=3D<unavailable>) at
/usr/src/sys/ufs/ffs/ffs_vfsops.c:479
#10 0xffffffff80661a54 in vfs_domount_update (td=3D0xfffff80080ac9401,
vp=3D<optimized out>, fsflags=3D<optimized out>, optlist=3D<optimized out>)=
 at
/usr/src/sys/kern/vfs_mount.c:1037
#11 vfs_domount (td=3D0xfffff80080ac9401, fstype=3D<optimized out>,
fspath=3D<optimized out>, fsflags=3D<optimized out>, optlist=3D0xfffffe0000=
53aa38) at
/usr/src/sys/kern/vfs_mount.c:1191
#12 0xffffffff80660b27 in vfs_donmount (td=3D0xfffff8000237e760, fsflags=3D=
2166784,
fsoptions=3D0xfffff80004108600) at /usr/src/sys/kern/vfs_mount.c:726
#13 0xffffffff80660312 in sys_nmount (td=3D0xfffff8000237e760,
uap=3D0xfffff8000237eb20) at /usr/src/sys/kern/vfs_mount.c:431
#14 0xffffffff808418b7 in syscallenter (td=3D0xfffff8000237e760) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#15 amd64_syscall (td=3D0xfffff8000237e760, traced=3D0) at
/usr/src/sys/amd64/amd64/trap.c:1163
#16 <signal handler called>
#17 0x00000008002dcb9a in ?? ()

I *think* it should be awakened by softdep_flush thread, in function
process_worklist_item.
Alas, this is stuck waiting for a buffer in bufspace_wait.
Full backtrace:
#0  sched_switch (td=3D0xfffff8000425f760, newtd=3D0xfffff800040df000,
flags=3D<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143
#1  0xffffffff805a5294 in mi_switch (flags=3D260, newtd=3D0x0) at
/usr/src/sys/kern/kern_synch.c:452
#2  0xffffffff805f272b in sleepq_switch (wchan=3D0xffffffff80a0a8b8
<bdomain+33208>, pri=3D96) at /usr/src/sys/kern/subr_sleepqueue.c:626
#3  0xffffffff805f25c3 in sleepq_wait (wchan=3D0xffffffff80a0a8b8
<bdomain+33208>, pri=3D96) at /usr/src/sys/kern/subr_sleepqueue.c:705
#4  0xffffffff805a4a6b in _sleep (ident=3D0xffffffff80a0a8b8 <bdomain+33208=
>,
lock=3D<optimized out>, priority=3D96, wmesg=3D0xffffffff808e5676 "newbuf",=
 sbt=3D0,
pr=3D0, flags=3D256) at /usr/src/sys/kern/kern_synch.c:217
#5  0xffffffff8064f6ff in bufspace_wait (bd=3D0xffffffff80a02700 <bdomain>,
vp=3D0xfffff800042145a0, gbflags=3D<optimized out>, slpflag=3D<optimized ou=
t>,
slptimeo=3D<optimized out>) at /usr/src/sys/kern/vfs_bio.c:773
#6  0xffffffff8064bbfc in getnewbuf (vp=3D<optimized out>, slpflag=3D0, slp=
timeo=3D0,
maxsize=3D32768, gbflags=3D0) at /usr/src/sys/kern/vfs_bio.c:3284
#7  0xffffffff80649155 in getblkx (vp=3D0xfffff800042145a0, blkno=3D<optimi=
zed
out>, size=3D32768, slpflag=3D0, slptimeo=3D0, flags=3D0, bpp=3D0xfffffe000=
05853d8) at
/usr/src/sys/kern/vfs_bio.c:4022
#8  0xffffffff8064b905 in getblk (vp=3D<unavailable>, blkno=3D<unavailable>,
size=3D<unavailable>, slpflag=3D<unavailable>, slptimeo=3D<unavailable>,
flags=3D<unavailable>) at /usr/src/sys/kern/vfs_bio.c:3802
#9  0xffffffff807c69b5 in readindir (vp=3D<unavailable>, lbn=3D<unavailable=
>,
daddr=3D56528, bpp=3D0xfffffe0000585478) at /usr/src/sys/ufs/ufs/ufs_bmap.c=
:111
#10 0xffffffff807c6468 in ufs_bmaparray (vp=3D0xfffff800042145a0, bn=3D-604=
9804,
bnp=3D0xfffffe0000585518, nbp=3D<optimized out>, runp=3D<optimized out>, ru=
nb=3D0x0) at
/usr/src/sys/ufs/ufs/ufs_bmap.c:266
#11 0xffffffff807d3975 in ufs_strategy (ap=3D<optimized out>) at
/usr/src/sys/ufs/ufs/ufs_vnops.c:2309
#12 0xffffffff808b9911 in VOP_STRATEGY_APV (vop=3D0xffffffff80aca540
<ufs_vnodeops>, a=3D0xfffffe0000585570) at vnode_if.c:2279
#13 0xffffffff80647114 in VOP_STRATEGY (vp=3D<unavailable>,
bp=3D0xfffffe000090b5c0) at ./vnode_if.h:940
#14 bufstrategy (bo=3D<optimized out>, bp=3D0xfffffe000090b5c0) at
/usr/src/sys/kern/vfs_bio.c:4999
#15 0xffffffff80648b1e in bstrategy (bp=3D<optimized out>) at
/usr/src/sys/sys/buf.h:419
#16 breadn_flags (vp=3D<optimized out>, blkno=3D<optimized out>, size=3D<op=
timized
out>, rablkno=3D0x0, rabsize=3D0x0, cnt=3D0, cred=3D0x0, flags=3D0, ckhashf=
unc=3D0x0,
bpp=3D0xfffffe0000585780) at /usr/src/sys/kern/vfs_bio.c:2181
#17 0xffffffff807982b8 in ffs_balloc_ufs2 (vp=3D<optimized out>,
startoffset=3D<optimized out>, size=3D<optimized out>, cred=3D0xfffff800021=
1b000,
flags=3D<optimized out>, bpp=3D0xfffffe0000585820) at
/usr/src/sys/ufs/ffs/ffs_balloc.c:894
#18 0xffffffff8079fb22 in ffs_snapblkfree (fs=3D0xfffffe0011c76000,
devvp=3D<optimized out>, bno=3D48416024, size=3D32768, inum=3D5, vtype=3DVR=
EG,
wkhd=3D0xfffffe0000585950) at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1790
#19 0xffffffff80790d16 in ffs_blkfree (ump=3D0xfffff80004116800,
fs=3D0xfffffe0011c76000, devvp=3D0xfffff800041ff780, bno=3D48416024, size=
=3D32768,
inum=3D5, vtype=3DVREG, dephd=3D0xfffffe0000585950, key=3D2) at
/usr/src/sys/ufs/ffs/ffs_alloc.c:2602
#20 0xffffffff807baa4f in indir_trunc (freework=3D0xfffff800048ed480,
dbn=3D<optimized out>, lbn=3D<optimized out>) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:8259
#21 0xffffffff807ba93b in indir_trunc (freework=3D0xfffff800048ed480,
dbn=3D<optimized out>, lbn=3D<optimized out>) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:8240
#22 0xffffffff807ad4f9 in handle_workitem_indirblk (freework=3D<optimized o=
ut>)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:7875
#23 handle_workitem_freeblocks (freeblks=3D0xfffff800048eda00, flags=3D512)=
 at
/usr/src/sys/ufs/ffs/ffs_softdep.c:7970
#24 0xffffffff807b5ba1 in process_worklist_item (mp=3D0xfffff80004120000,
target=3D10, flags=3D512) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1806
#25 0xffffffff807a1e92 in softdep_process_worklist (mp=3D0xfffff80004120000,
full=3D0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1600
#26 0xffffffff807a580f in softdep_flush (addr=3D0xfffff80004120000) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:1402
#27 0xffffffff80556f8c in fork_exit (callout=3D0xffffffff807a5720
<softdep_flush>, arg=3D0xfffff80004120000, frame=3D0xfffffe0000585c00) at
/usr/src/sys/kern/kern_fork.c:1080
#28 <signal handler called>

The above explains why mksnap_ffs is halted; however the whole machine is
hanged.
It seems almost any user thread (e.g. an "ls" *on a different filesystem*) =
is
stuck in bufspace_wait.

Neither buf_daemon thread, nor its child bufspace_daemon are stuck (they are
running their normal loop).


So far I wasn't able to pinpoint what changed between 11.4 and 12.1 to cause
this.
Any hint appreciated.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244048-3630-RlsBk1Xrgy>