From owner-freebsd-fs@freebsd.org Sun Feb 21 18:51:38 2021 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6F6A255E53D for ; Sun, 21 Feb 2021 18:51:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4DkDtf2Qkzz3GxZ for ; Sun, 21 Feb 2021 18:51:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 5333255E53C; Sun, 21 Feb 2021 18:51:38 +0000 (UTC) Delivered-To: fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 52EF655E161 for ; Sun, 21 Feb 2021 18:51:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DkDtf1r2xz3H5Y for ; Sun, 21 Feb 2021 18:51:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 30F14177F5 for ; Sun, 21 Feb 2021 18:51:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 11LIpcrr055927 for ; Sun, 21 Feb 2021 18:51:38 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 11LIpcvk055926 for fs@FreeBSD.org; Sun, 21 Feb 2021 18:51:38 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 244048] mksnap_ffs hangs machine (12.1 regression over 11.3) Date: Sun, 21 Feb 2021 18:51:38 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.2-RELEASE X-Bugzilla-Keywords: regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ml@netfence.it X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Feb 2021 18:51:38 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244048 --- Comment #6 from ml@netfence.it --- After some investigation this is what I found. (Notice I'm no kernel expert, so I just hope I'm not saying stupid things). The thread originating from mksnap_ffs is stuck in softdep_check_suspend, sleeping on mp->mnt_secondary_writes ("secwr" for userland utilities). Full backtrace: #0 sched_switch (td=3D0xfffff8000237e760, newtd=3D0xfffff8000212c760, flags=3D) at /usr/src/sys/kern/sched_ule.c:2143 #1 0xffffffff805a5294 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:452 #2 0xffffffff805f272b in sleepq_switch (wchan=3D0xfffff80004120a00, pri=3D= 119) at /usr/src/sys/kern/subr_sleepqueue.c:626 #3 0xffffffff805f25c3 in sleepq_wait (wchan=3D0xfffff80004120a00, pri=3D11= 9) at /usr/src/sys/kern/subr_sleepqueue.c:705 #4 0xffffffff805a4a6b in _sleep (ident=3D0xfffff80004120a00, lock=3D, priority=3D631, wmesg=3D0xffffffff80921d1b "secwr", sbt=3D0, pr=3D0, = flags=3D256) at /usr/src/sys/kern/kern_synch.c:217 #5 0xffffffff807b6324 in softdep_check_suspend (mp=3D0xfffff80004120000, devvp=3D0xfffff800041ff780, softdep_depcnt=3D15, softdep_accdepcnt=3D50188, secondary_writes=3D1, secondary_accwrites=3D106) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14299 #6 0xffffffff807c19a6 in ffs_sync (mp=3D0xfffff80004120000, waitfor=3D4) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1620 #7 0xffffffff8067c8bf in vfs_write_suspend (mp=3D0xfffff80004120000, flags= =3D0) at /usr/src/sys/kern/vfs_vnops.c:1864 #8 0xffffffff8079c0b9 in ffs_snapshot (mp=3D0xfffff80004120000, snapfile=3D0xfffff80004214780 "A\003\215\200\377\377\377\377\070\232\254\200\377\377\377\377 \b\361\035")= at /usr/src/sys/ufs/ffs/ffs_snapshot.c:430 #9 0xffffffff807bfe5a in ffs_mount (mp=3D) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:479 #10 0xffffffff80661a54 in vfs_domount_update (td=3D0xfffff80080ac9401, vp=3D, fsflags=3D, optlist=3D)= at /usr/src/sys/kern/vfs_mount.c:1037 #11 vfs_domount (td=3D0xfffff80080ac9401, fstype=3D, fspath=3D, fsflags=3D, optlist=3D0xfffffe0000= 53aa38) at /usr/src/sys/kern/vfs_mount.c:1191 #12 0xffffffff80660b27 in vfs_donmount (td=3D0xfffff8000237e760, fsflags=3D= 2166784, fsoptions=3D0xfffff80004108600) at /usr/src/sys/kern/vfs_mount.c:726 #13 0xffffffff80660312 in sys_nmount (td=3D0xfffff8000237e760, uap=3D0xfffff8000237eb20) at /usr/src/sys/kern/vfs_mount.c:431 #14 0xffffffff808418b7 in syscallenter (td=3D0xfffff8000237e760) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #15 amd64_syscall (td=3D0xfffff8000237e760, traced=3D0) at /usr/src/sys/amd64/amd64/trap.c:1163 #16 #17 0x00000008002dcb9a in ?? () I *think* it should be awakened by softdep_flush thread, in function process_worklist_item. Alas, this is stuck waiting for a buffer in bufspace_wait. Full backtrace: #0 sched_switch (td=3D0xfffff8000425f760, newtd=3D0xfffff800040df000, flags=3D) at /usr/src/sys/kern/sched_ule.c:2143 #1 0xffffffff805a5294 in mi_switch (flags=3D260, newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:452 #2 0xffffffff805f272b in sleepq_switch (wchan=3D0xffffffff80a0a8b8 , pri=3D96) at /usr/src/sys/kern/subr_sleepqueue.c:626 #3 0xffffffff805f25c3 in sleepq_wait (wchan=3D0xffffffff80a0a8b8 , pri=3D96) at /usr/src/sys/kern/subr_sleepqueue.c:705 #4 0xffffffff805a4a6b in _sleep (ident=3D0xffffffff80a0a8b8 , lock=3D, priority=3D96, wmesg=3D0xffffffff808e5676 "newbuf",= sbt=3D0, pr=3D0, flags=3D256) at /usr/src/sys/kern/kern_synch.c:217 #5 0xffffffff8064f6ff in bufspace_wait (bd=3D0xffffffff80a02700 , vp=3D0xfffff800042145a0, gbflags=3D, slpflag=3D, slptimeo=3D) at /usr/src/sys/kern/vfs_bio.c:773 #6 0xffffffff8064bbfc in getnewbuf (vp=3D, slpflag=3D0, slp= timeo=3D0, maxsize=3D32768, gbflags=3D0) at /usr/src/sys/kern/vfs_bio.c:3284 #7 0xffffffff80649155 in getblkx (vp=3D0xfffff800042145a0, blkno=3D, size=3D32768, slpflag=3D0, slptimeo=3D0, flags=3D0, bpp=3D0xfffffe000= 05853d8) at /usr/src/sys/kern/vfs_bio.c:4022 #8 0xffffffff8064b905 in getblk (vp=3D, blkno=3D, size=3D, slpflag=3D, slptimeo=3D, flags=3D) at /usr/src/sys/kern/vfs_bio.c:3802 #9 0xffffffff807c69b5 in readindir (vp=3D, lbn=3D, daddr=3D56528, bpp=3D0xfffffe0000585478) at /usr/src/sys/ufs/ufs/ufs_bmap.c= :111 #10 0xffffffff807c6468 in ufs_bmaparray (vp=3D0xfffff800042145a0, bn=3D-604= 9804, bnp=3D0xfffffe0000585518, nbp=3D, runp=3D, ru= nb=3D0x0) at /usr/src/sys/ufs/ufs/ufs_bmap.c:266 #11 0xffffffff807d3975 in ufs_strategy (ap=3D) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2309 #12 0xffffffff808b9911 in VOP_STRATEGY_APV (vop=3D0xffffffff80aca540 , a=3D0xfffffe0000585570) at vnode_if.c:2279 #13 0xffffffff80647114 in VOP_STRATEGY (vp=3D, bp=3D0xfffffe000090b5c0) at ./vnode_if.h:940 #14 bufstrategy (bo=3D, bp=3D0xfffffe000090b5c0) at /usr/src/sys/kern/vfs_bio.c:4999 #15 0xffffffff80648b1e in bstrategy (bp=3D) at /usr/src/sys/sys/buf.h:419 #16 breadn_flags (vp=3D, blkno=3D, size=3D, rablkno=3D0x0, rabsize=3D0x0, cnt=3D0, cred=3D0x0, flags=3D0, ckhashf= unc=3D0x0, bpp=3D0xfffffe0000585780) at /usr/src/sys/kern/vfs_bio.c:2181 #17 0xffffffff807982b8 in ffs_balloc_ufs2 (vp=3D, startoffset=3D, size=3D, cred=3D0xfffff800021= 1b000, flags=3D, bpp=3D0xfffffe0000585820) at /usr/src/sys/ufs/ffs/ffs_balloc.c:894 #18 0xffffffff8079fb22 in ffs_snapblkfree (fs=3D0xfffffe0011c76000, devvp=3D, bno=3D48416024, size=3D32768, inum=3D5, vtype=3DVR= EG, wkhd=3D0xfffffe0000585950) at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1790 #19 0xffffffff80790d16 in ffs_blkfree (ump=3D0xfffff80004116800, fs=3D0xfffffe0011c76000, devvp=3D0xfffff800041ff780, bno=3D48416024, size= =3D32768, inum=3D5, vtype=3DVREG, dephd=3D0xfffffe0000585950, key=3D2) at /usr/src/sys/ufs/ffs/ffs_alloc.c:2602 #20 0xffffffff807baa4f in indir_trunc (freework=3D0xfffff800048ed480, dbn=3D, lbn=3D) at /usr/src/sys/ufs/ffs/ffs_softdep.c:8259 #21 0xffffffff807ba93b in indir_trunc (freework=3D0xfffff800048ed480, dbn=3D, lbn=3D) at /usr/src/sys/ufs/ffs/ffs_softdep.c:8240 #22 0xffffffff807ad4f9 in handle_workitem_indirblk (freework=3D) at /usr/src/sys/ufs/ffs/ffs_softdep.c:7875 #23 handle_workitem_freeblocks (freeblks=3D0xfffff800048eda00, flags=3D512)= at /usr/src/sys/ufs/ffs/ffs_softdep.c:7970 #24 0xffffffff807b5ba1 in process_worklist_item (mp=3D0xfffff80004120000, target=3D10, flags=3D512) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1806 #25 0xffffffff807a1e92 in softdep_process_worklist (mp=3D0xfffff80004120000, full=3D0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1600 #26 0xffffffff807a580f in softdep_flush (addr=3D0xfffff80004120000) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1402 #27 0xffffffff80556f8c in fork_exit (callout=3D0xffffffff807a5720 , arg=3D0xfffff80004120000, frame=3D0xfffffe0000585c00) at /usr/src/sys/kern/kern_fork.c:1080 #28 The above explains why mksnap_ffs is halted; however the whole machine is hanged. It seems almost any user thread (e.g. an "ls" *on a different filesystem*) = is stuck in bufspace_wait. Neither buf_daemon thread, nor its child bufspace_daemon are stuck (they are running their normal loop). So far I wasn't able to pinpoint what changed between 11.4 and 12.1 to cause this. Any hint appreciated. --=20 You are receiving this mail because: You are the assignee for the bug.=