From nobody Fri Oct 3 19:22:21 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4cddpQ1gm7z6906d; Fri, 03 Oct 2025 19:22:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R12" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4cddpQ0FRDz3hb1; Fri, 03 Oct 2025 19:22:22 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1759519342; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rqTKjU67RxWrwcdxmdTYi9XA5hqdB5DceML+JJLOeRw=; b=u4rkOwF1Qo9E7YmBAPd0RxoqyX+xRQFoo3QEMQNXzuDY3vdBDsj0EudqtgQW1O5EfqF6QY I7SPpeWSDkwmN2jE3VqbvF4w6OhM++OUg5KeU/1rCKflz01DU39rkNK9Hdj9uAYFXepSBX 3Zv0A9wKhhsg8R6QpoOtTzvdgsv3EdRDC9wdj6ESKQfCjp1pmOD6rNFohdrK6x/rc+jvHF M+kqpACmbTq1bgPFF+ddgaqNmuRdmlUiMd9s0LOxnzrhEm5hvNbIVAEppkAOmdg5rz9Uky lRrmLrPzcql8gq2dqY2DW3nOHAkyo4aJjEGtT97qEpt8jfgBGCrvmdbFAQrVHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1759519342; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rqTKjU67RxWrwcdxmdTYi9XA5hqdB5DceML+JJLOeRw=; b=IVDMsIoOLx72wwudbwhSlgaHKVIHen9l6iuXiNAX9WHsZt4MWzFdNcygkWLZte/yqRh3JN FWo1nAlwCKyleRITeGlUQNJlm4xo+ttirourgfk9B2rCl5Pit7GFq0igr48VUIFTCPwcHu Xll4C/rPFbWQDUM4NXBQsHXugQDIk1fras4qKZyMjsuEUFSbTosglzKDTO9oLJSzwj7N4W 1E/atO5REGN+BagJHu6J3LceU6I+/tl2anatTqnq8g3BmZRjD06i2qWEIP4UXR8yikcQk7 +2M7vL0Ih/NXDeTjCFdLNZ5uLSH2twV7Uw3q0EUD9SZ3fLskfFvN+XaDsFKMAA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1759519342; a=rsa-sha256; cv=none; b=KxcsCrzgkkzU4q5qygKK+O+MNTx4sAotUf36caKuMEZkM8ZiqR79awFPfdaNBjIIefiGiY oxXTI/tfi01jw7HUDhT+oT+yMKMoTPX+2ppJo/Nb3XVUS7EYP0As1avlj0D8EvOHh4GlrC 8RyHvNgYOPCAc2uDhAhm+VUUzpS/QvFQpuM/386VI6YxKvbkuEkT9pwCuDyoHVUDvQKvuh Kvhcopir1OPOkaJ66LV8RtfrWqUI2XEy9XhT1n6ktUbVwPqagO9zUDouSobjJsej7R7OXT Z6fsERWxEU28RWbUGWWP8T8xlLfDDhIKdKKOaWKnSFzlDfdE14q+VhkgcfPepw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4cddpP6hSQz16Ps; Fri, 03 Oct 2025 19:22:21 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 593JML7v047404; Fri, 3 Oct 2025 19:22:21 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 593JMLdx047401; Fri, 3 Oct 2025 19:22:21 GMT (envelope-from git) Date: Fri, 3 Oct 2025 19:22:21 GMT Message-Id: <202510031922.593JMLdx047401@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Mateusz Guzik Subject: git: 641a58239520 - main - nullfs: avoid the interlock in null_lock with smr List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: mjg X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 641a58239520de9fc5a9077e9a709481cfc75dc0 Auto-Submitted: auto-generated The branch main has been updated by mjg: URL: https://cgit.FreeBSD.org/src/commit/?id=641a58239520de9fc5a9077e9a709481cfc75dc0 commit 641a58239520de9fc5a9077e9a709481cfc75dc0 Author: Mateusz Guzik AuthorDate: 2025-10-01 10:06:39 +0000 Commit: Mateusz Guzik CommitDate: 2025-10-03 19:16:21 +0000 nullfs: avoid the interlock in null_lock with smr This largely eliminates contention on the vnode interlock. Note this still does not scale, to be fixed(tm). Reviewed by: kib Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D38761 --- sys/fs/nullfs/null.h | 1 + sys/fs/nullfs/null_vnops.c | 152 ++++++++++++++++++++++++++++----------------- 2 files changed, 95 insertions(+), 58 deletions(-) diff --git a/sys/fs/nullfs/null.h b/sys/fs/nullfs/null.h index dd6cb4f71f07..aa7a689bec34 100644 --- a/sys/fs/nullfs/null.h +++ b/sys/fs/nullfs/null.h @@ -64,6 +64,7 @@ struct null_node { #define MOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)->mnt_data)) #define VTONULL(vp) ((struct null_node *)(vp)->v_data) +#define VTONULL_SMR(vp) ((struct null_node *)vn_load_v_data_smr(vp)) #define NULLTOV(xp) ((xp)->null_vnode) int nullfs_init(struct vfsconf *vfsp); diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index dd176b34e4eb..375b6aa27531 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -174,6 +174,8 @@ #include #include #include +#include +#include #include #include #include @@ -185,6 +187,8 @@ #include #include +VFS_SMR_DECLARE; + static int null_bug_bypass = 0; /* for debugging: enables bypass printf'ing */ SYSCTL_INT(_debug, OID_AUTO, nullfs_bug_bypass, CTLFLAG_RW, &null_bug_bypass, 0, ""); @@ -768,75 +772,107 @@ null_rmdir(struct vop_rmdir_args *ap) } /* - * We need to process our own vnode lock and then clear the - * interlock flag as it applies only to our vnode, not the - * vnodes below us on the stack. + * We need to process our own vnode lock and then clear the interlock flag as + * it applies only to our vnode, not the vnodes below us on the stack. + * + * We have to hold the vnode here to solve a potential reclaim race. If we're + * forcibly vgone'd while we still have refs, a thread could be sleeping inside + * the lowervp's vop_lock routine. When we vgone we will drop our last ref to + * the lowervp, which would allow it to be reclaimed. The lowervp could then + * be recycled, in which case it is not legal to be sleeping in its VOP. We + * prevent it from being recycled by holding the vnode here. */ +static struct vnode * +null_lock_prep_with_smr(struct vop_lock1_args *ap) +{ + struct null_node *nn; + struct vnode *lvp; + + vfs_smr_enter(); + + lvp = NULL; + + nn = VTONULL_SMR(ap->a_vp); + if (__predict_true(nn != NULL)) { + lvp = nn->null_lowervp; + if (lvp != NULL && !vhold_smr(lvp)) + lvp = NULL; + } + + vfs_smr_exit(); + return (lvp); +} + +static struct vnode * +null_lock_prep_with_interlock(struct vop_lock1_args *ap) +{ + struct null_node *nn; + struct vnode *lvp; + + ASSERT_VI_LOCKED(ap->a_vp, __func__); + + ap->a_flags &= ~LK_INTERLOCK; + + lvp = NULL; + + nn = VTONULL(ap->a_vp); + if (__predict_true(nn != NULL)) { + lvp = nn->null_lowervp; + if (lvp != NULL) + vholdnz(lvp); + } + VI_UNLOCK(ap->a_vp); + return (lvp); +} + static int null_lock(struct vop_lock1_args *ap) { - struct vnode *vp = ap->a_vp; - int flags; - struct null_node *nn; struct vnode *lvp; - int error; + int error, flags; - if ((ap->a_flags & LK_INTERLOCK) == 0) - VI_LOCK(vp); - else - ap->a_flags &= ~LK_INTERLOCK; - flags = ap->a_flags; - nn = VTONULL(vp); + if (__predict_true((ap->a_flags & LK_INTERLOCK) == 0)) { + lvp = null_lock_prep_with_smr(ap); + if (__predict_false(lvp == NULL)) { + VI_LOCK(ap->a_vp); + lvp = null_lock_prep_with_interlock(ap); + } + } else { + lvp = null_lock_prep_with_interlock(ap); + } + + ASSERT_VI_UNLOCKED(ap->a_vp, __func__); + + if (__predict_false(lvp == NULL)) + return (vop_stdlock(ap)); + + VNPASS(lvp->v_holdcnt > 0, lvp); + error = VOP_LOCK(lvp, ap->a_flags); /* - * If we're still active we must ask the lower layer to - * lock as ffs has special lock considerations in its - * vop lock. + * We might have slept to get the lock and someone might have + * clean our vnode already, switching vnode lock from one in + * lowervp to v_lock in our own vnode structure. Handle this + * case by reacquiring correct lock in requested mode. */ - if (nn != NULL && (lvp = NULLVPTOLOWERVP(vp)) != NULL) { - /* - * We have to hold the vnode here to solve a potential - * reclaim race. If we're forcibly vgone'd while we - * still have refs, a thread could be sleeping inside - * the lowervp's vop_lock routine. When we vgone we will - * drop our last ref to the lowervp, which would allow it - * to be reclaimed. The lowervp could then be recycled, - * in which case it is not legal to be sleeping in its VOP. - * We prevent it from being recycled by holding the vnode - * here. - */ - vholdnz(lvp); - VI_UNLOCK(vp); - error = VOP_LOCK(lvp, flags); - - /* - * We might have slept to get the lock and someone might have - * clean our vnode already, switching vnode lock from one in - * lowervp to v_lock in our own vnode structure. Handle this - * case by reacquiring correct lock in requested mode. - */ - if (VTONULL(vp) == NULL && error == 0) { - ap->a_flags &= ~LK_TYPE_MASK; - switch (flags & LK_TYPE_MASK) { - case LK_SHARED: - ap->a_flags |= LK_SHARED; - break; - case LK_UPGRADE: - case LK_EXCLUSIVE: - ap->a_flags |= LK_EXCLUSIVE; - break; - default: - panic("Unsupported lock request %d\n", - ap->a_flags); - } - VOP_UNLOCK(lvp); - error = vop_stdlock(ap); + if (VTONULL(ap->a_vp) == NULL && error == 0) { + flags = ap->a_flags; + ap->a_flags &= ~LK_TYPE_MASK; + switch (flags & LK_TYPE_MASK) { + case LK_SHARED: + ap->a_flags |= LK_SHARED; + break; + case LK_UPGRADE: + case LK_EXCLUSIVE: + ap->a_flags |= LK_EXCLUSIVE; + break; + default: + panic("Unsupported lock request %d\n", + flags); } - vdrop(lvp); - } else { - VI_UNLOCK(vp); + VOP_UNLOCK(lvp); error = vop_stdlock(ap); } - + vdrop(lvp); return (error); }