From nobody Fri May 23 13:04:45 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4b3lk63k10z5xDMJ; Fri, 23 May 2025 13:04:46 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4b3lk62Hkcz47nN; Fri, 23 May 2025 13:04:46 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1748005486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wJbtihOrTT02LOg2BKpl/Mw4LYr5gip1jYi36Vv0zSA=; b=bW9vGMadoqXjaP7r6RWAr5cpUwF+qhLaA5wcmntiEznB3I46FSjiMyvBGDtQsVUVth7o5e 2wBmqOvSaNYH4KMh6o17fdoaagN0qS+mFveZMP+WHJTYj3QLrPegn+fSsuVAxVv75APiu3 DzrntruxvyqTAgI5okdDYny9g3ZqVdXGr5xgOiuoELqv4ob14+St71lMLeqHEdv7zvFIqo aFeM/Y7HPecuDSL+3whMsNf9luG21/Im6WhYPMVndFjZQ0Vp+g1uro34cV/NdBGSsSCEvc jUvvH8/SDCDjOyaOv3fvtXE0MNhE6YO7Na9JT/JQer+V0BvmJIIww3xAXKNzdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1748005486; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=wJbtihOrTT02LOg2BKpl/Mw4LYr5gip1jYi36Vv0zSA=; b=RA2dxWsSxi1Yn9H9PXiHlxyAeOXN9zS6fHTwV/1Rmj28wlXFqTcyWiLg/KGh68VBT2NpBi zD4smaU2rL2bRHtBhKJCtOzgRbxq4wvi2THPS4/6nvk8MhH00FypyLtygRxJLrcxNiN/6E 8VUlm77qU1kTcATQJGlwxgTQFTLyym2/vTCUvGOJ5Qa5GWf/h8PmVRoyMUmRiA0+rU9fex 3us8Wfg0M0MThbWO397BQq0cu2Uj1PE/ulyKK7CEmhYsevbpQO6802feASDf1koySgHJfK XoRMztCCSNPxDzGy1Wnoai+WWsURWCTaeipaXBFXD1mEHZbzT0l/5EyKX8jE4w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1748005486; a=rsa-sha256; cv=none; b=xpq7zfKtGVwGVuLpWzLxzznXKXeYnHDQuVezhgQxhCr7TGxiCrxDYvw+1M1ooMVfDdo9/9 E/WKqvgu3lLYNkHU66OnpY7PZujyzhaf0otRE7nq5z6Q7Ou677y6im9Aoidfdf5HM1rixE xLKLcTZRHd2wUobta3JNUS74LNSmRsYFalCJeQn0S4SHQSJQ4/fU8oPXpime7DxLGtsql7 3ToBRke9LX4Bed/5b5HPNsLmcXhN/zvFoCK4gR4wcwwmiNZmbDARzWs15Vq3yuI38atlPH 0TtGxRTvbO//RQBBARmM983qAjGxbM/YnHbpR78TU9FdL+usCwEWeWP6nQa4Gw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4b3lk61BHHz16ST; Fri, 23 May 2025 13:04:46 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 54ND4jv8044849; Fri, 23 May 2025 13:04:45 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 54ND4jMD044846; Fri, 23 May 2025 13:04:45 GMT (envelope-from git) Date: Fri, 23 May 2025 13:04:45 GMT Message-Id: <202505231304.54ND4jMD044846@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Mark Johnston Subject: git: 7587f6d4840f - main - namei: Make stackable filesystems check harder for jail roots List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 7587f6d4840f8d363e457cddc14c184cf1fe7cc1 Auto-Submitted: auto-generated The branch main has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=7587f6d4840f8d363e457cddc14c184cf1fe7cc1 commit 7587f6d4840f8d363e457cddc14c184cf1fe7cc1 Author: Mark Johnston AuthorDate: 2025-05-23 12:52:24 +0000 Commit: Mark Johnston CommitDate: 2025-05-23 13:03:38 +0000 namei: Make stackable filesystems check harder for jail roots Suppose a process has its cwd pointing to a nullfs directory, where the lower directory is also visible in the jail's filesystem namespace. Suppose that the lower directory vnode is moved out from under the nullfs mount. The nullfs vnode still shadows the lower vnode, and dotdot lookups relative to that directory will instantiate new nullfs vnodes outside of the nullfs mountpoint, effectively shadowing the lower filesystem. This phenomenon can be abused to escape a chroot, since the nullfs vnodes instantiated by these dotdot lookups defeat the root vnode check in vfs_lookup(), which uses vnode pointer equality to test for the process root. Fix this by extending nullfs and unionfs to perform the same check, exploiting the fact that the passed componentname is embedded in a nameidata structure to avoid changing the VOP_LOOKUP interface. That is, add a flag to indicate that containerof can be used to get the full nameidata structure, and perform the root vnode check on the lower vnode when performing a dotdot lookup. PR: 262180 Reviewed by: olce, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50418 --- share/man/man9/namei.9 | 9 +++++++++ sys/fs/nullfs/null_vnops.c | 28 ++++++++++++++++++---------- sys/fs/unionfs/union_vnops.c | 21 +++++++++++++++++++++ sys/kern/vfs_cache.c | 11 +---------- sys/kern/vfs_lookup.c | 41 ++++++++++++++++++++++++++++++----------- sys/sys/namei.h | 5 ++++- 6 files changed, 83 insertions(+), 32 deletions(-) diff --git a/share/man/man9/namei.9 b/share/man/man9/namei.9 index 3f2f43e94cd1..5bdffce8c360 100644 --- a/share/man/man9/namei.9 +++ b/share/man/man9/namei.9 @@ -281,6 +281,15 @@ The flag is used to implement .Dv O_RESOLVE_BENEATH flag for .Xr openat 2 . +.It Dv NAMEILOOKUP +The component is embedded in a +.Nm +lookup structure, and the +.Fn vfs_lookup_nameidata +function can be used to obtain that structure. +This can be useful in +.Xr VOP_LOOKUP 9 +implementations which need to obtain extra lookup metadata. .El .Sh PARAMETERS DESCRIPTORS FLAGS These flags are used for several purposes. diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index a6b15c1dfa78..8608216e10e5 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -403,17 +403,25 @@ null_lookup(struct vop_lookup_args *ap) /* * Renames in the lower mounts might create an inconsistent - * configuration where lower vnode is moved out of the - * directory tree remounted by our null mount. Do not try to - * handle it fancy, just avoid VOP_LOOKUP() with DOTDOT name - * which cannot be handled by VOP, at least passing over lower - * root. + * configuration where lower vnode is moved out of the directory tree + * remounted by our null mount. + * + * Do not try to handle it fancy, just avoid VOP_LOOKUP() with DOTDOT + * name which cannot be handled by the VOP. */ - if ((ldvp->v_vflag & VV_ROOT) != 0 && (flags & ISDOTDOT) != 0) { - KASSERT((dvp->v_vflag & VV_ROOT) == 0, - ("ldvp %p fl %#x dvp %p fl %#x flags %#jx", - ldvp, ldvp->v_vflag, dvp, dvp->v_vflag, (uintmax_t)flags)); - return (ENOENT); + if ((flags & ISDOTDOT) != 0) { + struct nameidata *ndp; + + if ((ldvp->v_vflag & VV_ROOT) != 0) { + KASSERT((dvp->v_vflag & VV_ROOT) == 0, + ("ldvp %p fl %#x dvp %p fl %#x flags %#jx", + ldvp, ldvp->v_vflag, dvp, dvp->v_vflag, + (uintmax_t)flags)); + return (ENOENT); + } + ndp = vfs_lookup_nameidata(cnp); + if (ndp != NULL && vfs_lookup_isroot(ndp, ldvp)) + return (ENOENT); } /* diff --git a/sys/fs/unionfs/union_vnops.c b/sys/fs/unionfs/union_vnops.c index e1048e4ba7ab..a930e3921ab3 100644 --- a/sys/fs/unionfs/union_vnops.c +++ b/sys/fs/unionfs/union_vnops.c @@ -76,6 +76,21 @@ VNASSERT(((vp)->v_op == &unionfs_vnodeops), vp, \ ("%s: non-unionfs vnode", __func__)) +static bool +unionfs_lookup_isroot(struct componentname *cnp, struct vnode *dvp) +{ + struct nameidata *ndp; + + if (dvp == NULL) + return (false); + if ((dvp->v_vflag & VV_ROOT) != 0) + return (true); + ndp = vfs_lookup_nameidata(cnp); + if (ndp == NULL) + return (false); + return (vfs_lookup_isroot(ndp, dvp)); +} + static int unionfs_lookup(struct vop_cachedlookup_args *ap) { @@ -149,6 +164,12 @@ unionfs_lookup(struct vop_cachedlookup_args *ap) goto unionfs_lookup_return; } + if (unionfs_lookup_isroot(cnp, udvp) || + unionfs_lookup_isroot(cnp, ldvp)) { + error = ENOENT; + goto unionfs_lookup_return; + } + if (udvp != NULLVP) dtmpvp = udvp; else diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index ff671e80b803..f2368877ec93 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -5276,7 +5276,6 @@ cache_fplookup_dotdot(struct cache_fpl *fpl) struct componentname *cnp; struct namecache *ncp; struct vnode *dvp; - struct prison *pr; u_char nc_flag; ndp = fpl->ndp; @@ -5288,15 +5287,7 @@ cache_fplookup_dotdot(struct cache_fpl *fpl) /* * XXX this is racy the same way regular lookup is */ - for (pr = cnp->cn_cred->cr_prison; pr != NULL; - pr = pr->pr_parent) - if (dvp == pr->pr_root) - break; - - if (dvp == ndp->ni_rootdir || - dvp == ndp->ni_topdir || - dvp == rootvnode || - pr != NULL) { + if (vfs_lookup_isroot(ndp, dvp)) { fpl->tvp = dvp; fpl->tvp_seqc = vn_seqc_read_any(dvp); if (seqc_in_modify(fpl->tvp_seqc)) { diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 127d068f1fff..86c7bdaa02c0 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -611,12 +611,12 @@ namei(struct nameidata *ndp) } #endif ndp->ni_cnd.cn_cred = td->td_ucred; - KASSERT(ndp->ni_resflags == 0, ("%s: garbage in ni_resflags: %x\n", + KASSERT(ndp->ni_resflags == 0, ("%s: garbage in ni_resflags: %x", __func__, ndp->ni_resflags)); KASSERT(cnp->cn_cred && td->td_proc, ("namei: bad cred/proc")); KASSERT((cnp->cn_flags & NAMEI_INTERNAL_FLAGS) == 0, - ("namei: unexpected flags: %" PRIx64 "\n", - cnp->cn_flags & NAMEI_INTERNAL_FLAGS)); + ("namei: unexpected flags: %#jx", + (uintmax_t)(cnp->cn_flags & NAMEI_INTERNAL_FLAGS))); if (cnp->cn_flags & NOCACHE) KASSERT(cnp->cn_nameiop != LOOKUP, ("%s: NOCACHE passed with LOOKUP", __func__)); @@ -862,6 +862,30 @@ bad: return (error); } +struct nameidata * +vfs_lookup_nameidata(struct componentname *cnp) +{ + if ((cnp->cn_flags & NAMEILOOKUP) == 0) + return (NULL); + return (__containerof(cnp, struct nameidata, ni_cnd)); +} + +/* + * Would a dotdot lookup relative to dvp cause this lookup to cross a jail or + * chroot boundary? + */ +bool +vfs_lookup_isroot(struct nameidata *ndp, struct vnode *dvp) +{ + for (struct prison *pr = ndp->ni_cnd.cn_cred->cr_prison; pr != NULL; + pr = pr->pr_parent) { + if (dvp == pr->pr_root) + return (true); + } + return (dvp == ndp->ni_rootdir || dvp == ndp->ni_topdir || + dvp == rootvnode); +} + /* * FAILIFEXISTS handling. * @@ -1020,7 +1044,6 @@ vfs_lookup(struct nameidata *ndp) char *lastchar; /* location of the last character */ struct vnode *dp = NULL; /* the directory we are searching */ struct vnode *tdp; /* saved dp */ - struct prison *pr; size_t prev_ni_pathlen; /* saved ndp->ni_pathlen */ int docache; /* == 0 do not cache last component */ int wantparent; /* 1 => wantparent or lockparent flag */ @@ -1206,13 +1229,9 @@ dirloop: goto bad; } for (;;) { - for (pr = cnp->cn_cred->cr_prison; pr != NULL; - pr = pr->pr_parent) - if (dp == pr->pr_root) - break; - bool isroot = dp == ndp->ni_rootdir || - dp == ndp->ni_topdir || dp == rootvnode || - pr != NULL; + bool isroot; + + isroot = vfs_lookup_isroot(ndp, dp); if (__predict_false(isroot && (ndp->ni_lcf & (NI_LCF_STRICTREL | NI_LCF_STRICTREL_KTR)) != 0)) { if ((ndp->ni_lcf & NI_LCF_STRICTREL_KTR) != 0) diff --git a/sys/sys/namei.h b/sys/sys/namei.h index bbaa71f629e6..4a16ec923bd8 100644 --- a/sys/sys/namei.h +++ b/sys/sys/namei.h @@ -152,6 +152,7 @@ int cache_fplookup(struct nameidata *ndp, enum cache_fpl_status *status, #define LOCKSHARED 0x0100 /* Shared lock leaf */ #define NOFOLLOW 0x0000 /* do not follow symbolic links (pseudo) */ #define RBENEATH 0x100000000ULL /* No escape, even tmp, from start dir */ +#define NAMEILOOKUP 0x200000000ULL /* cnp is embedded in nameidata */ #define MODMASK 0xf000001ffULL /* mask of operational modifiers */ /* @@ -249,7 +250,7 @@ do { \ NDINIT_PREFILL(_ndp); \ NDINIT_DBG(_ndp); \ _ndp->ni_cnd.cn_nameiop = op; \ - _ndp->ni_cnd.cn_flags = flags; \ + _ndp->ni_cnd.cn_flags = (flags) | NAMEILOOKUP; \ _ndp->ni_segflg = segflg; \ _ndp->ni_dirp = namep; \ _ndp->ni_dirfd = dirfd; \ @@ -286,6 +287,8 @@ do { \ int namei(struct nameidata *ndp); int vfs_lookup(struct nameidata *ndp); +bool vfs_lookup_isroot(struct nameidata *ndp, struct vnode *dvp); +struct nameidata *vfs_lookup_nameidata(struct componentname *cnp); int vfs_relookup(struct vnode *dvp, struct vnode **vpp, struct componentname *cnp, bool refstart);