Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 May 2021 00:00:23 GMT
From:      Kirk McKusick <mckusick@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org
Subject:   git: 9a2fac6ba65f - main - Fix handling of embedded symbolic links (and history lesson).
Message-ID:  <202105170000.14H00NgY054353@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch main has been updated by mckusick:

URL: https://cgit.FreeBSD.org/src/commit/?id=9a2fac6ba65fbd14d37ccedbc2aec27a190128ea

commit 9a2fac6ba65fbd14d37ccedbc2aec27a190128ea
Author:     Kirk McKusick <mckusick@FreeBSD.org>
AuthorDate: 2021-05-17 00:02:42 +0000
Commit:     Kirk McKusick <mckusick@FreeBSD.org>
CommitDate: 2021-05-17 00:04:11 +0000

    Fix handling of embedded symbolic links (and history lesson).
    
    The original filesystem release (4.2BSD) had no embedded sysmlinks.
    Historically symbolic links were just a different type of file, so
    the content of the symbolic link was contained in a single disk block
    fragment. We observed that most symbolic links were short enough that
    they could fit in the area of the inode that normally holds the block
    pointers. So we created embedded symlinks where the content of the
    link was held in the inode's pointer area thus avoiding the need to
    seek and read a data fragment and reducing the pressure on the block
    cache. At the time we had only UFS1 with 32-bit block pointers,
    so the test for a fastlink was:
    
            di_size < (NDADDR + NIADDR) * sizeof(daddr_t)
    
    (where daddr_t would be ufs1_daddr_t today).
    
    When embedded symlinks were added, a spare field in the superblock
    with a known zero value became fs_maxsymlinklen. New filesystems
    set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded
    symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus
    filesystems that preceeded this change always read from blocks
    (since fs->fs_maxsymlinklen == 0) and newer ones used embedded
    symlinks if they fit. Similarly symlinks created on pre-embedded
    symlink filesystems always spill into blocks while newer ones will
    embed if they fit.
    
    At the same time that the embedded symbolic links were added, the
    on-disk directory structure was changed splitting the former
    u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen.
    Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can
    be used to distinguish old directory formats. In retrospect that
    should have just been an added flag, but we did not realize we
    needed to know about that change until it was already in production.
    
    Code was split into ufs/ffs so that the log structured filesystem could
    use ufs functionality while doing its own disk layout. This meant
    that no ffs superblock fields could be used in the ufs code. Thus
    ffs superblock fields that were needed in ufs code had to be copied
    to fields in the mount structure. Since ufs_readlink needed to know
    if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen.
    
    The kernel panic that arose to making this fix was triggered when a
    disk error created an inode of type symlink with no allocated data
    blocks but a large size. When readlink was called the uiomove was
    attempted which segment faulted.
    
    static int
    ufs_readlink(ap)
            struct vop_readlink_args /* {
                    struct vnode *a_vp;
                    struct uio *a_uio;
                    struct ucred *a_cred;
            } */ *ap;
    {
            struct vnode *vp = ap->a_vp;
            struct inode *ip = VTOI(vp);
            doff_t isize;
    
            isize = ip->i_size;
            if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
                DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
                    return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
            }
            return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));
    }
    
    The second part of the "if" statement that adds
    
            DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
    
    is problematic. It never appeared in BSD released by Berkeley because
    as noted above mnt_maxsymlinklen is 0 for old format filesystems, so
    will always fall through to the VOP_READ as it should. I had to dig
    back through `git blame' to find that Rodney Grimes added it as
    part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.''
    He must have brought it across from an earlier FreeBSD. Unfortunately
    the source-control logs for FreeBSD up to the merger with the
    AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the
    agreement to let FreeBSD remain unencumbered, so I cannot pin-point
    where that line got added on the FreeBSD side.
    
    The one change needed here is that mnt_maxsymlinklen is declared as
    an `int' and should be changed to be `u_int64_t'.
    
    This discovery led us to check out the code that deletes symbolic
    links. Specifically
    
            if (vp->v_type == VLNK &&
                (ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
                 datablocks == 0)) {
                    if (length != 0)
                            panic("ffs_truncate: partial truncate of symlink");
                    bzero(SHORTLINK(ip), (u_int)ip->i_size);
                    ip->i_size = 0;
                    DIP_SET(ip, i_size, 0);
                    UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE);
                    if (needextclean)
                            goto extclean;
                    return (ffs_update(vp, waitforupdate));
            }
    
    Here too our broken symlink inode with no data blocks allocated
    and a large size will segment fault as we are incorrectly using the
    test that we have no data blocks to decide that it is an embdedded
    symbolic link and attempting to bzero past the end of the inode.
    The test for datablocks == 0 is unnecessary as the test for
    ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right
    thing in all cases.
    
    The test for datablocks == 0 was added by David Greenman in this commit:
    
    Author: David Greenman <dg@FreeBSD.org>
    Date:   Tue Aug 2 13:51:05 1994 +0000
    
        Completed (hopefully) the kernel support for old style "fastlinks".
    
        Notes:
            svn path=/head/; revision=1821
    
    I am guessing that he likely earlier added the incorrect test in the
    ufs_readlink code.
    
    I asked David if he had any recollection of why he made this change.
    Amazingly, he still had a recollection of why he had made a one-line
    change more than twenty years ago. And unsurpisingly it was because
    he had been stuck between a rock and a hard place.
    
    FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code
    base. Prior to that, there were three years of development in all
    areas of the kernel, including the filesystem code, from the combined
    set of people including Bill Jolitz, Patchkit contributors, and
    FreeBSD Project members. The compatibility issue at hand was caused
    by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite
    changes David had to find a way to provide compatibility with both
    the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite.
    He felt that these changes would provide compatibility with both systems.
    
    In his words:
    ``My recollection is that the 'FASTLINKS' symlinks support in
    FreeBSD-1.x, as implemented by Curt Mayer, worked differently than
    4.4BSD. He used a spare field in the inode to duplicately store the
    length. When the 4.4BSD-Lite merge was done, the optimized symlinks
    support for existing filesystems (those that were initialized in
    FreeBSD-1.x) were broken due to the FFS on-disk structure of
    4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to
    restore the backward compatibility with FreeBSD-1.x filesystems.
    I think it was the best that could be done in the somewhat urgent
    circumstances of the post Berkeley-USL settlement. Also, regarding
    Rod's massive commit with little explanation, some context: John
    Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to
    the 386 platform in just 10 days. It was by far the most intense
    hacking effort of my life. In addition to the porting of tons of
    FreeBSD-1 code, I think we wrote more than 30,000 lines of new code
    in that time to deal with the missing pieces and architectural
    changes of 4.4BSD-Lite. We didn't make many notes along the way.
    There was a lot of pressure to get something out to the rest of the
    developer community as fast as possible, so detailed discrete commits
    didn't happen - it all came as a giant wad, which is why Rod's
    commit message was worded the way it was.''
    
    Reported by:  Chuck Silvers
    Tested by:    Chuck Silvers
    History by:   David Greenman Lawrence
    MFC after:    1 week
    Sponsored by: Netflix
---
 sys/kern/vfs_subr.c     | 3 ++-
 sys/sys/mount.h         | 2 +-
 sys/ufs/ffs/ffs_inode.c | 4 +---
 sys/ufs/ufs/ufs_vnops.c | 4 +---
 4 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 2be3a81ad799..468bd21dae21 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -4469,7 +4469,8 @@ DB_SHOW_COMMAND(mount, db_show_mount)
 	    mp->mnt_lazyvnodelistsize);
 	db_printf("    mnt_writeopcount = %d (with %d in the struct)\n",
 	    vfs_mount_fetch_counter(mp, MNT_COUNT_WRITEOPCOUNT), mp->mnt_writeopcount);
-	db_printf("    mnt_maxsymlinklen = %d\n", mp->mnt_maxsymlinklen);
+	db_printf("    mnt_maxsymlinklen = %jd\n",
+	    (uintmax_t)mp->mnt_maxsymlinklen);
 	db_printf("    mnt_iosize_max = %d\n", mp->mnt_iosize_max);
 	db_printf("    mnt_hashseed = %u\n", mp->mnt_hashseed);
 	db_printf("    mnt_lockref = %d (with %d in the struct)\n",
diff --git a/sys/sys/mount.h b/sys/sys/mount.h
index a6d750a1ff37..f341370ecd86 100644
--- a/sys/sys/mount.h
+++ b/sys/sys/mount.h
@@ -224,7 +224,7 @@ struct mount {
 	int		mnt_writeopcount;	/* (i) write syscalls pending */
 	struct vfsoptlist *mnt_opt;		/* current mount options */
 	struct vfsoptlist *mnt_optnew;		/* new options passed to fs */
-	int		mnt_maxsymlinklen;	/* max size of short symlink */
+	uint64_t	mnt_maxsymlinklen;	/* max size of short symlink */
 	struct statfs	mnt_stat;		/* cache of filesystem stats */
 	struct ucred	*mnt_cred;		/* credentials of mounter */
 	void *		mnt_data;		/* private data */
diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
index 5598c7ba30e8..8fe7f7ab97de 100644
--- a/sys/ufs/ffs/ffs_inode.c
+++ b/sys/ufs/ffs/ffs_inode.c
@@ -329,9 +329,7 @@ ffs_truncate(vp, length, flags, cred)
 	}
 	if ((flags & IO_NORMAL) == 0)
 		return (0);
-	if (vp->v_type == VLNK &&
-	    (ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
-	     datablocks == 0)) {
+	if (vp->v_type == VLNK && ip->i_size < vp->v_mount->mnt_maxsymlinklen) {
 #ifdef INVARIANTS
 		if (length != 0)
 			panic("ffs_truncate: partial truncate of symlink");
diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c
index 301c583291d1..70bf1a1d9036 100644
--- a/sys/ufs/ufs/ufs_vnops.c
+++ b/sys/ufs/ufs/ufs_vnops.c
@@ -2458,10 +2458,8 @@ ufs_readlink(ap)
 	doff_t isize;
 
 	isize = ip->i_size;
-	if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
-	    DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
+	if (isize < vp->v_mount->mnt_maxsymlinklen)
 		return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
-	}
 	return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));
 }
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202105170000.14H00NgY054353>