Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Feb 2023 19:16:22 +0100
From:      Maxim Sobolev <sobomax@freebsd.org>
To:        =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= <des@freebsd.org>
Cc:        src-committers <src-committers@freebsd.org>, dev-commits-src-all@freebsd.org,  dev-commits-src-main@freebsd.org
Subject:   Re: git: 69d94f4c7608 - main - Add tarfs, a filesystem backed by tarballs.
Message-ID:  <CAH7qZfsBdcsV9GqjUpKqhQ%2Bbk8q73GcaHM=9Bdwf34fziLwxuw@mail.gmail.com>
In-Reply-To: <202302021720.312HKDQG099212@gitrepo.freebsd.org>
References:  <202302021720.312HKDQG099212@gitrepo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000931c1e05f3cfae44
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Wow, cool, thank you so much! It feels like Christmas again. >:-) I see
this being immediately useful for anyone building a custom system to
replace uzip+ufs combo, or other similar methods for creating read-only
compressed storage containers!

Just curious has you done some performance testing? Something like
"worldstone" but with /usr/src mounted off tar archive vs. "normal" UFS
would be interesting to see.

Also, has any, even cursory, security audit been done on tar processing
routines? Of course with functionality being opt-in the onus is on the user
to make sure only tars obtained from trusted sources are used and in a way
that protects tar file content from modification by unprivileged users.
However, it won't protect us from FreeBSD looking bad in public eyes, if
some high-profile institutional user of FreeBSD is breached by exploiting
some of the vulnerability in this code few years down the line when it hits
RELENG branch.

At the very least, some big, fat warning can be added into the man page to
notify an user about the code being somewhat fresh and not on par
quality-wise with something like UFS or ZFS. Plus providing some tips on
best practices on how to reduce exposure when tarfs is used (nosuid mount,
proper tar file permissions, trusted sources etc).

This is of course all hypothetical, but given the history of buffer/integer
overflows etc in handling user-supplied data in simple syscalls operating
on structures of 1-2 orders of magnitude smaller size and lower complexity,
I find it unlikely that fresh-off-the-mill tar code won't have any.
Perhaps, some automated fuzzing approach can be employed to see if it can
crash kernel by giving it a slightly corrupted but otherwise valid tar
file? If Juniper sponsored the development of this feature I suspect they
may not be the ones least interested to make sure using it won't compromise
security of their products. Pure speculation of course on my par, but
pretty reasonable at that.

Anyhow, just my few Canadian cents on the topic, while it fresh. Thanks
again for anyone involved to make this available. I look forward to get my
hands on it as soon as soon as I get back from FOSDEM, if not sooner.

-Max

On Thu, Feb 2, 2023, 6:20 PM Dag-Erling Sm=C3=B8rgrav <des@freebsd.org> wro=
te:

> The branch main has been updated by des:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=3D69d94f4c7608e415059965593674507=
06e91fbb8
>
> commit 69d94f4c7608e41505996559367450706e91fbb8
> Author:     Dag-Erling Sm=C3=B8rgrav <des@FreeBSD.org>
> AuthorDate: 2023-02-02 17:18:41 +0000
> Commit:     Dag-Erling Sm=C3=B8rgrav <des@FreeBSD.org>
> CommitDate: 2023-02-02 17:19:29 +0000
>
>     Add tarfs, a filesystem backed by tarballs.
>
>     Sponsored by:   Juniper Networks, Inc.
>     Sponsored by:   Klara, Inc.
>     Reviewed by:    pauamma, imp
>     Differential Revision:  https://reviews.freebsd.org/D37753
> ---
>  etc/mtree/BSD.tests.dist         |    2 +
>  share/man/man5/Makefile          |    1 +
>  share/man/man5/tarfs.5           |  103 ++++
>  sys/conf/files                   |    4 +
>  sys/conf/options                 |    4 +
>  sys/fs/tarfs/tarfs.h             |  254 +++++++++
>  sys/fs/tarfs/tarfs_dbg.h         |   65 +++
>  sys/fs/tarfs/tarfs_io.c          |  727 +++++++++++++++++++++++
>  sys/fs/tarfs/tarfs_subr.c        |  603 ++++++++++++++++++++
>  sys/fs/tarfs/tarfs_vfsops.c      | 1173
> ++++++++++++++++++++++++++++++++++++++
>  sys/fs/tarfs/tarfs_vnops.c       |  642 +++++++++++++++++++++
>  sys/kern/subr_witness.c          |    6 +
>  sys/modules/Makefile             |    1 +
>  sys/modules/tarfs/Makefile       |   23 +
>  tests/sys/fs/Makefile            |    1 +
>  tests/sys/fs/tarfs/Makefile      |   10 +
>  tests/sys/fs/tarfs/mktar.c       |  238 ++++++++
>  tests/sys/fs/tarfs/tarfs_test.sh |   54 ++
>  18 files changed, 3911 insertions(+)
>
> diff --git a/etc/mtree/BSD.tests.dist b/etc/mtree/BSD.tests.dist
> index 0d05ecaf06fc..b4b18997b7f9 100644
> --- a/etc/mtree/BSD.tests.dist
> +++ b/etc/mtree/BSD.tests.dist
> @@ -757,6 +757,8 @@
>          fs
>              fusefs
>              ..
> +            tarfs
> +            ..
>              tmpfs
>              ..
>          ..
> diff --git a/share/man/man5/Makefile b/share/man/man5/Makefile
> index 2d49d981c2f9..f6e91e4ed00b 100644
> --- a/share/man/man5/Makefile
> +++ b/share/man/man5/Makefile
> @@ -70,6 +70,7 @@ MAN=3D  acct.5 \
>         style.Makefile.5 \
>         style.mdoc.5 \
>         sysctl.conf.5 \
> +       tarfs.5 \
>         tmpfs.5 \
>         unionfs.5
>
> diff --git a/share/man/man5/tarfs.5 b/share/man/man5/tarfs.5
> new file mode 100644
> index 000000000000..b25131c323c1
> --- /dev/null
> +++ b/share/man/man5/tarfs.5
> @@ -0,0 +1,103 @@
> +.\"-
> +.\" SPDX-License-Identifier: BSD-2-Clause
> +.\"
> +.\" Copyright (c) 2022 Klara, Inc.
> +.\"
> +.\" Redistribution and use in source and binary forms, with or without
> +.\" modification, are permitted provided that the following conditions
> +.\" are met:
> +.\" 1. Redistributions of source code must retain the above copyright
> +.\"    notice, this list of conditions and the following disclaimer.
> +.\" 2. Redistributions in binary form must reproduce the above copyright
> +.\"    notice, this list of conditions and the following disclaimer in t=
he
> +.\"    documentation and/or other materials provided with the
> distribution.
> +.\"
> +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' A=
ND
> +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, TH=
E
> +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> +.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE
> LIABLE
> +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
> GOODS
> +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION=
)
> +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> STRICT
> +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN AN=
Y
> WAY
> +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY =
OF
> +.\" SUCH DAMAGE.
> +.\"
> +.Dd February 2, 2023
> +.Dt TARFS 5
> +.Os
> +.Sh NAME
> +.Nm tarfs
> +.Nd tarball filesystem
> +.Sh SYNOPSIS
> +To compile this driver into the kernel, place the following line in
> +your kernel configuration file:
> +.Bd -ragged -offset indent
> +.Cd "options TARFS"
> +.Ed
> +.Pp
> +Alternatively, to load the driver as a module at boot time, place the
> +following line in
> +.Xr loader.conf 5 :
> +.Bd -literal -offset indent
> +tarfs_load=3D"YES"
> +.Ed
> +.Sh DESCRIPTION
> +The
> +.Nm
> +driver implementes a read-only filesystem backed by a
> +.Xr tar 5
> +file.
> +Currently, only POSIX archives, optionally compressed with
> +.Xr zstd 1 ,
> +are supported.
> +.Pp
> +The preferred I/O size for
> +.Nm
> +filesystems can be adjusted using the
> +.Va vfs.tarfs.ioshift
> +sysctl setting and tunable.
> +Setting it to 0 will reset it to its default value.
> +Note that changes to this setting only apply to filesystems mounted
> +after the change.
> +.Sh DIAGNOSTICS
> +If enabled by the
> +.Dv TARFS_DEBUG
> +kernel option, the
> +.Va vfs.tarfs.debug
> +sysctl setting can be used to control debugging output from the
> +.Nm
> +driver.
> +Debugging output for individual sections of the driver can be enabled
> +by adding together the relevant values from the table below.
> +.Bl -column Value Description
> +.It 0x01 Ta Memory allocations
> +.It 0x02 Ta Checksum calculations
> +.It 0x04 Ta Filesystem operations (vfsops)
> +.It 0x08 Ta Path lookups
> +.It 0x10 Ta File operations (vnops)
> +.It 0x20 Ta General I/O
> +.It 0x40 Ta Decompression
> +.It 0x80 Ta Decompression index
> +.It 0x100 Ta Sparse file mapping
> +.El
> +.Sh SEE ALSO
> +.Xr tar 1 ,
> +.Xr zstd 1 ,
> +.Xr fstab 5 ,
> +.Xr tar 5 ,
> +.Xr mount 8 ,
> +.Xr sysctl 8
> +.Sh HISTORY
> +.An -nosplit
> +The
> +.Nm
> +driver was developed by
> +.An Stephen J. Kiernan Aq Mt stevek@FreeBSD.org
> +and
> +.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org
> +for Juniper Networks and Klara Systems.
> +This manual page was written by
> +.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org
> +for Juniper Networks and Klara Systems.
> diff --git a/sys/conf/files b/sys/conf/files
> index 6cb4abcd9223..08966a9b46e4 100644
> --- a/sys/conf/files
> +++ b/sys/conf/files
> @@ -3615,6 +3615,10 @@ fs/smbfs/smbfs_smb.c             optional smbfs
>  fs/smbfs/smbfs_subr.c          optional smbfs
>  fs/smbfs/smbfs_vfsops.c                optional smbfs
>  fs/smbfs/smbfs_vnops.c         optional smbfs
> +fs/tarfs/tarfs_io.c            optional tarfs compile-with "${NORMAL_C}
> -I$S/contrib/zstd/lib/freebsd"
> +fs/tarfs/tarfs_subr.c          optional tarfs
> +fs/tarfs/tarfs_vfsops.c                optional tarfs
> +fs/tarfs/tarfs_vnops.c         optional tarfs
>  fs/udf/osta.c                  optional udf
>  fs/udf/udf_iconv.c             optional udf_iconv
>  fs/udf/udf_vfsops.c            optional udf
> diff --git a/sys/conf/options b/sys/conf/options
> index 1f5003507539..3b2be66ba602 100644
> --- a/sys/conf/options
> +++ b/sys/conf/options
> @@ -265,6 +265,7 @@ NULLFS              opt_dontuse.h
>  PROCFS         opt_dontuse.h
>  PSEUDOFS       opt_dontuse.h
>  SMBFS          opt_dontuse.h
> +TARFS          opt_dontuse.h
>  TMPFS          opt_dontuse.h
>  UDF            opt_dontuse.h
>  UNIONFS                opt_dontuse.h
> @@ -273,6 +274,9 @@ ZFS         opt_dontuse.h
>  # Pseudofs debugging
>  PSEUDOFS_TRACE opt_pseudofs.h
>
> +# Tarfs debugging
> +TARFS_DEBUG    opt_tarfs.h
> +
>  # In-kernel GSS-API
>  KGSSAPI                opt_kgssapi.h
>  KGSSAPI_DEBUG  opt_kgssapi.h
> diff --git a/sys/fs/tarfs/tarfs.h b/sys/fs/tarfs/tarfs.h
> new file mode 100644
> index 000000000000..dffd60ee6d8a
> --- /dev/null
> +++ b/sys/fs/tarfs/tarfs.h
> @@ -0,0 +1,254 @@
> +/*-
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright (c) 2013 Juniper Networks, Inc.
> + * Copyright (c) 2022-2023 Klara, Inc.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in th=
e
> + *    documentation and/or other materials provided with the distributio=
n.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AN=
D
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIAB=
LE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO=
DS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O=
F
> + * SUCH DAMAGE.
> + */
> +
> +#ifndef        _FS_TARFS_TARFS_H_
> +#define        _FS_TARFS_TARFS_H_
> +
> +#ifndef _KERNEL
> +#error Should only be included by kernel
> +#endif
> +
> +MALLOC_DECLARE(M_TARFSMNT);
> +MALLOC_DECLARE(M_TARFSNODE);
> +MALLOC_DECLARE(M_TARFSNAME);
> +
> +#ifdef SYSCTL_DECL
> +SYSCTL_DECL(_vfs_tarfs);
> +#endif
> +
> +struct componentname;
> +struct mount;
> +struct vnode;
> +
> +/*
> + * Internal representation of a tarfs file system node.
> + */
> +struct tarfs_node {
> +       TAILQ_ENTRY(tarfs_node) entries;
> +       TAILQ_ENTRY(tarfs_node) dirents;
> +
> +       struct mtx               lock;
> +
> +       struct vnode            *vnode;
> +       struct tarfs_mount      *tmp;
> +       enum vtype               type;
> +       ino_t                    ino;
> +       off_t                    offset;
> +       size_t                   size;
> +       size_t                   physize;
> +       char                    *name;
> +       size_t                   namelen;
> +
> +       /* Node attributes */
> +       uid_t                    uid;
> +       gid_t                    gid;
> +       mode_t                   mode;
> +       unsigned int             flags;
> +       nlink_t                  nlink;
> +       struct timespec          atime;
> +       struct timespec          mtime;
> +       struct timespec          ctime;
> +       struct timespec          birthtime;
> +       unsigned long            gen;
> +
> +       /* Block map */
> +       size_t                   nblk;
> +       struct tarfs_blk        *blk;
> +
> +       struct tarfs_node       *parent;
> +       union {
> +               /* VDIR */
> +               struct {
> +                       TAILQ_HEAD(, tarfs_node) dirhead;
> +                       off_t                    lastcookie;
> +                       struct tarfs_node       *lastnode;
> +               } dir;
> +
> +               /* VLNK */
> +               struct {
> +                       char                    *name;
> +                       size_t                   namelen;
> +               } link;
> +
> +               /* VBLK or VCHR */
> +               dev_t                    rdev;
> +
> +               /* VREG */
> +               struct tarfs_node       *other;
> +       };
> +};
> +
> +/*
> + * Entry in sparse file block map.
> + */
> +struct tarfs_blk {
> +       off_t    i;             /* input (physical) offset */
> +       off_t    o;             /* output (logical) offset */
> +       size_t   l;             /* length */
> +};
> +
> +/*
> + * Decompression buffer.
> + */
> +#define TARFS_ZBUF_SIZE 1048576
> +struct tarfs_zbuf {
> +       u_char           buf[TARFS_ZBUF_SIZE];
> +       size_t           off; /* offset of contents */
> +       size_t           len; /* length of contents */
> +};
> +
> +/*
> + * Internal representation of a tarfs mount point.
> + */
> +struct tarfs_mount {
> +       TAILQ_HEAD(, tarfs_node) allnodes;
> +       struct mtx               allnode_lock;
> +
> +       struct tarfs_node       *root;
> +       struct vnode            *vp;
> +       struct mount            *vfs;
> +       ino_t                    ino;
> +       struct unrhdr           *ino_unr;
> +       size_t                   iosize;
> +       size_t                   nblocks;
> +       size_t                   nfiles;
> +       time_t                   mtime; /* default mtime for directories =
*/
> +
> +       struct tarfs_zio        *zio;
> +       struct vnode            *znode;
> +};
> +
> +struct tarfs_zio {
> +       struct tarfs_mount      *tmp;
> +
> +       /* decompression state */
> +#ifdef ZSTDIO
> +       struct tarfs_zstd       *zstd; /* decompression state (zstd) */
> +#endif
> +       off_t                    ipos; /* current input position */
> +       off_t                    opos; /* current output position */
> +
> +       /* index of compression frames */
> +       unsigned int             curidx; /* current index position*/
> +       unsigned int             nidx; /* number of index entries */
> +       unsigned int             szidx; /* index capacity */
> +       struct tarfs_idx { off_t i, o; } *idx;
> +};
> +
> +struct tarfs_fid {
> +       u_short                  len;   /* length of data in bytes */
> +       u_short                  data0; /* force alignment */
> +       ino_t                    ino;
> +       unsigned long            gen;
> +};
> +
> +#define        TARFS_NODE_LOCK(tnp) \
> +       mtx_lock(&(tnp)->lock)
> +#define        TARFS_NODE_UNLOCK(tnp) \
> +       mtx_unlock(&(tnp)->lock)
> +#define        TARFS_ALLNODES_LOCK(tnp) \
> +       mtx_lock(&(tmp)->allnode_lock)
> +#define        TARFS_ALLNODES_UNLOCK(tnp) \
> +       mtx_unlock(&(tmp)->allnode_lock)
> +
> +/*
> + * Data and metadata within tar files are aligned on 512-byte boundaries=
,
> + * to match the block size of the magnetic tapes they were originally
> + * intended for.
> + */
> +#define        TARFS_BSHIFT            9
> +#define        TARFS_BLOCKSIZE         (size_t)(1U << TARFS_BSHIFT)
> +#define        TARFS_BLKOFF(l)         ((l) % TARFS_BLOCKSIZE)
> +#define        TARFS_BLKNUM(l)         ((l) >> TARFS_BSHIFT)
> +#define        TARFS_SZ2BLKS(sz)       (((sz) + TARFS_BLOCKSIZE - 1) /
> TARFS_BLOCKSIZE)
> +
> +/*
> + * Our preferred I/O size.
> + */
> +extern unsigned int tarfs_ioshift;
> +#define        TARFS_IOSHIFT_MIN       TARFS_BSHIFT
> +#define        TARFS_IOSHIFT_DEFAULT   PAGE_SHIFT
> +#define        TARFS_IOSHIFT_MAX       PAGE_SHIFT
> +
> +#define        TARFS_ROOTINO           ((ino_t)3)
> +#define        TARFS_ZIOINO            ((ino_t)4)
> +#define        TARFS_MININO            ((ino_t)65535)
> +
> +#define        TARFS_COOKIE_DOT        0
> +#define        TARFS_COOKIE_DOTDOT     1
> +#define        TARFS_COOKIE_EOF        OFF_MAX
> +
> +#define        TARFS_ZIO_NAME          ".tar"
> +#define        TARFS_ZIO_NAMELEN       (sizeof(TARFS_ZIO_NAME) - 1)
> +
> +extern struct vop_vector tarfs_vnodeops;
> +
> +static inline
> +struct tarfs_mount *
> +MP_TO_TARFS_MOUNT(struct mount *mp)
> +{
> +
> +       MPASS(mp !=3D NULL && mp->mnt_data !=3D NULL);
> +       return (mp->mnt_data);
> +}
> +
> +static inline
> +struct tarfs_node *
> +VP_TO_TARFS_NODE(struct vnode *vp)
> +{
> +
> +       MPASS(vp !=3D NULL && vp->v_data !=3D NULL);
> +       return (vp->v_data);
> +}
> +
> +int    tarfs_alloc_node(struct tarfs_mount *tmp, const char *name,
> +           size_t namelen, enum vtype type, off_t off, size_t sz,
> +           time_t mtime, uid_t uid, gid_t gid, mode_t mode,
> +           unsigned int flags, const char *linkname, dev_t rdev,
> +           struct tarfs_node *parent, struct tarfs_node **node);
> +int    tarfs_load_blockmap(struct tarfs_node *tnp, size_t realsize);
> +void   tarfs_dump_tree(struct tarfs_node *tnp);
> +void   tarfs_free_node(struct tarfs_node *tnp);
> +struct tarfs_node *
> +       tarfs_lookup_dir(struct tarfs_node *tnp, off_t cookie);
> +struct tarfs_node *
> +       tarfs_lookup_node(struct tarfs_node *tnp, struct tarfs_node *f,
> +           struct componentname *cnp);
> +void   tarfs_print_node(struct tarfs_node *tnp);
> +int    tarfs_read_file(struct tarfs_node *tnp, size_t len, struct uio
> *uiop);
> +
> +int    tarfs_io_init(struct tarfs_mount *tmp);
> +int    tarfs_io_fini(struct tarfs_mount *tmp);
> +int    tarfs_io_read(struct tarfs_mount *tmp, bool raw,
> +    struct uio *uiop);
> +ssize_t        tarfs_io_read_buf(struct tarfs_mount *tmp, bool raw,
> +    void *buf, off_t off, size_t len);
> +unsigned int
> +       tarfs_strtofflags(const char *str, char **end);
> +
> +#endif /* _FS_TARFS_TARFS_H_ */
> diff --git a/sys/fs/tarfs/tarfs_dbg.h b/sys/fs/tarfs/tarfs_dbg.h
> new file mode 100644
> index 000000000000..45d11d679719
> --- /dev/null
> +++ b/sys/fs/tarfs/tarfs_dbg.h
> @@ -0,0 +1,65 @@
> +/*-
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright (c) 2013 Juniper Networks, Inc.
> + * Copyright (c) 2022 Klara, Inc.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in th=
e
> + *    documentation and/or other materials provided with the distributio=
n.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AN=
D
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIAB=
LE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO=
DS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O=
F
> + * SUCH DAMAGE.
> + */
> +
> +#ifndef        _FS_TARFS_TARFS_DBG_H_
> +#define        _FS_TARFS_TARFS_DBG_H_
> +
> +#ifndef _KERNEL
> +#error Should only be included by kernel
> +#endif
> +
> +#ifdef TARFS_DEBUG
> +extern int tarfs_debug;
> +
> +#define        TARFS_DEBUG_ALLOC       0x01
> +#define        TARFS_DEBUG_CHECKSUM    0x02
> +#define        TARFS_DEBUG_FS          0x04
> +#define        TARFS_DEBUG_LOOKUP      0x08
> +#define        TARFS_DEBUG_VNODE       0x10
> +#define        TARFS_DEBUG_IO          0x20
> +#define        TARFS_DEBUG_ZIO         0x40
> +#define        TARFS_DEBUG_ZIDX        0x80
> +#define        TARFS_DEBUG_MAP         0x100
> +
> +#define        TARFS_DPF(category, fmt, ...)
>      \
> +       do {                                                            \
> +               if ((tarfs_debug & TARFS_DEBUG_##category) !=3D 0)       =
 \
> +                       printf(fmt, ## __VA_ARGS__);                    \
> +       } while (0)
> +#define        TARFS_DPF_IFF(category, cond, fmt, ...)
>      \
> +       do {                                                            \
> +               if ((cond)                                              \
> +                   && (tarfs_debug & TARFS_DEBUG_##category) !=3D 0)    =
 \
> +                       printf(fmt, ## __VA_ARGS__);                    \
> +       } while (0)
> +#else
> +#define        TARFS_DPF(category, fmt, ...)
> +#define        TARFS_DPF_IFF(category, cond, fmt, ...)
> +#endif
> +
> +#endif /* _FS_TARFS_TARFS_DBG_H_ */
> diff --git a/sys/fs/tarfs/tarfs_io.c b/sys/fs/tarfs/tarfs_io.c
> new file mode 100644
> index 000000000000..b957ac11ff51
> --- /dev/null
> +++ b/sys/fs/tarfs/tarfs_io.c
> @@ -0,0 +1,727 @@
> +/*-
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright (c) 2013 Juniper Networks, Inc.
> + * Copyright (c) 2022-2023 Klara, Inc.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in th=
e
> + *    documentation and/or other materials provided with the distributio=
n.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AN=
D
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIAB=
LE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO=
DS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O=
F
> + * SUCH DAMAGE.
> + */
> +
> +#include "opt_tarfs.h"
> +#include "opt_zstdio.h"
> +
> +#include <sys/param.h>
> +#include <sys/systm.h>
> +#include <sys/counter.h>
> +#include <sys/bio.h>
> +#include <sys/buf.h>
> +#include <sys/malloc.h>
> +#include <sys/mount.h>
> +#include <sys/sysctl.h>
> +#include <sys/uio.h>
> +#include <sys/vnode.h>
> +
> +#ifdef ZSTDIO
> +#define ZSTD_STATIC_LINKING_ONLY
> +#include <contrib/zstd/lib/zstd.h>
> +#endif
> +
> +#include <fs/tarfs/tarfs.h>
> +#include <fs/tarfs/tarfs_dbg.h>
> +
> +#ifdef TARFS_DEBUG
> +SYSCTL_NODE(_vfs_tarfs, OID_AUTO, zio, CTLFLAG_RD, 0,
> +    "Tar filesystem decompression layer");
> +COUNTER_U64_DEFINE_EARLY(tarfs_zio_inflated);
> +SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, inflated, CTLFLAG_RD,
> +    &tarfs_zio_inflated, "Amount of compressed data inflated.");
> +COUNTER_U64_DEFINE_EARLY(tarfs_zio_consumed);
> +SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, consumed, CTLFLAG_RD,
> +    &tarfs_zio_consumed, "Amount of compressed data consumed.");
> +COUNTER_U64_DEFINE_EARLY(tarfs_zio_bounced);
> +SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, bounced, CTLFLAG_RD,
> +    &tarfs_zio_bounced, "Amount of decompressed data bounced.");
> +
> +static int
> +tarfs_sysctl_handle_zio_reset(SYSCTL_HANDLER_ARGS)
> +{
> +       unsigned int tmp;
> +       int error;
> +
> +       tmp =3D 0;
> +       if ((error =3D SYSCTL_OUT(req, &tmp, sizeof(tmp))) !=3D 0)
> +               return (error);
> +       if (req->newptr !=3D NULL) {
> +               if ((error =3D SYSCTL_IN(req, &tmp, sizeof(tmp))) !=3D 0)
> +                       return (error);
> +               counter_u64_zero(tarfs_zio_inflated);
> +               counter_u64_zero(tarfs_zio_consumed);
> +               counter_u64_zero(tarfs_zio_bounced);
> +       }
> +       return (0);
> +}
> +
> +SYSCTL_PROC(_vfs_tarfs_zio, OID_AUTO, reset,
> +    CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW,
> +    NULL, 0, tarfs_sysctl_handle_zio_reset, "IU",
> +    "Reset compression counters.");
> +#endif
> +
> +MALLOC_DEFINE(M_TARFSZSTATE, "tarfs zstate", "tarfs decompression state"=
);
> +MALLOC_DEFINE(M_TARFSZBUF, "tarfs zbuf", "tarfs decompression buffers");
> +
> +#define XZ_MAGIC               (uint8_t[]){ 0xfd, 0x37, 0x7a, 0x58, 0x5a=
 }
> +#define ZLIB_MAGIC             (uint8_t[]){ 0x1f, 0x8b, 0x08 }
> +#define ZSTD_MAGIC             (uint8_t[]){ 0x28, 0xb5, 0x2f, 0xfd }
> +
> +#ifdef ZSTDIO
> +struct tarfs_zstd {
> +       ZSTD_DStream *zds;
> +};
> +#endif
> +
> +/* XXX review use of curthread / uio_td / td_cred */
> +
> +/*
> + * Reads from the tar file according to the provided uio.  If the archiv=
e
> + * is compressed and raw is false, reads the decompressed stream;
> + * otherwise, reads directly from the original file.  Returns 0 on succe=
ss
> + * and a positive errno value on failure.
> + */
> +int
> +tarfs_io_read(struct tarfs_mount *tmp, bool raw, struct uio *uiop)
> +{
> +       void *rl =3D NULL;
> +       off_t off =3D uiop->uio_offset;
> +       size_t len =3D uiop->uio_resid;
> +       int error;
> +
> +       if (raw || tmp->znode =3D=3D NULL) {
> +               rl =3D vn_rangelock_rlock(tmp->vp, off, off + len);
> +               error =3D vn_lock(tmp->vp, LK_SHARED);
> +               if (error =3D=3D 0) {
> +                       error =3D VOP_READ(tmp->vp, uiop,
> +                           IO_DIRECT|IO_NODELOCKED,
> +                           uiop->uio_td->td_ucred);
> +                       VOP_UNLOCK(tmp->vp);
> +               }
> +               vn_rangelock_unlock(tmp->vp, rl);
> +       } else {
> +               error =3D vn_lock(tmp->znode, LK_EXCLUSIVE);
> +               if (error =3D=3D 0) {
> +                       error =3D VOP_READ(tmp->znode, uiop,
> +                           IO_DIRECT | IO_NODELOCKED,
> +                           uiop->uio_td->td_ucred);
> +                       VOP_UNLOCK(tmp->znode);
> +               }
> +       }
> +       TARFS_DPF(IO, "%s(%zu, %zu) =3D %d (resid %zd)\n", __func__,
> +           (size_t)off, len, error, uiop->uio_resid);
> +       return (error);
> +}
> +
> +/*
> + * Reads from the tar file into the provided buffer.  If the archive is
> + * compressed and raw is false, reads the decompressed stream; otherwise=
,
> + * reads directly from the original file.  Returns the number of bytes
> + * read on success, 0 on EOF, and a negative errno value on failure.
> + */
> +ssize_t
> +tarfs_io_read_buf(struct tarfs_mount *tmp, bool raw,
> +    void *buf, off_t off, size_t len)
> +{
> +       struct uio auio;
> +       struct iovec aiov;
> +       ssize_t res;
> +       int error;
> +
> +       if (len =3D=3D 0) {
> +               TARFS_DPF(IO, "%s(%zu, %zu) null\n", __func__,
> +                   (size_t)off, len);
> +               return (0);
> +       }
> +       aiov.iov_base =3D buf;
> +       aiov.iov_len =3D len;
> +       auio.uio_iov =3D &aiov;
> +       auio.uio_iovcnt =3D 1;
> +       auio.uio_offset =3D off;
> +       auio.uio_segflg =3D UIO_SYSSPACE;
> +       auio.uio_rw =3D UIO_READ;
> +       auio.uio_resid =3D len;
> +       auio.uio_td =3D curthread;
> +       error =3D tarfs_io_read(tmp, raw, &auio);
> +       if (error !=3D 0) {
> +               TARFS_DPF(IO, "%s(%zu, %zu) error %d\n", __func__,
> +                   (size_t)off, len, error);
> +               return (-error);
> +       }
> +       res =3D len - auio.uio_resid;
> +       if (res =3D=3D 0 && len !=3D 0) {
> +               TARFS_DPF(IO, "%s(%zu, %zu) eof\n", __func__,
> +                   (size_t)off, len);
> +       } else {
> +               TARFS_DPF(IO, "%s(%zu, %zu) read %zd | %*D\n", __func__,
> +                   (size_t)off, len, res,
> +                   (int)(res > 8 ? 8 : res), (uint8_t *)buf, " ");
> +       }
> +       return (res);
> +}
> +
> +#ifdef ZSTDIO
> +static void *
> +tarfs_zstate_alloc(void *opaque, size_t size)
> +{
> +
> +       (void)opaque;
> +       return (malloc(size, M_TARFSZSTATE, M_WAITOK));
> +}
> +#endif
> +
> +#ifdef ZSTDIO
> +static void
> +tarfs_zstate_free(void *opaque, void *address)
> +{
> +
> +       (void)opaque;
> +       free(address, M_TARFSZSTATE);
> +}
> +#endif
> +
> +#ifdef ZSTDIO
> +static ZSTD_customMem tarfs_zstd_mem =3D {
> +       tarfs_zstate_alloc,
> +       tarfs_zstate_free,
> +       NULL,
> +};
> +#endif
> +
> +/*
> + * Updates the decompression frame index, recording the current input an=
d
> + * output offsets in a new index entry, and growing the index if
> + * necessary.
> + */
> +static void
> +tarfs_zio_update_index(struct tarfs_zio *zio, off_t i, off_t o)
> +{
> +
> +       if (++zio->curidx >=3D zio->nidx) {
> +               if (++zio->nidx > zio->szidx) {
> +                       zio->szidx *=3D 2;
> +                       zio->idx =3D realloc(zio->idx,
> +                           zio->szidx * sizeof(*zio->idx),
> +                           M_TARFSZSTATE, M_ZERO | M_WAITOK);
> +                       TARFS_DPF(ALLOC, "%s: resized zio index\n",
> __func__);
> +               }
> +               zio->idx[zio->curidx].i =3D i;
> +               zio->idx[zio->curidx].o =3D o;
> +               TARFS_DPF(ZIDX, "%s: index %u =3D i %zu o %zu\n", __func_=
_,
> +                   zio->curidx, (size_t)zio->idx[zio->curidx].i,
> +                   (size_t)zio->idx[zio->curidx].o);
> +       }
> +       MPASS(zio->idx[zio->curidx].i =3D=3D i);
> +       MPASS(zio->idx[zio->curidx].o =3D=3D o);
> +}
> +
> +/*
> + * VOP_ACCESS for zio node.
> + */
> +static int
> +tarfs_zaccess(struct vop_access_args *ap)
> +{
> +       struct vnode *vp =3D ap->a_vp;
> +       struct tarfs_zio *zio =3D vp->v_data;
> +       struct tarfs_mount *tmp =3D zio->tmp;
> +       accmode_t accmode =3D ap->a_accmode;
> +       int error =3D EPERM;
> +
> +       if (accmode =3D=3D VREAD) {
> +               error =3D vn_lock(tmp->vp, LK_SHARED);
> +               if (error =3D=3D 0) {
> +                       error =3D VOP_ACCESS(tmp->vp, accmode, ap->a_cred=
,
> ap->a_td);
> +                       VOP_UNLOCK(tmp->vp);
> +               }
> +       }
> +       TARFS_DPF(ZIO, "%s(%d) =3D %d\n", __func__, accmode, error);
> +       return (error);
> +}
> +
> +/*
> + * VOP_GETATTR for zio node.
> + */
> +static int
> +tarfs_zgetattr(struct vop_getattr_args *ap)
> +{
> +       struct vattr va;
> +       struct vnode *vp =3D ap->a_vp;
> +       struct tarfs_zio *zio =3D vp->v_data;
> +       struct tarfs_mount *tmp =3D zio->tmp;
> +       struct vattr *vap =3D ap->a_vap;
> +       int error =3D 0;
> +
> +       VATTR_NULL(vap);
> +       error =3D vn_lock(tmp->vp, LK_SHARED);
> +       if (error =3D=3D 0) {
> +               error =3D VOP_GETATTR(tmp->vp, &va, ap->a_cred);
> +               VOP_UNLOCK(tmp->vp);
> +               if (error =3D=3D 0) {
> +                       vap->va_type =3D VREG;
> +                       vap->va_mode =3D va.va_mode;
> +                       vap->va_nlink =3D 1;
> +                       vap->va_gid =3D va.va_gid;
> +                       vap->va_uid =3D va.va_uid;
> +                       vap->va_fsid =3D vp->v_mount->mnt_stat.f_fsid.val=
[0];
> +                       vap->va_fileid =3D TARFS_ZIOINO;
> +                       vap->va_size =3D zio->idx[zio->nidx - 1].o;
> +                       vap->va_blocksize =3D vp->v_mount->mnt_stat.f_ios=
ize;
> +                       vap->va_atime =3D va.va_atime;
> +                       vap->va_ctime =3D va.va_ctime;
> +                       vap->va_mtime =3D va.va_mtime;
> +                       vap->va_birthtime =3D tmp->root->birthtime;
> +                       vap->va_bytes =3D va.va_bytes;
> +               }
> +       }
> +       TARFS_DPF(ZIO, "%s() =3D %d\n", __func__, error);
> +       return (error);
> +}
> +
> +#ifdef ZSTDIO
> +/*
> + * VOP_READ for zio node, zstd edition.
> + */
> +static int
> +tarfs_zread_zstd(struct tarfs_zio *zio, struct uio *uiop)
> +{
> +       void *ibuf =3D NULL, *obuf =3D NULL, *rl =3D NULL;
> +       struct uio auio;
> +       struct iovec aiov;
> +       struct tarfs_mount *tmp =3D zio->tmp;
> +       struct tarfs_zstd *zstd =3D zio->zstd;
> +       struct thread *td =3D curthread;
> +       ZSTD_inBuffer zib;
> +       ZSTD_outBuffer zob;
> +       off_t zsize;
> +       off_t ipos, opos;
> +       size_t ilen, olen;
> +       size_t zerror;
> +       off_t off =3D uiop->uio_offset;
> +       size_t len =3D uiop->uio_resid;
> +       size_t resid =3D uiop->uio_resid;
> +       size_t bsize;
> +       int error;
> +       bool reset =3D false;
> +
> +       /* do we have to rewind? */
> +       if (off < zio->opos) {
> +               while (zio->curidx > 0 && off < zio->idx[zio->curidx].o)
> +                       zio->curidx--;
> +               reset =3D true;
> +       }
> +       /* advance to the nearest index entry */
> +       if (off > zio->opos) {
> +               // XXX maybe do a binary search instead
> +               while (zio->curidx < zio->nidx - 1 &&
> +                   off >=3D zio->idx[zio->curidx + 1].o) {
> +                       zio->curidx++;
> +                       reset =3D true;
> +               }
> +       }
> +       /* reset the decompression stream if needed */
> +       if (reset) {
> +               zio->ipos =3D zio->idx[zio->curidx].i;
> +               zio->opos =3D zio->idx[zio->curidx].o;
> +               ZSTD_resetDStream(zstd->zds);
> +               TARFS_DPF(ZIDX, "%s: skipping to index %u =3D i %zu o
> %zu\n", __func__,
> +                   zio->curidx, (size_t)zio->ipos, (size_t)zio->opos);
> +       } else {
> +               TARFS_DPF(ZIDX, "%s: continuing at i %zu o %zu\n",
> __func__,
> +                   (size_t)zio->ipos, (size_t)zio->opos);
> +       }
> +
> +       /*
> +        * Set up a temporary buffer for compressed data.  Use the size
> +        * recommended by the zstd library; this is usually 128 kB, but
> +        * just in case, make sure it's a multiple of the page size and n=
o
> +        * larger than MAXBSIZE.
> +        */
> +       bsize =3D roundup(ZSTD_CStreamOutSize(), PAGE_SIZE);
> +       if (bsize > MAXBSIZE)
> +               bsize =3D MAXBSIZE;
> +       ibuf =3D malloc(bsize, M_TEMP, M_WAITOK);
> +       zib.src =3D NULL;
> +       zib.size =3D 0;
> +       zib.pos =3D 0;
> +
> +       /*
> +        * Set up the decompression buffer.  If the target is not in
> +        * kernel space, we will have to set up a bounce buffer.
> +        *
> +        * TODO: to avoid using a bounce buffer, map destination pages
> +        * using vm_fault_quick_hold_pages().
> +        */
> +       MPASS(zio->opos <=3D off);
> +       MPASS(uiop->uio_iovcnt =3D=3D 1);
> +       MPASS(uiop->uio_iov->iov_len >=3D len);
> +       if (uiop->uio_segflg =3D=3D UIO_SYSSPACE) {
> +               zob.dst =3D uiop->uio_iov->iov_base;
> +       } else {
> +               TARFS_DPF(ALLOC, "%s: allocating %zu-byte bounce buffer\n=
",
> +                   __func__, len);
> +               zob.dst =3D obuf =3D malloc(len, M_TEMP, M_WAITOK);
> +       }
> +       zob.size =3D len;
> +       zob.pos =3D 0;
> +
> +       /* lock tarball */
> +       rl =3D vn_rangelock_rlock(tmp->vp, zio->ipos, OFF_MAX);
> +       error =3D vn_lock(tmp->vp, LK_SHARED);
> +       if (error !=3D 0) {
> +               goto fail_unlocked;
> +       }
> +       /* check size */
> +       error =3D vn_getsize_locked(tmp->vp, &zsize, td->td_ucred);
> +       if (error !=3D 0) {
> +               goto fail;
> +       }
> +       if (zio->ipos >=3D zsize) {
> +               /* beyond EOF */
> +               goto fail;
> +       }
> +
> +       while (resid > 0) {
> +               if (zib.pos =3D=3D zib.size) {
> +                       /* request data from the underlying file */
> +                       aiov.iov_base =3D ibuf;
> +                       aiov.iov_len =3D bsize;
> +                       auio.uio_iov =3D &aiov;
> +                       auio.uio_iovcnt =3D 1;
> +                       auio.uio_offset =3D zio->ipos;
> +                       auio.uio_segflg =3D UIO_SYSSPACE;
> +                       auio.uio_rw =3D UIO_READ;
> +                       auio.uio_resid =3D aiov.iov_len;
> +                       auio.uio_td =3D td;
> +                       error =3D VOP_READ(tmp->vp, &auio,
> +                           IO_DIRECT | IO_NODELOCKED,
> +                           td->td_ucred);
> +                       if (error !=3D 0)
> +                               goto fail;
> +                       TARFS_DPF(ZIO, "%s: req %zu+%zu got %zu+%zu\n",
> __func__,
> +                           (size_t)zio->ipos, bsize,
> +                           (size_t)zio->ipos, bsize - auio.uio_resid);
> +                       zib.src =3D ibuf;
> +                       zib.size =3D bsize - auio.uio_resid;
> +                       zib.pos =3D 0;
> +               }
> +               MPASS(zib.pos <=3D zib.size);
> +               if (zib.pos =3D=3D zib.size) {
> +                       TARFS_DPF(ZIO, "%s: end of file after i %zu o
> %zu\n", __func__,
> +                           (size_t)zio->ipos, (size_t)zio->opos);
> +                       goto fail;
> +               }
> +               if (zio->opos < off) {
> +                       /* to be discarded */
> +                       zob.size =3D min(off - zio->opos, len);
> +                       zob.pos =3D 0;
> *** 3111 LINES SKIPPED ***
>
>

--000000000000931c1e05f3cfae44
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>Wow, cool, thank you so much! It feels like Christma=
s again. &gt;:-) I see this being immediately useful for anyone building a =
custom system to replace uzip+ufs combo, or other similar methods for creat=
ing read-only compressed storage containers!</div><div dir=3D"auto"><br></d=
iv><div dir=3D"auto">Just curious has you done some performance testing? So=
mething like &quot;worldstone&quot; but with /usr/src mounted off tar archi=
ve vs. &quot;normal&quot; UFS would be interesting to see.</div><div dir=3D=
"auto"><br></div><div dir=3D"auto">Also, has any, even cursory, security au=
dit been done on tar processing routines? Of course with functionality bein=
g opt-in the onus is on the user to make sure only tars obtained from trust=
ed sources are used and in a way that protects tar file content from modifi=
cation by unprivileged users. However, it won&#39;t protect us from FreeBSD=
 looking bad in public eyes, if some high-profile institutional user of Fre=
eBSD is breached by exploiting some of the vulnerability in this code few y=
ears down the line when it hits RELENG branch.</div><div dir=3D"auto"><br><=
/div><div dir=3D"auto">At the very least, some big, fat warning can be adde=
d into the man page to notify an user about the code being somewhat fresh a=
nd not on par quality-wise with something like UFS or ZFS. Plus providing s=
ome tips on best practices on how to reduce exposure when tarfs is used (no=
suid mount, proper tar file permissions, trusted sources etc).</div><div di=
r=3D"auto"><br></div><div dir=3D"auto">This is of course all hypothetical, =
but given the history of buffer/integer overflows etc in handling user-supp=
lied data in simple syscalls operating on structures of 1-2 orders of magni=
tude smaller size and lower complexity, I find it unlikely that fresh-off-t=
he-mill tar code won&#39;t have any. Perhaps, some automated fuzzing approa=
ch can be employed to see if it can crash kernel by giving it a slightly co=
rrupted but otherwise valid tar file? If Juniper sponsored the development =
of this feature I suspect they may not be the ones least interested to make=
 sure using it won&#39;t compromise security of their products. Pure specul=
ation of course on my par, but pretty reasonable at that.</div><div dir=3D"=
auto"><br></div><div dir=3D"auto">Anyhow, just my few Canadian cents on the=
 topic, while it fresh. Thanks again for anyone involved to make this avail=
able. I look forward to get my hands on it as soon as soon as I get back fr=
om FOSDEM, if not sooner.</div><div dir=3D"auto"><br></div><div dir=3D"auto=
">-Max</div><div dir=3D"auto"><br><div class=3D"gmail_quote" dir=3D"auto"><=
div dir=3D"ltr" class=3D"gmail_attr">On Thu, Feb 2, 2023, 6:20 PM Dag-Erlin=
g Sm=C3=B8rgrav &lt;<a href=3D"mailto:des@freebsd.org" rel=3D"noreferrer no=
referrer noreferrer" target=3D"_blank">des@freebsd.org</a>&gt; wrote:<br></=
div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex">The branch main has been updated by des:=
<br>
<br>
URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D69d94f4c7608e4150=
5996559367450706e91fbb8" rel=3D"noreferrer noreferrer noreferrer noreferrer=
 noreferrer" target=3D"_blank">https://cgit.FreeBSD.org/src/commit/?id=3D69=
d94f4c7608e41505996559367450706e91fbb8</a><br>
<br>
commit 69d94f4c7608e41505996559367450706e91fbb8<br>
Author:=C2=A0 =C2=A0 =C2=A0Dag-Erling Sm=C3=B8rgrav &lt;des@FreeBSD.org&gt;=
<br>
AuthorDate: 2023-02-02 17:18:41 +0000<br>
Commit:=C2=A0 =C2=A0 =C2=A0Dag-Erling Sm=C3=B8rgrav &lt;des@FreeBSD.org&gt;=
<br>
CommitDate: 2023-02-02 17:19:29 +0000<br>
<br>
=C2=A0 =C2=A0 Add tarfs, a filesystem backed by tarballs.<br>
<br>
=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0Juniper Networks, Inc.<br>
=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0Klara, Inc.<br>
=C2=A0 =C2=A0 Reviewed by:=C2=A0 =C2=A0 pauamma, imp<br>
=C2=A0 =C2=A0 Differential Revision:=C2=A0 <a href=3D"https://reviews.freeb=
sd.org/D37753" rel=3D"noreferrer noreferrer noreferrer noreferrer noreferre=
r" target=3D"_blank">https://reviews.freebsd.org/D37753</a><br>;
---<br>
=C2=A0etc/mtree/BSD.tests.dist=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=
=A0 2 +<br>
=C2=A0share/man/man5/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=
=A0 1 +<br>
=C2=A0share/man/man5/tarfs.5=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=
=A0 103 ++++<br>
=C2=A0sys/conf/files=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0|=C2=A0 =C2=A0 4 +<br>
=C2=A0sys/conf/options=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0|=C2=A0 =C2=A0 4 +<br>
=C2=A0sys/fs/tarfs/tarfs.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=
=C2=A0 254 +++++++++<br>
=C2=A0sys/fs/tarfs/tarfs_dbg.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=
=A065 +++<br>
=C2=A0sys/fs/tarfs/tarfs_io.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 727=
 +++++++++++++++++++++++<br>
=C2=A0sys/fs/tarfs/tarfs_subr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 603 ++++=
++++++++++++++++<br>
=C2=A0sys/fs/tarfs/tarfs_vfsops.c=C2=A0 =C2=A0 =C2=A0 | 1173 ++++++++++++++=
++++++++++++++++++++++++<br>
=C2=A0sys/fs/tarfs/tarfs_vnops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 642 ++++=
+++++++++++++++++<br>
=C2=A0sys/kern/subr_witness.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=
=A0 6 +<br>
=C2=A0sys/modules/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=
=C2=A0 =C2=A0 1 +<br>
=C2=A0sys/modules/tarfs/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A023=
 +<br>
=C2=A0tests/sys/fs/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=
=A0 =C2=A0 1 +<br>
=C2=A0tests/sys/fs/tarfs/Makefile=C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A010 +<br=
>
=C2=A0tests/sys/fs/tarfs/mktar.c=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 238 ++++=
++++<br>
=C2=A0tests/sys/fs/tarfs/tarfs_test.sh |=C2=A0 =C2=A054 ++<br>
=C2=A018 files changed, 3911 insertions(+)<br>
<br>
diff --git a/etc/mtree/BSD.tests.dist b/etc/mtree/BSD.tests.dist<br>
index 0d05ecaf06fc..b4b18997b7f9 100644<br>
--- a/etc/mtree/BSD.tests.dist<br>
+++ b/etc/mtree/BSD.tests.dist<br>
@@ -757,6 +757,8 @@<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fs<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fusefs<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0..<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tarfs<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ..<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tmpfs<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0..<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0..<br>
diff --git a/share/man/man5/Makefile b/share/man/man5/Makefile<br>
index 2d49d981c2f9..f6e91e4ed00b 100644<br>
--- a/share/man/man5/Makefile<br>
+++ b/share/man/man5/Makefile<br>
@@ -70,6 +70,7 @@ MAN=3D=C2=A0 acct.5 \<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 style.Makefile.5 \<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 style.mdoc.5 \<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 sysctl.conf.5 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs.5 \<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tmpfs.5 \<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 unionfs.5<br>
<br>
diff --git a/share/man/man5/tarfs.5 b/share/man/man5/tarfs.5<br>
new file mode 100644<br>
index 000000000000..b25131c323c1<br>
--- /dev/null<br>
+++ b/share/man/man5/tarfs.5<br>
@@ -0,0 +1,103 @@<br>
+.\&quot;-<br>
+.\&quot; SPDX-License-Identifier: BSD-2-Clause<br>
+.\&quot;<br>
+.\&quot; Copyright (c) 2022 Klara, Inc.<br>
+.\&quot;<br>
+.\&quot; Redistribution and use in source and binary forms, with or withou=
t<br>
+.\&quot; modification, are permitted provided that the following condition=
s<br>
+.\&quot; are met:<br>
+.\&quot; 1. Redistributions of source code must retain the above copyright=
<br>
+.\&quot;=C2=A0 =C2=A0 notice, this list of conditions and the following di=
sclaimer.<br>
+.\&quot; 2. Redistributions in binary form must reproduce the above copyri=
ght<br>
+.\&quot;=C2=A0 =C2=A0 notice, this list of conditions and the following di=
sclaimer in the<br>
+.\&quot;=C2=A0 =C2=A0 documentation and/or other materials provided with t=
he distribution.<br>
+.\&quot;<br>
+.\&quot; THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS&=
#39;&#39; AND<br>
+.\&quot; ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,=
 THE<br>
+.\&quot; IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULA=
R PURPOSE<br>
+.\&quot; ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTOR=
S BE LIABLE<br>
+.\&quot; FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONS=
EQUENTIAL<br>
+.\&quot; DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE=
 GOODS<br>
+.\&quot; OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPT=
ION)<br>
+.\&quot; HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRAC=
T, STRICT<br>
+.\&quot; LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN=
 ANY WAY<br>
+.\&quot; OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILI=
TY OF<br>
+.\&quot; SUCH DAMAGE.<br>
+.\&quot;<br>
+.Dd February 2, 2023<br>
+.Dt TARFS 5<br>
+.Os<br>
+.Sh NAME<br>
+.Nm tarfs<br>
+.Nd tarball filesystem<br>
+.Sh SYNOPSIS<br>
+To compile this driver into the kernel, place the following line in<br>
+your kernel configuration file:<br>
+.Bd -ragged -offset indent<br>
+.Cd &quot;options TARFS&quot;<br>
+.Ed<br>
+.Pp<br>
+Alternatively, to load the driver as a module at boot time, place the<br>
+following line in<br>
+.Xr loader.conf 5 :<br>
+.Bd -literal -offset indent<br>
+tarfs_load=3D&quot;YES&quot;<br>
+.Ed<br>
+.Sh DESCRIPTION<br>
+The<br>
+.Nm<br>
+driver implementes a read-only filesystem backed by a<br>
+.Xr tar 5<br>
+file.<br>
+Currently, only POSIX archives, optionally compressed with<br>
+.Xr zstd 1 ,<br>
+are supported.<br>
+.Pp<br>
+The preferred I/O size for<br>
+.Nm<br>
+filesystems can be adjusted using the<br>
+.Va vfs.tarfs.ioshift<br>
+sysctl setting and tunable.<br>
+Setting it to 0 will reset it to its default value.<br>
+Note that changes to this setting only apply to filesystems mounted<br>
+after the change.<br>
+.Sh DIAGNOSTICS<br>
+If enabled by the<br>
+.Dv TARFS_DEBUG<br>
+kernel option, the<br>
+.Va vfs.tarfs.debug<br>
+sysctl setting can be used to control debugging output from the<br>
+.Nm<br>
+driver.<br>
+Debugging output for individual sections of the driver can be enabled<br>
+by adding together the relevant values from the table below.<br>
+.Bl -column Value Description<br>
+.It 0x01 Ta Memory allocations<br>
+.It 0x02 Ta Checksum calculations<br>
+.It 0x04 Ta Filesystem operations (vfsops)<br>
+.It 0x08 Ta Path lookups<br>
+.It 0x10 Ta File operations (vnops)<br>
+.It 0x20 Ta General I/O<br>
+.It 0x40 Ta Decompression<br>
+.It 0x80 Ta Decompression index<br>
+.It 0x100 Ta Sparse file mapping<br>
+.El<br>
+.Sh SEE ALSO<br>
+.Xr tar 1 ,<br>
+.Xr zstd 1 ,<br>
+.Xr fstab 5 ,<br>
+.Xr tar 5 ,<br>
+.Xr mount 8 ,<br>
+.Xr sysctl 8<br>
+.Sh HISTORY<br>
+.An -nosplit<br>
+The<br>
+.Nm<br>
+driver was developed by<br>
+.An Stephen J. Kiernan Aq Mt stevek@FreeBSD.org<br>
+and<br>
+.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org<br>
+for Juniper Networks and Klara Systems.<br>
+This manual page was written by<br>
+.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org<br>
+for Juniper Networks and Klara Systems.<br>
diff --git a/sys/conf/files b/sys/conf/files<br>
index 6cb4abcd9223..08966a9b46e4 100644<br>
--- a/sys/conf/files<br>
+++ b/sys/conf/files<br>
@@ -3615,6 +3615,10 @@ fs/smbfs/smbfs_smb.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0optional smbfs<br>
=C2=A0fs/smbfs/smbfs_subr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional smbf=
s<br>
=C2=A0fs/smbfs/smbfs_vfsops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 optional smbfs<br>
=C2=A0fs/smbfs/smbfs_vnops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0optional smbf=
s<br>
+fs/tarfs/tarfs_io.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional tarf=
s compile-with &quot;${NORMAL_C} -I$S/contrib/zstd/lib/freebsd&quot;<br>
+fs/tarfs/tarfs_subr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional tarfs<br>
+fs/tarfs/tarfs_vfsops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 optional tarfs<br>
+fs/tarfs/tarfs_vnops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0optional tarfs<br>
=C2=A0fs/udf/osta.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 optional udf<br>
=C2=A0fs/udf/udf_iconv.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0opt=
ional udf_iconv<br>
=C2=A0fs/udf/udf_vfsops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional=
 udf<br>
diff --git a/sys/conf/options b/sys/conf/options<br>
index 1f5003507539..3b2be66ba602 100644<br>
--- a/sys/conf/options<br>
+++ b/sys/conf/options<br>
@@ -265,6 +265,7 @@ NULLFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
opt_dontuse.h<br>
=C2=A0PROCFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0opt_dontuse.h<br>
=C2=A0PSEUDOFS=C2=A0 =C2=A0 =C2=A0 =C2=A0opt_dontuse.h<br>
=C2=A0SMBFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h<br>
+TARFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h<br>
=C2=A0TMPFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h<br>
=C2=A0UDF=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h<br>
=C2=A0UNIONFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_do=
ntuse.h<br>
@@ -273,6 +274,9 @@ ZFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0opt_dontuse.h<br>
=C2=A0# Pseudofs debugging<br>
=C2=A0PSEUDOFS_TRACE opt_pseudofs.h<br>
<br>
+# Tarfs debugging<br>
+TARFS_DEBUG=C2=A0 =C2=A0 opt_tarfs.h<br>
+<br>
=C2=A0# In-kernel GSS-API<br>
=C2=A0KGSSAPI=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_kg=
ssapi.h<br>
=C2=A0KGSSAPI_DEBUG=C2=A0 opt_kgssapi.h<br>
diff --git a/sys/fs/tarfs/tarfs.h b/sys/fs/tarfs/tarfs.h<br>
new file mode 100644<br>
index 000000000000..dffd60ee6d8a<br>
--- /dev/null<br>
+++ b/sys/fs/tarfs/tarfs.h<br>
@@ -0,0 +1,254 @@<br>
+/*-<br>
+ * SPDX-License-Identifier: BSD-2-Clause<br>
+ *<br>
+ * Copyright (c) 2013 Juniper Networks, Inc.<br>
+ * Copyright (c) 2022-2023 Klara, Inc.<br>
+ *<br>
+ * Redistribution and use in source and binary forms, with or without<br>
+ * modification, are permitted provided that the following conditions<br>
+ * are met:<br>
+ * 1. Redistributions of source code must retain the above copyright<br>
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim=
er.<br>
+ * 2. Redistributions in binary form must reproduce the above copyright<br=
>
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim=
er in the<br>
+ *=C2=A0 =C2=A0 documentation and/or other materials provided with the dis=
tribution.<br>
+ *<br>
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS&#39;&#=
39; AND<br>
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE<b=
r>
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP=
OSE<br>
+ * ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE L=
IABLE<br>
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT=
IAL<br>
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=
<br>
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)<b=
r>
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR=
ICT<br>
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W=
AY<br>
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF<=
br>
+ * SUCH DAMAGE.<br>
+ */<br>
+<br>
+#ifndef=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_H_<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_H_<br>
+<br>
+#ifndef _KERNEL<br>
+#error Should only be included by kernel<br>
+#endif<br>
+<br>
+MALLOC_DECLARE(M_TARFSMNT);<br>
+MALLOC_DECLARE(M_TARFSNODE);<br>
+MALLOC_DECLARE(M_TARFSNAME);<br>
+<br>
+#ifdef SYSCTL_DECL<br>
+SYSCTL_DECL(_vfs_tarfs);<br>
+#endif<br>
+<br>
+struct componentname;<br>
+struct mount;<br>
+struct vnode;<br>
+<br>
+/*<br>
+ * Internal representation of a tarfs file system node.<br>
+ */<br>
+struct tarfs_node {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TAILQ_ENTRY(tarfs_node) entries;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TAILQ_ENTRY(tarfs_node) dirents;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct mtx=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0lock;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 *vnode;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount=C2=A0 =C2=A0 =C2=A0 *tmp;<br=
>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0enum vtype=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0type;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ino_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ino;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 offset;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0size;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0physize;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0char=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 *name;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0namelen;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Node attributes */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0uid_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 uid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0gid_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 gid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mode_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0mode;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0flags;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0nlink_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 nlink;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 atime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 mtime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 ctime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 birthtime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 gen;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Block map */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0nblk;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_blk=C2=A0 =C2=A0 =C2=A0 =C2=A0 *bl=
k;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node=C2=A0 =C2=A0 =C2=A0 =C2=A0*pa=
rent;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0union {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VDIR */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0TAILQ_HEAD(, tarfs_node) dirhead;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 lastcookie;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0struct tarfs_node=C2=A0 =C2=A0 =C2=A0 =C2=A0*lastnode;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} dir;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VLNK */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0char=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 *name;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0namelen;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} link;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VBLK or VCHR */<=
br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dev_t=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rdev;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VREG */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node=
=C2=A0 =C2=A0 =C2=A0 =C2=A0*other;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0};<br>
+};<br>
+<br>
+/*<br>
+ * Entry in sparse file block map.<br>
+ */<br>
+struct tarfs_blk {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 i;=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0/* input (physical) offset */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 o;=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0/* output (logical) offset */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0l;=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0/* length */<br>
+};<br>
+<br>
+/*<br>
+ * Decompression buffer.<br>
+ */<br>
+#define TARFS_ZBUF_SIZE 1048576<br>
+struct tarfs_zbuf {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0u_char=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
buf[TARFS_ZBUF_SIZE];<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
off; /* offset of contents */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
len; /* length of contents */<br>
+};<br>
+<br>
+/*<br>
+ * Internal representation of a tarfs mount point.<br>
+ */<br>
+struct tarfs_mount {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TAILQ_HEAD(, tarfs_node) allnodes;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct mtx=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0allnode_lock;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node=C2=A0 =C2=A0 =C2=A0 =C2=A0*ro=
ot;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 *vp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct mount=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 *vfs;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ino_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ino;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct unrhdr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0*ino_unr;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0iosize;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0nblocks;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0nfiles;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0time_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0mtime; /* default mtime for directories */<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zio=C2=A0 =C2=A0 =C2=A0 =C2=A0 *zi=
o;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 *znode;<br>
+};<br>
+<br>
+struct tarfs_zio {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount=C2=A0 =C2=A0 =C2=A0 *tmp;<br=
>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* decompression state */<br>
+#ifdef ZSTDIO<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zstd=C2=A0 =C2=A0 =C2=A0 =C2=A0*zs=
td; /* decompression state (zstd) */<br>
+#endif<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ipos; /* current input position */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 opos; /* current output position */<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* index of compression frames */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0curidx; /* current index position*/<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0nidx; /* number of index entries */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0szidx; /* index capacity */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_idx { off_t i, o; } *idx;<br>
+};<br>
+<br>
+struct tarfs_fid {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0u_short=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 len;=C2=A0 =C2=A0/* length of data in bytes */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0u_short=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 data0; /* force alignment */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ino_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ino;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 gen;<br>
+};<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_NODE_LOCK(tnp) \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_lock(&amp;(tnp)-&gt;lock)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_NODE_UNLOCK(tnp) \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&amp;(tnp)-&gt;lock)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ALLNODES_LOCK(tnp) \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_lock(&amp;(tmp)-&gt;allnode_lock)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ALLNODES_UNLOCK(tnp) \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&amp;(tmp)-&gt;allnode_lock)<br>
+<br>
+/*<br>
+ * Data and metadata within tar files are aligned on 512-byte boundaries,<=
br>
+ * to match the block size of the magnetic tapes they were originally<br>
+ * intended for.<br>
+ */<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BSHIFT=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 9<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BLOCKSIZE=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0(size_t)(1U &lt;&lt; TARFS_BSHIFT)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BLKOFF(l)=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0((l) % TARFS_BLOCKSIZE)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BLKNUM(l)=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0((l) &gt;&gt; TARFS_BSHIFT)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_SZ2BLKS(sz)=C2=A0 =C2=A0 =C2=A0 =
=C2=A0(((sz) + TARFS_BLOCKSIZE - 1) / TARFS_BLOCKSIZE)<br>
+<br>
+/*<br>
+ * Our preferred I/O size.<br>
+ */<br>
+extern unsigned int tarfs_ioshift;<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_IOSHIFT_MIN=C2=A0 =C2=A0 =C2=A0 =
=C2=A0TARFS_BSHIFT<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_IOSHIFT_DEFAULT=C2=A0 =C2=A0PAGE_=
SHIFT<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_IOSHIFT_MAX=C2=A0 =C2=A0 =C2=A0 =
=C2=A0PAGE_SHIFT<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ROOTINO=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0((ino_t)3)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ZIOINO=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 ((ino_t)4)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_MININO=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 ((ino_t)65535)<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_COOKIE_DOT=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 0<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_COOKIE_DOTDOT=C2=A0 =C2=A0 =C2=A0=
1<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_COOKIE_EOF=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 OFF_MAX<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ZIO_NAME=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 &quot;.tar&quot;<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ZIO_NAMELEN=C2=A0 =C2=A0 =C2=A0 =
=C2=A0(sizeof(TARFS_ZIO_NAME) - 1)<br>
+<br>
+extern struct vop_vector tarfs_vnodeops;<br>
+<br>
+static inline<br>
+struct tarfs_mount *<br>
+MP_TO_TARFS_MOUNT(struct mount *mp)<br>
+{<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(mp !=3D NULL &amp;&amp; mp-&gt;mnt_data !=
=3D NULL);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (mp-&gt;mnt_data);<br>
+}<br>
+<br>
+static inline<br>
+struct tarfs_node *<br>
+VP_TO_TARFS_NODE(struct vnode *vp)<br>
+{<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(vp !=3D NULL &amp;&amp; vp-&gt;v_data !=
=3D NULL);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (vp-&gt;v_data);<br>
+}<br>
+<br>
+int=C2=A0 =C2=A0 tarfs_alloc_node(struct tarfs_mount *tmp, const char *nam=
e,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen, enum vtype type, =
off_t off, size_t sz,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0time_t mtime, uid_t uid, gid_t gi=
d, mode_t mode,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int flags, const char *l=
inkname, dev_t rdev,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node *parent, struct=
 tarfs_node **node);<br>
+int=C2=A0 =C2=A0 tarfs_load_blockmap(struct tarfs_node *tnp, size_t realsi=
ze);<br>
+void=C2=A0 =C2=A0tarfs_dump_tree(struct tarfs_node *tnp);<br>
+void=C2=A0 =C2=A0tarfs_free_node(struct tarfs_node *tnp);<br>
+struct tarfs_node *<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_lookup_dir(struct tarfs_node *tnp, off_t =
cookie);<br>
+struct tarfs_node *<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_lookup_node(struct tarfs_node *tnp, struc=
t tarfs_node *f,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct componentname *cnp);<br>
+void=C2=A0 =C2=A0tarfs_print_node(struct tarfs_node *tnp);<br>
+int=C2=A0 =C2=A0 tarfs_read_file(struct tarfs_node *tnp, size_t len, struc=
t uio *uiop);<br>
+<br>
+int=C2=A0 =C2=A0 tarfs_io_init(struct tarfs_mount *tmp);<br>
+int=C2=A0 =C2=A0 tarfs_io_fini(struct tarfs_mount *tmp);<br>
+int=C2=A0 =C2=A0 tarfs_io_read(struct tarfs_mount *tmp, bool raw,<br>
+=C2=A0 =C2=A0 struct uio *uiop);<br>
+ssize_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 tarfs_io_read_buf(struct tarfs_mount *t=
mp, bool raw,<br>
+=C2=A0 =C2=A0 void *buf, off_t off, size_t len);<br>
+unsigned int<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_strtofflags(const char *str, char **end);=
<br>
+<br>
+#endif /* _FS_TARFS_TARFS_H_ */<br>
diff --git a/sys/fs/tarfs/tarfs_dbg.h b/sys/fs/tarfs/tarfs_dbg.h<br>
new file mode 100644<br>
index 000000000000..45d11d679719<br>
--- /dev/null<br>
+++ b/sys/fs/tarfs/tarfs_dbg.h<br>
@@ -0,0 +1,65 @@<br>
+/*-<br>
+ * SPDX-License-Identifier: BSD-2-Clause<br>
+ *<br>
+ * Copyright (c) 2013 Juniper Networks, Inc.<br>
+ * Copyright (c) 2022 Klara, Inc.<br>
+ *<br>
+ * Redistribution and use in source and binary forms, with or without<br>
+ * modification, are permitted provided that the following conditions<br>
+ * are met:<br>
+ * 1. Redistributions of source code must retain the above copyright<br>
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim=
er.<br>
+ * 2. Redistributions in binary form must reproduce the above copyright<br=
>
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim=
er in the<br>
+ *=C2=A0 =C2=A0 documentation and/or other materials provided with the dis=
tribution.<br>
+ *<br>
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS&#39;&#=
39; AND<br>
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE<b=
r>
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP=
OSE<br>
+ * ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE L=
IABLE<br>
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT=
IAL<br>
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=
<br>
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)<b=
r>
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR=
ICT<br>
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W=
AY<br>
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF<=
br>
+ * SUCH DAMAGE.<br>
+ */<br>
+<br>
+#ifndef=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_DBG_H_<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_DBG_H_<br>
+<br>
+#ifndef _KERNEL<br>
+#error Should only be included by kernel<br>
+#endif<br>
+<br>
+#ifdef TARFS_DEBUG<br>
+extern int tarfs_debug;<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_ALLOC=C2=A0 =C2=A0 =C2=A0 =
=C2=A00x01<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_CHECKSUM=C2=A0 =C2=A0 0x02<=
br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_FS=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 0x04<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_LOOKUP=C2=A0 =C2=A0 =C2=A0 =
0x08<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_VNODE=C2=A0 =C2=A0 =C2=A0 =
=C2=A00x10<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_IO=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 0x20<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_ZIO=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A00x40<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_ZIDX=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 0x80<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_MAP=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A00x100<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF(category, fmt, ...)=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0do {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((tarfs_debug &a=
mp; TARFS_DEBUG_##category) !=3D 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0printf(fmt, ## __VA_ARGS__);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} while (0)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF_IFF(category, cond, fmt, ...)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0\<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0do {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((cond)=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&amp;=
&amp; (tarfs_debug &amp; TARFS_DEBUG_##category) !=3D 0)=C2=A0 =C2=A0 =C2=
=A0\<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0printf(fmt, ## __VA_ARGS__);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} while (0)<br>
+#else<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF(category, fmt, ...)<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF_IFF(category, cond, fmt, ...)=
<br>
+#endif<br>
+<br>
+#endif /* _FS_TARFS_TARFS_DBG_H_ */<br>
diff --git a/sys/fs/tarfs/tarfs_io.c b/sys/fs/tarfs/tarfs_io.c<br>
new file mode 100644<br>
index 000000000000..b957ac11ff51<br>
--- /dev/null<br>
+++ b/sys/fs/tarfs/tarfs_io.c<br>
@@ -0,0 +1,727 @@<br>
+/*-<br>
+ * SPDX-License-Identifier: BSD-2-Clause<br>
+ *<br>
+ * Copyright (c) 2013 Juniper Networks, Inc.<br>
+ * Copyright (c) 2022-2023 Klara, Inc.<br>
+ *<br>
+ * Redistribution and use in source and binary forms, with or without<br>
+ * modification, are permitted provided that the following conditions<br>
+ * are met:<br>
+ * 1. Redistributions of source code must retain the above copyright<br>
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim=
er.<br>
+ * 2. Redistributions in binary form must reproduce the above copyright<br=
>
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim=
er in the<br>
+ *=C2=A0 =C2=A0 documentation and/or other materials provided with the dis=
tribution.<br>
+ *<br>
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS&#39;&#=
39; AND<br>
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE<b=
r>
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP=
OSE<br>
+ * ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE L=
IABLE<br>
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT=
IAL<br>
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=
<br>
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)<b=
r>
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR=
ICT<br>
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W=
AY<br>
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF<=
br>
+ * SUCH DAMAGE.<br>
+ */<br>
+<br>
+#include &quot;opt_tarfs.h&quot;<br>
+#include &quot;opt_zstdio.h&quot;<br>
+<br>
+#include &lt;sys/param.h&gt;<br>
+#include &lt;sys/systm.h&gt;<br>
+#include &lt;sys/counter.h&gt;<br>
+#include &lt;sys/bio.h&gt;<br>
+#include &lt;sys/buf.h&gt;<br>
+#include &lt;sys/malloc.h&gt;<br>
+#include &lt;sys/mount.h&gt;<br>
+#include &lt;sys/sysctl.h&gt;<br>
+#include &lt;sys/uio.h&gt;<br>
+#include &lt;sys/vnode.h&gt;<br>
+<br>
+#ifdef ZSTDIO<br>
+#define ZSTD_STATIC_LINKING_ONLY<br>
+#include &lt;contrib/zstd/lib/zstd.h&gt;<br>
+#endif<br>
+<br>
+#include &lt;fs/tarfs/tarfs.h&gt;<br>
+#include &lt;fs/tarfs/tarfs_dbg.h&gt;<br>
+<br>
+#ifdef TARFS_DEBUG<br>
+SYSCTL_NODE(_vfs_tarfs, OID_AUTO, zio, CTLFLAG_RD, 0,<br>
+=C2=A0 =C2=A0 &quot;Tar filesystem decompression layer&quot;);<br>
+COUNTER_U64_DEFINE_EARLY(tarfs_zio_inflated);<br>
+SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, inflated, CTLFLAG_RD,<br>
+=C2=A0 =C2=A0 &amp;tarfs_zio_inflated, &quot;Amount of compressed data inf=
lated.&quot;);<br>
+COUNTER_U64_DEFINE_EARLY(tarfs_zio_consumed);<br>
+SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, consumed, CTLFLAG_RD,<br>
+=C2=A0 =C2=A0 &amp;tarfs_zio_consumed, &quot;Amount of compressed data con=
sumed.&quot;);<br>
+COUNTER_U64_DEFINE_EARLY(tarfs_zio_bounced);<br>
+SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, bounced, CTLFLAG_RD,<br>
+=C2=A0 =C2=A0 &amp;tarfs_zio_bounced, &quot;Amount of decompressed data bo=
unced.&quot;);<br>
+<br>
+static int<br>
+tarfs_sysctl_handle_zio_reset(SYSCTL_HANDLER_ARGS)<br>
+{<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int tmp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tmp =3D 0;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((error =3D SYSCTL_OUT(req, &amp;tmp, sizeof=
(tmp))) !=3D 0)<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (req-&gt;newptr !=3D NULL) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((error =3D SYSC=
TL_IN(req, &amp;tmp, sizeof(tmp))) !=3D 0)<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0return (error);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0counter_u64_zero(ta=
rfs_zio_inflated);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0counter_u64_zero(ta=
rfs_zio_consumed);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0counter_u64_zero(ta=
rfs_zio_bounced);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (0);<br>
+}<br>
+<br>
+SYSCTL_PROC(_vfs_tarfs_zio, OID_AUTO, reset,<br>
+=C2=A0 =C2=A0 CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW,<br>
+=C2=A0 =C2=A0 NULL, 0, tarfs_sysctl_handle_zio_reset, &quot;IU&quot;,<br>
+=C2=A0 =C2=A0 &quot;Reset compression counters.&quot;);<br>
+#endif<br>
+<br>
+MALLOC_DEFINE(M_TARFSZSTATE, &quot;tarfs zstate&quot;, &quot;tarfs decompr=
ession state&quot;);<br>
+MALLOC_DEFINE(M_TARFSZBUF, &quot;tarfs zbuf&quot;, &quot;tarfs decompressi=
on buffers&quot;);<br>
+<br>
+#define XZ_MAGIC=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(ui=
nt8_t[]){ 0xfd, 0x37, 0x7a, 0x58, 0x5a }<br>
+#define ZLIB_MAGIC=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(uint8_t=
[]){ 0x1f, 0x8b, 0x08 }<br>
+#define ZSTD_MAGIC=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(uint8_t=
[]){ 0x28, 0xb5, 0x2f, 0xfd }<br>
+<br>
+#ifdef ZSTDIO<br>
+struct tarfs_zstd {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_DStream *zds;<br>
+};<br>
+#endif<br>
+<br>
+/* XXX review use of curthread / uio_td / td_cred */<br>
+<br>
+/*<br>
+ * Reads from the tar file according to the provided uio.=C2=A0 If the arc=
hive<br>
+ * is compressed and raw is false, reads the decompressed stream;<br>
+ * otherwise, reads directly from the original file.=C2=A0 Returns 0 on su=
ccess<br>
+ * and a positive errno value on failure.<br>
+ */<br>
+int<br>
+tarfs_io_read(struct tarfs_mount *tmp, bool raw, struct uio *uiop)<br>
+{<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0void *rl =3D NULL;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t off =3D uiop-&gt;uio_offset;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t len =3D uiop-&gt;uio_resid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (raw || tmp-&gt;znode =3D=3D NULL) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rl =3D vn_rangelock=
_rlock(tmp-&gt;vp, off, off + len);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(t=
mp-&gt;vp, LK_SHARED);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)=
 {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0error =3D VOP_READ(tmp-&gt;vp, uiop,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0IO_DIRECT|IO_NODELOCKED,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0uiop-&gt;uio_td-&gt;td_ucred);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0VOP_UNLOCK(tmp-&gt;vp);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vn_rangelock_unlock=
(tmp-&gt;vp, rl);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(t=
mp-&gt;znode, LK_EXCLUSIVE);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)=
 {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0error =3D VOP_READ(tmp-&gt;znode, uiop,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0IO_DIRECT | IO_NODELOCKED,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0uiop-&gt;uio_td-&gt;td_ucred);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0VOP_UNLOCK(tmp-&gt;znode);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, &quot;%s(%zu, %zu) =3D %d (resid =
%zd)\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size_t)off, len, error, uiop-&gt=
;uio_resid);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);<br>
+}<br>
+<br>
+/*<br>
+ * Reads from the tar file into the provided buffer.=C2=A0 If the archive =
is<br>
+ * compressed and raw is false, reads the decompressed stream; otherwise,<=
br>
+ * reads directly from the original file.=C2=A0 Returns the number of byte=
s<br>
+ * read on success, 0 on EOF, and a negative errno value on failure.<br>
+ */<br>
+ssize_t<br>
+tarfs_io_read_buf(struct tarfs_mount *tmp, bool raw,<br>
+=C2=A0 =C2=A0 void *buf, off_t off, size_t len)<br>
+{<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct uio auio;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct iovec aiov;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ssize_t res;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (len =3D=3D 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, &quot=
;%s(%zu, %zu) null\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size=
_t)off, len);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (0);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0aiov.iov_base =3D buf;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0aiov.iov_len =3D len;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_iov =3D &amp;aiov;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_iovcnt =3D 1;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_offset =3D off;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_segflg =3D UIO_SYSSPACE;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_rw =3D UIO_READ;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_resid =3D len;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_td =3D curthread;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D tarfs_io_read(tmp, raw, &amp;auio);<b=
r>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error !=3D 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, &quot=
;%s(%zu, %zu) error %d\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size=
_t)off, len, error);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (-error);<br=
>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0res =3D len - auio.uio_resid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (res =3D=3D 0 &amp;&amp; len !=3D 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, &quot=
;%s(%zu, %zu) eof\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size=
_t)off, len);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, &quot=
;%s(%zu, %zu) read %zd | %*D\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size=
_t)off, len, res,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(int)=
(res &gt; 8 ? 8 : res), (uint8_t *)buf, &quot; &quot;);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (res);<br>
+}<br>
+<br>
+#ifdef ZSTDIO<br>
+static void *<br>
+tarfs_zstate_alloc(void *opaque, size_t size)<br>
+{<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0(void)opaque;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (malloc(size, M_TARFSZSTATE, M_WAITOK));=
<br>
+}<br>
+#endif<br>
+<br>
+#ifdef ZSTDIO<br>
+static void<br>
+tarfs_zstate_free(void *opaque, void *address)<br>
+{<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0(void)opaque;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0free(address, M_TARFSZSTATE);<br>
+}<br>
+#endif<br>
+<br>
+#ifdef ZSTDIO<br>
+static ZSTD_customMem tarfs_zstd_mem =3D {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_zstate_alloc,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_zstate_free,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0NULL,<br>
+};<br>
+#endif<br>
+<br>
+/*<br>
+ * Updates the decompression frame index, recording the current input and<=
br>
+ * output offsets in a new index entry, and growing the index if<br>
+ * necessary.<br>
+ */<br>
+static void<br>
+tarfs_zio_update_index(struct tarfs_zio *zio, off_t i, off_t o)<br>
+{<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (++zio-&gt;curidx &gt;=3D zio-&gt;nidx) {<br=
>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (++zio-&gt;nidx =
&gt; zio-&gt;szidx) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zio-&gt;szidx *=3D 2;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zio-&gt;idx =3D realloc(zio-&gt;idx,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0zio-&gt;szidx * sizeof(*zio-&gt;idx),<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0M_TARFSZSTATE, M_ZERO | M_WAITOK);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0TARFS_DPF(ALLOC, &quot;%s: resized zio index\n&quot;, __func__);<=
br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&gt;idx[zio-&gt=
;curidx].i =3D i;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&gt;idx[zio-&gt=
;curidx].o =3D o;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIDX, &qu=
ot;%s: index %u =3D i %zu o %zu\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&=
gt;curidx, (size_t)zio-&gt;idx[zio-&gt;curidx].i,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size=
_t)zio-&gt;idx[zio-&gt;curidx].o);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zio-&gt;idx[zio-&gt;curidx].i =3D=3D i);<=
br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zio-&gt;idx[zio-&gt;curidx].o =3D=3D o);<=
br>
+}<br>
+<br>
+/*<br>
+ * VOP_ACCESS for zio node.<br>
+ */<br>
+static int<br>
+tarfs_zaccess(struct vop_access_args *ap)<br>
+{<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode *vp =3D ap-&gt;a_vp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zio *zio =3D vp-&gt;v_data;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount *tmp =3D zio-&gt;tmp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0accmode_t accmode =3D ap-&gt;a_accmode;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error =3D EPERM;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (accmode =3D=3D VREAD) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(t=
mp-&gt;vp, LK_SHARED);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)=
 {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0error =3D VOP_ACCESS(tmp-&gt;vp, accmode, ap-&gt;a_cred, ap-&gt;a=
_td);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0VOP_UNLOCK(tmp-&gt;vp);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIO, &quot;%s(%d) =3D %d\n&quot;, __f=
unc__, accmode, error);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);<br>
+}<br>
+<br>
+/*<br>
+ * VOP_GETATTR for zio node.<br>
+ */<br>
+static int<br>
+tarfs_zgetattr(struct vop_getattr_args *ap)<br>
+{<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vattr va;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode *vp =3D ap-&gt;a_vp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zio *zio =3D vp-&gt;v_data;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount *tmp =3D zio-&gt;tmp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vattr *vap =3D ap-&gt;a_vap;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error =3D 0;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0VATTR_NULL(vap);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(tmp-&gt;vp, LK_SHARED);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D VOP_GETAT=
TR(tmp-&gt;vp, &amp;va, ap-&gt;a_cred);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VOP_UNLOCK(tmp-&gt;=
vp);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)=
 {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_type =3D VREG;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_mode =3D va.va_mode;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_nlink =3D 1;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_gid =3D va.va_gid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_uid =3D va.va_uid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_fsid =3D vp-&gt;v_mount-&gt;mnt_stat.f_fsid.val[0];<br=
>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_fileid =3D TARFS_ZIOINO;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_size =3D zio-&gt;idx[zio-&gt;nidx - 1].o;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_blocksize =3D vp-&gt;v_mount-&gt;mnt_stat.f_iosize;<br=
>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_atime =3D va.va_atime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_ctime =3D va.va_ctime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_mtime =3D va.va_mtime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_birthtime =3D tmp-&gt;root-&gt;birthtime;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0vap-&gt;va_bytes =3D va.va_bytes;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIO, &quot;%s() =3D %d\n&quot;, __fun=
c__, error);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);<br>
+}<br>
+<br>
+#ifdef ZSTDIO<br>
+/*<br>
+ * VOP_READ for zio node, zstd edition.<br>
+ */<br>
+static int<br>
+tarfs_zread_zstd(struct tarfs_zio *zio, struct uio *uiop)<br>
+{<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0void *ibuf =3D NULL, *obuf =3D NULL, *rl =3D NU=
LL;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct uio auio;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct iovec aiov;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount *tmp =3D zio-&gt;tmp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zstd *zstd =3D zio-&gt;zstd;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct thread *td =3D curthread;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_inBuffer zib;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_outBuffer zob;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t zsize;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t ipos, opos;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t ilen, olen;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t zerror;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t off =3D uiop-&gt;uio_offset;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t len =3D uiop-&gt;uio_resid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t resid =3D uiop-&gt;uio_resid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t bsize;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0bool reset =3D false;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* do we have to rewind? */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (off &lt; zio-&gt;opos) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (zio-&gt;curi=
dx &gt; 0 &amp;&amp; off &lt; zio-&gt;idx[zio-&gt;curidx].o)<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zio-&gt;curidx--;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reset =3D true;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* advance to the nearest index entry */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (off &gt; zio-&gt;opos) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0// XXX maybe do a b=
inary search instead<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (zio-&gt;curi=
dx &lt; zio-&gt;nidx - 1 &amp;&amp;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0off &=
gt;=3D zio-&gt;idx[zio-&gt;curidx + 1].o) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zio-&gt;curidx++;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0reset =3D true;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* reset the decompression stream if needed */<=
br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (reset) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&gt;ipos =3D zi=
o-&gt;idx[zio-&gt;curidx].i;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&gt;opos =3D zi=
o-&gt;idx[zio-&gt;curidx].o;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_resetDStream(z=
std-&gt;zds);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIDX, &qu=
ot;%s: skipping to index %u =3D i %zu o %zu\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&=
gt;curidx, (size_t)zio-&gt;ipos, (size_t)zio-&gt;opos);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIDX, &qu=
ot;%s: continuing at i %zu o %zu\n&quot;, __func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size=
_t)zio-&gt;ipos, (size_t)zio-&gt;opos);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * Set up a temporary buffer for compressed dat=
a.=C2=A0 Use the size<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * recommended by the zstd library; this is usu=
ally 128 kB, but<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * just in case, make sure it&#39;s a multiple =
of the page size and no<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * larger than MAXBSIZE.<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0bsize =3D roundup(ZSTD_CStreamOutSize(), PAGE_S=
IZE);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (bsize &gt; MAXBSIZE)<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0bsize =3D MAXBSIZE;=
<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ibuf =3D malloc(bsize, M_TEMP, M_WAITOK);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zib.src =3D NULL;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zib.size =3D 0;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zib.pos =3D 0;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * Set up the decompression buffer.=C2=A0 If th=
e target is not in<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * kernel space, we will have to set up a bounc=
e buffer.<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 *<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * TODO: to avoid using a bounce buffer, map de=
stination pages<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * using vm_fault_quick_hold_pages().<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zio-&gt;opos &lt;=3D off);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(uiop-&gt;uio_iovcnt =3D=3D 1);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(uiop-&gt;uio_iov-&gt;iov_len &gt;=3D len)=
;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (uiop-&gt;uio_segflg =3D=3D UIO_SYSSPACE) {<=
br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zob.dst =3D uiop-&g=
t;uio_iov-&gt;iov_base;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ALLOC, &q=
uot;%s: allocating %zu-byte bounce buffer\n&quot;,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__fun=
c__, len);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zob.dst =3D obuf =
=3D malloc(len, M_TEMP, M_WAITOK);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zob.size =3D len;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zob.pos =3D 0;<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* lock tarball */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0rl =3D vn_rangelock_rlock(tmp-&gt;vp, zio-&gt;i=
pos, OFF_MAX);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(tmp-&gt;vp, LK_SHARED);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error !=3D 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail_unlocked;=
<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* check size */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_getsize_locked(tmp-&gt;vp, &amp;zs=
ize, td-&gt;td_ucred);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error !=3D 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (zio-&gt;ipos &gt;=3D zsize) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* beyond EOF */<br=
>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0while (resid &gt; 0) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zib.pos =3D=3D =
zib.size) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0/* request data from the underlying file */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0aiov.iov_base =3D ibuf;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0aiov.iov_len =3D bsize;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_iov =3D &amp;aiov;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_iovcnt =3D 1;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_offset =3D zio-&gt;ipos;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_segflg =3D UIO_SYSSPACE;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_rw =3D UIO_READ;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_resid =3D aiov.iov_len;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0auio.uio_td =3D td;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0error =3D VOP_READ(tmp-&gt;vp, &amp;auio,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0IO_DIRECT | IO_NODELOCKED,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0td-&gt;td_ucred);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0if (error !=3D 0)<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0TARFS_DPF(ZIO, &quot;%s: req %zu+%zu got %zu+%zu\n&quot;, __func_=
_,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0(size_t)zio-&gt;ipos, bsize,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0(size_t)zio-&gt;ipos, bsize - auio.uio_resid);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zib.src =3D ibuf;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zib.size =3D bsize - auio.uio_resid;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zib.pos =3D 0;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zib.pos &lt;=
=3D zib.size);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zib.pos =3D=3D =
zib.size) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0TARFS_DPF(ZIO, &quot;%s: end of file after i %zu o %zu\n&quot;, _=
_func__,<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0(size_t)zio-&gt;ipos, (size_t)zio-&gt;opos);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0goto fail;<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zio-&gt;opos &l=
t; off) {<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0/* to be discarded */<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zob.size =3D min(off - zio-&gt;opos, len);<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0zob.pos =3D 0;<br>
*** 3111 LINES SKIPPED ***<br>
<br>
</blockquote></div></div></div>

--000000000000931c1e05f3cfae44--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfsBdcsV9GqjUpKqhQ%2Bbk8q73GcaHM=9Bdwf34fziLwxuw>