Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Apr 2024 08:13:18 -0600
From:      alan somers <asomers@gmail.com>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        Alan Somers <asomers@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: SEEK_HOLE at EOF
Message-ID:  <CAOtMX2g2VxffUn0jGmc=BtcTP753-ake8nZgqCWXYUKN7JfqrA@mail.gmail.com>
In-Reply-To: <202404051354.435Ds1KX086243@critter.freebsd.dk>
References:  <CAOtMX2gaHkH7gRT1OWTNpZEcr13%2BiozicmUDZ1hEapT6oiXiuQ@mail.gmail.com> <202404050543.4355hDcS009860@critter.freebsd.dk> <CAOtMX2hfxQNrk1iPtq6snYnt0EzK_ffXm5b1TnkTLCYKgW6j3A@mail.gmail.com> <202404051354.435Ds1KX086243@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 5, 2024 at 7:54=E2=80=AFAM Poul-Henning Kamp <phk@phk.freebsd.d=
k> wrote:
>
> --------
> Alan Somers writes:
> > On Thu, Apr 4, 2024 at 11:43=3DE2=3D80=3DAFPM Poul-Henning Kamp <phk@ph=
k.freebsd.=3D
> > dk> wrote:
>
> > > Just two minor quibbles:
> > >
> > > If the file position is EOF, then you /are/ "beyond the end of the fi=
le"
> > > because a read(2) would not be able to return any data.
> >
> > Do you distinguish between "at EOF" and "beyond EOF"?  And does it not
> > trouble you that calling SEEK_HOLE from the beginning of the "virtual
> > hole at EOF" will return ENXIO, even though calling SEEK_HOLE from the
> > beginning of any real hole will return the current offset?
>
> EOF is where the file ends and there's no "hole" there, because there
> no more file on the other side of that "hole".
>
> When you stand on a cliff, the ocean is not "a hole in the landscape",
> it's where the landscape ends.

Except there is a hole at EOF, a virtual hole.  The draft spec
specifically says "all seekable files shall have a virtual hole
starting at the
current size of the file".

>
> > > And returning ENXIO is more informative than returning the size of th=
e
> > > file, since it atomically tells you that there are no more holes.
> >
> > Ahh, that's a good point.  It's the first point I've heard in favor of
> > this option.  Are you aware of any applications that need to know
> > that?
>
> No, but that should not get in the way of good syscall architecture :-)
>
> It might be useful for archivers which try to be smart about sparse files=
.

I imagine that most archivers would work like this:
ofs =3D 0
loop {
    let start =3D lseek(fd, ofs, SEEK_DATA);
    if ENXIO {
        // No more data regions
        break
    }
    let end =3D lseek(fd, ofs, SEEK_HOLE);
    assert!(!ENXIO) // thanks to the virtual hole, we should never
have ENXIO here
    copy(fd, start, end - start, ...)
    ofs =3D end
}
truncate(output_file, fd.fsize)

Since archivers really only care about data regions, not holes, I
don't think that they would usually call SEEK_HOLE at EOF.

>
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetenc=
e.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2g2VxffUn0jGmc=BtcTP753-ake8nZgqCWXYUKN7JfqrA>