Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Apr 2024 07:23:20 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        alan somers <asomers@gmail.com>
Cc:        Poul-Henning Kamp <phk@phk.freebsd.dk>, Alan Somers <asomers@freebsd.org>,  FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: SEEK_HOLE at EOF
Message-ID:  <CAM5tNy5PJ=zy5CGHHe0Zzefy16kB_8Fhj=wosCWZSApZDxGC%2BA@mail.gmail.com>
In-Reply-To: <CAOtMX2g2VxffUn0jGmc=BtcTP753-ake8nZgqCWXYUKN7JfqrA@mail.gmail.com>
References:  <CAOtMX2gaHkH7gRT1OWTNpZEcr13%2BiozicmUDZ1hEapT6oiXiuQ@mail.gmail.com> <202404050543.4355hDcS009860@critter.freebsd.dk> <CAOtMX2hfxQNrk1iPtq6snYnt0EzK_ffXm5b1TnkTLCYKgW6j3A@mail.gmail.com> <202404051354.435Ds1KX086243@critter.freebsd.dk> <CAOtMX2g2VxffUn0jGmc=BtcTP753-ake8nZgqCWXYUKN7JfqrA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 5, 2024 at 7:13=E2=80=AFAM alan somers <asomers@gmail.com> wrot=
e:
>
> On Fri, Apr 5, 2024 at 7:54=E2=80=AFAM Poul-Henning Kamp <phk@phk.freebsd=
.dk> wrote:
> >
> > --------
> > Alan Somers writes:
> > > On Thu, Apr 4, 2024 at 11:43=3DE2=3D80=3DAFPM Poul-Henning Kamp <phk@=
phk.freebsd.=3D
> > > dk> wrote:
> >
> > > > Just two minor quibbles:
> > > >
> > > > If the file position is EOF, then you /are/ "beyond the end of the =
file"
> > > > because a read(2) would not be able to return any data.
> > >
> > > Do you distinguish between "at EOF" and "beyond EOF"?
As a bit of an aside, NFSv4.2 does differentiate between "at EOF"
and "beyond EOF" for its Seek operation.
The fun part is that Linux did not implement what is in the RFC and shipped
to many before the "bug" was noticed (and still do not conform to the RFC
afaik). As such, there are now two ways to do it, The RFC way or the Linux
way. Selecting between them is what the sysctl vfs.nfsd.linux42server does.

> > >  And does it not
> > > trouble you that calling SEEK_HOLE from the beginning of the "virtual
> > > hole at EOF" will return ENXIO, even though calling SEEK_HOLE from th=
e
> > > beginning of any real hole will return the current offset?
> >
> > EOF is where the file ends and there's no "hole" there, because there
> > no more file on the other side of that "hole".
> >
> > When you stand on a cliff, the ocean is not "a hole in the landscape",
> > it's where the landscape ends.
>
> Except there is a hole at EOF, a virtual hole.  The draft spec
> specifically says "all seekable files shall have a virtual hole
> starting at the
> current size of the file".
I think that they used the term "virtual" to indicate this is not a real ho=
le
and I think it was a good idea, since it allows file systems that do not
support holes to support SEEK_DATA.

However, I still believe that conforming to the Austin Group draft is
preferable.

rick

>
> >
> > > > And returning ENXIO is more informative than returning the size of =
the
> > > > file, since it atomically tells you that there are no more holes.
> > >
> > > Ahh, that's a good point.  It's the first point I've heard in favor o=
f
> > > this option.  Are you aware of any applications that need to know
> > > that?
> >
> > No, but that should not get in the way of good syscall architecture :-)
> >
> > It might be useful for archivers which try to be smart about sparse fil=
es.
>
> I imagine that most archivers would work like this:
> ofs =3D 0
> loop {
>     let start =3D lseek(fd, ofs, SEEK_DATA);
>     if ENXIO {
>         // No more data regions
>         break
>     }
>     let end =3D lseek(fd, ofs, SEEK_HOLE);
>     assert!(!ENXIO) // thanks to the virtual hole, we should never
> have ENXIO here
>     copy(fd, start, end - start, ...)
>     ofs =3D end
> }
> truncate(output_file, fd.fsize)
>
> Since archivers really only care about data regions, not holes, I
> don't think that they would usually call SEEK_HOLE at EOF.
>
> >
> > --
> > Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> > phk@FreeBSD.ORG         | TCP/IP since RFC 956
> > FreeBSD committer       | BSD since 4.3-tahoe
> > Never attribute to malice what can adequately be explained by incompete=
nce.
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy5PJ=zy5CGHHe0Zzefy16kB_8Fhj=wosCWZSApZDxGC%2BA>