Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Feb 2016 11:22:18 -0800
From:      Maxim Sobolev <sobomax@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, Kirk McKusick <mckusick@mckusick.com>
Subject:   Re: Inconsistency between lseek(SEEK_HOLE) and lseek(SEEK_DATA)
Message-ID:  <CAH7qZftsv_0ersqexJ0fTnSQexe4WvpMLnF6X9bj_wX6q9Ewfw@mail.gmail.com>
In-Reply-To: <20160201182257.GN91220@kib.kiev.ua>
References:  <CAH7qZfuZNZ%2BJDPC4D1sjXj2tFxZKBiYVyTp-Y3UUUoq9er%2BWYQ@mail.gmail.com> <20160201165648.GM91220@kib.kiev.ua> <CAH7qZfvcpBo%2BvDho4GeNYWh6N83sebUi-DSG9--T%2BnxQiLhJ1A@mail.gmail.com> <20160201182257.GN91220@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Well, it's still seems to be quite obscure. At the very least, the lseek(2)
manual page needs to reflect that. Right now it says:

ERRORS
[...]
     [ENXIO]            For SEEK_DATA, there are no more data regions past
the
                        supplied offset.  For SEEK_HOLE, there are no more
                        holes past the supplied offset.

Which is not true, the SEEK_HOLE would return st_size when there are no
more holes past the supplied offset, not ENXIO. It is also interesting that
somehow empty file is a special case as well. Both SEEK_HOLE and SEEK_DATA
return -1 on those. Anybody who programs to that document would probably
get as confused as myself.

However, having said that, our cousin Linux behaves the same - i.e. returns
EOF+1 on SEEK_HOLE and -1 on SEEK_DATA, and does the same for empty files,
so at least we are consistent with that.

-Max

On Mon, Feb 1, 2016 at 10:22 AM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Mon, Feb 01, 2016 at 09:17:49AM -0800, Maxim Sobolev wrote:
> > Here it is:
> >
> > The expected outcome is return code 0, the failure condition is in the
> > lseek() returning 4 (i.e. sizeof(int)), not -1.
> >
> > ------
> > #include <sys/stat.h>
> > #include <sys/types.h>
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <unistd.h>
> >
> > int main(void)
> > {
> >         char tempname[] = "/tmp/temp.XXXXXX";
> >         char *fname;
> >         int fd;
> >         off_t hole;
> >
> >         fname = mktemp(tempname);
> >         if (fname == NULL) {
> >             exit (1);
> >         }
> >         fd = open(fname, O_WRONLY | O_CREAT | O_TRUNC, DEFFILEMODE);
> >         if (fd == -1) {
> >             exit (1);
> >         }
> >         if (write(fd, &fd, sizeof(fd)) <= 0) {
> >             exit (1);
> >         }
> >         hole = lseek(fd, 0, SEEK_HOLE);
> >         close(fd);
> >         unlink(fname);
> >         if (hole >= 0) {
> >             fprintf(stderr, "lseek() returned %jd, not -1\n",
> > (intmax_t)hole);
> >             exit (1);
> >         }
> >         exit (0);
> > }
> > ------
> I tested you program on both UFS and ZFS, and the behaviour is
> identical, lseek(SEEK_HOLE) points to the end of file. In fact, when I
> did UFS implementation, I most likely considered this case and tested
> ZFS compatibility, because the case is handled explicitely. Look at the
> lines 2193-2197 in kern/vfs_vnops.c:vn_bmap_seekhole(), esp. the comment.
>
> For me, the results of the test are reasonable.  There is no data
> after EOF, and the idea of 'implicit hole' after EOF is one which
> is quite intuitive.
>
> >
> >
> > On Mon, Feb 1, 2016 at 8:56 AM, Konstantin Belousov <kostikbel@gmail.com
> >
> > wrote:
> >
> > > On Mon, Feb 01, 2016 at 07:57:40AM -0800, Maxim Sobolev wrote:
> > > > Hi,
> > > >
> > > > I've noticed that lseek() behaved inconsistently with regards to
> > > SEEK_HOLE
> > > > and SEEK_DATA operations. The SEEK_HOLE on a data-only file returns
> > > st_size
> > > > (i.e. EOF + 1), while the SEEK_DATA on a hole-only file returns -1
> and
> > > sets
> > > > errno to ENXIO. The latter seems to be a documented way to indicate
> that
> > > > the file has no more data sections past this point.
> > > >
> > > > My first idea was that somehow most files has a hole attached to its
> end
> > > to
> > > > fill up the FS block, but that does not seem to be a case. Trying to
> > > > SEEK_HOLE past the end of any of those data-only files produces an
> error
> > > > (i.e. lseek(fd, st_size, SEEK_HOLE) == -1).
> > > >
> > > > In short, for some reason I cannot get proper ENXIO from the
> SEEK_HOLE.
> > > > What currently returned implies that there is 1-byte hole attached to
> > > each
> > > > file past its EOF and that does not smell right.
> > > >
> > > > All tests are done on UFS, fairly recent 11-current.
> > > >
> > >
> > > There is no 'hole-only' files on UFS, the last byte in the UFS file
> must
> > > be populated, either by allocated fragment if the last byte is in the
> > > direct blocks range, or by the full block if in the indirect range.
> > >
> > > Please show an exact minimal test case which reproduces what you
> > > consider the bug, with the comment about the expected outcome in the
> > > failing location.
> > >
> > >
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZftsv_0ersqexJ0fTnSQexe4WvpMLnF6X9bj_wX6q9Ewfw>