Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 May 2016 15:56:33 -0700
From:      Conrad Meyer <cem@FreeBSD.org>
To:        Dirk Engling <erdgeist@erdgeist.org>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: read(2) and thus bsdiff is limited to 2^31 bytes
Message-ID:  <CAG6CVpWb7nvX%2BLFpLizkSx8Y-deXfXiWi=rL56iGZ71YPhmLbw@mail.gmail.com>
In-Reply-To: <b2515cae-b75d-66e9-4207-3cf100ab3ab0@erdgeist.org>
References:  <b2515cae-b75d-66e9-4207-3cf100ab3ab0@erdgeist.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, May 22, 2016 at 1:54 PM, Dirk Engling <erdgeist@erdgeist.org> wrote:
> When trying to bsdiff two DVD images, I noticed it failing due to
> read(2) returning EINVAL to the tool. man 2 read says, this would only
> happen for a negative value for fildes, which clearly was not true.

Actually, it's documented at the very bottom of the first section:

ERRORS
     The read(), readv(), pread() and preadv() system calls will succeed
     unless:
...
     [EINVAL]           The value nbytes is greater than INT_MAX.

It does seem silly to me given nbytes is a size_t.  I think it should
error if nbytes is greater than SSIZE_T_MAX, but on platforms where
size_t is larger than int (e.g. amd64) it shouldn't error for nbytes
in [INT_MAX, SSIZE_T_MAX - 1].

As far as I can tell, this INT_MAX behavior is not required by POSIX.

> After more digging I found that read internally wraps a single call to
> readv, preparing a temporary struct iovec. man 2 readv in turn says that
> it will fail with EINVAL, if
>
> The sum of the iov_len values in the iov array overflowed a 32-bit integer.
>
> I saw the same behaviour on a linux system, so I kind of assume there is
> a standard that allows read(2) doing that. Still I think that
>
> 1) the man page must be corrected to match this behaviour, or
> 2) the read(2) syscall must wrap multiple calls to readv
>
> However, the http://www.daemonology.net/bsdiff/ page claims that:
>
> Providing that off_t is defined properly, bsdiff and bspatch support
> files of up to 2^61-1 = 2Ei-1 bytes.
>
> which I could not confirm on any system. I could easily fix this by
> using mmap instead of read to get pointers to file contents.
>
> Now, where should I start?

I think read(2) could be fixed to not exhibit this behavior.  Or you
could change the application to loop INT_MAX or smaller reads.

Best,
Conrad



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6CVpWb7nvX%2BLFpLizkSx8Y-deXfXiWi=rL56iGZ71YPhmLbw>