From owner-freebsd-hackers@freebsd.org Sun May 22 21:01:16 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0F275B45CCE for ; Sun, 22 May 2016 21:01:16 +0000 (UTC) (envelope-from erdgeist@erdgeist.org) Received: from elektropost.org (elektropost.org [217.115.13.198]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 508901110 for ; Sun, 22 May 2016 21:01:14 +0000 (UTC) (envelope-from erdgeist@erdgeist.org) Received: (qmail 78938 invoked from network); 22 May 2016 20:54:30 -0000 Received: from elektropost.org (HELO elektropost.org) (erdgeist@erdgeist.org) by elektropost.org with ESMTPS (DHE-RSA-AES128-SHA encrypted); 22 May 2016 20:54:30 -0000 To: freebsd-hackers@freebsd.org From: Dirk Engling Subject: read(2) and thus bsdiff is limited to 2^31 bytes Message-ID: Date: Sun, 22 May 2016 22:54:30 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 May 2016 21:01:16 -0000 When trying to bsdiff two DVD images, I noticed it failing due to read(2) returning EINVAL to the tool. man 2 read says, this would only happen for a negative value for fildes, which clearly was not true. After more digging I found that read internally wraps a single call to readv, preparing a temporary struct iovec. man 2 readv in turn says that it will fail with EINVAL, if The sum of the iov_len values in the iov array overflowed a 32-bit integer. I saw the same behaviour on a linux system, so I kind of assume there is a standard that allows read(2) doing that. Still I think that 1) the man page must be corrected to match this behaviour, or 2) the read(2) syscall must wrap multiple calls to readv However, the http://www.daemonology.net/bsdiff/ page claims that: Providing that off_t is defined properly, bsdiff and bspatch support files of up to 2^61-1 = 2Ei-1 bytes. which I could not confirm on any system. I could easily fix this by using mmap instead of read to get pointers to file contents. Now, where should I start? erdgeist