From owner-freebsd-hackers@freebsd.org Sun May 22 22:56:36 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0ACF2B3B137 for ; Sun, 22 May 2016 22:56:36 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D3612187D for ; Sun, 22 May 2016 22:56:35 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-ig0-f171.google.com with SMTP id l10so15964881igk.0 for ; Sun, 22 May 2016 15:56:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:cc; bh=qmptOknLy2XKuWHp00djqmwbtxTmDLhLQtxG7g8YlVI=; b=PjrL/n9j0XMyUoJoptjU6BSVZPCQBGF6+J1yHKsAHCGCOevdkrGpQeBhjo3nBARLBI Rj2fxAhqyGXZSahY0h4XX5ZQkWf2AkAI3UhrbklYD7WthjP7qVvB5+vVXvk9sebSFe/W aRpLj6AqcTD7Z4vufp/x0q7zc1AyY+JKmzq8x6eq/5sTNopO7XX3+sydoP+f1vfDIwsM ZGNEr1W1e7jvCKvx0dvEQkjlRAsrdUijlUcYV8N65TczB/4rQuWPuifNvHVF99rxTAdB phLL1CtavfF1bry1OtlUHa3ejgki21ud9yQzZDB6xYd1H4qXXH+fXg5hU7tiqA6vHA5O 295Q== X-Gm-Message-State: ALyK8tIYz8njkAW5GvH/Xfb/U7k/mHh0X6LMl5s1XAXJWOVuXw5czTRIvnai0sGcC/qQNw== X-Received: by 10.50.7.100 with SMTP id i4mr2838033iga.69.1463957794726; Sun, 22 May 2016 15:56:34 -0700 (PDT) Received: from mail-ig0-f172.google.com (mail-ig0-f172.google.com. [209.85.213.172]) by smtp.gmail.com with ESMTPSA id yb2sm2917107igc.9.2016.05.22.15.56.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 May 2016 15:56:34 -0700 (PDT) Received: by mail-ig0-f172.google.com with SMTP id ww4so24153526igb.1 for ; Sun, 22 May 2016 15:56:34 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.50.244.180 with SMTP id xh20mr10175035igc.48.1463957793978; Sun, 22 May 2016 15:56:33 -0700 (PDT) Reply-To: cem@FreeBSD.org Received: by 10.36.205.70 with HTTP; Sun, 22 May 2016 15:56:33 -0700 (PDT) In-Reply-To: References: Date: Sun, 22 May 2016 15:56:33 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: read(2) and thus bsdiff is limited to 2^31 bytes From: Conrad Meyer To: Dirk Engling Cc: FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 May 2016 22:56:36 -0000 On Sun, May 22, 2016 at 1:54 PM, Dirk Engling wrote: > When trying to bsdiff two DVD images, I noticed it failing due to > read(2) returning EINVAL to the tool. man 2 read says, this would only > happen for a negative value for fildes, which clearly was not true. Actually, it's documented at the very bottom of the first section: ERRORS The read(), readv(), pread() and preadv() system calls will succeed unless: ... [EINVAL] The value nbytes is greater than INT_MAX. It does seem silly to me given nbytes is a size_t. I think it should error if nbytes is greater than SSIZE_T_MAX, but on platforms where size_t is larger than int (e.g. amd64) it shouldn't error for nbytes in [INT_MAX, SSIZE_T_MAX - 1]. As far as I can tell, this INT_MAX behavior is not required by POSIX. > After more digging I found that read internally wraps a single call to > readv, preparing a temporary struct iovec. man 2 readv in turn says that > it will fail with EINVAL, if > > The sum of the iov_len values in the iov array overflowed a 32-bit integer. > > I saw the same behaviour on a linux system, so I kind of assume there is > a standard that allows read(2) doing that. Still I think that > > 1) the man page must be corrected to match this behaviour, or > 2) the read(2) syscall must wrap multiple calls to readv > > However, the http://www.daemonology.net/bsdiff/ page claims that: > > Providing that off_t is defined properly, bsdiff and bspatch support > files of up to 2^61-1 = 2Ei-1 bytes. > > which I could not confirm on any system. I could easily fix this by > using mmap instead of read to get pointers to file contents. > > Now, where should I start? I think read(2) could be fixed to not exhibit this behavior. Or you could change the application to loop INT_MAX or smaller reads. Best, Conrad