From owner-freebsd-hackers@freebsd.org Mon May 23 14:31:25 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D413FB470C9 for ; Mon, 23 May 2016 14:31:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6B68A1EC6 for ; Mon, 23 May 2016 14:31:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u4NEVJ3N034058 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Mon, 23 May 2016 17:31:20 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u4NEVJ3N034058 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u4NEVJi7034057 for freebsd-hackers@freebsd.org; Mon, 23 May 2016 17:31:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 23 May 2016 17:31:19 +0300 From: Konstantin Belousov To: freebsd-hackers@freebsd.org Subject: Re: read(2) and thus bsdiff is limited to 2^31 bytes Message-ID: <20160523143119.GV89104@kib.kiev.ua> References: <20160522225414.GB24398@britannica.bec.de> <154dab43060.11208cdfd132112.2616144627831899155@nextbsd.org> <20160522231203.GB25503@britannica.bec.de> <154db353935.dd5e87c1133922.4370692881788049491@nextbsd.org> <20160523122131.GC8747@britannica.bec.de> <5a607409-1b98-8944-b1f2-4422b1d28248@erdgeist.org> <20160523133842.GA17056@britannica.bec.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160523133842.GA17056@britannica.bec.de> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 May 2016 14:31:25 -0000 On Mon, May 23, 2016 at 03:38:42PM +0200, Joerg Sonnenberger wrote: > On Mon, May 23, 2016 at 02:36:58PM +0200, Dirk Engling wrote: > > On 23.05.16 14:21, Joerg Sonnenberger wrote: > > > > > Atomic meaning in this context that the read can be observed either > > > completely or not at all. This still doesn't mean that read must > > > execute the full size. Other cases for short read/writes are socket, > > > pipes etc. > > > > On linux I found read() returning a short read, however I wonder if any > > user land application developer ever expects a read from local file to > > yield a short read and continue reading. Maybe I should scan base system > > sources for all occurrences of read. > > They have to. Consider a signal interrupting the read. FreeBSD ensures, at least for some filesystems, that reads are atomic WRT writes, by your definition of atomic. Previously, it was (mostly) ensured by keeping exclusive vnode lock around VOP_WRITE, and shared vnode lock around VOP_READ. Then ZFS was changed to only keep shared lock on write, but supposedly there was an internal range locking, preventing reads from starting if write happens for the intersecting range. Then UFS was modified to sometimes split read/write requests into smaller VOP calls and drop vnode locks between them. This was done to prevent recursing info VM/VFS on page faults during uiomove(9) from VOPs. As a compensation, VFS-level rangelocks were introduced for UFS only. And then, quite recently, ZFS was changed to operate in the same chunked mode as UFS and, implicitely, the same VFS rangelocks are currently applied for each read and write requests on both UFS and ZFS. But none of the local filesystems allow signals to interrupt the operations. Pending signal never results in the short read or write neither on UFS nor on ZFS (and msdosfs too). It might be allowed for NFS by a mount option.