Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 May 2016 17:31:19 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: read(2) and thus bsdiff is limited to 2^31 bytes
Message-ID:  <20160523143119.GV89104@kib.kiev.ua>
In-Reply-To: <20160523133842.GA17056@britannica.bec.de>
References:  <b2515cae-b75d-66e9-4207-3cf100ab3ab0@erdgeist.org> <20160522225414.GB24398@britannica.bec.de> <154dab43060.11208cdfd132112.2616144627831899155@nextbsd.org> <20160522231203.GB25503@britannica.bec.de> <154db353935.dd5e87c1133922.4370692881788049491@nextbsd.org> <20160523122131.GC8747@britannica.bec.de> <5a607409-1b98-8944-b1f2-4422b1d28248@erdgeist.org> <20160523133842.GA17056@britannica.bec.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 23, 2016 at 03:38:42PM +0200, Joerg Sonnenberger wrote:
> On Mon, May 23, 2016 at 02:36:58PM +0200, Dirk Engling wrote:
> > On 23.05.16 14:21, Joerg Sonnenberger wrote:
> > 
> > > Atomic meaning in this context that the read can be observed either
> > > completely or not at all. This still doesn't mean that read must
> > > execute the full size. Other cases for short read/writes are socket,
> > > pipes etc.
> > 
> > On linux I found read() returning a short read, however I wonder if any
> > user land application developer ever expects a read from local file to
> > yield a short read and continue reading. Maybe I should scan base system
> > sources for all occurrences of read.
> 
> They have to. Consider a signal interrupting the read.

FreeBSD ensures, at least for some filesystems, that reads are atomic
WRT writes, by your definition of atomic.  Previously, it was (mostly)
ensured by keeping exclusive vnode lock around VOP_WRITE, and shared
vnode lock around VOP_READ.

Then ZFS was changed to only keep shared lock on write, but supposedly
there was an internal range locking, preventing reads from starting if
write happens for the intersecting range.

Then UFS was modified to sometimes split read/write requests into smaller
VOP calls and drop vnode locks between them.  This was done to prevent
recursing info VM/VFS on page faults during uiomove(9) from VOPs.
As a compensation, VFS-level rangelocks were introduced for UFS only.

And then, quite recently, ZFS was changed to operate in the same chunked
mode as UFS and, implicitely, the same VFS rangelocks are currently
applied for each read and write requests on both UFS and ZFS.

But none of the local filesystems allow signals to interrupt the
operations. Pending signal never results in the short read or write
neither on UFS nor on ZFS (and msdosfs too). It might be allowed for NFS
by a mount option.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160523143119.GV89104>