Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Jun 2019 01:08:15 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>, Alan Somers <asomers@freebsd.org>, Brooks Davis <brooks@freebsd.org>
Subject:   Re: RFC: should the copy_file_range() syscall be atomic?
Message-ID:  <20190613220815.GB8697@kib.kiev.ua>
In-Reply-To: <YQXPR01MB3128EA1CB774995DD5C774FCDDEF0@YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM>
References:  <YQXPR01MB3128EA1CB774995DD5C774FCDDEF0@YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 13, 2019 at 09:44:01PM +0000, Rick Macklem wrote:
> When I first wrote the copy_file_range() syscall, the updating of the file was atomic,
> because I locked both vnodes while doing it.
> kib@ quickly pointed out that this could introduce a LOR/deadlock because two
> userland threads could concurrently do the syscall with the file arguments reversed.
> 
> It turns out that locking two VREG vnodes concurrently to do this isn't easy and
> would require the implementation of non-blocking versions of:
>   vn_rangelock_rlock() and vn_rangelock_wlock()
>   - I am not sure how difficult doing this is, but I'll admit I'd rather not do this.
I will help you with this.  It should be relatively easy, even if requires
some code re-shuffling.

> 
> Also, having the vnodes locked precludes use of VOP_IOCTL(..FIOSEEKDATA/
> FIOSEEKHOLE..) calls to find holes in the byte range of the file being copied from.
If you lock ranges, you still can do ioctl.  But in fact it might be better
to extract wrapper for FIOSEEK* into kernel function, and use it in the
ioctl handler and in the copy syscall.

> 
> Without the vnodes locked, it is possible for other threads to write to either of the
> files concurrently with the copy_file_range(), resulting in an indeterminate results.
> (cp(1) has never guaranteed atomic copying of files, so is it needed in this syscall?)
> In summary, doing the syscall non-atomically has the advantages of:
> - vn_rdwr() takes care of the vnode locking issues, so any changes w.r.t. locking
>   wouldn't require changes to this syscall's code.
> - VOP_IOCTL() could be used to find holes.
> - No new rangelock primitives need to be added to the system.
> - If there were some combination of file system/storage unit where copying
>   non-overlapping byte ranges concurrently could result in better performance,
>   then that could be done. (An atomic syscall would serialize them.)
> 
> The advantage of an atomic syscall would be consistency with write(2) and read(2)
> behaviour.
> 
> The following comments are copied from phabricator:
> kib@ - So you relock range for each chunk ? This defeats the purpose of the range locking. Should copy_file_range() be atomic WRT other reads and writes ?
> 
> asomers@ - That has an unfortunate side effect: copy_file_range is no longer atomic if you drop the vnode locks and range locks in the middle. It would be possible for two copy_file_range operations to proceed concurrently, leaving the file in a state where each of the operations was partially successful. A better solution would be to add rangelock_trywlock and rangelock_tryrlock. I don't think it would be very hard (testing them, however, could be).
> 
> I don't see anything in the Linux man page w.r.t. atomicity, so I am now asking what
> others think?
> (I'll admit I'm biased towards non-atomic, since I have already coded it and can use
>  the VOP_IOCTL() calls to find the holes in the input file, but...)

AFAIK, linux does not provide atomicity for file reads and writes at all,
even for normal reads and writes.  I do not want to read Linux code to
confirm this.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190613220815.GB8697>