From owner-freebsd-fs@freebsd.org Thu Jun 13 22:08:24 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A932715BE8C6 for ; Thu, 13 Jun 2019 22:08:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 149C7705E2; Thu, 13 Jun 2019 22:08:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x5DM8F52050195 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 14 Jun 2019 01:08:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x5DM8F52050195 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x5DM8F4g050192; Fri, 14 Jun 2019 01:08:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 14 Jun 2019 01:08:15 +0300 From: Konstantin Belousov To: Rick Macklem Cc: "freebsd-fs@freebsd.org" , Alan Somers , Brooks Davis Subject: Re: RFC: should the copy_file_range() syscall be atomic? Message-ID: <20190613220815.GB8697@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.0 (2019-05-25) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2019 22:08:24 -0000 On Thu, Jun 13, 2019 at 09:44:01PM +0000, Rick Macklem wrote: > When I first wrote the copy_file_range() syscall, the updating of the file was atomic, > because I locked both vnodes while doing it. > kib@ quickly pointed out that this could introduce a LOR/deadlock because two > userland threads could concurrently do the syscall with the file arguments reversed. > > It turns out that locking two VREG vnodes concurrently to do this isn't easy and > would require the implementation of non-blocking versions of: > vn_rangelock_rlock() and vn_rangelock_wlock() > - I am not sure how difficult doing this is, but I'll admit I'd rather not do this. I will help you with this. It should be relatively easy, even if requires some code re-shuffling. > > Also, having the vnodes locked precludes use of VOP_IOCTL(..FIOSEEKDATA/ > FIOSEEKHOLE..) calls to find holes in the byte range of the file being copied from. If you lock ranges, you still can do ioctl. But in fact it might be better to extract wrapper for FIOSEEK* into kernel function, and use it in the ioctl handler and in the copy syscall. > > Without the vnodes locked, it is possible for other threads to write to either of the > files concurrently with the copy_file_range(), resulting in an indeterminate results. > (cp(1) has never guaranteed atomic copying of files, so is it needed in this syscall?) > In summary, doing the syscall non-atomically has the advantages of: > - vn_rdwr() takes care of the vnode locking issues, so any changes w.r.t. locking > wouldn't require changes to this syscall's code. > - VOP_IOCTL() could be used to find holes. > - No new rangelock primitives need to be added to the system. > - If there were some combination of file system/storage unit where copying > non-overlapping byte ranges concurrently could result in better performance, > then that could be done. (An atomic syscall would serialize them.) > > The advantage of an atomic syscall would be consistency with write(2) and read(2) > behaviour. > > The following comments are copied from phabricator: > kib@ - So you relock range for each chunk ? This defeats the purpose of the range locking. Should copy_file_range() be atomic WRT other reads and writes ? > > asomers@ - That has an unfortunate side effect: copy_file_range is no longer atomic if you drop the vnode locks and range locks in the middle. It would be possible for two copy_file_range operations to proceed concurrently, leaving the file in a state where each of the operations was partially successful. A better solution would be to add rangelock_trywlock and rangelock_tryrlock. I don't think it would be very hard (testing them, however, could be). > > I don't see anything in the Linux man page w.r.t. atomicity, so I am now asking what > others think? > (I'll admit I'm biased towards non-atomic, since I have already coded it and can use > the VOP_IOCTL() calls to find the holes in the input file, but...) AFAIK, linux does not provide atomicity for file reads and writes at all, even for normal reads and writes. I do not want to read Linux code to confirm this.