Date: Thu, 13 Jun 2019 21:44:01 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org> Cc: "kib@freebsd.org" <kib@FreeBSD.org>, Alan Somers <asomers@freebsd.org>, Brooks Davis <brooks@freebsd.org> Subject: RFC: should the copy_file_range() syscall be atomic? Message-ID: <YQXPR01MB3128EA1CB774995DD5C774FCDDEF0@YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM>
next in thread | raw e-mail | index | archive | help
When I first wrote the copy_file_range() syscall, the updating of the file = was atomic, because I locked both vnodes while doing it. kib@ quickly pointed out that this could introduce a LOR/deadlock because t= wo userland threads could concurrently do the syscall with the file arguments = reversed. It turns out that locking two VREG vnodes concurrently to do this isn't eas= y and would require the implementation of non-blocking versions of: vn_rangelock_rlock() and vn_rangelock_wlock() - I am not sure how difficult doing this is, but I'll admit I'd rather no= t do this. Also, having the vnodes locked precludes use of VOP_IOCTL(..FIOSEEKDATA/ FIOSEEKHOLE..) calls to find holes in the byte range of the file being copi= ed from. Without the vnodes locked, it is possible for other threads to write to eit= her of the files concurrently with the copy_file_range(), resulting in an indeterminat= e results. (cp(1) has never guaranteed atomic copying of files, so is it needed in thi= s syscall?) In summary, doing the syscall non-atomically has the advantages of: - vn_rdwr() takes care of the vnode locking issues, so any changes w.r.t. l= ocking wouldn't require changes to this syscall's code. - VOP_IOCTL() could be used to find holes. - No new rangelock primitives need to be added to the system. - If there were some combination of file system/storage unit where copying non-overlapping byte ranges concurrently could result in better performan= ce, then that could be done. (An atomic syscall would serialize them.) The advantage of an atomic syscall would be consistency with write(2) and r= ead(2) behaviour. The following comments are copied from phabricator: kib@ - So you relock range for each chunk ? This defeats the purpose of the= range locking. Should copy_file_range() be atomic WRT other reads and writ= es ? asomers@ - That has an unfortunate side effect: copy_file_range is no longe= r atomic if you drop the vnode locks and range locks in the middle. It woul= d be possible for two copy_file_range operations to proceed concurrently, l= eaving the file in a state where each of the operations was partially succe= ssful. A better solution would be to add rangelock_trywlock and rangelock_t= ryrlock. I don't think it would be very hard (testing them, however, could = be). I don't see anything in the Linux man page w.r.t. atomicity, so I am now as= king what others think? (I'll admit I'm biased towards non-atomic, since I have already coded it an= d can use the VOP_IOCTL() calls to find the holes in the input file, but...) rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR01MB3128EA1CB774995DD5C774FCDDEF0>