Date: Mon, 10 Nov 2025 01:02:54 -0800 From: Rick Macklem <rick.macklem@gmail.com> To: Don Lewis <truckman@freebsd.org> Cc: Ronald Klop <ronald-lists@klop.ws>, "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org>, FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: RFC: Should copy_file_range(2) return after a few seconds? Message-ID: <CAM5tNy7vEb_QA5XpBqJ5LWP_A=7_OOt6skXMNwDawhDJXJ8FoA@mail.gmail.com> In-Reply-To: <tkrat.b03871bab7986d98@FreeBSD.org> References: <CAM5tNy4cpC0a_Bgngi_wJt_h_FwoVnDT5c3ozr7b4O_M0Kx5pA@mail.gmail.com> <2100145914.14642.1762672441817@localhost> <CAM5tNy6-=BqcwpsC==QemJua70taAYFYB0=4P3LaO53TKoiy8Q@mail.gmail.com> <tkrat.b03871bab7986d98@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 10, 2025 at 12:15 AM Don Lewis <truckman@freebsd.org> wrote: > > On 9 Nov, Rick Macklem wrote: > > On Sat, Nov 8, 2025 at 11:14 PM Ronald Klop <ronald-lists@klop.ws> wrote: > >> > >> > >> Van: Rick Macklem <rick.macklem@gmail.com> > >> Datum: 9 november 2025 00:23 > >> Aan: FreeBSD CURRENT <freebsd-current@freebsd.org> > >> CC: Peter 'PMc' Much <pmc@citylink.dinoex.sub.org> > >> Onderwerp: RFC: Should copy_file_range(2) return after a few seconds? > >> > >> Hi, > >> > >> Peter Much reported a problem on the freebsd-fs@ mailing > >> list on Oct. 21 under the Subject: "Why does rangelock_enqueue() > >> hang for hours?". > >> > >> The problem was that he had a copy_file_range(2) copying > >> between a large NFS file and a local file that was taking 2hrs. > >> While this copy_file_range(2) was in progress, it was holding > >> a rangelock for the entire output file, causing another process > >> trying to read the output file to hang, waiting for the rangelock. > >> > >> Since copy_file_range(2) is not any standard (just trying to > >> emulate the Linux one), there is no definitive answer w.r.t. > >> should it hold rangelocks. However, that is how it is currently > >> coded and I, personally, think it is appropriate to do so. > >> > >> Having a copy_file_range(2) syscall take two hours is > >> definitely an unusual case, but it does seem that it is > >> excessive? > >> > >> Peter tried a quick patch I gave him that limited the > >> copy_file_range(2) to 1sec and it fixed the problem > >> he was observing. > >> > >> Which brings me to the question... > >> Should copy_file_range(2) be time limited? > >> And, if the answer to this is "yes", how long do > >> you think the time limit should be? > >> (1sec, 2-5sec or ??) > >> > >> Note that the longer you allow copy_file_range(2) > >> to continue, the more efficient it will be. > >> > >> Thanks in advance for any comments, rick > >> > >> ________________________________ > >> > >> > >> > >> Why is this locking needed? > >> AFAIK Unix has advisory locking, so if you read a file somebody else is writing the result is your own problem. It is up to the applications to adhere to the locking. > >> Is this a lock different than file locking from user space? > > Yes. A rangelock is used for a byte range during a read(2) or > > write(2) to ensure that they are serialized. This is a POSIX > > requirement. (See this post by kib@ in the original email > > discussion. https://lists.freebsd.org/archives/freebsd-fs/2025-October/004704.html) > > > > Since there is no POSIX standard for copy_file_range(), it could > > be argued that range locking isn't required for copy_file_range(), > > but that makes it inconsistent with read(2)/write(2) behaviour. > > (I, personally, am more comfortable with a return after N sec > > than removing the range locking, but that's just my opinion.) > > > > rick > > > >> Why can’t this tail a file that is being written by copy_file_range if none of the applications request a lock? > > Since writes don't go backwards, it would seem to make sense to advance > the start of the range lock as the copy proceeds. The current code does the rangelock above the VOP layer and, for ZFS, if block cloning is enabled, the entire copy happens all at once and fairly quickly (it's copy on write as I understand it). I can't recall for certain, but I think the rangelock must be acquired before the vnode lock(s), so I don't think moving it to below the VOP layer is practical? rick > As long as the read > position + length is before the write position, there is no reason to > block the read. Running "cat outfile" would look a lot like tail -f > because cat would only see the new data because it would temporarily > block if it ever caught up with the copy. > > tail is a bit funky, though. If the size of the destination file is > updated periodically during the copy, tail could return early with an > earlier part of the file. If the size is updated immediately to the > final size, then tail will wait for the copy to complete, but will > output the true end of the file. > > What about backups?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy7vEb_QA5XpBqJ5LWP_A=7_OOt6skXMNwDawhDJXJ8FoA>
