Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Oct 2025 17:49:27 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org>, Bakul Shah <bakul@iitbombay.org>, freebsd-fs@freebsd.org
Subject:   Re: Why does rangelock_enqueue() hang for hours?
Message-ID:  <aPpAdzESTgGoiYh0@kib.kiev.ua>
In-Reply-To: <CAM5tNy7mFNo4=7C7GqdXeivyYTFkZ5ebR=PkHoYUF-g2zyKKuw@mail.gmail.com>
References:  <aPeEVSamcdKSoF5N@disp.intra.daemon.contact> <CAM5tNy6n5QJpRpkCJFHK=_5dWT=EZqkjztAfZHdqREKYSby55g@mail.gmail.com> <BC3B5D56-7DAD-4090-A25F-91640CFEE28D@iitbombay.org> <CAM5tNy5od=uNAy4ysXSjXBgV1M38mAJvXqtOVuTeaS_JKZe_PQ@mail.gmail.com> <aPn6IyIG5BWy5sHu@disp.intra.daemon.contact> <CAM5tNy7mFNo4=7C7GqdXeivyYTFkZ5ebR=PkHoYUF-g2zyKKuw@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail

On Thu, Oct 23, 2025 at 07:21:56AM -0700, Rick Macklem wrote:
> On Thu, Oct 23, 2025 at 2:54 AM Peter 'PMc' Much
> <pmc@citylink.dinoex.sub.org> wrote:
> >
> > On Wed, Oct 22, 2025 at 08:52:00AM -0700, Rick Macklem wrote:
> > ! On Tue, Oct 21, 2025 at 7:50 AM Bakul Shah <bakul@iitbombay.org> wrote:
> > ! >
> > ! > I didn't read this thread before commenting on the forum where Peter
> > ! > first raised this issue. Adding the relevant part of my comment here:
> > ! > +---
> > ! > By git blame cat.c we find it was added on 2023-07-08 in commit 8113cc8276.
> > ! > git log 8113cc8276 says
> > ! >   cat: use copy_file_range(2) with fallback to previous behavior
> > ! >
> > ! >   This allows to use special filesystem features like server-side
> > ! >   copying on NFS 4.2 or block cloning on OpenZFS 2.2.
> > ! >
> > ! > May be it should check that these conditions are met? That is, both files should be
> > ! > remote or both files should be local for it to be really worth it. In any case IMHO
> > ! > this should not be the default behavior. Still, it should not hang....
> > ! Peter, you could try the attached trivial patch (untested).
> > !
> > ! I'm not sure if this is a reasonable thing to do, but at least you can report
> > ! back to let us know if it fixes your problem?
> >
> >
> > Hi Rick,
> >
> >   I tested the patch. And I did somehting more, like
> > trying to update my linux installation (which was unpleasant
> > and didn't fully succeed) and have a look there. See below.
> >
> > The patch helps. Things on the writing side now look like this:
> >
> > ...
> > 1.409706711 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> > 1.216986006 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> > 1.219576946 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 393216 (0x60000)
> > 1.025836739 copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 262144 (0x40000)
> > ...
> >
> > More interesting, the read access runs immediately, it does not wait
> > for that one second to find a gap.
> I suspect that was because you did it after the first copy_file_range() call.
> The 2nd and subsequent calls would not start at offset 0, so the rangelock
> would not start at offset 0 either.
> 
> >
> > But, I am still wondering: why do we do this? And then I found,
> > Linux (6.12.38+kali-amd64) does not do it:
> >
> >
> > $ strace cp XX XY
> > ...
> > copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0^Z
> > [1]+ Stopped                  strace cp XX XY
> > $ cp XY XZ
> > $
> >
> > This does not block. And it does not split the copy_file_range()
> > into chunks. FreeBSD 14.3 does block at this point.
> As you probably already know, there is no standard for copy_file_range(2).
> When I did it, the aim was to be Linux compatible, but I guess it is
> no surprise that it isn't 100% compatible. (The Linux copy_file_range(2)
> is a moving target. It started out as a libc function and then its semantics
> changed significantly at some Linux version. I've forgotten which version.
> Prior to that version, a copy_file_range() with a len argument that went
> past EOF was not allowed, if I recall correctly?)
> 
> Range locking is required for read/write (I'm fairly sure that is in the POSIX
> standard for them). When I did copy_file_range(2) for FreeBSD others
> (I don't recall who) thought that it should do range locking to be consistent
> with read/write, which made sense to me.

See https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
2.9.7 Thread Interactions with Regular File Operations

[List of functions about file io, including read() and write)]

If two threads each call one of these functions, each call shall either
see all of the specified effects of the other call, or none of them.

> 
> I will ask on freebsd-current@ (few read freebsd-fs@) to see what the
> consensus is w.r.t. this. (I suspect the "return after 1sec" is preferred
> over disabling range locking, but we'll see.) I will also run some tests
> on the Linux system I have, to confirm what their semantics are for
> a recent Linux kernel. (Don't expect to see the post for a little while.)
> 
> rick
> 
> >
> > BTW: this is another one of my creepy use-cases: freeze some
> > job and forget about it - and if it happens to use cp somewhere,
> > then all other reads traversing the concerned file (e.g. backup)
> > would also freeze. And then after a week we wonder why we do not
> > have backups.
> >
> > rgds,
> > PMc
> >
> >
> > ! >
> > ! > > On Oct 21, 2025, at 6:28 AM, Rick Macklem <rick.macklem@gmail.com> wrote:
> > ! > >
> > ! > > On Tue, Oct 21, 2025 at 6:09 AM Peter 'PMc' Much
> > ! > > <pmc@citylink.dinoex.sub.org> wrote:
> > ! > >>
> > ! > >>
> > ! > >> This is 14.3-RELEASE.
> > ! > >>
> > ! > >> I am copying a file from a NFSv4 share to a local filesystem. This
> > ! > >> takes a couple of hours.
> > ! > >>
> > ! > >> In the meantime I want to read that partially copied file. This is
> > ! > >> not possible. The reading process locks up in rangelock_enqueue(),
> > ! > >> unkillable(!), and only after the first slow copy has completed, it
> > ! > >> will do it's job.
> > ! > >>
> > ! > >> Even if I do the first copy to stdout with redirect to file, the
> > ! > >> same problem happens. I.e.:
> > ! > >>
> > ! > >> $ cat /nfsshare/File > /localfs/File &
> > ! > >> $ cat /localfs/File  --> HANGS unkillable
> > ! > > This is caused by "cat" using copy_file_range(2), where the
> > ! > > system call is taking a long time.
> > ! > >
> > ! > > The version done below makes "cat" not use copy_file_range(2).
> > ! > > (copy_file_range(2) is interruptible, but that stops the file copy.
> > ! > > It also has a "return after 1sec" option.
> > ! > > Maybe that option should be exposed to userland and used by
> > ! > > "cat", "cp" and friends at least when enabled by a command line
> > ! > > option. (I'll admit looking at a file while it is being copied is a bit odd?)
> > ! > > The whole idea behind range-lock is to prevent a read/write syscall
> > ! > > from seeing a partial write. It just happens that the "write" takes a long
> > ! > > time in this case.
> > ! > >
> > ! > > Do others have thoughts on this? rick
> > ! > >
> > ! > >>
> > ! > >> Only if I introduce another process, the tie is avoided:
> > ! > >>
> > ! > >> $ cat /nfsshare/File | cat > /localfs/File &
> > ! > >> $ cat /localfs/File  --> WORKS
> > ! > >>
> > ! > >> I very much doubt that this is how it should be.
> > ! > >>
> > ! > >> Also, if I try to get some information about the supposed operation
> > ! > >> of this "rangelock" feature, search engines point me to a
> > ! > >> "rangelock(9)" manpage on man.freebsd.org, but that page doesn't
> > ! > >> seem to exist. :(
> > ! > >>
> > ! > >
> > ! >
> >
> >
> 


help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?aPpAdzESTgGoiYh0>