Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Sep 2020 11:52:43 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>, Konstantin Belousov <kib@freebsd.org>
Subject:   Re: RFC: copy_file_range(3)
Message-ID:  <CAOtMX2gSc8EF-GCeiDhq3zmQzSXicb2haT_RzvG4XosgrH0Ugg@mail.gmail.com>
In-Reply-To: <YTBPR01MB39666626FF10803E5D4EF3D2DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
References:  <CAOtMX2iFZZpoj%2Bap21rrju4hJoip6ZoyxEiCB8852NeH7DAN0Q@mail.gmail.com> <YTBPR01MB39666188FC89399B0D632FE8DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2gMYdcx0CUC1Mky3ETFr1JkBbYzn17i11axSW=HRTL7OA@mail.gmail.com> <YTBPR01MB3966C1D4D10BE836B37955F5DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA@mail.gmail.com> <YTBPR01MB3966BA18F43F7B6353171E67DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <YTBPR01MB39666626FF10803E5D4EF3D2DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 23, 2020 at 9:08 AM Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Rick Macklem wrote:
> >Alan Somers wrote:
> >[lots of stuff snipped]
> >>1) In order to quickly respond to a signal, a program must use a modest
> len with >>copy_file_range
> >For the programs you have mentioned, I think the only signal handling
> would
> >be termination (<ctrl>C or SIGTERM if you prefer).
> >I'm not sure what is a reasonable response time for this.
> >I'd like to hear comments from others?
> >- 1sec, less than 1sec, a few seconds, ...
> >
> >> 2) If a hole is larger than len, that will cause
> vn_generic_copy_file_range to
> >> truncate the output file to the middle of the hole.  Then, in the next
> invocation,
> >> truncate it again to a larger size.
> >> 3) The result is a file that is not as sparse as the original.
> >Yes. So, the trick is to use the largest "len" you can live with, given
> how long you
> >are willing to wait for signal processing.
> >
> >> For example, on UFS:
> >> $ truncate -s 1g sparsefile
> >Not a very interesting sparse file. I wrote a little program to create
> one.
> >> $ cp sparsefile sparsefile2
> >> $ du -sh sparsefile*
> >>  96K sparsefile
> >>  32M sparsefile2
> Btw, this happens because, at least for UFS (not sure about other file
> systems), if you grow a file's size via VOP_SETATTR() of size, it
> allocates a
> block at the new EOF, even though no data has been written there.
> --> This results in one block being allocated at the end of the range used
>     for a copy_file_range() call, if that file offset is within a hole.
>     --> The larger the "len" argument, the less frequently it will occur.
>
> >>
> >> My idea for a userland wrapper would solve this problem by using
> >> SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use
> copy_file_range for
> >> everything else with a modest len.  Alternatively, we could eliminate
> the need for
> >> the wrapper by enabling copy_file_range for every file system, and
> making
> >> vn_generic_copy_file_range interruptible, so copy_file_range can be
> called with
> >> large len without penalizing signal handling performance.
> >
> >Well, I ran some quick benchmarks using the attached programs, plus "cp"
> both
> >before and with your copy_file_range() patch.
> >copya - Does what I think your plan is above, with a limit of 2Mbytes for
> "len".
> >copyb -Just uses copy_file_range() with 128Mbytes for "len".
> >
> >I first created the sparse file with createsparse.c. It is admittedly a
> worst case,
> >creating alternating holes and data blocks of the minimum size supported
> by
> >the file system. (I ran it on a UFS file system created with defaults, so
> the minimum
> >>hole size is 32Kbytes.)
> >The file is 1Gbyte in size with an Allocation size of 524576 ("ls -ls").
> >
> >I then ran copya, copyb, old-cp and new-cp. For NFS, I redid the mount
> before
> >each copy to avoid data caching in the client.
> >Here's what I got:
> >                      Elapsed time           #RPCs
> Allocation size ("ls -ls" on server)
> >NFSv4.2
> >copya             39.7sec          16384copy+32768seek       524576
> >copyb             10.2sec          104copy
> 524576
> When I ran the tests I had vfs.nfs.maxcopyrange set to 128Mbytes on the
> server. However it was still the default of 10Mbytes on the client,
> so this test run used 10Mbytes per Copy. (I wondered why it did 104
> Copyies?)
> With both set to 128Mbytes I got:
> copyb                10.0sec          8copy
>   524576
> >old-cp             21.9sec          16384read+16384write      1048864
> >new-cp            10.5sec          1024copy
> 524576
> >
> >NFSv4.1
> >copya             21.8sec          16384read+16384write      1048864
> >copyb             21.0sec          16384read+16384write      1048864
> >old-cp             21.8sec          16384read+16384write      1048864
> >new-cp           21.4sec           16384read+16384write      1048864
> >
> >Local on the UFS file system
> >copya             9.2sec                       n/a
>      524576
> This turns out to be just variability in the test. I get 7.9sec->9.2sec
> for runs of all three of copya, copyb and new-cp for UFS.
> I think it is caching related, since I wasn't unmounting/remounting the
> UFS file system between test runs.
> >copyb             8.0sec                       n/a
>      524576
> >old-cp            15.9sec                      n/a
>     1048864
> >new-cp           7.9sec                        n/a
>      524576
> >
> >So, for a NFSv4.2 mount, using SEEK_DATA/SEEK_HOLE is definitely
> >a performance hit, due to all the RPC rtts.
> >Your patched "cp" does fine, although a larger "len" reduces the
> >RPC count against the server.
> >All variants using copy_file_range() retain the holes.
> >
> >For NFSv4.1, it (not surprisingly) doesn't matter, since only NFSv4.2
> >supports SEEK_DATA/SEEK_HOLE and VOP_COPY_FILE_RANGE().
> >
> >For UFS, everything using copy_file_range() works pretty well and
> >retains the holes.
>
> >Although "copya" is guaranteed to retain the holes, it does run noticably
> >slower than the others. Not sure why? Does the extra SEEK_DATA/SEEK_HOLE
> >syscalls cost that much?
> Ignore this. It was just variability in the test runs.
>
> >The limitation of not using SEEK_DATA/SEEK_HOLE is that you will not
> >retain holes that straddle the byte range copied by two subsequent
> >copy_file_range(2) calls.
> This statement is misleading. These holes are partially retained, but there
> will be a block allocated (at least for UFS) at the boundary, due the
> property of
> growing a file via VOP_SETATTR(size) as noted above.
>
> >--> This can be minimized by using a large "len", but that large "len"
> >      results in slower response to signal handling.
> I'm going to play with "len" to-day and come up with some numbers
> w.r.t. signal handling response time vs the copy_file_range() "len"
> argument.
>
> >I've attached the little programs, so you can play with them.
> >(Maybe try different sparse schemes/sizes? It might be fun to
> > make the holes/blocks some random multiple of hole size up
> > to a limit?)
> >
> >rick
> >ps: In case he isn't reading hackers these days, I've added kib@
> >      as a cc. He might know why UFS is 15% slower when SEEK_HOLE
> >      SEEK_DATA is used.
>

So it sounds like your main point is that for file systems with special
support, copy_file_range(2) is more efficient for many sparse files than
SEEK_HOLE/SEEK_DATA.  Sure, I buy that.  And secondarily, you don't see any
reason not to increase the len argument in commands like cp up to somewhere
around 128 MB, delaying signal handling for about 1 second on a typical
desktop (maybe set it lower on embedded arches).  And you think it's fine
to allow copy_file_range on devfs, as long as the len argument is clipped
at some finite value.  If we make all of those changes, are there any other
reasons why the write/read fallback path would be needed?
-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gSc8EF-GCeiDhq3zmQzSXicb2haT_RzvG4XosgrH0Ugg>