Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Sep 2020 15:08:08 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>, Konstantin Belousov <kib@FreeBSD.org>
Subject:   Re: RFC: copy_file_range(3)
Message-ID:  <YTBPR01MB39666626FF10803E5D4EF3D2DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <YTBPR01MB3966BA18F43F7B6353171E67DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
References:  <CAOtMX2iFZZpoj%2Bap21rrju4hJoip6ZoyxEiCB8852NeH7DAN0Q@mail.gmail.com> <YTBPR01MB39666188FC89399B0D632FE8DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2gMYdcx0CUC1Mky3ETFr1JkBbYzn17i11axSW=HRTL7OA@mail.gmail.com> <YTBPR01MB3966C1D4D10BE836B37955F5DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>, <CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA@mail.gmail.com>, <YTBPR01MB3966BA18F43F7B6353171E67DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
Rick Macklem wrote:=0A=
>Alan Somers wrote:=0A=
>[lots of stuff snipped]=0A=
>>1) In order to quickly respond to a signal, a program must use a modest l=
en with >>copy_file_range=0A=
>For the programs you have mentioned, I think the only signal handling woul=
d=0A=
>be termination (<ctrl>C or SIGTERM if you prefer).=0A=
>I'm not sure what is a reasonable response time for this.=0A=
>I'd like to hear comments from others?=0A=
>- 1sec, less than 1sec, a few seconds, ...=0A=
>=0A=
>> 2) If a hole is larger than len, that will cause vn_generic_copy_file_ra=
nge to=0A=
>> truncate the output file to the middle of the hole.  Then, in the next i=
nvocation,=0A=
>> truncate it again to a larger size.=0A=
>> 3) The result is a file that is not as sparse as the original.=0A=
>Yes. So, the trick is to use the largest "len" you can live with, given ho=
w long you=0A=
>are willing to wait for signal processing.=0A=
>=0A=
>> For example, on UFS:=0A=
>> $ truncate -s 1g sparsefile=0A=
>Not a very interesting sparse file. I wrote a little program to create one=
.=0A=
>> $ cp sparsefile sparsefile2=0A=
>> $ du -sh sparsefile*=0A=
>>  96K sparsefile=0A=
>>  32M sparsefile2=0A=
Btw, this happens because, at least for UFS (not sure about other file=0A=
systems), if you grow a file's size via VOP_SETATTR() of size, it allocates=
 a=0A=
block at the new EOF, even though no data has been written there.=0A=
--> This results in one block being allocated at the end of the range used=
=0A=
    for a copy_file_range() call, if that file offset is within a hole.=0A=
    --> The larger the "len" argument, the less frequently it will occur.=
=0A=
=0A=
>>=0A=
>> My idea for a userland wrapper would solve this problem by using=0A=
>> SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use copy_file_r=
ange for=0A=
>> everything else with a modest len.  Alternatively, we could eliminate th=
e need for=0A=
>> the wrapper by enabling copy_file_range for every file system, and makin=
g=0A=
>> vn_generic_copy_file_range interruptible, so copy_file_range can be call=
ed with=0A=
>> large len without penalizing signal handling performance.=0A=
>=0A=
>Well, I ran some quick benchmarks using the attached programs, plus "cp" b=
oth=0A=
>before and with your copy_file_range() patch.=0A=
>copya - Does what I think your plan is above, with a limit of 2Mbytes for =
"len".=0A=
>copyb -Just uses copy_file_range() with 128Mbytes for "len".=0A=
>=0A=
>I first created the sparse file with createsparse.c. It is admittedly a wo=
rst case,=0A=
>creating alternating holes and data blocks of the minimum size supported b=
y=0A=
>the file system. (I ran it on a UFS file system created with defaults, so =
the minimum=0A=
>>hole size is 32Kbytes.)=0A=
>The file is 1Gbyte in size with an Allocation size of 524576 ("ls -ls").=
=0A=
>=0A=
>I then ran copya, copyb, old-cp and new-cp. For NFS, I redid the mount bef=
ore=0A=
>each copy to avoid data caching in the client.=0A=
>Here's what I got:=0A=
>                      Elapsed time           #RPCs                  Alloca=
tion size ("ls -ls" on server)=0A=
>NFSv4.2=0A=
>copya             39.7sec          16384copy+32768seek       524576=0A=
>copyb             10.2sec          104copy                              52=
4576=0A=
When I ran the tests I had vfs.nfs.maxcopyrange set to 128Mbytes on the=0A=
server. However it was still the default of 10Mbytes on the client,=0A=
so this test run used 10Mbytes per Copy. (I wondered why it did 104 Copyies=
?)=0A=
With both set to 128Mbytes I got:=0A=
copyb                10.0sec          8copy                                =
  524576=0A=
>old-cp             21.9sec          16384read+16384write      1048864=0A=
>new-cp            10.5sec          1024copy                            524=
576=0A=
>=0A=
>NFSv4.1=0A=
>copya             21.8sec          16384read+16384write      1048864=0A=
>copyb             21.0sec          16384read+16384write      1048864=0A=
>old-cp             21.8sec          16384read+16384write      1048864=0A=
>new-cp           21.4sec           16384read+16384write      1048864=0A=
>=0A=
>Local on the UFS file system=0A=
>copya             9.2sec                       n/a                        =
     524576=0A=
This turns out to be just variability in the test. I get 7.9sec->9.2sec=0A=
for runs of all three of copya, copyb and new-cp for UFS.=0A=
I think it is caching related, since I wasn't unmounting/remounting the=0A=
UFS file system between test runs.=0A=
>copyb             8.0sec                       n/a                        =
     524576=0A=
>old-cp            15.9sec                      n/a                        =
    1048864=0A=
>new-cp           7.9sec                        n/a                        =
     524576=0A=
>=0A=
>So, for a NFSv4.2 mount, using SEEK_DATA/SEEK_HOLE is definitely=0A=
>a performance hit, due to all the RPC rtts.=0A=
>Your patched "cp" does fine, although a larger "len" reduces the=0A=
>RPC count against the server.=0A=
>All variants using copy_file_range() retain the holes.=0A=
>=0A=
>For NFSv4.1, it (not surprisingly) doesn't matter, since only NFSv4.2=0A=
>supports SEEK_DATA/SEEK_HOLE and VOP_COPY_FILE_RANGE().=0A=
>=0A=
>For UFS, everything using copy_file_range() works pretty well and=0A=
>retains the holes.=0A=
=0A=
>Although "copya" is guaranteed to retain the holes, it does run noticably=
=0A=
>slower than the others. Not sure why? Does the extra SEEK_DATA/SEEK_HOLE=
=0A=
>syscalls cost that much?=0A=
Ignore this. It was just variability in the test runs.=0A=
=0A=
>The limitation of not using SEEK_DATA/SEEK_HOLE is that you will not=0A=
>retain holes that straddle the byte range copied by two subsequent=0A=
>copy_file_range(2) calls.=0A=
This statement is misleading. These holes are partially retained, but there=
=0A=
will be a block allocated (at least for UFS) at the boundary, due the prope=
rty of=0A=
growing a file via VOP_SETATTR(size) as noted above.=0A=
=0A=
>--> This can be minimized by using a large "len", but that large "len"=0A=
>      results in slower response to signal handling.=0A=
I'm going to play with "len" to-day and come up with some numbers=0A=
w.r.t. signal handling response time vs the copy_file_range() "len" argumen=
t.=0A=
=0A=
>I've attached the little programs, so you can play with them.=0A=
>(Maybe try different sparse schemes/sizes? It might be fun to=0A=
> make the holes/blocks some random multiple of hole size up=0A=
> to a limit?)=0A=
>=0A=
>rick=0A=
>ps: In case he isn't reading hackers these days, I've added kib@=0A=
>      as a cc. He might know why UFS is 15% slower when SEEK_HOLE=0A=
>      SEEK_DATA is used.=0A=
=0A=
rick=0A=
=0A=
-Alan=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTBPR01MB39666626FF10803E5D4EF3D2DD380>