Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Sep 2020 01:18:18 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>, Konstantin Belousov <kib@FreeBSD.org>
Subject:   Re: RFC: copy_file_range(3)
Message-ID:  <YTBPR01MB3966BA18F43F7B6353171E67DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA@mail.gmail.com>
References:  <CAOtMX2iFZZpoj%2Bap21rrju4hJoip6ZoyxEiCB8852NeH7DAN0Q@mail.gmail.com> <YTBPR01MB39666188FC89399B0D632FE8DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2gMYdcx0CUC1Mky3ETFr1JkBbYzn17i11axSW=HRTL7OA@mail.gmail.com> <YTBPR01MB3966C1D4D10BE836B37955F5DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>, <CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--_003_YTBPR01MB3966BA18F43F7B6353171E67DD380YTBPR01MB3966CANP_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Alan Somers wrote:=0A=
[lots of stuff snipped]=0A=
>1) In order to quickly respond to a signal, a program must use a modest le=
n with >copy_file_range=0A=
For the programs you have mentioned, I think the only signal handling would=
=0A=
be termination (<ctrl>C or SIGTERM if you prefer).=0A=
I'm not sure what is a reasonable response time for this.=0A=
I'd like to hear comments from others?=0A=
- 1sec, less than 1sec, a few seconds, ...=0A=
=0A=
> 2) If a hole is larger than len, that will cause vn_generic_copy_file_ran=
ge to=0A=
> truncate the output file to the middle of the hole.  Then, in the next in=
vocation, =0A=
> truncate it again to a larger size.=0A=
> 3) The result is a file that is not as sparse as the original.=0A=
Yes. So, the trick is to use the largest "len" you can live with, given how=
 long you=0A=
are willing to wait for signal processing.=0A=
=0A=
> For example, on UFS:=0A=
> $ truncate -s 1g sparsefile=0A=
Not a very interesting sparse file. I wrote a little program to create one.=
=0A=
> $ cp sparsefile sparsefile2=0A=
> $ du -sh sparsefile*=0A=
>  96K sparsefile=0A=
>  32M sparsefile2=0A=
>=0A=
> My idea for a userland wrapper would solve this problem by using =0A=
> SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use copy_file_ra=
nge for=0A=
> everything else with a modest len.  Alternatively, we could eliminate the=
 need for=0A=
> the wrapper by enabling copy_file_range for every file system, and making=
 =0A=
> vn_generic_copy_file_range interruptible, so copy_file_range can be calle=
d with =0A=
> large len without penalizing signal handling performance.=0A=
=0A=
Well, I ran some quick benchmarks using the attached programs, plus "cp" bo=
th=0A=
before and with your copy_file_range() patch.=0A=
copya - Does what I think your plan is above, with a limit of 2Mbytes for "=
len".=0A=
copyb -Just uses copy_file_range() with 128Mbytes for "len".=0A=
=0A=
I first created the sparse file with createsparse.c. It is admittedly a wor=
st case,=0A=
creating alternating holes and data blocks of the minimum size supported by=
=0A=
the file system. (I ran it on a UFS file system created with defaults, so t=
he minimum=0A=
hole size is 32Kbytes.)=0A=
The file is 1Gbyte in size with an Allocation size of 524576 ("ls -ls").=0A=
=0A=
I then ran copya, copyb, old-cp and new-cp. For NFS, I redid the mount befo=
re=0A=
each copy to avoid data caching in the client.=0A=
Here's what I got:=0A=
                      Elapsed time           #RPCs                  Allocat=
ion size ("ls -ls" on server)=0A=
NFSv4.2    =0A=
copya             39.7sec          16384copy+32768seek       524576=0A=
copyb             10.2sec          104copy                              524=
576=0A=
old-cp             21.9sec          16384read+16384write      1048864=0A=
new-cp            10.5sec          1024copy                            5245=
76=0A=
=0A=
NFSv4.1=0A=
copya             21.8sec          16384read+16384write      1048864=0A=
copyb             21.0sec          16384read+16384write      1048864=0A=
old-cp             21.8sec          16384read+16384write      1048864=0A=
new-cp           21.4sec           16384read+16384write      1048864=0A=
=0A=
Local on the UFS file system=0A=
copya             9.2sec                       n/a                         =
    524576=0A=
copyb             8.0sec                       n/a                         =
    524576=0A=
old-cp            15.9sec                      n/a                         =
   1048864=0A=
new-cp           7.9sec                        n/a                         =
    524576=0A=
=0A=
So, for a NFSv4.2 mount, using SEEK_DATA/SEEK_HOLE is definitely=0A=
a performance hit, due to all the RPC rtts.=0A=
Your patched "cp" does fine, although a larger "len" reduces the=0A=
RPC count against the server.=0A=
All variants using copy_file_range() retain the holes.=0A=
=0A=
For NFSv4.1, it (not surprisingly) doesn't matter, since only NFSv4.2=0A=
supports SEEK_DATA/SEEK_HOLE and VOP_COPY_FILE_RANGE().=0A=
=0A=
For UFS, everything using copy_file_range() works pretty well and=0A=
retains the holes.=0A=
Although "copya" is guaranteed to retain the holes, it does run noticably=
=0A=
slower than the others. Not sure why? Does the extra SEEK_DATA/SEEK_HOLE=0A=
syscalls cost that much?=0A=
=0A=
The limitation of not using SEEK_DATA/SEEK_HOLE is that you will not=0A=
retain holes that straddle the byte range copied by two subsequent=0A=
copy_file_range(2) calls.=0A=
--> This can be minimized by using a large "len", but that large "len"=0A=
      results in slower response to signal handling.=0A=
=0A=
I've attached the little programs, so you can play with them.=0A=
(Maybe try different sparse schemes/sizes? It might be fun to=0A=
 make the holes/blocks some random multiple of hole size up=0A=
 to a limit?)=0A=
=0A=
rick=0A=
ps: In case he isn't reading hackers these days, I've added kib@=0A=
      as a cc. He might know why UFS is 15% slower when SEEK_HOLE=0A=
      SEEK_DATA is used.=0A=
=0A=
=0A=
-Alan=0A=

--_003_YTBPR01MB3966BA18F43F7B6353171E67DD380YTBPR01MB3966CANP_
Content-Type: text/plain; name="copyb.c"
Content-Description: copyb.c
Content-Disposition: attachment; filename="copyb.c"; size=765;
	creation-date="Wed, 23 Sep 2020 01:14:16 GMT";
	modification-date="Wed, 23 Sep 2020 01:14:16 GMT"
Content-Transfer-Encoding: base64

I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHN0cmluZy5o
PgojaW5jbHVkZSA8ZmNudGwuaD4KI2luY2x1ZGUgPGVycm5vLmg+CiNpbmNsdWRlIDxzeXMvcGFy
YW0uaD4KI2luY2x1ZGUgPHN5cy90eXBlcy5oPgojaW5jbHVkZSA8c3lzL3N0YXQuaD4KI2luY2x1
ZGUgPGVyci5oPgojaW5jbHVkZSA8dW5pc3RkLmg+CgppbnQKbWFpbihpbnQgYXJnYywgY2hhciAq
YXJndltdKQp7CglpbnQgaSwgaW5mZCwgb3V0ZmQ7CglzdHJ1Y3Qgc3RhdCBpbnN0LCBvdXRzdDsK
CW9mZl90IHNlZWtkYXRhLCBzZWVraG9sZSwgeGZlciwgb2ZmcG9zOwoJc2l6ZV90IGxlbjsKCXNz
aXplX3QgcnNpejsKCWNoYXIgY3A7CgoJaWYgKGFyZ2MgIT0gMykKCQllcnJ4KDEsICJVc2FnZTog
Y29weWIgPGluZmlsZT4gPG91dGZpbGU+Iik7CglpbmZkID0gb3Blbihhcmd2WzFdLCBPX1JET05M
WSwgMCk7CglpZiAoaW5mZCA8IDApCgkJZXJyKDEsICJjYW4ndCBvcGVuICVzIiwgYXJndlsxXSk7
CglvdXRmZCA9IG9wZW4oYXJndlsyXSwgT19DUkVBVCB8IE9fUkRXUiwgMDY0NCk7CglpZiAob3V0
ZmQgPCAwKQoJCWVycigxLCAiY2FuJ3QgY3JlYXRlICVzIiwgYXJndlsyXSk7CgoJLyogTm93LCBj
b3B5IGluZmQgdG8gb3V0ZmQuICovCglkbyB7CgkJc2Vla2RhdGEgPSBjb3B5X2ZpbGVfcmFuZ2Uo
aW5mZCwgTlVMTCwgb3V0ZmQsIE5VTEwsCgkJICAgIDEyOCAqIDEwMjQgKiAxMDI0LCAwKTsKCX0g
d2hpbGUgKHNlZWtkYXRhID4gMCk7Cn0K

--_003_YTBPR01MB3966BA18F43F7B6353171E67DD380YTBPR01MB3966CANP_
Content-Type: text/plain; name="createsparse.c"
Content-Description: createsparse.c
Content-Disposition: attachment; filename="createsparse.c"; size=1139;
	creation-date="Wed, 23 Sep 2020 01:14:32 GMT";
	modification-date="Wed, 23 Sep 2020 01:14:32 GMT"
Content-Transfer-Encoding: base64

I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHN0cmluZy5o
PgojaW5jbHVkZSA8ZmNudGwuaD4KI2luY2x1ZGUgPGVycm5vLmg+CiNpbmNsdWRlIDxzeXMvcGFy
YW0uaD4KI2luY2x1ZGUgPHN5cy90eXBlcy5oPgojaW5jbHVkZSA8c3lzL3N0YXQuaD4KI2luY2x1
ZGUgPGVyci5oPgojaW5jbHVkZSA8dW5pc3RkLmg+CgpzdGF0aWMgY2hhciBvdXRidWZbMTAyNCAq
IDEwMjRdOwoKaW50Cm1haW4oaW50IGFyZ2MsIGNoYXIgKmFyZ3ZbXSkKewoJaW50IGksIG91dGZk
OwoJc3RydWN0IHN0YXQgaW5zdCwgb3V0c3Q7CglvZmZfdCBzZWVrZGF0YSwgc2Vla2hvbGUsIHhm
ZXIsIG9mZnBvczsKCXNpemVfdCBsZW47Cglzc2l6ZV90IHJzaXo7CgljaGFyIGNwOwoJbG9uZyBo
b2xlc2l6OwoKCWlmIChhcmdjICE9IDIpCgkJZXJyeCgxLCAiVXNhZ2U6IGNyZWF0ZXNwYXJzZSA8
aW5maWxlPiIpOwoJLyogRmlsbCBpbiBpbmJ1ZiB3aXRoIHRoZSBhbHBoYWJldCBvdmVyIGFuZCBv
dmVyIGFuZCBvdmVyIGFnYWluLiAqLwoJY3AgPSAnYSc7Cglmb3IgKGkgPSAwOyBpIDwgc2l6ZW9m
KG91dGJ1Zik7IGkrKykgewoJCW91dGJ1ZltpXSA9IGNwKys7CgkJaWYgKGNwID4gJ3onKQoJCQlj
cCA9ICdhJzsKCX0KCW91dGZkID0gb3Blbihhcmd2WzFdLCBPX0NSRUFUIHwgT19SRFdSLCAwNjQ0
KTsKCWlmIChvdXRmZCA8IDApCgkJZXJyKDEsICJjYW4ndCBvcGVuICVzIiwgYXJndlsxXSk7CgoJ
aG9sZXNpeiA9IGZwYXRoY29uZihvdXRmZCwgX1BDX01JTl9IT0xFX1NJWkUpOwoJaWYgKGhvbGVz
aXogPD0gMCkKCQllcnIoMSwgIkNhbid0IGdldCBtaW4gaG9sZSBzaXplIik7CgkvKiBDcmVhdGUg
dGhlIHNwYXJzZSBmaWxlLiAqLwoJeGZlciA9IDEwMjQgKiAxMDI0ICogMTAyNDsKcHJpbnRmKCJ4
ZmVyPSVqZFxuIiwgKGludG1heF90KXhmZXIpOwoJZm9yIChvZmZwb3MgPSAwOyBvZmZwb3MgPCB4
ZmVyOyBvZmZwb3MgKz0gMiAqIGhvbGVzaXopIHsKcHJpbnRmKCJ4ZmVyPSVqZCBvZmZwb3M9JWpk
XG4iLCAoaW50bWF4X3QpeGZlciwgKGludG1heF90KW9mZnBvcyk7CgkJbHNlZWsob3V0ZmQsIGhv
bGVzaXosIFNFRUtfQ1VSKTsKCQl3cml0ZShvdXRmZCwgb3V0YnVmLCBob2xlc2l6KTsKCX0KfQo=

--_003_YTBPR01MB3966BA18F43F7B6353171E67DD380YTBPR01MB3966CANP_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTBPR01MB3966BA18F43F7B6353171E67DD380>