Date: Wed, 23 Sep 2020 01:24:24 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Alan Somers <asomers@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>, Konstantin Belousov <kib@FreeBSD.org> Subject: Re: RFC: copy_file_range(3) Message-ID: <YTBPR01MB3966DE95F967F892EA2738EFDD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <YTBPR01MB3966BA18F43F7B6353171E67DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> References: <CAOtMX2iFZZpoj%2Bap21rrju4hJoip6ZoyxEiCB8852NeH7DAN0Q@mail.gmail.com> <YTBPR01MB39666188FC89399B0D632FE8DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2gMYdcx0CUC1Mky3ETFr1JkBbYzn17i11axSW=HRTL7OA@mail.gmail.com> <YTBPR01MB3966C1D4D10BE836B37955F5DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>, <CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA@mail.gmail.com>, <YTBPR01MB3966BA18F43F7B6353171E67DD380@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
Oh, and I set=0A= vfs.nfs.maxcopyrange=3D134217728=0A= on the server.=0A= =0A= The current default is only 10Mbytes, but I think 128Mbytes=0A= is a more reasonable setting.=0A= =0A= rick=0A= ps: The server and client are only somewhat old Dell Latitude 6420=0A= laptops, so the tests were not done on server grade hardware.=0A= =0A= =0A= ________________________________________=0A= From: owner-freebsd-hackers@freebsd.org <owner-freebsd-hackers@freebsd.org>= on behalf of Rick Macklem <rmacklem@uoguelph.ca>=0A= Sent: Tuesday, September 22, 2020 9:18 PM=0A= To: Alan Somers=0A= Cc: FreeBSD Hackers; Konstantin Belousov=0A= Subject: Re: RFC: copy_file_range(3)=0A= =0A= Alan Somers wrote:=0A= [lots of stuff snipped]=0A= >1) In order to quickly respond to a signal, a program must use a modest le= n with >copy_file_range=0A= For the programs you have mentioned, I think the only signal handling would= =0A= be termination (<ctrl>C or SIGTERM if you prefer).=0A= I'm not sure what is a reasonable response time for this.=0A= I'd like to hear comments from others?=0A= - 1sec, less than 1sec, a few seconds, ...=0A= =0A= > 2) If a hole is larger than len, that will cause vn_generic_copy_file_ran= ge to=0A= > truncate the output file to the middle of the hole. Then, in the next in= vocation,=0A= > truncate it again to a larger size.=0A= > 3) The result is a file that is not as sparse as the original.=0A= Yes. So, the trick is to use the largest "len" you can live with, given how= long you=0A= are willing to wait for signal processing.=0A= =0A= > For example, on UFS:=0A= > $ truncate -s 1g sparsefile=0A= Not a very interesting sparse file. I wrote a little program to create one.= =0A= > $ cp sparsefile sparsefile2=0A= > $ du -sh sparsefile*=0A= > 96K sparsefile=0A= > 32M sparsefile2=0A= >=0A= > My idea for a userland wrapper would solve this problem by using=0A= > SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use copy_file_ra= nge for=0A= > everything else with a modest len. Alternatively, we could eliminate the= need for=0A= > the wrapper by enabling copy_file_range for every file system, and making= =0A= > vn_generic_copy_file_range interruptible, so copy_file_range can be calle= d with=0A= > large len without penalizing signal handling performance.=0A= =0A= Well, I ran some quick benchmarks using the attached programs, plus "cp" bo= th=0A= before and with your copy_file_range() patch.=0A= copya - Does what I think your plan is above, with a limit of 2Mbytes for "= len".=0A= copyb -Just uses copy_file_range() with 128Mbytes for "len".=0A= =0A= I first created the sparse file with createsparse.c. It is admittedly a wor= st case,=0A= creating alternating holes and data blocks of the minimum size supported by= =0A= the file system. (I ran it on a UFS file system created with defaults, so t= he minimum=0A= hole size is 32Kbytes.)=0A= The file is 1Gbyte in size with an Allocation size of 524576 ("ls -ls").=0A= =0A= I then ran copya, copyb, old-cp and new-cp. For NFS, I redid the mount befo= re=0A= each copy to avoid data caching in the client.=0A= Here's what I got:=0A= Elapsed time #RPCs Allocat= ion size ("ls -ls" on server)=0A= NFSv4.2=0A= copya 39.7sec 16384copy+32768seek 524576=0A= copyb 10.2sec 104copy 524= 576=0A= old-cp 21.9sec 16384read+16384write 1048864=0A= new-cp 10.5sec 1024copy 5245= 76=0A= =0A= NFSv4.1=0A= copya 21.8sec 16384read+16384write 1048864=0A= copyb 21.0sec 16384read+16384write 1048864=0A= old-cp 21.8sec 16384read+16384write 1048864=0A= new-cp 21.4sec 16384read+16384write 1048864=0A= =0A= Local on the UFS file system=0A= copya 9.2sec n/a = 524576=0A= copyb 8.0sec n/a = 524576=0A= old-cp 15.9sec n/a = 1048864=0A= new-cp 7.9sec n/a = 524576=0A= =0A= So, for a NFSv4.2 mount, using SEEK_DATA/SEEK_HOLE is definitely=0A= a performance hit, due to all the RPC rtts.=0A= Your patched "cp" does fine, although a larger "len" reduces the=0A= RPC count against the server.=0A= All variants using copy_file_range() retain the holes.=0A= =0A= For NFSv4.1, it (not surprisingly) doesn't matter, since only NFSv4.2=0A= supports SEEK_DATA/SEEK_HOLE and VOP_COPY_FILE_RANGE().=0A= =0A= For UFS, everything using copy_file_range() works pretty well and=0A= retains the holes.=0A= Although "copya" is guaranteed to retain the holes, it does run noticably= =0A= slower than the others. Not sure why? Does the extra SEEK_DATA/SEEK_HOLE=0A= syscalls cost that much?=0A= =0A= The limitation of not using SEEK_DATA/SEEK_HOLE is that you will not=0A= retain holes that straddle the byte range copied by two subsequent=0A= copy_file_range(2) calls.=0A= --> This can be minimized by using a large "len", but that large "len"=0A= results in slower response to signal handling.=0A= =0A= I've attached the little programs, so you can play with them.=0A= (Maybe try different sparse schemes/sizes? It might be fun to=0A= make the holes/blocks some random multiple of hole size up=0A= to a limit?)=0A= =0A= rick=0A= ps: In case he isn't reading hackers these days, I've added kib@=0A= as a cc. He might know why UFS is 15% slower when SEEK_HOLE=0A= SEEK_DATA is used.=0A= =0A= =0A= -Alan=0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTBPR01MB3966DE95F967F892EA2738EFDD380>