Date: Sun, 20 Sep 2020 19:40:16 -0600 From: Alan Somers <asomers@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: RFC: copy_file_range(3) Message-ID: <CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA@mail.gmail.com> In-Reply-To: <YTBPR01MB3966C1D4D10BE836B37955F5DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> References: <CAOtMX2iFZZpoj%2Bap21rrju4hJoip6ZoyxEiCB8852NeH7DAN0Q@mail.gmail.com> <YTBPR01MB39666188FC89399B0D632FE8DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM> <CAOtMX2gMYdcx0CUC1Mky3ETFr1JkBbYzn17i11axSW=HRTL7OA@mail.gmail.com> <YTBPR01MB3966C1D4D10BE836B37955F5DD3D0@YTBPR01MB3966.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Sep 20, 2020 at 5:14 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: > Alan Somers wrote: > >On Sun, Sep 20, 2020 at 9:58 AM Rick Macklem <rmacklem@uoguelph.ca > <mailto:rmacklem@uoguelph.ca>> wrote: > >>Alan Somers wrote: > >>>copy_file_range(2) is nifty, but it has a few sharp edges: > >>>1) Certain file systems don't support it, necessitating a write/read > based > >>>fallback > >>>2) It doesn't handle sparse files as well as SEEK_HOLE/SEEK_DATA > >>>3) It's slightly tricky to both efficiently deal with holes and also > >>>promptly respond to signals > >>> > >>>These problems aren't terribly hard, but it seems to me like most > >>>applications that use copy_file_range would share the exact same > >>>solutions. In particular, I'm thinking about cp(1), dd(1), and > >>>install(8). Those three could benefit from sharing a userland wrapper > that > >>>handles the above problems. > >>> > >>>Should we add such a wrapper to libc? If so, what should it be called, > and > >>>should it be public or just private to /usr/src ? > >>There has been a discussion on src-committers which I suggested should > >>be taken to a public mailing list. > >> > >>The basic question is... > >>Whether or not the copy_file_range(2) syscall should be compatible with > >>the Linux one. > >>When I did the syscall, I tried to make it Linux-compatible, arguing that > >>Linux is now a de-facto standard. > >>The Linux syscall only works on regular files, which is why Alan's patch > for > >>cp required a "fallback to the old way" for VCHR files like /dev/null. > >> > >>He is considering a wrapper in libc to provide FreeBSD specific > semantics, > >>which I have no problem with, so long as the naming and man page make > >>it clear that it is not compatible with the Linux syscall. > >>(Personally, I'd prefer a wrapper in libc to making the actual syscall > non-Linux > >> compatible, but that is just mho.) > >> > >>Hopefully this helps clarify what Alan is asking, rick > >> > >>I don't think the two questions are equivalent. I think that > copy_file_range(2) >>ought to work on character devices. Separately, even > it does, I think a userland >>wrapper would still be useful. It would > still be able to handle sparse files more >>efficiently than the > kernel-based vn_generic_copy_file_range. > I saw this also stated in your #2 above, but wonder why you think a wrapper > would handle holes more efficiently. > vn_generic_copy_file_range() does look for holes via SEEK_DATA/SEEK_HOLE > just like a wrapper would and retains them as far as possible. It also > looks > for blocks of all zero bytes for file systems that do not support > SEEK_DATA/ > SEEK_HOLE (like NFS versions prior to 4.2) and creates holes for these in > the output file. > --> The only cases that I am aware of where the holes are not retained are: > - When the min holesize for the output file is larger than that of the > input file. > - When the hole straddles the byte range specified for the syscall. > (Or when the hole straddles two copy_file_range(2) syscalls, if you > prefer.) > > If you are copying the entire file and do not care how long the syscall > takes (which also implies how long it will take for a termination signal > like <ctrl>C to be handled), the most efficient usage is to specify > a "len" argument equal to UINT64_MAX. > --> This will usually copy the whole file in one gulp, although it is not > guaranteed to copy everything, given the Linux semantics definition > of it (an NFSv4.2 server can simply choose to copy less, for > example). > --> This allows the kernel to use whatever block size works > efficiently > and does not require an allocation of a large userspace > buffer for > the date, nor that the data be copied to/from userspace. > > The problem with doing the whole file in one gulp are: > - A large file can take quite a while and any signal won't be processed > until > the gulp is done. > --> If you wrote a program that allocated a 100Gbyte buffer and then > copied a file using read(2)/write(2) with a size of 100Gbytes in a > loop, > you'd end up with the same result. > - As kib@ noted, if the input file never reports EOF (as /dev/zero does), > then the "one gulp" wouldn't end until storage is exhausted on the > output file(s) device and <crtl>C wouldn't stop it (since it is one > big > syscall). > --> As such, I suggested that, if the syscall is extended to allow > VCHR, > that the "len" argument be clipped at "K Mbytes" for that case > to > avoid filling the storage device before being able to <ctrl>C > out > of it, for this case. > I suppose the answer for #3 is... > - smaller "len" allows for quicker response to signals > but > - smaller "len" results in less efficient use of the syscall. > > Your patch for "cp" seemed fine, but used a small "len" and, as such, > made the use of copy_file_range(2) less efficient. > > All I see the wrapper dong is handling the VCHR case (if the syscall > remains > as it is now and returns EINVAL to be compatible with Linux) and making > some rather arbitrary choice w.r.t. how big "len" should be. > --> Choosing an appropriate "len" might better be left to the specific use > case, I think? > > In summary, it's mostly whether VCHR gets handled by the syscall or a > wrapper? > 1) In order to quickly respond to a signal, a program must use a modest len with copy_file_range 2) If a hole is larger than len, that will cause vn_generic_copy_file_range to truncate the output file to the middle of the hole. Then, in the next invocation, truncate it again to a larger size. 3) The result is a file that is not as sparse as the original. For example, on UFS: $ truncate -s 1g sparsefile $ cp sparsefile sparsefile2 $ du -sh sparsefile* 96K sparsefile 32M sparsefile2 My idea for a userland wrapper would solve this problem by using SEEK_HOLE/SEEK_DATA to copy holes in their entirety, and use copy_file_range for everything else with a modest len. Alternatively, we could eliminate the need for the wrapper by enabling copy_file_range for every file system, and making vn_generic_copy_file_range interruptible, so copy_file_range can be called with large len without penalizing signal handling performance. -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jHMRD0Hno03f2dqjJToR152u8d-_40GM_%2BBvNPkN_smA>