From owner-freebsd-fs@freebsd.org Sat Jun 22 17:35:44 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4514615D92DF for ; Sat, 22 Jun 2019 17:35:44 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 218F9855B2 for ; Sat, 22 Jun 2019 17:35:43 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf1-f68.google.com with SMTP id y13so7168159lfh.9 for ; Sat, 22 Jun 2019 10:35:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ui1qp2ejfWk6OW9XUP2aRDzPq7sbVCkuEZe924eHsWg=; b=TeGGzJVznDkF8QBFmJfFlvvz/cPxmZHUJJE3xyyJGr03DWrwg14YMII9TWcQVXZRZv 2m8stJw9NtIlfcdGAHibb4Gj+RclBfU7DokusocdaE1D26S1DoqM+P+6X9mOYkz/LTON wN0j7s8FyoJ2LOyMUK/J1fsDprdclXNfgsv4Doq7B0bIVfBk1tcE7TDQks1fiGC+IqVE VJ3PcL00jEJLADTpSB38tgIuJuS4LD62EhM6ukdm67u3wGnf4sjegx1+OxYHa5yF24sZ DBH2QiShljWv9PuD2qjz+2JcIKOYh/HRmlQ3tnNfkiH97yeFxNMzT24uyVC3Z+nycPQ8 Eb2A== X-Gm-Message-State: APjAAAU7v2DgBAakqfbdZiiEU/iLN3UhvqzaoK+oF7VOD8wImJ2gDMHo hBC34KHPsV6Fpv5Gzbc2uhswF6gTkSJ/dL95cKg= X-Google-Smtp-Source: APXvYqwSsBr5DUNx6dYcPA2BbD7hKQgUL5OAYqMQIxuG2RtYI320uAaDD/jScXHPkmVbg2WOkDXQ44EhqEBhVjuk4SE= X-Received: by 2002:a19:671c:: with SMTP id b28mr3644162lfc.164.1561224494787; Sat, 22 Jun 2019 10:28:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Sat, 22 Jun 2019 11:28:03 -0600 Message-ID: Subject: Re: RFC: What should a copy_file_range(2) syscall do by default? To: Rick Macklem Cc: "freebsd-fs@freebsd.org" , Sean Fagan Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 218F9855B2 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.167.68 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.87)[-0.873,0]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; IP_SCORE(-1.12)[ip: (0.24), ipnet: 209.85.128.0/17(-3.44), asn: 15169(-2.33), country: US(-0.06)]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; SUBJECT_ENDS_QUESTION(1.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[68.167.85.209.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[68.167.85.209.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jun 2019 17:35:44 -0000 On Sat, Jun 22, 2019 at 10:02 AM Rick Macklem wrote: > > Hi, > > sef@ made this comment on phabricator. I don't believe phabricator is the correct > place for "big picture" discussions, so I'm posting it here (I'm assuming sef@ doesn't > mind, since the phabricator comments are public). > sef@ wrote: > >This much work in the kernel for what //should// be user-space makes me twitchy... >but there is lots of precedent for it, so I obviously have to get with the times. > > > > I've done a quick review of the code; it seems most of the complexity is in the hole->detection. I'm also annoyed that linux used size_t for the amount to copy, when >off_t would have been more appropriate. But not much to do about that now. > > > > Having a default implementation means that user-space can't fall back if it's not >supported, and do it better (e.g., parallel I/O). Should we also have a pathconf for >the feature? > > > > WRT your question on -fs, I have no objections to this working cross-filesystem, >although I think I might ask to have a flag to fail in that case. > > Well, all I am interested in is a system call/VOP call so the NFSv4.2 client can do > a file copy locally on the NFS server instead of doing Reads/Writes across the wire. > The current code has gotten fairly complex, so I'll try and ask "how complex" this > syscall/VOP call should be? > > The range of variants I can think of are: > 0) - Don't do it at all. > 1) - The syscall could just do a VOP_COPY_FILE_RANGE() and return whatever error > it returns. > --> This implies an error return for all file systems for now, with support for > NFSv4.2mounts being added later (FreeBSD13 hopefully). This option would require applications or the C library to fallback to a copy loop. While doable, nothing in userland would be able to range-lock the file, making the copy loop non-atomic. So the in-kernel copy is superior. > 2) - The syscall could fall back on a simple copy loop, but not try to deal with holes. > --> The Linux man page mentions using copy_file_range(2) in a loop with > lseek(SEEK_DATA)/lseek(SEEK_HOLE) for sparse files. This suggests that > the Linux fallback code doesn't try to handle holes. Same problem as 1. Or if you do the copy loop in-kernel it would waste CPU time and expand sparse files, which isn't good either. > 3) - The current patch which tries to handle holes and copy the entire byte range > in one call. Definitely the best option, despite its complexity. I would argue that the complexity calls for a robust test suite, rather than abandoning the feature. -Alan