From owner-freebsd-fs@freebsd.org Sat Jun 22 23:10:24 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EB2D515BCB5D for ; Sat, 22 Jun 2019 23:10:23 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660075.outbound.protection.outlook.com [40.107.66.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D940B6A6AE for ; Sat, 22 Jun 2019 23:10:22 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQXPR01MB0278.CANPRD01.PROD.OUTLOOK.COM (10.165.131.138) by YQXPR01MB2728.CANPRD01.PROD.OUTLOOK.COM (52.132.92.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2008.16; Sat, 22 Jun 2019 23:10:20 +0000 Received: from YQXPR01MB0278.CANPRD01.PROD.OUTLOOK.COM ([fe80::6061:d04a:9a88:62ce]) by YQXPR01MB0278.CANPRD01.PROD.OUTLOOK.COM ([fe80::6061:d04a:9a88:62ce%5]) with mapi id 15.20.1987.014; Sat, 22 Jun 2019 23:10:20 +0000 From: Rick Macklem To: Sean Eric Fagan CC: "freebsd-fs@freebsd.org" Subject: Re: RFC: What should a copy_file_range(2) syscall do by default? Thread-Topic: RFC: What should a copy_file_range(2) syscall do by default? Thread-Index: AQHVKUrHiWvL2+NTtUq87Jz5w0yVWaaoSFJI Date: Sat, 22 Jun 2019 23:10:20 +0000 Message-ID: References: <20190622223517.6DF6514BC0@kithrup.com> In-Reply-To: <20190622223517.6DF6514BC0@kithrup.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 70f00759-84d6-43df-eee2-08d6f766cc60 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:YQXPR01MB2728; x-ms-traffictypediagnostic: YQXPR01MB2728: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0076F48C8A x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(366004)(136003)(346002)(39850400004)(396003)(199004)(189003)(53936002)(6246003)(33656002)(7696005)(786003)(229853002)(6436002)(4326008)(305945005)(55016002)(74316002)(9686003)(6506007)(5660300002)(186003)(66446008)(64756008)(66556008)(66476007)(52536014)(316002)(102836004)(81166006)(81156014)(8676002)(8936002)(76116006)(66946007)(73956011)(76176011)(99286004)(486006)(256004)(14454004)(478600001)(476003)(11346002)(446003)(6916009)(46003)(2906002)(68736007)(71190400001)(71200400001)(86362001)(74482002)(25786009); DIR:OUT; SFP:1101; SCL:1; SRVR:YQXPR01MB2728; H:YQXPR01MB0278.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 33oTTHTRVYXWktnl3/SPqzTFXbQgVvzbqqdsbT+OZ437Qrcx2QnKAMbIxm4erQp8SXKBh8WTHehbQ2oCLKcSQabzPzwiAiJM7PJHIUhbk5fkQxN0bv3MiBIaHNfTSmJ5IPG/qcRtkWs3FM4BeZA1oO75zUDl1f1Gbo4kkcHz+LF2bt08Nn9N7NfMY91LdJx00HW49UZa1tjLVS7IWct072FvVDaEz8kUfFUZd4pbFpUUtSF3A4DvDBsTtw8JoyNS2EZhf9RN2r6k8LYa2+5afTLsmT/Fm6SjIi0oozFKd0IZj/dXB3ddQy9NrsyiPirE1AGrqmdwomVSVvh/kK60ftlWDKv/oOL0fC8pWDx8XDdGoGd24r9dpvfs88Zr9y/efiK3DIgn51HWBWYf6A8lyG27i8ivSzFBAq1aERF2qwE= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 70f00759-84d6-43df-eee2-08d6f766cc60 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Jun 2019 23:10:20.6362 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rmacklem@uoguelph.ca X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB2728 X-Rspamd-Queue-Id: D940B6A6AE X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.75 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-1.60 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-0.80)[-0.798,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-0.91)[-0.910,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[uoguelph.ca]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: mx2.hc184-76.ca.iphmx.com]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[75.66.107.40.list.dnswl.org : 127.0.3.0]; NEURAL_HAM_SHORT(-0.58)[-0.577,0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; MIME_TRACE(0.00)[0:+] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jun 2019 23:10:24 -0000 Sean Eric Fagan wrote: >>Well, all I am interested in is a system call/VOP call so the NFSv4.2 >>client can do >>a file copy locally on the NFS server instead of doing Reads/Writes >>across the wire. >>The current code has gotten fairly complex, so I'll try and ask "how >>complex" this >>syscall/VOP call should be? > >In a previous life, I was responsible for one of the file copy libraries, = so >this is something I do have experience with. (I find the copy-range sysca= ll >interesting; AFP had a command to copy an entire hierachy on the server.) > >> --> The Linux man page mentions using copy_file_range(2) in a loop= with >> lseek(SEEK_DATA)/lseek(SEEK_HOLE) for sparse files. This >>suggests that >> the Linux fallback code doesn't try to handle holes. > >As far as I can tell, correct; instead, the copy routine looks for holes i= n >user space, and copies the non-holes. For NFSv4.2, the client can do SEEK_DATA/SEEK_HOLE against the server, alth= ough it does imply extra RPC RTTs. >>Linux discussions have talked about improved performance for local file s= ystems >>based on reduced # of system calls, but I have not seen any data to show = what, >>if any, performance improvement has been observed. (The slow hardware I h= ave >>to test on won't be useful for performance evaluation.) > >My experience shows that it's minimal, if all it will be copying is a sing= le >file. There would have to be a lot of system calls, and a *lot* of syscal= l >overhead, for that to hold sway -- and they're also doing the checks for >holes, which may end up increasing the number of system calls for them by = a >significant amount. I'm still skeptical. Yes, my hunch is the same. However, I do expect a performance improvement for NFS (at least for large = files), due to savings w.r.t. RPC RTTs and avoiding data going server->client->serv= er. I suspect avoiding the kernel/userspace transitions may help w.r.t. fuse, t= oo. >Alan mentioned locking, which does buy you something, but it also means >*locking the file while it is being copied*. Which, for large files, is n= ot >so great. I also don't think you can call any large copy atomic, unless >you're using a signle transaction for the entire copy. I tried posting w.r.t. atomicity and didn't get a lot of responses. However= , although kib@ didn't exactly say it should be the case, he did point out that FreeBS= D has traditionally ensured atomicity of file updates for syscalls and felt that = was a good thing. As such, I've done the range locking of both files and created new p= rimitives to do that while avoiding deadlock. If others have opinions w.r.t. atomicity of file data updates within this s= yscall, please post to either that thread or this one. >Anyway: I don't have a big objection to it, other than putting a lot of w= ork >into a system call, but as I said I'm clearly a couple decades behind on t= hat >sentiment :). Thanks for your comments. However, you didn't seem to indicate your preferr= ed alternative? I, personally, don't care, but would like to find out what the "collective"= thinks, rick.