From owner-freebsd-fs@freebsd.org Sat Jun 22 16:02:01 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CCFC15D7C41 for ; Sat, 22 Jun 2019 16:02:01 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660043.outbound.protection.outlook.com [40.107.66.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B5C3182E81; Sat, 22 Jun 2019 16:01:59 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM (10.165.219.7) by YTXPR01MB0669.CANPRD01.PROD.OUTLOOK.COM (10.165.221.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1987.13; Sat, 22 Jun 2019 16:01:57 +0000 Received: from YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM ([fe80::fdaa:6868:bd31:8b7d]) by YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM ([fe80::fdaa:6868:bd31:8b7d%5]) with mapi id 15.20.1987.014; Sat, 22 Jun 2019 16:01:57 +0000 From: Rick Macklem To: "freebsd-fs@freebsd.org" CC: Sean Fagan , Alan Somers Subject: RFC: What should a copy_file_range(2) syscall do by default? Thread-Topic: RFC: What should a copy_file_range(2) syscall do by default? Thread-Index: AQHVKREqKzvk0WaXu025/Chow1UOHQ== Date: Sat, 22 Jun 2019 16:01:57 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 41b9c767-b745-466b-ef58-08d6f72af412 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:YTXPR01MB0669; x-ms-traffictypediagnostic: YTXPR01MB0669: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0076F48C8A x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(346002)(376002)(396003)(366004)(136003)(39860400002)(189003)(199004)(316002)(305945005)(52536014)(2501003)(486006)(9686003)(186003)(7696005)(102836004)(54906003)(8936002)(68736007)(6506007)(99286004)(14454004)(86362001)(786003)(478600001)(5660300002)(256004)(53936002)(74482002)(2906002)(476003)(6436002)(2351001)(33656002)(6916009)(81156014)(8676002)(74316002)(55016002)(81166006)(64756008)(66476007)(66556008)(66946007)(66446008)(4326008)(73956011)(46003)(25786009)(71190400001)(71200400001)(76116006); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR01MB0669; H:YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: hYUV5kXkH2Vlc4UtiFxSjbDwICMlp9tihokuyA7vmCUU5GiM3jBwKdbSFCsdRlLutCJxaAe3XWU0Lf+UPwbakHrAv4TCljpQitIWVlFrTFuFmWFL8796dsvG+WrO6Do7VhKKsspqqqZcoLOx9ZnVsG/S5+Qj9IHiUMypbfxcvGKSLYIMePbSvhIcCDEG8IKU7yp83wz2NW9PLDuikjqk057/XXeIB57zeYWHzR3oA95CM5x/ZTBlftX6A3yw4PX4kFU6levItKWtkCRwN7MSDA6PEfGw5DnaKLE88Wp7zSUB6Bk10+6x5xK9tE5wCzr4Tp3fQrb3YwJVIlID4yKfTbv1Je49I8sc10PTh0lbR3DYQVr+rirqZZ/K0jZhugFew8FBj/pAumx5m3Sgj9PfeDln7NT5ORiWb9MycQZE3bU= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 41b9c767-b745-466b-ef58-08d6f72af412 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Jun 2019 16:01:57.4462 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rmacklem@uoguelph.ca X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR01MB0669 X-Rspamd-Queue-Id: B5C3182E81 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.43 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-3.13 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[uoguelph.ca]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com]; NEURAL_HAM_SHORT(-0.79)[-0.795,0]; RCVD_IN_DNSWL_NONE(0.00)[43.66.107.40.list.dnswl.org : 127.0.3.0]; IP_SCORE(-1.02)[ipnet: 40.64.0.0/10(-2.89), asn: 8075(-2.17), country: US(-0.06)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; MIME_TRACE(0.00)[0:+] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jun 2019 16:02:01 -0000 Hi, sef@ made this comment on phabricator. I don't believe phabricator is the c= orrect place for "big picture" discussions, so I'm posting it here (I'm assuming s= ef@ doesn't mind, since the phabricator comments are public). sef@ wrote: >This much work in the kernel for what //should// be user-space makes me tw= itchy... >but there is lots of precedent for it, so I obviously have to get= with the times. > =20 > I've done a quick review of the code; it seems most of the complexity is= in the hole->detection. I'm also annoyed that linux used size_t for the a= mount to copy, when >off_t would have been more appropriate. But not much = to do about that now. > =20 > Having a default implementation means that user-space can't fall back if= it's not >supported, and do it better (e.g., parallel I/O). Should we als= o have a pathconf for >the feature? > =20 > WRT your question on -fs, I have no objections to this working cross-fil= esystem, >although I think I might ask to have a flag to fail in that case. Well, all I am interested in is a system call/VOP call so the NFSv4.2 clien= t can do a file copy locally on the NFS server instead of doing Reads/Writes across = the wire. The current code has gotten fairly complex, so I'll try and ask "how comple= x" this syscall/VOP call should be? The range of variants I can think of are: 0) - Don't do it at all. 1) - The syscall could just do a VOP_COPY_FILE_RANGE() and return whatever = error it returns. --> This implies an error return for all file systems for now, with= support for=20 NFSv4.2mounts being added later (FreeBSD13 hopefully). 2) - The syscall could fall back on a simple copy loop, but not try to deal= with holes. --> The Linux man page mentions using copy_file_range(2) in a loop w= ith lseek(SEEK_DATA)/lseek(SEEK_HOLE) for sparse files. This sugge= sts that the Linux fallback code doesn't try to handle holes. 3) - The current patch which tries to handle holes and copy the entire byte= range in one call. As sef@ mentions, there is also the question of handling copying across mul= tiple file systems. I asked about this before and I only got the one response, wh= ich was "do it". I have seen a discussion of adding cross-mount to the syscall for = Linux, but I don't know if/when the Linux one might support that. (They have not creat= ed a "flag" option for this, as far as I've seen.) It happens without additional complexity for #2 and #3 above. Linux discussions have talked about improved performance for local file sys= tems based on reduced # of system calls, but I have not seen any data to show wh= at, if any, performance improvement has been observed. (The slow hardware I hav= e to test on won't be useful for performance evaluation.) So, what do others think w.r.t. the above? rick