From owner-freebsd-current@freebsd.org Fri Jul 5 20:59:25 2019 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 35F2015D46B0 for ; Fri, 5 Jul 2019 20:59:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670082.outbound.protection.outlook.com [40.107.67.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C058989659; Fri, 5 Jul 2019 20:59:24 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM (10.165.219.7) by YTXPR01MB0286.CANPRD01.PROD.OUTLOOK.COM (10.165.219.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.20; Fri, 5 Jul 2019 20:59:23 +0000 Received: from YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM ([fe80::9cc8:c3b7:19c2:7baf]) by YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM ([fe80::9cc8:c3b7:19c2:7baf%4]) with mapi id 15.20.2032.022; Fri, 5 Jul 2019 20:59:23 +0000 From: Rick Macklem To: Konstantin Belousov , Jilles Tjoelker CC: "freebsd-current@FreeBSD.org" , Alan Somers Subject: Re: should a copy_file_range(2) syscall be interrupted via a signal Thread-Topic: should a copy_file_range(2) syscall be interrupted via a signal Thread-Index: AQHVMsfe501Pgm1HEESW4i3o9xExQKa8STsAgAAFAQCAADEg0g== Date: Fri, 5 Jul 2019 20:59:23 +0000 Message-ID: References: <20190705173054.GA30404@stack.nl>,<20190705174848.GG47193@kib.kiev.ua> In-Reply-To: <20190705174848.GG47193@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 92cb7d7b-716c-4e53-66ae-08d7018ba866 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:YTXPR01MB0286; x-ms-traffictypediagnostic: YTXPR01MB0286: x-ms-exchange-purlcount: 1 x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-forefront-prvs: 008960E8EC x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(396003)(39860400002)(366004)(136003)(51444003)(199004)(189003)(66476007)(7696005)(64756008)(76176011)(66946007)(46003)(66446008)(66556008)(76116006)(86362001)(25786009)(52536014)(91956017)(478600001)(4326008)(54906003)(186003)(81156014)(2906002)(476003)(486006)(966005)(256004)(55016002)(102836004)(786003)(316002)(6246003)(68736007)(110136005)(81166006)(8676002)(99286004)(305945005)(229853002)(8936002)(14444005)(446003)(11346002)(53936002)(74316002)(6506007)(33656002)(6306002)(71200400001)(14454004)(5660300002)(9686003)(6436002)(73956011)(71190400001)(74482002); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR01MB0286; H:YTXPR01MB0285.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: sN3jl6UnvK4yi33jhUKypLCBdogRTshv/BOX7e14g57UWCcSyR7oOBmv7ObPHzm+/wfIpVaZ/zit2EXEAGT3GE1DNpS3k8XwZdehqDVL3FKAuTsG2ArZgMBCpkV/pckubPKs1byuCyK2VZ4+D2GTfBWnK0E8wjsBhuA4YoxKwVMz69SIgExUWbeXc2TMCSRfeyHH7crog7GrreFn0wsGwkr8hfoUGIhWRZy0AgSUM19fYm9ZGM1qxOsCoLmA7v/1Li8YfUja41a/0AoICDS1VWE4j1ZnKrMqxzzaqHFdKLYJVtu1tAD9bYAmVvOVyhA++dDJtgUTgowI3QGsEV9WV5LJX+CXNVCTPl94xYJIs4MY+4hMuWZYX6u8TApmcyR1g6jqIHvJCPUpr0001uzcW1gfW6Xv0NOZuUX9tj8g4KY= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 92cb7d7b-716c-4e53-66ae-08d7018ba866 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2019 20:59:23.2668 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rmacklem@uoguelph.ca X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR01MB0286 X-Rspamd-Queue-Id: C058989659 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.94 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.94)[-0.943,0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2019 20:59:25 -0000 Konstantin Belousov wrote: >On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote: >> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote: >> > I have been working on a Linux compatible copy_file_range(2) syscall >> > (the current code can be found at https://reviews.freebsd.org/D20584). >> >> > One outstanding issue is how it should deal with signals. Right now, I >> > have vn_start_write() without PCATCH, so that it won't be interrupted >> > by a signal, but I notice that vn_write() {ie. write syscall } does >> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is >> > called without IO_NODELOCKED. >> >> A regular write() is only interruptible when writing to a terminal, >> pseudo-terminal master, pipe, socket, or, under certain conditions, a >> file on an NFS intr mount. Therefore, applications may not have the code >> to resume interrupted writes to regular files gracefully. Yes, agreed. Since this syscall only works on VREG vnodes, the only weird c= ases are NFS (and maybe fuse). I'll let asomers@ address the fuse situation. >> >> > I am thinking that copy_file_range(2) should do this also. >> > However, if it returns an error, it is impossible for the caller to >> > know how much of the data range got copied. >> >> A regular write() returns partial success if interrupted by a signal >> when it has already written something. Therefore, the application can >> resume the operation by adjusting pointers and counts. >> >> Something similar applies to "deterministic" errors like [EFBIG] where >> the first call will write as far as possible (if this is not nothing) >> successfully and the next attempt will return the error. >> >> > What do you think the copy_file_range(2) code should do? >> >> I'm not sure it should actually be done, but the need for adjusting >> pointers and counts could be avoided with a little extra kernel and libc >> code. The system call would receive an additional argument pointing to >> an off_t that indicates how many bytes previous calls have already >> written. A libc wrapper would initialize this to 0. With this, the >> system call can be restarted automatically after a signal. >> >> In any case, [EINTR] and the internal ERESTART must not be returned >> unless it is safe to repeat the call with the same (direct) arguments. Well, since the copy_file_range(2) syscall is allowed to return fewer bytes= copied than requested and this doesn't mean EOF, it seems that doing that would achieve the result of allowing an application to call it again. (Basically, it must be used in a loop until the bytes of the range have bee= n copied, since returning fewer bytes copied than requested is a normal outcome.) >BTW, if the syscall is made interruptible, it should be made cancellable ? Not sure what you mean by "cancellable"? If you mean "terminated by a signa= l where there has been no change to the output file, then that could only eas= ily be done by returning EINTR before any data has been copied. If you mean something else, then I'd need to know what that is? >I think that PCATCH commonly used for vn_start_write(9) is not the best >decision. It is safe in the sense explained by Jilles, since its interrup= tion >only happens at the very beginning of the syscall, but it contradict to th= e >tradition of write(2) to the local fs being not interruptible. > >I suggest to not make the syscall interruptible by default, and perhaps >only allow it with a flag. Then you would need to explain that the >syscall is only interruptible between VOPs, it is up to fs to decide if >the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs). This is how it is coded now. The one thing I have noticed is that a copy_file_range() can take a long time (about 2min for 2Gbytes on the old h= ardware I test on). This seems like a long delay for C when you do that to an= application copying a large file. ("cp" and "dd" also take 2min for 2Gbytes, so it isn'= t a bug in copy_file_range(2). It just introduces a long delay in response to C.) rick