From owner-freebsd-current@freebsd.org Sun Jul 7 21:09:28 2019 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EEC6415CAA51 for ; Sun, 7 Jul 2019 21:09:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660071.outbound.protection.outlook.com [40.107.66.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5E98B7597C; Sun, 7 Jul 2019 21:09:26 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTXPR0101MB1824.CANPRD01.PROD.OUTLOOK.COM (52.132.34.14) by YTXPR0101MB1247.CANPRD01.PROD.OUTLOOK.COM (52.132.35.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2052.18; Sun, 7 Jul 2019 21:09:24 +0000 Received: from YTXPR0101MB1824.CANPRD01.PROD.OUTLOOK.COM ([fe80::6c31:6056:6176:730b]) by YTXPR0101MB1824.CANPRD01.PROD.OUTLOOK.COM ([fe80::6c31:6056:6176:730b%3]) with mapi id 15.20.2052.020; Sun, 7 Jul 2019 21:09:24 +0000 From: Rick Macklem To: Konstantin Belousov CC: Jilles Tjoelker , "freebsd-current@FreeBSD.org" , Alan Somers Subject: Re: should a copy_file_range(2) syscall be interrupted via a signal Thread-Topic: should a copy_file_range(2) syscall be interrupted via a signal Thread-Index: AQHVMsfe501Pgm1HEESW4i3o9xExQKa8STsAgAAFAQCAADEg0oAAB/iAgAMg854= Date: Sun, 7 Jul 2019 21:09:24 +0000 Message-ID: References: <20190705173054.GA30404@stack.nl> <20190705174848.GG47193@kib.kiev.ua> , <20190705211309.GI47193@kib.kiev.ua> In-Reply-To: <20190705211309.GI47193@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: a4a24950-e87d-4065-e494-08d7031f63c8 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:YTXPR0101MB1247; x-ms-traffictypediagnostic: YTXPR0101MB1247: x-ms-exchange-purlcount: 1 x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0091C8F1EB x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(136003)(39850400004)(366004)(396003)(199004)(189003)(51444003)(1411001)(46003)(74482002)(6916009)(53936002)(186003)(99286004)(7696005)(6246003)(256004)(25786009)(54906003)(14454004)(8936002)(305945005)(316002)(14444005)(786003)(81156014)(81166006)(86362001)(966005)(52536014)(5660300002)(102836004)(8676002)(55016002)(6506007)(11346002)(6306002)(74316002)(2906002)(476003)(66946007)(76116006)(66556008)(9686003)(66476007)(64756008)(66446008)(73956011)(6436002)(76176011)(33656002)(68736007)(4326008)(486006)(71200400001)(229853002)(446003)(71190400001)(478600001)(21314003); DIR:OUT; SFP:1101; SCL:1; SRVR:YTXPR0101MB1247; H:YTXPR0101MB1824.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: LmoDteMLpC0Z1/VvzMx/OEQoTtUU87RWeTRyw34BX3SBIAgo2hM7RWCXU2BfNkxCZTqPHgJW/SgCQ6cID0r4uiyTH9MeORqO6vu3hcqF4aS2LF8Pkhswk8VvfEO+Sg6ElJ98gTRVvjcx0OfM3XK/B0kdG5lIBMZsv7jeQcZz65j3ZCDrRVqTOweNrecpnIMnFCyd28pC37ILziK67fkZY4+ucwa9anR+So6W/l8FAwX+gApM/cvz8UM7nw7+QpdMbc/hknju9KVZNYq7NQhQqG9Lm7gGdZMbwQJZzHepiPL9D6BXF/lxzZQTf9VWY+2IyTCAvtWq3EwTCf+Fi2QNpWiryVxoE86W6aV+3hzuqOfOzE3LClTmhaOQ5zfna05OuEvYphwow/4UQhczNBU+GGOW1gEq5yC5xECPmkNElr0= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: a4a24950-e87d-4065-e494-08d7031f63c8 X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Jul 2019 21:09:24.8124 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rmacklem@uoguelph.ca X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTXPR0101MB1247 X-Rspamd-Queue-Id: 5E98B7597C X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.71 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-3.74 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com]; NEURAL_HAM_SHORT(-0.41)[-0.406,0]; RCVD_IN_DNSWL_NONE(0.00)[71.66.107.40.list.dnswl.org : 127.0.3.0]; IP_SCORE(-1.03)[ipnet: 40.64.0.0/10(-2.89), asn: 8075(-2.18), country: US(-0.06)]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; RCVD_TLS_LAST(0.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jul 2019 21:09:28 -0000 Konstantin Belousov wrote: >On Fri, Jul 05, 2019 at 08:59:23PM +0000, Rick Macklem wrote: >> Konstantin Belousov wrote: >> >On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote: >> >> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote: >> >> > I have been working on a Linux compatible copy_file_range(2) syscal= l >> >> > (the current code can be found at https://reviews.freebsd.org/D2058= 4). >> >> >> >> > One outstanding issue is how it should deal with signals. Right now= , I >> >> > have vn_start_write() without PCATCH, so that it won't be interrupt= ed >> >> > by a signal, but I notice that vn_write() {ie. write syscall } does >> >> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is >> >> > called without IO_NODELOCKED. >> >> >> >> A regular write() is only interruptible when writing to a terminal, >> >> pseudo-terminal master, pipe, socket, or, under certain conditions, a >> >> file on an NFS intr mount. Therefore, applications may not have the c= ode >> >> to resume interrupted writes to regular files gracefully. >> Yes, agreed. Since this syscall only works on VREG vnodes, the only weir= d cases >> are NFS (and maybe fuse). I'll let asomers@ address the fuse situation. >> >> >> >> >> > I am thinking that copy_file_range(2) should do this also. >> >> > However, if it returns an error, it is impossible for the caller to >> >> > know how much of the data range got copied. >> >> >> >> A regular write() returns partial success if interrupted by a signal >> >> when it has already written something. Therefore, the application can >> >> resume the operation by adjusting pointers and counts. >> >> >> >> Something similar applies to "deterministic" errors like [EFBIG] wher= e >> >> the first call will write as far as possible (if this is not nothing) >> >> successfully and the next attempt will return the error. >> >> >> >> > What do you think the copy_file_range(2) code should do? >> >> >> >> I'm not sure it should actually be done, but the need for adjusting >> >> pointers and counts could be avoided with a little extra kernel and l= ibc >> >> code. The system call would receive an additional argument pointing t= o >> >> an off_t that indicates how many bytes previous calls have already >> >> written. A libc wrapper would initialize this to 0. With this, the >> >> system call can be restarted automatically after a signal. >> >> >> >> In any case, [EINTR] and the internal ERESTART must not be returned >> >> unless it is safe to repeat the call with the same (direct) arguments= . >> Well, since the copy_file_range(2) syscall is allowed to return fewer by= tes copied >> than requested and this doesn't mean EOF, it seems that doing that would >> achieve the result of allowing an application to call it again. >> (Basically, it must be used in a loop until the bytes of the range have = been copied, >> since returning fewer bytes copied than requested is a normal outcome.) >> >> >BTW, if the syscall is made interruptible, it should be made cancellabl= e ? >> Not sure what you mean by "cancellable"? If you mean "terminated by a si= gnal >> where there has been no change to the output file, then that could only = easily be >> done by returning EINTR before any data has been copied. >> If you mean something else, then I'd need to know what that is? >See pthread_setcancelstate(3) for start, but the POSIX 1003.1-2017 >2.9.5 Thread Cancellation is the definitive spec, including the quite >readable overview. Ok, thanks. That explains why cancellation of NFSv4.2 Copy operations are d= efined the way they are. >> >> >I think that PCATCH commonly used for vn_start_write(9) is not the best >> >decision. It is safe in the sense explained by Jilles, since its inter= ruption >> >only happens at the very beginning of the syscall, but it contradict to= the >> >tradition of write(2) to the local fs being not interruptible. >> > >> >I suggest to not make the syscall interruptible by default, and perhaps >> >only allow it with a flag. Then you would need to explain that the >> >syscall is only interruptible between VOPs, it is up to fs to decide if >> >the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs). >> This is how it is coded now. The one thing I have noticed is that a >> copy_file_range() can take a long time (about 2min for 2Gbytes on the ol= d hardware >> I test on). This seems like a long delay for C when you do that to= an application >> copying a large file. ("cp" and "dd" also take 2min for 2Gbytes, so it i= sn't a bug >> in copy_file_range(2). It just introduces a long delay in response to C.) >That long delay is inconvenience but not something that we should spent >too much time trying to fix. We cause the same delay if program does a >write(2) of several GB, or when very large process like firefox dumps >core. Well, I am happy to leave the patch the way it is now, where the only case EINTR/ERESTART is returned is if the VOP_xxx() call for the underlying file system has returned it (such as an NFS mount with "intr" option). Thanks, rick