From owner-freebsd-fs@freebsd.org Thu Jun 13 21:44:05 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0108915BDFA7 for ; Thu, 13 Jun 2019 21:44:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670077.outbound.protection.outlook.com [40.107.67.77]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0FA386FB48; Thu, 13 Jun 2019 21:44:02 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM (52.132.93.160) by YQXPR01MB2743.CANPRD01.PROD.OUTLOOK.COM (52.132.92.94) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1987.12; Thu, 13 Jun 2019 21:44:01 +0000 Received: from YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM ([fe80::4882:9001:520a:7453]) by YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM ([fe80::4882:9001:520a:7453%5]) with mapi id 15.20.1987.012; Thu, 13 Jun 2019 21:44:01 +0000 From: Rick Macklem To: "freebsd-fs@freebsd.org" CC: "kib@freebsd.org" , Alan Somers , Brooks Davis Subject: RFC: should the copy_file_range() syscall be atomic? Thread-Topic: RFC: should the copy_file_range() syscall be atomic? Thread-Index: AQHVIi1PBuOR1YGquUCvHoiFxEOa/Q== Date: Thu, 13 Jun 2019 21:44:01 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e5a99c37-15b7-46db-603c-08d6f0483f92 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:YQXPR01MB2743; x-ms-traffictypediagnostic: YQXPR01MB2743: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0067A8BA2A x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(979002)(366004)(39860400002)(136003)(376002)(396003)(346002)(199004)(189003)(99286004)(74482002)(46003)(8936002)(2351001)(2501003)(186003)(81166006)(6916009)(53936002)(81156014)(86362001)(786003)(316002)(8676002)(33656002)(54906003)(25786009)(4326008)(478600001)(76116006)(64756008)(66446008)(66946007)(73956011)(66476007)(14454004)(68736007)(71200400001)(5660300002)(71190400001)(450100002)(52536014)(486006)(55016002)(102836004)(7696005)(6506007)(9686003)(66556008)(305945005)(74316002)(256004)(6436002)(14444005)(476003)(2906002)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1101; SCL:1; SRVR:YQXPR01MB2743; H:YQXPR01MB3128.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: +ZOBprmbYepkkFq7NgfgH55SGwyUIBz1LwlHGeG7G2km5WE3JvYFbhJVqMG96GSyvTUjNJAiUGjZdoJm6UQv5aesifPvJ1AmSUqpFFgTkpNw7+aVLh8HWFH9eWbw4J8EwpcMWWyCcmaAGuOObL0KA4VnRnSl9ATrkUzSKJCAeW23ca2ymCWHk5900RqzlARBXv2LWAzmV55DBTMjv4sDm+V1EMRjBsblXw1AjY5kW/JC3E6dLQ4OYnSSluwDFzn7b0PLf2V9rVRhxPNJ3wztgMQCVlW22C4HV8XYYQ1mr/rzLuvr5gH6ar8j+spDowl/YOBctvctbKz6a7kFWxyNz0SThLxONaJre1VI/6lBF/yB2BBR6G9PzBCGs4SLIlnM2zxHbiQrGZnlZWtV+UHwuokn1PDA111KLUGDmtU40cc= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: e5a99c37-15b7-46db-603c-08d6f0483f92 X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Jun 2019 21:44:01.3453 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rmacklem@uoguelph.ca X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB2743 X-Rspamd-Queue-Id: 0FA386FB48 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.67.77 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-2.95 / 15.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; SUBJECT_ENDS_QUESTION(1.00)[]; DMARC_NA(0.00)[uoguelph.ca]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3]; IP_SCORE(-1.02)[ipnet: 40.64.0.0/10(-2.86), asn: 8075(-2.16), country: US(-0.06)]; MX_GOOD(-0.01)[mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com,mx2.hc184-76.ca.iphmx.com,mx1.hc184-76.ca.iphmx.com]; NEURAL_HAM_SHORT(-0.63)[-0.626,0]; RCVD_IN_DNSWL_NONE(0.00)[77.67.107.40.list.dnswl.org : 127.0.3.0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.64.0.0/10, country:US]; RCVD_TLS_LAST(0.00)[] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Jun 2019 21:44:05 -0000 When I first wrote the copy_file_range() syscall, the updating of the file = was atomic, because I locked both vnodes while doing it. kib@ quickly pointed out that this could introduce a LOR/deadlock because t= wo userland threads could concurrently do the syscall with the file arguments = reversed. It turns out that locking two VREG vnodes concurrently to do this isn't eas= y and would require the implementation of non-blocking versions of: vn_rangelock_rlock() and vn_rangelock_wlock() - I am not sure how difficult doing this is, but I'll admit I'd rather no= t do this. Also, having the vnodes locked precludes use of VOP_IOCTL(..FIOSEEKDATA/ FIOSEEKHOLE..) calls to find holes in the byte range of the file being copi= ed from. Without the vnodes locked, it is possible for other threads to write to eit= her of the files concurrently with the copy_file_range(), resulting in an indeterminat= e results. (cp(1) has never guaranteed atomic copying of files, so is it needed in thi= s syscall?) In summary, doing the syscall non-atomically has the advantages of: - vn_rdwr() takes care of the vnode locking issues, so any changes w.r.t. l= ocking wouldn't require changes to this syscall's code. - VOP_IOCTL() could be used to find holes. - No new rangelock primitives need to be added to the system. - If there were some combination of file system/storage unit where copying non-overlapping byte ranges concurrently could result in better performan= ce, then that could be done. (An atomic syscall would serialize them.) The advantage of an atomic syscall would be consistency with write(2) and r= ead(2) behaviour. The following comments are copied from phabricator: kib@ - So you relock range for each chunk ? This defeats the purpose of the= range locking. Should copy_file_range() be atomic WRT other reads and writ= es ? asomers@ - That has an unfortunate side effect: copy_file_range is no longe= r atomic if you drop the vnode locks and range locks in the middle. It woul= d be possible for two copy_file_range operations to proceed concurrently, l= eaving the file in a state where each of the operations was partially succe= ssful. A better solution would be to add rangelock_trywlock and rangelock_t= ryrlock. I don't think it would be very hard (testing them, however, could = be). I don't see anything in the Linux man page w.r.t. atomicity, so I am now as= king what others think? (I'll admit I'm biased towards non-atomic, since I have already coded it an= d can use the VOP_IOCTL() calls to find the holes in the input file, but...) rick