From owner-freebsd-fs@freebsd.org Sat Jan 16 22:57:40 2021 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id CC3A04EE34A for ; Sat, 16 Jan 2021 22:57:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660051.outbound.protection.outlook.com [40.107.66.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DJD375z0Nz3FSn for ; Sat, 16 Jan 2021 22:57:39 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DGNPfeauODxYQ+jJs1a/Loo909UJ9RX53WfUuC+NQ4Fb53AvT0gMU1szeGOXq43EpBEouD262mg6aeXQh7VwmBzpElgTIXR14HYx3hU1ACjtzCk6wKDRjolR3uLxTbWbXt6vkdm0F+GA274D1J6xyHycGHVqHRqkJOn4hYFE5q6v7tMnHoVQPaXg3kewXRsYP8Naon84yBSvaPXK9o+cAGTReN7QfbDBo8WibVaVwvi17H6a5NR3ixvNigAHAu1sz5gepaFzDsU4RWM6FSIrbHRl0ZsCvLa3LqZFFCJwTP25k6uFYuCA6/SRTEtcdDzUDW5ZEmNq+x9mwUPKf9jocA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DUPBAOwc1YQwEtodr25hR6mDbReloLeoaN/KmiWw9eQ=; b=i+9K2nf9nYDkvUKoPRJqB8zho+ajFSWuaNmII4HAsQ0GYFzJRMvo3cx2E/T7W6LbTWs32jFraKrNNN9hlRjMmzSgqKl6GL6/Bk1GSm+lorNx13ThwwXaFkjtIFAyLKhoYS1dBHyKyacnKxqD5WCJLHpIzhT6nWHpPHmZnEutbzRqo49+VF3Ee6rlWX31b04ShmL9raNhstIaqgxorz/WmJoZOLrpHg8ETCjejc/X2nw2iktyRu3P8jGBuCtHjlGPMjhamp318AyDffKQg1fH9VsK4bMxptWFRBYsqOqQxrA23W34ycdu5oxcSMdgcCJPPdiQfUt357uF+3dOmuyKGA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DUPBAOwc1YQwEtodr25hR6mDbReloLeoaN/KmiWw9eQ=; b=LaglVyMizeQ5PtCErlHeG3c4+2DS6rmSfQhQ4CJvLROkja233TU1sny2mUmOVSeV7/FMjJMeVToQ+jhVHBnqpcuOsZu84bCCCNkDWZYoHo13jDOzOLmm3tPf6dyy+HhK45UTzmvTHhddjaxI88MDghb5bFWDRuAifmLe+xVQbiz6FNYTSO5p8/sHoV2cbwOv33Zxsus0DjHMt3HX3uu1lTHn7Cb2UUVHmNFamhc5YY3rsY3W2gsjhZiQm6t11SG3j+mAcQ5vnYntZFS9Wd9MN9cXQBrHi9ZedOCX2Emfi5MPp32iCKAjGxw5lFlZrKqI029+9/7NKgJ07A2ampBdaA== Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:19::29) by YQBPR0101MB1761.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:8::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.10; Sat, 16 Jan 2021 22:57:37 +0000 Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::3d86:c7f9:bc4c:40c0]) by YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::3d86:c7f9:bc4c:40c0%6]) with mapi id 15.20.3763.011; Sat, 16 Jan 2021 22:57:37 +0000 From: Rick Macklem To: J David CC: Konstantin Belousov , "freebsd-fs@freebsd.org" Subject: Re: Major issues with nfsv4 Thread-Topic: Major issues with nfsv4 Thread-Index: AQHWzw/HDat+dHoH9kKG5K3Xpd53kqnxDteQgAFi0QCAABTa84AALLCAgAAVvcmAAAu0AIAA4wiAgAAI5gCAAO/IAIAAiDrmgABWOgCAAGv4AIAEv9nSgCyfuACAAIn3C4ACDQ+AgAESerU= Date: Sat, 16 Jan 2021 22:57:37 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 18151869-107f-4eea-57aa-08d8ba721eb6 x-ms-traffictypediagnostic: YQBPR0101MB1761: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: V8cMPJawYCAy3RnGp23Z14vy4v7XPe8D9DEFLbq1SdxrF0rbQKqFY0MgokkfkaBOPwpSJgl7A6KUF8fosTOV84hdFdhpsttjEIpjpRDJDGzmRraiyrHGddyvCgCPQ/G7tY57Rm7CBEFUp0QBOSFKFLGMwE4eMnXHwZ9HIaWkVNbBcUOqSiY7JuMuro0fWUoTg2yCGorZqZMyOeBeFmCyn6YScWHi6B5WXVI55ByGLx45K69MguXOIctC+AbkB5aCsQ3SEXgfGg2yb6Zz+K+H0few0pbWVT3sIIVqaBuTmbVeW60G2GjX9TK/rHimel4FkuqHMQKNYOkyvGkZ68Ub0C4EgbnkjWTx2IiD6cs0GBOt7J4adLy++4XYXe0kbYvcdMeMzmROEcZfoRMCMPgdTQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(366004)(346002)(39860400002)(396003)(136003)(376002)(2906002)(186003)(4326008)(6506007)(83380400001)(33656002)(478600001)(8936002)(86362001)(786003)(316002)(71200400001)(55016002)(9686003)(6916009)(8676002)(66446008)(64756008)(66556008)(66476007)(66946007)(52536014)(76116006)(91956017)(54906003)(5660300002)(7696005); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?Q?DiHy3wdkE++nFDKIjqWwpSZF8Rod8yxhgACJNoAQ5sUlF+n5GGD5I9FmC2?= =?iso-8859-1?Q?sIpdUCrby9I/WrNlLDDxKM527F6EAFtWY0aZS/WXwI6G69cbIq+6zSkwmG?= =?iso-8859-1?Q?RIRGjqRh4oMWb8Mp/LyxQJM5KQNq3VvpLP7agzky/Gc9ZlAf0/cDcy5FQC?= =?iso-8859-1?Q?QZiND7z7C5YNaEtaBn9sB2kMjGDBbEwrITXHuQRI7kIcNxCa+sKvPHLwv6?= =?iso-8859-1?Q?IF/q/in07Ft9LPiNUo4VKHaeI2ZBd5TRLhVZI3KFA3r5PLCfm2hkHC28jd?= =?iso-8859-1?Q?LlPbC1hUZij+ov2XX/xM861BYQ9N7GoFXHteNEfQyEjF1mV6NF5HQ+/Sey?= =?iso-8859-1?Q?hmTBeD41kLV83X2JUpzjbY5oGOy91+hsTTWpCcluzBzw2K9TRnXxVdNEUD?= =?iso-8859-1?Q?bA4lRLwV47tIRd+Z43nb0Y4M5n/gkw88DrYOxpB6YfkcciZDdNKmcmYDxn?= =?iso-8859-1?Q?fAidLaiaxJFkOhx0oI9Pf84BLDVkS02BonLvHuUTfGRJICZOWJDvyN+au6?= =?iso-8859-1?Q?QmCreccaYyQkuqvNCDjKmffU0E8JmVM8PVaeO1p/P8oWSzSERQZC3fz/WN?= =?iso-8859-1?Q?ni1eLi8Kn9VZm/4p7l1w4IfceehPs8084QJvSmfegoFV/HIbtLxpNiNAf/?= =?iso-8859-1?Q?5Trjsx/zjLGpT+41p7Gx4WJ+/5BtGl6Y1jE6cjGdn7kiDs6+DjZ9mRvPO2?= =?iso-8859-1?Q?Knt1qnC0nb7xaCPSd2SnUmWf8GLAenb5D+1hkrbNKWhYCdw7k+xqRrClQj?= =?iso-8859-1?Q?fiCsQZ997Jtz4Czgd8qaEsqP1JNBNhp1BT7Nm5OAdLg8zsyo+jw+9vxeul?= =?iso-8859-1?Q?n82ZT5zyYcLR60Db+L/N17dLzZ2INdR5OFqWWKW06KU5l5g6o1npAxu8zK?= =?iso-8859-1?Q?v8WXrHtrpyjeB/fRlqYv8qzD4loIDLMeOT5CpZvy0yXio9B8Vxl5ARFMJB?= =?iso-8859-1?Q?2CP2p8gSaoDEePZatAvQLcWpcgrk0ZPZFKh2IK9u+xrth2PDi2H+KsIL8w?= =?iso-8859-1?Q?dw0uUT+1D6iavgehUnF15DCkkS7ErrDi7RFYG4I1mhukh82z9pHpnzQY2s?= =?iso-8859-1?Q?Q3SmDwb94s/PfYj1Urnq6TU=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 18151869-107f-4eea-57aa-08d8ba721eb6 X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jan 2021 22:57:37.6789 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: NlQEly9//iylfo0XsEoURdTvxJ3JQ+lRbW2rUlIKztgeSd1NmkrusYIpj2r7f9S98p6PXS6Pk9/wb5pzaUS0YQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1761 X-Rspamd-Queue-Id: 4DJD375z0Nz3FSn X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=LaglVyMi; arc=pass (microsoft.com:s=arcselector9901:i=1); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.51 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[40.107.66.51:from]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(1.00)[1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[40.107.66.51:from:127.0.2.255]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.51:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.51:from]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; MAILMAN_DEST(0.00)[freebsd-fs] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Jan 2021 22:57:40 -0000 J David wrote:=0A= >On Thu, Jan 14, 2021 at 5:30 PM Rick Macklem wrote:= =0A= >> One thing to try (other than a FreeBSD13/head system, if possible)=0A= >> is the "oneopenown" mount option.=0A= >=0A= >The odds of being able to run an unreleased version of FreeBSD on=0A= >production servers are slim to none.=0A= Well, the current release schedule has FreeBSD13 being released at the=0A= end of March. If you set up a test system now/soon, you might be able=0A= to determine if an upgrade to FreeBSD13 is justified when it is released?= =0A= =0A= >While trying to develop a reproduction, I think I have narrowed down=0A= >what the problem is. There are no jails or nullfs involved here, just=0A= >NFSv4.1.=0A= >=0A= >Window 1: (to track OpenOwner/Opens)=0A= >=0A= >while true; do date; nfsstat -E -c | fgrep -A1 OpenOwner; sleep 1; done=0A= >=0A= >Window 2:=0A= >=0A= >mount -o ro,nfsv4,minorversion=3D1,nosuid fileserver:/path/to/freesbd/root= /mnt=0A= >chroot /mnt=0A= >=0A= >(OpenOwner is now 1 and Opens is now 9.)=0A= >=0A= >Window 3:=0A= >=0A= >chroot /mnt=0A= >(OpenOwner is now 2 and Opens is now 18.)=0A= >ls=0A= >(OpenOwner is now 3 and Opens is now 21.)=0A= >ls=0A= >(OpenOwner is now 4 and Opens is now 24.)=0A= >ls=0A= >(OpenOwner is now 5 and Opens is now 27.)=0A= >ls=0A= >(OpenOwner is now 6 and Opens is now 30.)=0A= >ls=0A= >(OpenOwner is now 7 and Opens is now 33.)=0A= >bash=0A= >while true; do ls | true; done=0A= >(Allow about a minute to pass, hit CTRL-C. OpenOwner is now 4647 and=0A= >Opens is now 13957)=0A= >exit=0A= >exit=0A= >(OpenOwner is now 4647 and Opens is now 13952.)=0A= Hmm. Not sure what files would get opened each time. NFSv4 Opens=0A= only apply to regular files, so "ls" shouldn't result in Opens.=0A= =0A= I may try this and see what the 3 files being Open'd are.=0A= (Obviously something related to the chroot. But what?)=0A= =0A= >Back in Window 2:=0A= >=0A= >exit=0A= >(wait about 30 seconds, OpenOwner is now 0 and Opens is now 0.)=0A= Yes, it could take a while to close all those opens.=0A= =0A= >So it looks like the NFSv4 code can't let go of *any* Opens on a=0A= >file/directory until *all* references to that file/directory are=0A= >closed.=0A= True for regular files, but not directories.=0A= =0A= >If chroot is too much, "vi /mnt/etc/motd" in Window 2 and "cat=0A= >/mnt/etc/motd" in Window 3 have the same effect, leaking one Open per=0A= >cat instead of 3. You probably don't even need a FreeBSD install on=0A= >the NFS mount; just hold a single file open in one window and=0A= >open/close it repeatedly in another.=0A= An NFSv4 Open is unique for open_owner/file, so the same file opened=0A= by different processes (an openowner represents a process unless you=0A= use "oneopenown") results in separate NFSv4 Opens.=0A= However, since the NFSv4 client cannot know which Open a VOP_CLOSE()=0A= is associated with, due to file descriptor inheritance, none of the NFSv4= =0A= Opens can be closed until all FreeBSD open file descriptors for the file=0A= are closed.=0A= --> Just the way it is. It is not an unintended leak. They go away once=0A= all file descriptors get closed, so long as the VOP_INACTIVE() gets= =0A= called for the NFSv4 vnode.=0A= --> It is the handling of deferred VOP_INACTIVE() calls that has=0A= changed for FreeBSD13.=0A= However, none of the above seems unexpected, except maybe for why=0A= "ls" in the chroot opens 3 regular files each time. I don't know what=0A= chroot actually does for something like "ls"? I'll look.=0A= =0A= >Then I re-tested this with "-o=0A= >ro,nfsv4,minorversion=3D1,nosuid,oneopenown." At least for this simple=0A= >case, the problem did not occur with oneopenown set.=0A= Yes. For the oneopenown case, there will only be one NFSv4 Open for=0A= each file opened.=0A= =0A= >Are there downsides to the oneopenown flag other than breaking delegations= ?=0A= Here's an example:=0A= - One process running as J David opens a file for reading, which works sinc= e=0A= J David has read permissions on the file.=0A= - Another process running as Warner opens the same file for writing, which= =0A= works, since Warner has write access for the file.=0A= =0A= Now, network partition the client from the server until the lease expires..= .=0A= The client now gets a NFSERR_EXPIRED error, which forces it to retry the=0A= open(s).=0A= =0A= Without "oneopenown", the above FreeBSD opens resulted in 2 NFSv4=0A= Opens, both of which probably reopen successfully (unless a chmod/chown/=0A= setfacl on the file makes the reopen fail).=0A= =0A= With "oneopenown", the above FreeBSD opens results in one NFSv4 Open=0A= for reading/writing. A retry of this one Open might succeed, depending on= =0A= what the file permissions are. (I think the code uses the credentials for= =0A= Warner in this case, assuming that the credentials that opened it for=0A= writing is more likely to succeed, but there are no guarantees.=0A= =0A= --> For normal operation, it should be fine. A network partition that=0A= results in NFSERR_EXPIRED is a worst case scenario, where all=0A= byte range locks will be lost and Opens may be lost.=0A= =0A= For delegations, the story is similar, but happens routinely when=0A= delegations are recalled by the server.=0A= --> For delegation recall, the reopen is done using a special variant of Op= en called=0A= claim_delegate_current. A server should allow claim_delegate_current= =0A= irrespective of what the file permissions are, but the original RFC3= 530=0A= did not make this clear.=0A= --> Is allowed by a FreeBSD NFSv4 server.=0A= As such, the warning in the man page is mainly there for NFSv4 servers=0A= where the reopen done at delegation recall time can fail, due to permission= =0A= checking.=0A= =0A= There is also the fact that the case of "oneopenown+delegations" is not wel= l=0A= tested and the current FreeBSD client code will use separate open_owners=0A= (violating the "one open owner" principal) when delegations are recalled.= =0A= =0A= Are delegations useful?=0A= - Short answer, often not.=0A= =0A= Delegations allow the client to do 2 things:=0A= 1 - More extensive file data caching, since the delegation guarantees that= =0A= other clients will not be modifying the file.=0A= This is not exploited by current clients, as far as I know. I had som= ething=0A= called Packrat, but it never made it into FreeBSD. More on Packrat be= low,=0A= for anyone interested.=0A= 2 - Do NFSv4 Opens locally in the client, avoiding the Open/Close RPCs.=0A= Since delegations are per file, this only helps if the same files get = opened=0A= over and over and over again. Doesn't happen for many loads, from what= =0A= I've seen.=0A= --> With delegations turned on, you can compare the counts for Opens= =0A= vs LocalOpens in the "nfsstat -E -c" stats, to see how many Open= /Close=0A= RPCs get saved.=0A= =0A= Packrats: Some now bitrotted code in the subversion projects area that did= =0A= whole file caching of small files in non-volatile storage on = the client=0A= when a delegation was acquired for the file.=0A= This was intended for devices like laptops, with slow/flakey = network=0A= connectivity.=0A= --> Recent FreeBSD changes might inspire me to resurrect this= .=0A= - FreeBSD has recently become more laptop friendly.=0A= - nfs-over-tls provides an easy way to make NFSv4 mount= s=0A= from anywhere (as a laptop might) relatively secure.= =0A= =0A= rick=0A= =0A= Thanks!=0A=