From owner-freebsd-fs@freebsd.org Thu Jan 14 22:30:50 2021 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 492414D0228 for ; Thu, 14 Jan 2021 22:30:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660087.outbound.protection.outlook.com [40.107.66.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DGzY50mHGz3LBx for ; Thu, 14 Jan 2021 22:30:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BwsQylWb8ubkdP5CS7PP/iDi9qU6oFUWqKhV5zeGvym4Onx0x+6qjfoPVliw6begu9q0FA0j+rIYmIcqql5w8//G/44NM2LQ2HtyaFd2jbYgvmMz6roRXNjB/Ej47NH/k/64ko3tVIrhKwx8yS8Nk+L42nRWhNjNuifBAh+lPEDF8sP73kTtkXTzJE2BYRg0Nwo/c1pNYRmtppxTY8nYGRM5Gd3syM+f7VQ2/2fDw/n63ZN4VakolSqfxVpcziTdzqHlfxG5gqWvp4RHfC7rZYhBJFv+lFkhoemKTgBEp6cEd9NqyxzPamTe5zXALqSce0nwV+ey9CIe/8GN5ou/UA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QlCff2JooXpJTfI4GMgLN+UKlKl7UldM//evEkyA2DM=; b=Fm61xpEvGsPWr2iSXouHtQ1ogVVmklrYQA/e0eAMHLP+L8JStampgOfBRcqYVVL9tihHHbA54jd0YMiOpFzfuE28DaRveBteqEyXl/aMhEZnZGzew97GefLLWmmGo7QoNe5N2D+TGbc409twPhJWwaDLYjvSvYgBgRsMSWzvZ3G4LRyT08TZLchmSEwQjNtuSQt0y9TfywUdynKeVhMU8vduTRsh4vk5vw/tzAniqdg7BCzGfpgyaYuRAFWDGfVK8106WGsjzOt3sc+Cfh41xfR89/9KPKt/Zqdxuj1WLHKo1v3Kp++KQbiv/ZpNBFxS4YmuBxXiQvJuOOfN6aIvSg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QlCff2JooXpJTfI4GMgLN+UKlKl7UldM//evEkyA2DM=; b=YRqd1FcCrxmAQn8uJokQuwQSNWIwqey/d7GIjSX8T6MYhxM41iefSicZH4NZSf264A0hTp2EwdVXgAGQDDntK8OMVCjGWSrCFDpKl0w4reTKpFVkBFx6kzgDvDxwP55NxQjO6RzULr7hXYBADnzTYIWsLS8/xVixmBuSkxzjQGNrRKWU//n1vuYjGXRFOG0oe0UfsH59+IVP8IspiG01qe0OBIdYhSi5nIprj6ugucgEW4AScNPBnnmtHcdDaPwr7Ps12eMAqK4Xm37QJjL6uwueNF+yLqbdLfRvuxDcKBBaSur3wUi8ahw5JukO+xi3m+fZaIMIMTSJKAJ4urvdmQ== Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:19::29) by YQXPR01MB2664.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:44::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.9; Thu, 14 Jan 2021 22:30:47 +0000 Received: from YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::3d86:c7f9:bc4c:40c0]) by YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM ([fe80::3d86:c7f9:bc4c:40c0%6]) with mapi id 15.20.3763.010; Thu, 14 Jan 2021 22:30:47 +0000 From: Rick Macklem To: J David CC: Konstantin Belousov , "freebsd-fs@freebsd.org" Subject: Re: Major issues with nfsv4 Thread-Topic: Major issues with nfsv4 Thread-Index: AQHWzw/HDat+dHoH9kKG5K3Xpd53kqnxDteQgAFi0QCAABTa84AALLCAgAAVvcmAAAu0AIAA4wiAgAAI5gCAAO/IAIAAiDrmgABWOgCAAGv4AIAEv9nSgCyfuACAAIn3Cw== Date: Thu, 14 Jan 2021 22:30:47 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9f142443-1b04-413c-4b30-08d8b8dc0a09 x-ms-traffictypediagnostic: YQXPR01MB2664: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 5m5tOy1TSYrvP/eU/q33zXIYhYSMMIcDkOqYkzthSq7PgunLIGG/DrRGchUfXihqw83NwcrnyNnZjWgN3bfzRV1fq31iamHPx53wnBj5mj377GNv5tVypdDrFJFu/yOEP5IiW8vHzXXJihByinskYLXaFRkEwQ5FM38FwuVByuhyEQ8ryO8B7mrNJH4by74DxDt/Wk+bJveOugnnHywvfJDDkBx/KaxtlkH4/u0WjWnRPgL6jxFSlyPHOZ2KxLJtS183K65NtO5wTOTnd1ezuehHDvgx0NY5ZXcKEdJzAAOTHsSju6xeqBugN1AVuDmYFcecmjVBHk2jZD8ZrQpBAX9wyzofp/epNgVXuAWz9aMWodOK1Fh2JXINIe/UKCZp1Z1f0/mQ2SjPw5VRIg0I4w== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(376002)(39860400002)(136003)(366004)(396003)(346002)(5660300002)(66476007)(4326008)(54906003)(7696005)(186003)(91956017)(66446008)(64756008)(66946007)(6506007)(86362001)(76116006)(33656002)(66556008)(478600001)(83380400001)(8936002)(8676002)(9686003)(52536014)(2906002)(316002)(55016002)(786003)(71200400001)(6916009); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?Q?//PSsuSV24CzNSvFQJf3pZQ41WXCzt3rk4NDPL+p6WGn9Md0n6iaBKqRKA?= =?iso-8859-1?Q?xErbeVyGFt/hqATaPqjfCHlcsytg5yW6XBjPt+bu24dtY8/d+6k6mXtnpI?= =?iso-8859-1?Q?4Bm6H9ZGrFcZLAhwJwZw6lQZB6Mgo8gxWnLEPQO1kkULZzSVmb5z7Bg76Q?= =?iso-8859-1?Q?f4xeUtIxfPvwuPwF1BxlOg3ZLX7Yci7VyaVMGQu/92rVPgOt/nnDuHMthw?= =?iso-8859-1?Q?C4Ya7PjSlChDuIASsAJEhnMkKHfv0nv5XUtU+Jrpa7WJ2/O5r+n8RSyJdp?= =?iso-8859-1?Q?iWBPc1V5hxkY+kkfprT31GpcEJs2yYkWe3rgYdGiOtdyDLND/7SkEzdjSC?= =?iso-8859-1?Q?eAVWUVehRG/05J3404Cg1jmyytIA+5RDNLBzlIQKL0PscNoTCX1cEx5fb/?= =?iso-8859-1?Q?KqdiRp/KWNLodD3m2Iz/mw0JyXT1Rb1RBgtsJMrPjeUujNvN8QDUmKhSMU?= =?iso-8859-1?Q?E4zk9H9SMeoy3yE1uSqKrnaIX+8wSavMR6dR2E/2Ojh4zXG9kshXOV5X9K?= =?iso-8859-1?Q?N3xm/TckOlCVG7waKHZ4n43DoAH6ck/6tcLx3P0sH8V+wtD5Jh6R67Rqb7?= =?iso-8859-1?Q?ikqeA1s7Dem7JE/v1qxBtw4L0AZktrP8w3/r6PepTOa3eh4FIOtSAB7WQ0?= =?iso-8859-1?Q?BFWOF5VD6bB0RjHhp7MD1S0Q3M3SB/ZRUZehnxc0elUHEz2a9+JTZi2aBD?= =?iso-8859-1?Q?tr4Z7nkKVZC0W14zw4XOkQw8++Rw/QdyWnsFOGoy4+c81kd7UEUaUf14ca?= =?iso-8859-1?Q?DZMixWq+UDpdzBEpmyQYDXzWQj45OmoP6JHPnIvRqkYrpHyJ27Imk7O8HY?= =?iso-8859-1?Q?JO9U2Ht7R3Bcq/Ai/pektzKB4GDmeeQFDD9DKkjGHRoBRrTWhdxJ1Kjr1e?= =?iso-8859-1?Q?V5jaPcM80R4VDjwlhEyd6gx9iS4V8BQB+OugmsZfDbHY/QFtlNqr0QM0Ts?= =?iso-8859-1?Q?2VijlmBfp4n19CDDMeg6BtGm9/R9/mJW+RPn74K8TUpsCkREfACv1lt5KP?= =?iso-8859-1?Q?PxvniE3a3FlUNgCzygUBfnxGr6FkGfiAGie+ZOjo6FiG8cmV/fmJ4wa9qk?= =?iso-8859-1?Q?bt+oVtKZoFFEnBi6kscA4cU=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 9f142443-1b04-413c-4b30-08d8b8dc0a09 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Jan 2021 22:30:47.3222 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: ASYpXV7PoohejODQXdzrxm+J+BfKQxy9FK8Q8B5BaSr9sxNspYNKLs7YktowisvN6/8CCfhM6mFypszhPaz1Eg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB2664 X-Rspamd-Queue-Id: 4DGzY50mHGz3LBx X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=YRqd1FcC; arc=pass (microsoft.com:s=arcselector9901:i=1); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.87 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[40.107.66.87:from]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; SPAMHAUS_ZRD(0.00)[40.107.66.87:from:127.0.2.255]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.87:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.87:from]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; MAILMAN_DEST(0.00)[freebsd-fs] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jan 2021 22:30:50 -0000 J David wrote:=0A= >On Wed, Dec 16, 2020 at 11:25 PM Rick Macklem wrote= :=0A= >> If you can do so when the "Opens" count has gone fairly high,=0A= >> please "sysctl vfs.deferred_inact" and let us know what that=0A= >> returns.=0A= >=0A= >$ sysctl vfs.deferred_inact=0A= >sysctl: unknown oid 'vfs.deferred_inact'=0A= >$ sysctl -a vfs | fgrep defer=0A= >$=0A= Yes. I did not realize how different FreeBSD12 is when compared with FreeBS= D13/head=0A= in this area.=0A= At a quick glance, I do not see where the syncer tries to vinactive() vnode= s where=0A= the VOP_INACTIVE() has been deferred.=0A= =0A= --> It is possible that this problem is fixed in FreeBSD13/head.=0A= Any chance you can test a FreeBSD13/head system?=0A= =0A= Kostik, does FreeBSD12 try to vinactive() deferred VOP_INACTIVE() vnodes vi= a the=0A= syncer?=0A= =0A= >Sorry for the delay in responding to this. I got my knuckles rapped=0A= >for allowing this to happen so much.=0A= >=0A= >It happened just now because some of the "use NFSv4.1" config leaked=0A= >out to a production machine, but not all of it. As a result, only the=0A= >read-only "job binary" filesystems were mounted with nullfs+nfsv4.1.=0A= >So it is unlikely to be related to creating files. Hopefully, that=0A= >narrows things down.=0A= >=0A= >$ sudo nfsstat -E -c=0A= >[...]=0A= > OpenOwner Opens LockOwner Locks Delegs LocalOwn=0A= > 37473 303469 0 0 1 0=0A= >[...]=0A= >=0A= >"nfscl: never fnd open" continues to appear regularly on=0A= >console/dmesg, even at the end of the reboot:=0A= Not sure what this implies. The message means that it cannot find=0A= a NFSv4 Open to Close.=0A= It may indicate something is broken in the client, but is not by itself, se= rious.=0A= =0A= >Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... d= one=0A= >Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... d= one=0A= >Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... d= one=0A= >Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... d= one=0A= >All buffers synced.=0A= >nfscl: never fnd open=0A= >nfscl: never fnd open=0A= >nfscl: never fnd open=0A= >nfscl: never fnd open=0A= >nfscl: never fnd open=0A= >nfscl: never fnd open=0A= >Uptime: 4d13h59m27s=0A= >Rebooting...=0A= >cpu_reset: Stopping other CPUs=0A= >---<>---=0A= >=0A= >It did not appear 300,000 times, though. More like a few times a day.=0A= >=0A= >Also, I set up an idle system with the NFSv4.1+nullfs config, as=0A= >requested. It has been up for 32 days and appears not to have leaked=0A= >anything. But it does also have a fistful of those "nfscl: never fnd=0A= >open" messages.=0A= >=0A= >There is also a third system in a test environment with the=0A= >nullfs+nfsv4.1 config. That system is up 34 days, has no exhibited=0A= >problems, and shows this:=0A= >=0A= > OpenOwner Opens LockOwner Locks Delegs LocalOwn=0A= > 342 15098 2 0 0 0=0A= >=0A= >That machine shows one "nfscl: never fnd open" in the dmesg.=0A= >=0A= >A fourth system has the NFSv4.1-no-nullfs config in production with=0A= >net.inet.ip.portrange.lowlast tweaked and a limit on simultaneous=0A= >jobs. That system had issues requiring a restart 18 days ago. It also=0A= >occasionally gets "nfscl: never fnd open" in the dmesg and has=0A= >relatively large Open numbers:=0A= >=0A= >As of right now:=0A= > OpenOwner Opens LockOwner Locks Delegs LocalOwn=0A= > 23214 46304 0 0 0 0=0A= >=0A= >The "OpenOwner" value on that system seems to swing dramatically,=0A= >ranging between 45,000 to 10,000 in just a few minutes. It appears to=0A= >correlate well to simultaneous jobs.=0A= This sounds normal, since an OpenOwner refers to a process on the client=0A= doing a file Open.=0A= =0A= > The "Opens" value goes up and=0A= >down a bit, but trends upward over time. However, when I found and=0A= >killed one long-running job and unmounted its filesystems, "Opens"=0A= >dropped 90% to around 4600. Note there are *no* nullfs mounts on that=0A= >system. So nullfs may not be a necessary component of the problem.=0A= This also sounds reasonable. The NFSv4 Opens can only be closed once=0A= the process doing the Open plus all chidren processes have closed the=0A= file.=0A= --> If a program is "lazy" and doesn't do closes, they won't=0A= happen until the process exits. And then children processes=0A= will also need to exit before it leaves zombie state.=0A= =0A= One thing to try (other than a FreeBSD13/head system, if possible)=0A= is the "oneopenown" mount option.=0A= --> It can only be used on NFSv4.1 mounts (not NFSv4.0) and=0A= makes the mount only use one OpenOwner for all Opens=0A= instead of a different one for each process doing an Open.=0A= --> This would reduce the number of Opens for the case=0A= where multiple processes open the same file.=0A= --> It also simplifies the search for an Open, since there=0A= is only one for each file.=0A= =0A= rick=0A= =0A= As a next step, I will try to create a fake job that opens a ton of=0A= files. Then I'll test it on the binary read-only nullfs+nfsv4.1=0A= mounts and on the system that runs nfsv4.1 directly.=0A= =0A= Thanks!=0A=