From owner-freebsd-stable@freebsd.org Mon Jul 9 02:10:44 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5361310303F9 for ; Mon, 9 Jul 2018 02:10:44 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670075.outbound.protection.outlook.com [40.107.67.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DD82A7990E for ; Mon, 9 Jul 2018 02:10:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM (52.132.44.24) by YTOPR0101MB1948.CANPRD01.PROD.OUTLOOK.COM (52.132.49.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.930.21; Mon, 9 Jul 2018 02:10:42 +0000 Received: from YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM ([fe80::7098:a543:5be8:f30e]) by YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM ([fe80::7098:a543:5be8:f30e%4]) with mapi id 15.20.0930.022; Mon, 9 Jul 2018 02:10:42 +0000 From: Rick Macklem To: Daniel Engel , "freebsd-stable@freebsd.org" Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS failed error Thread-Topic: NFS 4.1 RECLAIM_COMPLETE FS failed error Thread-Index: AQHUFwfSZzACGjFgjkKGxIUROnrv0aSGIEq3 Date: Mon, 9 Jul 2018 02:10:42 +0000 Message-ID: References: <1531087387.2543270.1433935616.5272EA26@webmail.messagingengine.com> In-Reply-To: <1531087387.2543270.1433935616.5272EA26@webmail.messagingengine.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB1948; 7:JiN+M7dYHvgz7c4ASUcGw8csK9lj0Za9JqF37HqtF6cWpusKB6QDAEBTpEPzqT9U/FZiJd0AmF1IIbC6UF2d43BjaDLtkRj6eiuDGfgLINokG6DEmGseE+b+8zL1zPZRVvcLaYafhCjUz3WZ20Eg07ki6u/94bowdY44R+NN1TfUq36KSfVZ6AaiYPsZ/V3fp2Py7jPrh0LgiZQE8h1quqJP64tuMD+JhUWXmKizX7RIEWxUAD9AT+f2W4ZFDc+M x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 00a39d73-9009-4978-5271-08d5e5412c69 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(4534165)(4627221)(201703031133081)(201702281549075)(8990107)(5600053)(711020)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB1948; x-ms-traffictypediagnostic: YTOPR0101MB1948: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3002001)(3231311)(944501410)(52105095)(149027)(150027)(6041310)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123562045)(20161123564045)(20161123558120)(6072148)(201708071742011)(7699016); SRVR:YTOPR0101MB1948; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB1948; x-forefront-prvs: 07283408BE x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(396003)(366004)(376002)(39860400002)(346002)(199004)(189003)(486006)(14454004)(5250100002)(26005)(186003)(6506007)(8936002)(6306002)(2900100001)(97736004)(81156014)(81166006)(102836004)(2501003)(966005)(229853002)(476003)(105586002)(478600001)(106356001)(76176011)(6486002)(110136005)(86362001)(8676002)(33656002)(2906002)(6436002)(6512007)(9686003)(68736007)(786003)(74316002)(11346002)(256004)(19273905006)(316002)(5660300001)(25786009)(305945005)(14444005)(74482002)(6246003)(99286004)(53936002)(446003)(563064011); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB1948; H:YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: +3f0ADi5SovcNVUqTth5Suz3Fq9H3Uj05G0esF5RgFCU8trAibgieooaNs0Rws9ZjfIHIIGY2eDpVeDT5New7MgqtDHzfgp/Ab2tu6TV8kClsYGXkUrY7DYBcp6dt1LRxdqJlBshFEdBfoY2Ua3SqsF5e6sN5ng8ScA6yzx/dki2d2PwGU+37ILAyeIq0TNwQ/eNQOf16jxC1GWcUZFq5E7CeL/QZ5tqsM2HsiitSa42TOdyGoNCorgZK4HYgoM8r0sX4Zqa1wvmx869AsaEX1nyK5fN+1Fm5z8kYnasbO+sy1pkhT+NbkcH9J4hjZRpPiyysDXa0ErmDhG9DHLaIPQV4X2Qx5g12T2doYgkzMI= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 00a39d73-9009-4978-5271-08d5e5412c69 X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jul 2018 02:10:42.2903 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1948 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jul 2018 02:10:44 -0000 Daniel Engel wrote: [stuff snipped] >I traced the commits that Rick has made since that thread and merged them = 'head' >into 'stable': > > 'svnlite checkout http://svn.freebsd.org/base/release/11.1.0/' > 'svnlite merge -c 332790 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333508 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333579 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333580 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333592 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333645 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333766 http://svn.freebsd.org/base/head' > 'svnlite merge -c 334396 http://svn.freebsd.org/base/head' > 'svnlite merge -c 334492 http://svn.freebsd.org/base/head' > 'svnlite merge -c 327674 http://svn.freebsd.org/base/head' Yes, you have all the commits to head related to the 4.1 server that might = affect the ESXi client, plus a bunch that should be harmless, but I don't think af= fect the ESXi client mounts. (Most of these will get MFC'd to stable/11, but I h= aven't gotten around to it yet.) The ones that might be in 6.7 (they were in 6.5) that may bite you are: - The client does an OpenDownGrade with all OPEN_SHARE_ACCESS and OPEN_SHARE_DENY bits set for something it calls a "drive lock". (Adding bits is supposed to be done via an Open/ClaimNull and not OpenDowngrade.) I'd really like to know if this still happens for 6.7? - Something about "directory modified too often" when doing deletion of a b= unch of files. (I have no idea what this one means, but apparently it was seen= for other NFSv4.1 servers.) - Some warnings about "wrong reason for not issuing a delegation". I have a= fix for this one in PR#226650, but they are just warnings and don't seem to matter much. The rest of the really nasty stuff happens after a server reboot. The recov= ery code seemed to be badly broken in the 6.5 client. (All sorts of fun stuff like t= he client looping doiing ExchangeID operations forever. VM crashes...) >That completely fixed the connection instability, but the NFS share was st= ill mounting >read-only with a RECLAIM_COMPLETE error. So, I manually appl= ied the first patch >from the previous thread and everything started workin= g: > > --- fs/nfsserver/nfs_nfsdserv.c.savrecl 2018-02-10 20:34:31.166445= 000 -0500 > +++ fs/nfsserver/nfs_nfsdserv.c 2018-02-10 20:36:07.947490000 -050= 0 > @@ -4226,10 +4226,9 @@ nfsrvd_reclaimcomplete(struct nfsrv_desc > goto nfsmout; > } > NFSM_DISSECT(tl, uint32_t *, NFSX_UNSIGNED); > + nd->nd_repstat =3D nfsrv_checkreclaimcomplete(nd); > if (*tl =3D=3D newnfs_true) > - nd->nd_repstat =3D NFSERR_NOTSUPP; > - else > - nd->nd_repstat =3D nfsrv_checkreclaimcomplete(nd); > + nd->nd_repstat =3D 0; I think this patch is ok to use, since no other extant client does a Reclai= mComplete with "one_fs =3D=3D true". It does kinda violate the RFC. The problem is that FreeBSD exports a hierarchy of file systems and telling= the server that one of them has been reclaimed is useless. (This hack just assu= mes the client meant to say "one_fs =3D=3D false".) There was also a case (I think it was after a server reboot) where the clie= nt would do one of these after doing a ReclaimComplete with "one_fs =3D=3D false" an= d that is definitely bogus (the server would reply NFS4ERR_ALREADY_COMPLETE without the above hack) since the "one_fs =3D=3D false" operation means all file sy= stems have been reclaimed. Anyhow, once I get some packet traces from Andreas for 6.7, I'll try and fi= gure out how to handle at least some of the outstanding issues. Good luck with it, rick