From owner-freebsd-stable@freebsd.org Mon Jul 9 06:48:12 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B86010254A1 for ; Mon, 9 Jul 2018 06:48:12 +0000 (UTC) (envelope-from Andreas.Nagy@frequentis.com) Received: from mail2.frequentis.com (mail2.frequentis.com [195.20.158.51]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "spamquarantine.frequentis.frq", Issuer "Frequentis Enterprise Issuing CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C20D484D4A for ; Mon, 9 Jul 2018 06:48:11 +0000 (UTC) (envelope-from Andreas.Nagy@frequentis.com) X-IronPort-AV: E=Sophos;i="5.51,329,1526335200"; d="scan'208";a="2852497" Received: from frqat01nt70.frequentis.frq ([172.16.1.70]) by mail2.frequentis.com with ESMTP; 09 Jul 2018 08:46:59 +0200 Received: from FRQAT01NT70.frequentis.frq (172.16.1.70) by FRQAT01NT70.frequentis.frq (172.16.1.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1466.3; Mon, 9 Jul 2018 08:46:59 +0200 Received: from VIE191NT.frequentis.frq (172.16.1.191) by FRQAT01NT70.frequentis.frq (172.16.1.70) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.1.1466.3 via Frontend Transport; Mon, 9 Jul 2018 08:46:59 +0200 Received: from vie196nt.frequentis.frq ([172.16.1.196]) by vie191nt.frequentis.frq ([172.16.1.191]) with mapi id 14.03.0382.000; Mon, 9 Jul 2018 08:46:59 +0200 From: NAGY Andreas To: Rick Macklem , Daniel Engel , "freebsd-stable@freebsd.org" Subject: RE: NFS 4.1 RECLAIM_COMPLETE FS failed error Thread-Topic: NFS 4.1 RECLAIM_COMPLETE FS failed error Thread-Index: AQHUFwgWEcALBOoCgU6bDe8gLpNcXaSGBHwAgABpcFA= Date: Mon, 9 Jul 2018 06:46:58 +0000 Message-ID: References: <1531087387.2543270.1433935616.5272EA26@webmail.messagingengine.com> In-Reply-To: Accept-Language: de-AT, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.72.199] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jul 2018 06:48:12 -0000 Hi! Sorry, I did not forget the traces, but had no time so far and as I am = actually setting up several servers on the system I don't want to break any= thing by performing tests. I will send them as soon I have finished my actu= al work. Will be at least end of this week. As I am actually setting up/cloning 80 VMs that are stored on the NFS datas= tore I can just report that the setup performs well and seems to be stable.= Only thing that happened twice while working with ZFS snapshots/clones was= that the ESXi host lost the connection to the NFS datastore. Don't know if= it was while creating or deleting a clone, but the only way to recover fro= m this was to restart nfsd or to switchover HAST/CARP, but all without cras= hing any VM. Br, Andi -----Original Message----- From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-stable@freebsd= .org] On Behalf Of Rick Macklem Sent: Montag, 9. Juli 2018 04:11 To: Daniel Engel ; freebsd-stable@freebsd.org Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS failed error Daniel Engel wrote: [stuff snipped] >I traced the commits that Rick has made since that thread and merged them = 'head' >into 'stable': > > 'svnlite checkout http://svn.freebsd.org/base/release/11.1.0/' > 'svnlite merge -c 332790 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333508 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333579 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333580 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333592 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333645 http://svn.freebsd.org/base/head' > 'svnlite merge -c 333766 http://svn.freebsd.org/base/head' > 'svnlite merge -c 334396 http://svn.freebsd.org/base/head' > 'svnlite merge -c 334492 http://svn.freebsd.org/base/head' > 'svnlite merge -c 327674 http://svn.freebsd.org/base/head' Yes, you have all the commits to head related to the 4.1 server that might = affect the ESXi client, plus a bunch that should be harmless, but I don't t= hink affect the ESXi client mounts. (Most of these will get MFC'd to stable= /11, but I haven't gotten around to it yet.) The ones that might be in 6.7 (they were in 6.5) that may bite you are: - The client does an OpenDownGrade with all OPEN_SHARE_ACCESS and OPEN_SHARE_DENY bits set for something it calls a "drive lock". (Adding bits is supposed to be done via an Open/ClaimNull and not OpenDowngrade.) I'd really like to know if this still happens for 6.7? - Something about "directory modified too often" when doing deletion of a b= unch of files. (I have no idea what this one means, but apparently it was seen= for other NFSv4.1 servers.) - Some warnings about "wrong reason for not issuing a delegation". I have a= fix for this one in PR#226650, but they are just warnings and don't seem to matter much. The rest of the really nasty stuff happens after a server reboot. The recov= ery code seemed to be badly broken in the 6.5 client. (All sorts of fun stu= ff like the client looping doiing ExchangeID operations forever. VM crashes= ...) >That completely fixed the connection instability, but the NFS share was st= ill mounting >read-only with a RECLAIM_COMPLETE error. So, I manually appl= ied the first patch >from the previous thread and everything started workin= g: > > --- fs/nfsserver/nfs_nfsdserv.c.savrecl 2018-02-10 20:34:31.166445= 000 -0500 > +++ fs/nfsserver/nfs_nfsdserv.c 2018-02-10 20:36:07.947490000 -050= 0 > @@ -4226,10 +4226,9 @@ nfsrvd_reclaimcomplete(struct nfsrv_desc > goto nfsmout; > } > NFSM_DISSECT(tl, uint32_t *, NFSX_UNSIGNED); > + nd->nd_repstat =3D nfsrv_checkreclaimcomplete(nd); > if (*tl =3D=3D newnfs_true) > - nd->nd_repstat =3D NFSERR_NOTSUPP; > - else > - nd->nd_repstat =3D nfsrv_checkreclaimcomplete(nd); > + nd->nd_repstat =3D 0; I think this patch is ok to use, since no other extant client does a Reclai= mComplete with "one_fs =3D=3D true". It does kinda violate the RFC. The problem is that FreeBSD exports a hierarchy of file systems and telling= the server that one of them has been reclaimed is useless. (This hack just= assumes the client meant to say "one_fs =3D=3D false".) There was also a c= ase (I think it was after a server reboot) where the client would do one of= these after doing a ReclaimComplete with "one_fs =3D=3D false" and that is= definitely bogus (the server would reply NFS4ERR_ALREADY_COMPLETE without = the above hack) since the "one_fs =3D=3D false" operation means all file sy= stems have been reclaimed. Anyhow, once I get some packet traces from Andreas for 6.7, I'll try and fi= gure out how to handle at least some of the outstanding issues. Good luck with it, rick _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"