Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Jul 2018 06:46:58 +0000
From:      NAGY Andreas <Andreas.Nagy@frequentis.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>, Daniel Engel <daniel@ftml.net>, "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Subject:   RE: NFS 4.1 RECLAIM_COMPLETE FS failed error
Message-ID:  <D890568E1D8DD044AA846C56245166780124B66858@vie196nt>
In-Reply-To: <YTXPR0101MB0959ED2C40E04C2B2B034779DD440@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>
References:  <1531087387.2543270.1433935616.5272EA26@webmail.messagingengine.com> <YTXPR0101MB0959ED2C40E04C2B2B034779DD440@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi! Sorry, I did not forget the traces, but had no time so far and as I am =
actually setting up several servers on the system I don't want to break any=
thing by performing tests. I will send them as soon I have finished my actu=
al work. Will be at least end of this week.

As I am actually setting up/cloning 80 VMs that are stored on the NFS datas=
tore I can just report that the setup performs well and seems to be stable.=
 Only thing that happened twice while working with ZFS snapshots/clones was=
 that the ESXi host lost the connection to the NFS datastore. Don't know if=
 it was while creating or deleting a clone, but the only way to recover fro=
m this was to restart nfsd or to switchover HAST/CARP, but all without cras=
hing any VM.

Br,
Andi



-----Original Message-----
From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-stable@freebsd=
.org] On Behalf Of Rick Macklem
Sent: Montag, 9. Juli 2018 04:11
To: Daniel Engel <daniel@ftml.net>; freebsd-stable@freebsd.org
Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS failed error

Daniel Engel wrote:
[stuff snipped]
>I traced the commits that Rick has made since that thread and merged them =
'head' >into 'stable':
>
>    'svnlite checkout http://svn.freebsd.org/base/release/11.1.0/'
>    'svnlite merge -c 332790 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333508 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333579 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333580 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333592 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333645 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 333766 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 334396 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 334492 http://svn.freebsd.org/base/head'
>    'svnlite merge -c 327674 http://svn.freebsd.org/base/head'
Yes, you have all the commits to head related to the 4.1 server that might =
affect the ESXi client, plus a bunch that should be harmless, but I don't t=
hink affect the ESXi client mounts. (Most of these will get MFC'd to stable=
/11, but I haven't gotten around to it yet.)

The ones that might be in 6.7 (they were in 6.5) that may bite you are:
- The client does an OpenDownGrade with all OPEN_SHARE_ACCESS and
   OPEN_SHARE_DENY bits set for something it calls a "drive lock".
  (Adding bits is supposed to be done via an Open/ClaimNull and not
   OpenDowngrade.) I'd really like to know if this still happens for 6.7?
- Something about "directory modified too often" when doing deletion of a b=
unch
  of files. (I have no idea what this one means, but apparently it was seen=
 for
  other NFSv4.1 servers.)
- Some warnings about "wrong reason for not issuing a delegation". I have a=
 fix
  for this one in PR#226650, but they are just warnings and don't seem to
  matter much.

The rest of the really nasty stuff happens after a server reboot. The recov=
ery code seemed to be badly broken in the 6.5 client. (All sorts of fun stu=
ff like the client looping doiing ExchangeID operations forever. VM crashes=
...)

>That completely fixed the connection instability, but the NFS share was st=
ill mounting >read-only with a RECLAIM_COMPLETE error.  So, I manually appl=
ied the first patch >from the previous thread and everything started workin=
g:
>
>    --- fs/nfsserver/nfs_nfsdserv.c.savrecl     2018-02-10 20:34:31.166445=
000 -0500
>    +++ fs/nfsserver/nfs_nfsdserv.c     2018-02-10 20:36:07.947490000 -050=
0
>    @@ -4226,10 +4226,9 @@ nfsrvd_reclaimcomplete(struct nfsrv_desc
>            goto nfsmout;
>        }
>        NFSM_DISSECT(tl, uint32_t *, NFSX_UNSIGNED);
>    +   nd->nd_repstat =3D nfsrv_checkreclaimcomplete(nd);
>        if (*tl =3D=3D newnfs_true)
>    -           nd->nd_repstat =3D NFSERR_NOTSUPP;
>    -   else
>    -           nd->nd_repstat =3D nfsrv_checkreclaimcomplete(nd);
>    +           nd->nd_repstat =3D 0;
I think this patch is ok to use, since no other extant client does a Reclai=
mComplete with "one_fs =3D=3D true". It does kinda violate the RFC.
The problem is that FreeBSD exports a hierarchy of file systems and telling=
 the server that one of them has been reclaimed is useless. (This hack just=
 assumes the client meant to say "one_fs =3D=3D false".) There was also a c=
ase (I think it was after a server reboot) where the client would do one of=
 these after doing a ReclaimComplete with "one_fs =3D=3D false" and that is=
 definitely bogus (the server would reply NFS4ERR_ALREADY_COMPLETE without =
the above hack) since the "one_fs =3D=3D false" operation means all file sy=
stems have been reclaimed.

Anyhow, once I get some packet traces from Andreas for 6.7, I'll try and fi=
gure out how to handle at least some of the outstanding issues.

Good luck with it, rick

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D890568E1D8DD044AA846C56245166780124B66858>