Date: Thu, 11 Feb 2021 14:32:03 -0700 From: Alan Somers <asomers@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: NFS delegations don't expire after unmounting client Message-ID: <CAOtMX2jQCRsUPaGw2uVb8XuguNnFzmdUt1OpPM8C7riE5Q%2BbfQ@mail.gmail.com> In-Reply-To: <YQXPR0101MB0968EC580D4F4006E155AC9DDD8C9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> References: <CAOtMX2h_2zCNpyzOs=SzuohRvLgga=Eip-LJ-7QjJBvwmueLXg@mail.gmail.com> <YQXPR0101MB0968EC580D4F4006E155AC9DDD8C9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 11, 2021 at 2:07 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: > Alan Somers wrote: > >I have several Linux 5.9.15 clients mounting NFS 4.1 served from a FreeBSD > >12.2-RELEASE server. Today, most of those clients' mounts hung, and their > >dmesg displayed "nfs: server XXX not responding, still trying". But one > >client kept running fine. nfsdumpstate on the server showed that that > >client, and that one only, had 2 delegations. It also had 1 OpenOwner, 1 > >Open, and the CB flags set. It was the only client that had CB set. On > >the theory that its delegation callbacks weren't working, I tried > >unmounting all of its NFS shares. That worked, but to my surprise > >nfsdumpstate showed no change! I could see that the lease time recorded > in > >/var/run/nfs-stablerestart was 120s, and I must've waited about 30m in all > >before disabling delegations, unmounting everything, and returning to NFS > >v3. So my questions are, what can cause a delegation to linger around > long > >after it should've expired, and what else can I do to debug this problem > if > >it recurs? > The FreeBSD NFSv4 server implements "courtesy locks" (my idea, but someone > else coined the term for it), where a lock is not thrown away until both > the > lease has expired and a conflicting lock request is received from another > client. > --> In this case, that would be an Open of the file from another client. > The idea is to avoid loss of lock state when there is a networking > partitioning > that exceeds the lease duration. > Ahh, so maybe the stale delegation was a red herring! That would make sense. Especially because the client with the stale delegation was mounting a different share than at least one of the hung clients. > > When a client dismounts, it should tell the server it is done with the > open/lock > state by doing a DestroyClientID operation. > (SetClientID/SetClientIDConfirm for 4.0) > --> If the Linux client did this, then it sounds like something is broken > in the server, > but my hunch is that the Linux client did not do this. > If you can capture packets during a dismount, you should be able to look > at them in wireshark and see if the DestroyClientID happened. > > There is also the nfsrevoke command, which is supposed to be able to > get rid of client lock state, but I'll admit I haven't tested it in like a > decade;-) > Well, it looks like it works. When I tried it, the delegation disappeared from nfsdumpstate's output. That did not resolve the hang, however. So the delegation was probably red herring then. I guess I'll have to roll up my sleeves and start tcpdumping then. Sigh. Thanks for the tips. -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jQCRsUPaGw2uVb8XuguNnFzmdUt1OpPM8C7riE5Q%2BbfQ>