Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 May 2022 17:53:27 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: nfs client's OpenOwner count increases without bounds
Message-ID:  <CAOtMX2hNp3%2B0Zs1jvpVAW07KLxStX0z-khZ4Y_-GaPnO%2BYkM5g@mail.gmail.com>
In-Reply-To: <YT3PR01MB97376472A2BAF2FA0643F4F2DDC39@YT3PR01MB9737.CANPRD01.PROD.OUTLOOK.COM>
References:  <CAOtMX2jX8gC8xEr%2BfsQjZz8YmWX6haQxRe_-Jr5RSTdw14jkFQ@mail.gmail.com> <YT3PR01MB97376472A2BAF2FA0643F4F2DDC39@YT3PR01MB9737.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 4, 2022 at 5:23 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:
>
> Alan Somers <asomers@freebsd.org> wrote:
> > I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) desktop
> > mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server.  It
> > worked fine until a few weeks ago.  Now, the desktop's performance
> > slowly degrades.  It becomes less and less responsive until I restart
> > X after 2-3 days.  /var/log/Xorg.0.log shows plenty of entries like
> > "AT keyboard: client bug: event processing lagging behind by 112ms,
> > your system is too slow".  "top -S" shows that the busiest process is
> > nfscl.  A dtrace profile shows that nfscl is spending most of its time
> > in nfscl_cleanup_common, in the loop over all nfsclowner objects.
> > Running "nfsdumpstate" on the server shows thousands of OpenOwners for
> > that client, and < 10 for any other NFS client.  The OpenOwners
> > increases by about 3000 per day.  And yet, "fstat" shows only a couple
> > hundred open files on the NFS file system.  Why are OpenOwners so
> > high?  Killing most of my desktop processes doesn't seem to make a
> > difference.  Restarting X does improve the perceived responsiveness,
> > though it does not change the number of OpenOwners.
> >
> > How can I figure out which process(es) are responsible for the
> > excessive OpenOwners?
> An OpenOwner represents a process on the client. The OpenOwner
> name is an encoding of pid + process startup time.
> However, I can't think of an easy way to get at the OpenOwner name.
>
> Now, why aren't they going away, hmm..
>
> I'm assuming the # of Opens is not large?
> (Openowners cannot go away until all associated opens
>  are closed.)

Oh, I didn't mention that yes the number of Opens is large.  Right
now, for example, I have 7950 OpenOwner and 8277 Open.

>
> Commit 1cedb4ea1a79 in main changed the semantics of this
> a little, to avoid a use-after-free bug. However, it is dated
> Feb. 25, 2022 and is not in 13.0, so I don't think it could
> be the culprit.
>
> Essentially, the function called nfscl_cleanupkext() should call
> nfscl_procdoesntexist(), which returns true after the process has
> exited and when that is the case, calls nfscl_cleanup_common().
> --> nfscl_cleanup_common() will either get rid of the openowner or,
>       if there are still children with open file descriptors, mark it "defunct"
>       so it can be free'd once the children close the file.
>
> It could be that X is now somehow creating a long chain of processes
> where the children inherit a file descriptor and that delays the cleanup
> indefinitely?
> Even then, everything should get cleaned up once you kill off X?
> (It might take a couple of seconds after killing all the processes off.)
>
> Another possibility is that the "nfscl" thread is wedged somehow.
> It is the one that will call nfscl_cleanupkext() once/sec. If it never
> gets called, the openowners will never go away.
>
> Being old fashioned, I'd probably try to figure this out by adding
> some printf()s to nfscl_cleanupkext() and nfscl_cleanup_common().

dtrace shows that nfscl_cleanupkext() is getting called at about 0.6 hz.


>
> To avoid the problem, you can probably just use the "oneopenown"
> mount option. With that option, only one openowner is used for
> all opens. (Having separate openowners for each process was needed
> for NFSv4.0, but not NFSv4.1/4.2.)
>
> > Or is it just a red herring and I shouldn't
> > worry?
> Well, you can probably avoid the problem by using the "oneopenown"
> mount option.

Ok, I'm trying that now.  After unmounting and remounting NFS,
"nfsstat -cE" reports 1 OpenOwner and 11 Opens".  But on the server,
"nfsdumpstate" still reports thousands.  Will those go away
eventually?

>
> Thanks for reporting this, rick
> ps: And, yes, large numbers of openowners will slow things down,
>       since the code ends up doing linear scans of them all in a linked
>       list in various places.
>
> -Alan
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2hNp3%2B0Zs1jvpVAW07KLxStX0z-khZ4Y_-GaPnO%2BYkM5g>