Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 May 2022 23:23:00 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Alan Somers <asomers@freebsd.org>, FreeBSD Stable ML <stable@freebsd.org>
Subject:   Re: nfs client's OpenOwner count increases without bounds
Message-ID:  <YT3PR01MB97376472A2BAF2FA0643F4F2DDC39@YT3PR01MB9737.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAOtMX2jX8gC8xEr%2BfsQjZz8YmWX6haQxRe_-Jr5RSTdw14jkFQ@mail.gmail.com>
References:  <CAOtMX2jX8gC8xEr%2BfsQjZz8YmWX6haQxRe_-Jr5RSTdw14jkFQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Alan Somers <asomers@freebsd.org> wrote:=0A=
> I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) desktop=0A=
> mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server.  It=0A=
> worked fine until a few weeks ago.  Now, the desktop's performance=0A=
> slowly degrades.  It becomes less and less responsive until I restart=0A=
> X after 2-3 days.  /var/log/Xorg.0.log shows plenty of entries like=0A=
> "AT keyboard: client bug: event processing lagging behind by 112ms,=0A=
> your system is too slow".  "top -S" shows that the busiest process is=0A=
> nfscl.  A dtrace profile shows that nfscl is spending most of its time=0A=
> in nfscl_cleanup_common, in the loop over all nfsclowner objects.=0A=
> Running "nfsdumpstate" on the server shows thousands of OpenOwners for=0A=
> that client, and < 10 for any other NFS client.  The OpenOwners=0A=
> increases by about 3000 per day.  And yet, "fstat" shows only a couple=0A=
> hundred open files on the NFS file system.  Why are OpenOwners so=0A=
> high?  Killing most of my desktop processes doesn't seem to make a=0A=
> difference.  Restarting X does improve the perceived responsiveness,=0A=
> though it does not change the number of OpenOwners.=0A=
>=0A=
> How can I figure out which process(es) are responsible for the=0A=
> excessive OpenOwners?  =0A=
An OpenOwner represents a process on the client. The OpenOwner=0A=
name is an encoding of pid + process startup time.=0A=
However, I can't think of an easy way to get at the OpenOwner name.=0A=
=0A=
Now, why aren't they going away, hmm..=0A=
=0A=
I'm assuming the # of Opens is not large?=0A=
(Openowners cannot go away until all associated opens=0A=
 are closed.)=0A=
=0A=
Commit 1cedb4ea1a79 in main changed the semantics of this=0A=
a little, to avoid a use-after-free bug. However, it is dated=0A=
Feb. 25, 2022 and is not in 13.0, so I don't think it could=0A=
be the culprit.=0A=
=0A=
Essentially, the function called nfscl_cleanupkext() should call=0A=
nfscl_procdoesntexist(), which returns true after the process has=0A=
exited and when that is the case, calls nfscl_cleanup_common().=0A=
--> nfscl_cleanup_common() will either get rid of the openowner or,=0A=
      if there are still children with open file descriptors, mark it "defu=
nct"=0A=
      so it can be free'd once the children close the file.=0A=
=0A=
It could be that X is now somehow creating a long chain of processes=0A=
where the children inherit a file descriptor and that delays the cleanup=0A=
indefinitely?=0A=
Even then, everything should get cleaned up once you kill off X?=0A=
(It might take a couple of seconds after killing all the processes off.)=0A=
=0A=
Another possibility is that the "nfscl" thread is wedged somehow.=0A=
It is the one that will call nfscl_cleanupkext() once/sec. If it never=0A=
gets called, the openowners will never go away.=0A=
=0A=
Being old fashioned, I'd probably try to figure this out by adding=0A=
some printf()s to nfscl_cleanupkext() and nfscl_cleanup_common().=0A=
=0A=
To avoid the problem, you can probably just use the "oneopenown"=0A=
mount option. With that option, only one openowner is used for=0A=
all opens. (Having separate openowners for each process was needed=0A=
for NFSv4.0, but not NFSv4.1/4.2.)=0A=
=0A=
> Or is it just a red herring and I shouldn't=0A=
> worry?=0A=
Well, you can probably avoid the problem by using the "oneopenown"=0A=
mount option.=0A=
=0A=
Thanks for reporting this, rick=0A=
ps: And, yes, large numbers of openowners will slow things down,=0A=
      since the code ends up doing linear scans of them all in a linked=0A=
      list in various places.=0A=
=0A=
-Alan=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YT3PR01MB97376472A2BAF2FA0643F4F2DDC39>