Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Jan 2021 22:30:47 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Major issues with nfsv4
Message-ID:  <YQXPR0101MB0968C6331C1C1F33E18523C2DDA80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CABXB=RTLogtoFi%2BtAyUHii%2BWFCQtj1qFjbiz2CQC8whNYEBy2Q@mail.gmail.com>
References:  <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua> <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> <YQXPR0101MB0968B120A417AF69CEBB6A12DDC80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9aGwshgh7Cwiv8p@kib.kiev.ua> <CABXB=RTFSAEZvp%2BmoiF%2BrE9vpEjJVacLYa6G=yP641f9oHJ1zw@mail.gmail.com> <YQXPR0101MB09681D2CB8FBD5DDE907D5A5DDC40@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <CABXB=RTLogtoFi%2BtAyUHii%2BWFCQtj1qFjbiz2CQC8whNYEBy2Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:=0A=
>On Wed, Dec 16, 2020 at 11:25 PM Rick Macklem <rmacklem@uoguelph.ca> wrote=
:=0A=
>> If you can do so when the "Opens" count has gone fairly high,=0A=
>> please "sysctl vfs.deferred_inact" and let us know what that=0A=
>> returns.=0A=
>=0A=
>$ sysctl vfs.deferred_inact=0A=
>sysctl: unknown oid 'vfs.deferred_inact'=0A=
>$ sysctl -a vfs | fgrep defer=0A=
>$=0A=
Yes. I did not realize how different FreeBSD12 is when compared with FreeBS=
D13/head=0A=
in this area.=0A=
At a quick glance, I do not see where the syncer tries to vinactive() vnode=
s where=0A=
the VOP_INACTIVE() has been deferred.=0A=
=0A=
--> It is possible that this problem is fixed in FreeBSD13/head.=0A=
       Any chance you can test a FreeBSD13/head system?=0A=
=0A=
Kostik, does FreeBSD12 try to vinactive() deferred VOP_INACTIVE() vnodes vi=
a the=0A=
syncer?=0A=
=0A=
>Sorry for the delay in responding to this.  I got my knuckles rapped=0A=
>for allowing this to happen so much.=0A=
>=0A=
>It happened just now because some of the "use NFSv4.1" config leaked=0A=
>out to a production machine, but not all of it. As a result, only the=0A=
>read-only "job binary" filesystems were mounted with nullfs+nfsv4.1.=0A=
>So it is unlikely to be related to creating files. Hopefully, that=0A=
>narrows things down.=0A=
>=0A=
>$ sudo nfsstat -E -c=0A=
>[...]=0A=
>  OpenOwner    Opens  LockOwner    Locks   Delegs  LocalOwn=0A=
>    37473   303469      0      0      1      0=0A=
>[...]=0A=
>=0A=
>"nfscl: never fnd open" continues to appear regularly on=0A=
>console/dmesg, even at the end of the reboot:=0A=
Not sure what this implies. The message means that it cannot find=0A=
a NFSv4 Open to Close.=0A=
It may indicate something is broken in the client, but is not by itself, se=
rious.=0A=
=0A=
>Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... d=
one=0A=
>Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... d=
one=0A=
>Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... d=
one=0A=
>Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... d=
one=0A=
>All buffers synced.=0A=
>nfscl: never fnd open=0A=
>nfscl: never fnd open=0A=
>nfscl: never fnd open=0A=
>nfscl: never fnd open=0A=
>nfscl: never fnd open=0A=
>nfscl: never fnd open=0A=
>Uptime: 4d13h59m27s=0A=
>Rebooting...=0A=
>cpu_reset: Stopping other CPUs=0A=
>---<<BOOT>>---=0A=
>=0A=
>It did not appear 300,000 times, though.  More like a few times a day.=0A=
>=0A=
>Also, I set up an idle system with the NFSv4.1+nullfs config, as=0A=
>requested. It has been up for 32 days and appears not to have leaked=0A=
>anything. But it does also have a fistful of those "nfscl: never fnd=0A=
>open" messages.=0A=
>=0A=
>There is also a third system in a test environment with the=0A=
>nullfs+nfsv4.1 config. That system is up 34 days, has no exhibited=0A=
>problems, and shows this:=0A=
>=0A=
>  OpenOwner    Opens  LockOwner    Locks   Delegs  LocalOwn=0A=
>     342    15098      2      0      0      0=0A=
>=0A=
>That machine shows one "nfscl: never fnd open" in the dmesg.=0A=
>=0A=
>A fourth system has the NFSv4.1-no-nullfs config in production with=0A=
>net.inet.ip.portrange.lowlast tweaked and a limit on simultaneous=0A=
>jobs.  That system had issues requiring a restart 18 days ago. It also=0A=
>occasionally gets "nfscl: never fnd open" in the dmesg and has=0A=
>relatively large Open numbers:=0A=
>=0A=
>As of right now:=0A=
>  OpenOwner    Opens  LockOwner    Locks   Delegs  LocalOwn=0A=
>    23214    46304      0      0      0      0=0A=
>=0A=
>The "OpenOwner" value on that system seems to swing dramatically,=0A=
>ranging between 45,000 to 10,000 in just a few minutes. It appears to=0A=
>correlate well to simultaneous jobs.=0A=
This sounds normal, since an OpenOwner refers to a process on the client=0A=
doing a file Open.=0A=
=0A=
> The "Opens" value goes up and=0A=
>down a bit, but trends upward over time. However, when I found and=0A=
>killed one long-running job and unmounted its filesystems, "Opens"=0A=
>dropped 90% to around 4600. Note there are *no* nullfs mounts on that=0A=
>system.  So nullfs may not be a necessary component of the problem.=0A=
This also sounds reasonable. The NFSv4 Opens can only be closed once=0A=
the process doing the Open plus all chidren processes have closed the=0A=
file.=0A=
--> If a program is "lazy" and doesn't do closes, they won't=0A=
       happen until the process exits. And then children processes=0A=
       will also need to exit before it leaves zombie state.=0A=
=0A=
One thing to try (other than a FreeBSD13/head system, if possible)=0A=
is the "oneopenown" mount option.=0A=
--> It can only be used on NFSv4.1 mounts (not NFSv4.0) and=0A=
      makes the mount only use one OpenOwner for all Opens=0A=
      instead of a different one for each process doing an Open.=0A=
      --> This would reduce the number of Opens for the case=0A=
            where multiple processes open the same file.=0A=
     --> It also simplifies the search for an Open, since there=0A=
            is only one for each file.=0A=
=0A=
rick=0A=
=0A=
As a next step, I will try to create a fake job that opens a ton of=0A=
files.  Then I'll test it on the binary read-only nullfs+nfsv4.1=0A=
mounts and on the system that runs nfsv4.1 directly.=0A=
=0A=
Thanks!=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968C6331C1C1F33E18523C2DDA80>