Date: Thu, 14 Jan 2021 08:50:34 -0500 From: J David <j.david.lists@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Konstantin Belousov <kostikbel@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <CABXB=RTLogtoFi%2BtAyUHii%2BWFCQtj1qFjbiz2CQC8whNYEBy2Q@mail.gmail.com> In-Reply-To: <YQXPR0101MB09681D2CB8FBD5DDE907D5A5DDC40@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> References: <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua> <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> <YQXPR0101MB0968B120A417AF69CEBB6A12DDC80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9aGwshgh7Cwiv8p@kib.kiev.ua> <CABXB=RTFSAEZvp%2BmoiF%2BrE9vpEjJVacLYa6G=yP641f9oHJ1zw@mail.gmail.com> <YQXPR0101MB09681D2CB8FBD5DDE907D5A5DDC40@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 16, 2020 at 11:25 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: > If you can do so when the "Opens" count has gone fairly high, > please "sysctl vfs.deferred_inact" and let us know what that > returns. $ sysctl vfs.deferred_inact sysctl: unknown oid 'vfs.deferred_inact' $ sysctl -a vfs | fgrep defer $ Sorry for the delay in responding to this. I got my knuckles rapped for allowing this to happen so much. It happened just now because some of the "use NFSv4.1" config leaked out to a production machine, but not all of it. As a result, only the read-only "job binary" filesystems were mounted with nullfs+nfsv4.1. So it is unlikely to be related to creating files. Hopefully, that narrows things down. $ sudo nfsstat -E -c [...] OpenOwner Opens LockOwner Locks Delegs LocalOwn 37473 303469 0 0 1 0 [...] "nfscl: never fnd open" continues to appear regularly on console/dmesg, even at the end of the reboot: Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... done All buffers synced. nfscl: never fnd open nfscl: never fnd open nfscl: never fnd open nfscl: never fnd open nfscl: never fnd open nfscl: never fnd open Uptime: 4d13h59m27s Rebooting... cpu_reset: Stopping other CPUs ---<<BOOT>>--- It did not appear 300,000 times, though. More like a few times a day. Also, I set up an idle system with the NFSv4.1+nullfs config, as requested. It has been up for 32 days and appears not to have leaked anything. But it does also have a fistful of those "nfscl: never fnd open" messages. There is also a third system in a test environment with the nullfs+nfsv4.1 config. That system is up 34 days, has no exhibited problems, and shows this: OpenOwner Opens LockOwner Locks Delegs LocalOwn 342 15098 2 0 0 0 That machine shows one "nfscl: never fnd open" in the dmesg. A fourth system has the NFSv4.1-no-nullfs config in production with net.inet.ip.portrange.lowlast tweaked and a limit on simultaneous jobs. That system had issues requiring a restart 18 days ago. It also occasionally gets "nfscl: never fnd open" in the dmesg and has relatively large Open numbers: As of right now: OpenOwner Opens LockOwner Locks Delegs LocalOwn 23214 46304 0 0 0 0 The "OpenOwner" value on that system seems to swing dramatically, ranging between 45,000 to 10,000 in just a few minutes. It appears to correlate well to simultaneous jobs. The "Opens" value goes up and down a bit, but trends upward over time. However, when I found and killed one long-running job and unmounted its filesystems, "Opens" dropped 90% to around 4600. Note there are *no* nullfs mounts on that system. So nullfs may not be a necessary component of the problem. As a next step, I will try to create a fake job that opens a ton of files. Then I'll test it on the binary read-only nullfs+nfsv4.1 mounts and on the system that runs nfsv4.1 directly. Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RTLogtoFi%2BtAyUHii%2BWFCQtj1qFjbiz2CQC8whNYEBy2Q>