Date: Sat, 26 Dec 2020 23:10:01 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: J David <j.david.lists@gmail.com>, Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <YQXPR0101MB096897CA4344DFDC8D22DFE9DDDB0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <YQXPR0101MB09681D2CB8FBD5DDE907D5A5DDC40@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> References: <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua> <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> <YQXPR0101MB0968B120A417AF69CEBB6A12DDC80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9aGwshgh7Cwiv8p@kib.kiev.ua>, <CABXB=RTFSAEZvp%2BmoiF%2BrE9vpEjJVacLYa6G=yP641f9oHJ1zw@mail.gmail.com>, <YQXPR0101MB09681D2CB8FBD5DDE907D5A5DDC40@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
Although you have not posted the value for vfs.deferred_inact, if that value has become relatively large when the problem occurs, it might support this theory w.r.t. how this could happen. Two processes in different jails do "stat()" or similar on the same file in the NFS file system at basically the same time. --> They both get shared locked nullfs vnodes, both of which hold shared locks on the same lowervp (the NFS client one). --> They both do vput() on these nullfs vnodes concurrently. If both call vput_final() concurrently, I think both could have the VOP_LOCK(vp, LK_UPGRADE | LK_INTERLOCK | LK_NOWAIT) at line #3147 fail, since this will call null_lock() for both nullfs vnodes and then both null_lock() calls will do VOP_LOCK(lvp, flags); at line #705. --> The call fails for both processes, since the other one still holds the shared lock on the NFS client vnode. If I have this right, then both processes end up calling vdefer_inactive() for the upper nullfs vnodes. If this is what is happening, then when does the VOP_INACTIVE() get called for the lowervp? I see vfs_deferred_inactive() in sys/kern/vfs_subr.c, but I do not know when/how it gets called? Hopefully Kostik can evaluate/correct this theory? rick ________________________________________ From: owner-freebsd-fs@freebsd.org <owner-freebsd-fs@freebsd.org> on behalf= of Rick Macklem <rmacklem@uoguelph.ca> Sent: Wednesday, December 16, 2020 11:25 PM To: J David; Konstantin Belousov Cc: freebsd-fs@freebsd.org Subject: Re: Major issues with nfsv4 If you can do so when the "Opens" count has gone fairly high, please "sysctl vfs.deferred_inact" and let us know what that returns. rick ________________________________________ From: J David <j.david.lists@gmail.com> Sent: Sunday, December 13, 2020 10:51 PM To: Konstantin Belousov Cc: Rick Macklem; freebsd-fs@freebsd.org Subject: Re: Major issues with nfsv4 CAUTION: This email originated from outside of the University of Guelph. Do= not click links or open attachments unless you recognize the sender and kn= ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo= guelph.ca On Sun, Dec 13, 2020 at 4:25 PM Konstantin Belousov <kostikbel@gmail.com> w= rote: > Nullfs with -o nocache (default for NFS mounts) should not cache vnodes. > So it is more likely a local load that has 130k files open. Of course, > it is the OP who can answer the question. This I can rule out; there is no visible correlation between "Opens" and the number of files open on the system. Just finishing a test right now, and: $ sudo nfsstat -E -c | fgrep -A1 OpenOwner OpenOwner Opens LockOwner Locks Delegs Local= Own 4678 36245 15 6 0 = 0 $ sudo fstat | wc -l 2698 $ ps Haxlww | wc -l 1012 The value of Opens increases consistently over time. Killing the processes causing this behavior *did not* reduce the number of OpenOwner or Opens. Unmounting the nullfs mounts (after the processes were gone) *did*: $ sudo nfsstat -E -c | fgrep -A1 OpenOwner OpenOwner Opens LockOwner Locks Delegs Local= Own 130 41 0 0 0 = 0 Mutex contention was observed this time, but once it was apparent that "Opens" was increasing over time, I didn't let the test get to the point of disrupting activities. This test ended at Opens =3D 36589, which is well short of the previous 130,000+. It is possible that mutex contention becomes an issue once system CPU resources are exhausted. More about the results of the latest test after the data is analyzed. After that's done, I'll attempt Rick's patch. In the long run, we would definitely like to get delegation to work. Baby steps! Thanks! _______________________________________________ freebsd-fs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB096897CA4344DFDC8D22DFE9DDDB0>