Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Dec 2020 15:52:19 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>, Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Major issues with nfsv4
Message-ID:  <YQXPR0101MB0968BBA4054DB9D5D648FE4ADDC70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CABXB=RSpfiU3R1JuLU_DE60SARs0rkPVROPLewJFjBwMXRnbSw@mail.gmail.com>
References:  <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua> <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> <YQXPR0101MB0968B120A417AF69CEBB6A12DDC80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9aGwshgh7Cwiv8p@kib.kiev.ua> <CABXB=RTFSAEZvp%2BmoiF%2BrE9vpEjJVacLYa6G=yP641f9oHJ1zw@mail.gmail.com> <CABXB=RTn9NC3PE-QyNLmaKUvAWtYtdN_39Nks5i05_VxWpbhRw@mail.gmail.com>, <CABXB=RSpfiU3R1JuLU_DE60SARs0rkPVROPLewJFjBwMXRnbSw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hope you don't mind a top post...
It's interesting that the leak of opens does not correlate
well to load, but it doesn't give me any insight into what
might be causing the leak.
--> Is there something that is always running that accesses
       files in the nullfs mount?

If you can set up a system with no jobs running...
--> Watch it to see if there is a leak happening.
--> Try the two cases of creating a bunch of files
       and opening a bunch of file that already exist,
       and then closing the files for both cases.

       Both file creation  and opening existing files
       uses the NFSv4 Open operation, but follow
       quite different code paths through the VFS.
       --> If the leak occurs on one but not the other,
              it would narrow things down.

Good luck with it, rick

________________________________________
From: J David <j.david.lists@gmail.com>
Sent: Monday, December 14, 2020 10:21 AM
To: Konstantin Belousov
Cc: Rick Macklem; freebsd-fs@freebsd.org
Subject: Re: Major issues with nfsv4

CAUTION: This email originated from outside of the University of Guelph. Do=
 not click links or open attachments unless you recognize the sender and kn=
ow the content is safe. If in doubt, forward suspicious emails to IThelp@uo=
guelph.ca


TLDR: The values of OpenOwner and Opens have a statistically
significant correlation to the passage of time and are statistically
independent of the number of currently running jobs (jails),
processes, or threads.

3,173 samples were collected over approximately twelve hours,
containing the following values (five number summary in parenthesis:
min 1Q median 3Q max):

- nfsstat -E -c OpenOwner (137 1405 2380 3541 4693)
- nfsstat -E -c Opens (49 10479 18229 27732 36589)
- # of active Jobs (1 50 50 50 51)
- # of Job processes (1 117 117 117 121)
- # of Job threads (1 519 521 525 533)
- # of nfscl Threads (48 53 53 53 55)
- Total # of processes on system (149 260 261 264 280)
- Total # of threads on system (481 996 1001 1005 1023)

OpenOwner and Opens are the dependent variables. The remaining values
and the sample sequence number (N) are independent variables.

The following table shows the adjusted R-squared values of linear
regressions using each combination of the independent and dependent
variables. While R-squared is not always the best measure of goodness
of fit, it is easy to understand, and given the type of data and the
relationship sought, its use here is both accurate and illustrative.

                 OpenOwner        Opens
N                0.9369           0.9310
NTestEnd*        0.9962           0.9979
Jobs             0.2461           0.0324
JobProcs         0.0225           0.0285
JobThreads       0.0921           0.1060
NfsclThreads     0.0072           0.0000
SysProcs         0.0325           0.0376
SysThreads       0.1003           0.1145

*Because the test ended at sample 3156, NTestEnd reflects the
regressions of OpenOwner and Opens vs. sample sequence number for only
sample 1 - 3156.

The results strongly indicate that both OpenOwner and Opens are highly
correlated with time. No other regression demonstrates a statistically
significant correlation. Opens and OpenOwner are also highly
correlated to each other (adjusted R-squared =3D 0.9957).

The high correlation and strong linear relationship with time suggests
this is caused by something that is both roughly constant over time
and largely independent of system activity measures based on process
counts.

It may be worth re-doing this test, capturing the rest of "nfsstat -E
-c stats" about operations as well as counts of open files.  Finding a
strong correlation might help narrow down the causal action, which
would hopefully make it possible to independently reproduce and/or fix
this.

Couple of questions around that:

1) Is there a way to get the total number of currently-open files more
efficiently than enumerating them?  (E.g., "fstat | wc -l" and "fstat
-m | wc -l" are slow and resource-intensive.)

2) If so, is there a way to do that on a per-process basis?

Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968BBA4054DB9D5D648FE4ADDC70>