Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Jan 2021 22:57:37 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Major issues with nfsv4
Message-ID:  <YQXPR0101MB0968F66CC35FFAD5767929FFDDA60@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CABXB=RQG_hR%2BVzC2mSiwK-9h8sEAQA5xrx2tjiKsPngoOMqxFQ@mail.gmail.com>
References:  <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua> <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com> <YQXPR0101MB0968B120A417AF69CEBB6A12DDC80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9aGwshgh7Cwiv8p@kib.kiev.ua> <CABXB=RTFSAEZvp%2BmoiF%2BrE9vpEjJVacLYa6G=yP641f9oHJ1zw@mail.gmail.com> <YQXPR0101MB09681D2CB8FBD5DDE907D5A5DDC40@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RTLogtoFi%2BtAyUHii%2BWFCQtj1qFjbiz2CQC8whNYEBy2Q@mail.gmail.com> <YQXPR0101MB0968C6331C1C1F33E18523C2DDA80@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <CABXB=RQG_hR%2BVzC2mSiwK-9h8sEAQA5xrx2tjiKsPngoOMqxFQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:=0A=
>On Thu, Jan 14, 2021 at 5:30 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:=
=0A=
>> One thing to try (other than a FreeBSD13/head system, if possible)=0A=
>> is the "oneopenown" mount option.=0A=
>=0A=
>The odds of being able to run an unreleased version of FreeBSD on=0A=
>production servers are slim to none.=0A=
Well, the current release schedule has FreeBSD13 being released at the=0A=
end of March. If you set up a test system now/soon, you might be able=0A=
to determine if an upgrade to FreeBSD13 is justified when it is released?=
=0A=
=0A=
>While trying to develop a reproduction, I think I have narrowed down=0A=
>what the problem is.  There are no jails or nullfs involved here, just=0A=
>NFSv4.1.=0A=
>=0A=
>Window 1: (to track OpenOwner/Opens)=0A=
>=0A=
>while true; do date; nfsstat -E -c | fgrep -A1 OpenOwner; sleep 1; done=0A=
>=0A=
>Window 2:=0A=
>=0A=
>mount -o ro,nfsv4,minorversion=3D1,nosuid fileserver:/path/to/freesbd/root=
 /mnt=0A=
>chroot /mnt=0A=
>=0A=
>(OpenOwner is now 1 and Opens is now 9.)=0A=
>=0A=
>Window 3:=0A=
>=0A=
>chroot /mnt=0A=
>(OpenOwner is now 2 and Opens is now 18.)=0A=
>ls=0A=
>(OpenOwner is now 3 and Opens is now 21.)=0A=
>ls=0A=
>(OpenOwner is now 4 and Opens is now 24.)=0A=
>ls=0A=
>(OpenOwner is now 5 and Opens is now 27.)=0A=
>ls=0A=
>(OpenOwner is now 6 and Opens is now 30.)=0A=
>ls=0A=
>(OpenOwner is now 7 and Opens is now 33.)=0A=
>bash=0A=
>while true; do ls | true; done=0A=
>(Allow about a minute to pass, hit CTRL-C.  OpenOwner is now 4647 and=0A=
>Opens is now 13957)=0A=
>exit=0A=
>exit=0A=
>(OpenOwner is now 4647 and Opens is now 13952.)=0A=
Hmm. Not sure what files would get opened each time. NFSv4 Opens=0A=
only apply to regular files, so "ls" shouldn't result in Opens.=0A=
=0A=
I may try this and see what the 3 files being Open'd are.=0A=
(Obviously something related to the chroot. But what?)=0A=
=0A=
>Back in Window 2:=0A=
>=0A=
>exit=0A=
>(wait about 30 seconds, OpenOwner is now 0 and Opens is now 0.)=0A=
Yes, it could take a while to close all those opens.=0A=
=0A=
>So it looks like the NFSv4 code can't let go of *any* Opens on a=0A=
>file/directory until *all* references to that file/directory are=0A=
>closed.=0A=
True for regular files, but not directories.=0A=
=0A=
>If chroot is too much, "vi /mnt/etc/motd" in Window 2 and "cat=0A=
>/mnt/etc/motd" in Window 3 have the same effect, leaking one Open per=0A=
>cat instead of 3.  You probably don't even need a FreeBSD install on=0A=
>the NFS mount; just hold a single file open in one window and=0A=
>open/close it repeatedly in another.=0A=
An NFSv4 Open is unique for open_owner/file, so the same file opened=0A=
by different processes (an openowner represents a process unless you=0A=
use "oneopenown") results in separate NFSv4 Opens.=0A=
However, since the NFSv4 client cannot know which Open a VOP_CLOSE()=0A=
is associated with, due to file descriptor inheritance, none of the NFSv4=
=0A=
Opens can be closed until all FreeBSD open file descriptors for the file=0A=
are closed.=0A=
--> Just the way it is. It is not an unintended leak. They go away once=0A=
      all file descriptors get closed, so long as the VOP_INACTIVE() gets=
=0A=
      called for the NFSv4 vnode.=0A=
      --> It is the handling of deferred VOP_INACTIVE() calls that has=0A=
             changed for FreeBSD13.=0A=
However, none of the above seems unexpected, except maybe for why=0A=
"ls" in the chroot opens 3 regular files each time. I don't know what=0A=
chroot actually does for something like "ls"? I'll look.=0A=
=0A=
>Then I re-tested this with "-o=0A=
>ro,nfsv4,minorversion=3D1,nosuid,oneopenown."  At least for this simple=0A=
>case, the problem did not occur with oneopenown set.=0A=
Yes. For the oneopenown case, there will only be one NFSv4 Open for=0A=
each file opened.=0A=
=0A=
>Are there downsides to the oneopenown flag other than breaking delegations=
?=0A=
Here's an example:=0A=
- One process running as J David opens a file for reading, which works sinc=
e=0A=
   J David has read permissions on the file.=0A=
- Another process running as Warner opens the same file for writing, which=
=0A=
  works, since Warner has write access for the file.=0A=
=0A=
Now, network partition the client from the server until the lease expires..=
.=0A=
The client now gets a NFSERR_EXPIRED error, which forces it to retry the=0A=
open(s).=0A=
=0A=
Without "oneopenown", the above FreeBSD opens resulted in 2 NFSv4=0A=
Opens, both of which probably reopen successfully (unless a chmod/chown/=0A=
setfacl on the file makes the reopen fail).=0A=
=0A=
With "oneopenown", the above FreeBSD opens results in one NFSv4 Open=0A=
for reading/writing. A retry of this one Open might succeed, depending on=
=0A=
what the file permissions are. (I think the code uses the credentials for=
=0A=
Warner in this case, assuming that the credentials that opened it for=0A=
writing is more likely to succeed, but there are no guarantees.=0A=
=0A=
--> For normal operation, it should be fine. A network partition that=0A=
      results in NFSERR_EXPIRED is a worst case scenario, where all=0A=
      byte range locks will be lost and Opens may be lost.=0A=
=0A=
For delegations, the story is similar, but happens routinely when=0A=
delegations are recalled by the server.=0A=
--> For delegation recall, the reopen is done using a special variant of Op=
en called=0A=
       claim_delegate_current. A server should allow claim_delegate_current=
=0A=
       irrespective of what the file permissions are, but the original RFC3=
530=0A=
       did not make this clear.=0A=
       --> Is allowed by a FreeBSD NFSv4 server.=0A=
As such, the warning in the man page is mainly there for NFSv4 servers=0A=
where the reopen done at delegation recall time can fail, due to permission=
=0A=
checking.=0A=
=0A=
There is also the fact that the case of "oneopenown+delegations" is not wel=
l=0A=
tested and the current FreeBSD client code will use separate open_owners=0A=
(violating the "one open owner" principal) when delegations are recalled.=
=0A=
=0A=
Are delegations useful?=0A=
- Short answer, often not.=0A=
=0A=
Delegations allow the client to do 2 things:=0A=
1 - More extensive file data caching, since the delegation guarantees that=
=0A=
      other clients will not be modifying the file.=0A=
      This is not exploited by current clients, as far as I know. I had som=
ething=0A=
      called Packrat, but it never made it into FreeBSD. More on Packrat be=
low,=0A=
      for anyone interested.=0A=
2 - Do NFSv4 Opens locally in the client, avoiding the Open/Close RPCs.=0A=
     Since delegations are per file, this only helps if the same files get =
opened=0A=
     over and over and over again. Doesn't happen for many loads, from what=
=0A=
     I've seen.=0A=
     --> With delegations turned on, you can compare the counts for Opens=
=0A=
           vs LocalOpens in the "nfsstat -E -c" stats, to see how many Open=
/Close=0A=
           RPCs get saved.=0A=
=0A=
Packrats: Some now bitrotted code in the subversion projects area that did=
=0A=
              whole file caching of small files in non-volatile storage on =
the client=0A=
              when a delegation was acquired for the file.=0A=
              This was intended for devices like laptops, with slow/flakey =
network=0A=
              connectivity.=0A=
              --> Recent FreeBSD changes might inspire me to resurrect this=
.=0A=
                    - FreeBSD has recently become more laptop friendly.=0A=
                    - nfs-over-tls provides an easy way to make NFSv4 mount=
s=0A=
                      from anywhere (as a laptop might) relatively secure.=
=0A=
=0A=
rick=0A=
=0A=
Thanks!=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968F66CC35FFAD5767929FFDDA60>