Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Dec 2020 03:40:55 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Major issues with nfsv4
Message-ID:  <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com>
References:  <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:=0A=
>On Fri, Dec 11, 2020 at 6:28 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:=
=0A=
>> I am afraid I know nothing about nullfs and jails. I suspect it will be=
=0A=
>> something related to when file descriptors in the NFS client mount=0A=
>> get closed.=0A=
>=0A=
>What does NFSv4 do differently than NFSv3 that might upset a low-level=0A=
>consumer like nullfs?=0A=
The opens for one. When a file is opened it finds its way to VOP_OPEN().=0A=
--> For NFSv3 all it does is some client side cache consistency checks.=0A=
--> For NFSv4, it must acquire or update a NFSv4 Open, which is a form=0A=
       of lock that is acquired/updated by an Open operation in an RPC.=0A=
       Then the client stores this locking info in a structure in a linked =
list=0A=
        off of the mount point.=0A=
       Once all file descriptors for the vnode are closed, then, and only=
=0A=
       then can a Close operation be done against the server and the linked=
=0A=
       list data structure be free'd.=0A=
       --> Does having nullfs between the file descriptors and the NFS vnod=
es=0A=
              for the same file affect when the v_usecount decrements to 0 =
on=0A=
              the NFS vnode?=0A=
              I don't know. but if it delays it, then these linked list str=
uctures=0A=
              will not be free'd as soon and might accumulate.=0A=
              --> The more structures the longer the linked list and the mo=
re=0A=
                     overhead/cpu will be used prcessing them.=0A=
       The fact that processes are spending a long time in exit() might=0A=
       be a hint that there are a large # of these NFSv4 Opens to deal with=
=0A=
       when files are being closed implicitly during exit.=0A=
=0A=
       As I mentioned, "nfsstat -c -E" will tell you how many Opens there=
=0A=
       are under the "OpenOwners ..." line.=0A=
=0A=
>> Well, NFSv3 is not going away any time soon, so if you don't need=0A=
>> any of the additional features it offers...=0A=
>=0A=
>If we did not want the additional features, we definitely would not be=0A=
>attempting this.=0A=
>=0A=
>> a user would have to run their own custom hacked=0A=
>> userland NFS client. Although doable, I have never heard of it being don=
e.=0A=
>=0A=
>Alex beat me to libnfs.=0A=
And you have users that would want to maliciously access the NFS server=0A=
running jobs on this environment? (Other than reverting to NFSv3, allowing=
=0A=
clients to use non-reserved port#s is probably your other choice, from what=
=0A=
I can see. Fixing whatever the interaction between nullfs and the NFSv4 mou=
nt=0A=
is probably won't be fixed quickly, if ever.)=0A=
=0A=
>What about this as a stopgap measure?=0A=
>=0A=
>> How explosive would adding SO_REUSEADDR to the NFS client be?  It's=0A=
>> not a full solution, but it would handle the TIME_WAIT side of the=0A=
>> issue.=0A=
>=0A=
>The kernel NFS networking code is confusing to me.  I can't even=0A=
>figure out where/how NFSv4 binds a client socket to know if it's=0A=
>possible.  (Pretty sure the code in sys/nfs/krpc_subr.c is not it.)=0A=
It's done in the kernel RPC code, found in the sys/rpc directory.=0A=
Mostly in clnt_rc.c and clnt_vc.c.=0A=
If there is a timeout for an RPC (slow server, network problem,...),=0A=
the code in clnt_rc.c will create a new TCP connection. The old=0A=
connection could easily still be around.=0A=
As such, I do not believe that SO_REUSEADDR or SO_REUSEPORT=0A=
is feasible.=0A=
=0A=
rick=0A=
=0A=
Thanks!=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90>