Date: Sat, 12 Dec 2020 03:40:55 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: J David <j.david.lists@gmail.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Major issues with nfsv4 Message-ID: <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> References: <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
J David wrote:=0A= >On Fri, Dec 11, 2020 at 6:28 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:= =0A= >> I am afraid I know nothing about nullfs and jails. I suspect it will be= =0A= >> something related to when file descriptors in the NFS client mount=0A= >> get closed.=0A= >=0A= >What does NFSv4 do differently than NFSv3 that might upset a low-level=0A= >consumer like nullfs?=0A= The opens for one. When a file is opened it finds its way to VOP_OPEN().=0A= --> For NFSv3 all it does is some client side cache consistency checks.=0A= --> For NFSv4, it must acquire or update a NFSv4 Open, which is a form=0A= of lock that is acquired/updated by an Open operation in an RPC.=0A= Then the client stores this locking info in a structure in a linked = list=0A= off of the mount point.=0A= Once all file descriptors for the vnode are closed, then, and only= =0A= then can a Close operation be done against the server and the linked= =0A= list data structure be free'd.=0A= --> Does having nullfs between the file descriptors and the NFS vnod= es=0A= for the same file affect when the v_usecount decrements to 0 = on=0A= the NFS vnode?=0A= I don't know. but if it delays it, then these linked list str= uctures=0A= will not be free'd as soon and might accumulate.=0A= --> The more structures the longer the linked list and the mo= re=0A= overhead/cpu will be used prcessing them.=0A= The fact that processes are spending a long time in exit() might=0A= be a hint that there are a large # of these NFSv4 Opens to deal with= =0A= when files are being closed implicitly during exit.=0A= =0A= As I mentioned, "nfsstat -c -E" will tell you how many Opens there= =0A= are under the "OpenOwners ..." line.=0A= =0A= >> Well, NFSv3 is not going away any time soon, so if you don't need=0A= >> any of the additional features it offers...=0A= >=0A= >If we did not want the additional features, we definitely would not be=0A= >attempting this.=0A= >=0A= >> a user would have to run their own custom hacked=0A= >> userland NFS client. Although doable, I have never heard of it being don= e.=0A= >=0A= >Alex beat me to libnfs.=0A= And you have users that would want to maliciously access the NFS server=0A= running jobs on this environment? (Other than reverting to NFSv3, allowing= =0A= clients to use non-reserved port#s is probably your other choice, from what= =0A= I can see. Fixing whatever the interaction between nullfs and the NFSv4 mou= nt=0A= is probably won't be fixed quickly, if ever.)=0A= =0A= >What about this as a stopgap measure?=0A= >=0A= >> How explosive would adding SO_REUSEADDR to the NFS client be? It's=0A= >> not a full solution, but it would handle the TIME_WAIT side of the=0A= >> issue.=0A= >=0A= >The kernel NFS networking code is confusing to me. I can't even=0A= >figure out where/how NFSv4 binds a client socket to know if it's=0A= >possible. (Pretty sure the code in sys/nfs/krpc_subr.c is not it.)=0A= It's done in the kernel RPC code, found in the sys/rpc directory.=0A= Mostly in clnt_rc.c and clnt_vc.c.=0A= If there is a timeout for an RPC (slow server, network problem,...),=0A= the code in clnt_rc.c will create a new TCP connection. The old=0A= connection could easily still be around.=0A= As such, I do not believe that SO_REUSEADDR or SO_REUSEPORT=0A= is feasible.=0A= =0A= rick=0A= =0A= Thanks!=0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90>