Date: Tue, 30 Jun 2020 16:20:45 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Benjamin Kaduk <bjkfbsd@gmail.com>, Rick Macklem <rmacklem@FreeBSD.org> Cc: src-committers <src-committers@freebsd.org>, "svn-src-projects@freebsd.org" <svn-src-projects@freebsd.org> Subject: Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls Message-ID: <QB1PR01MB336441A427B14216A4A20384DD6F0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <CAJ5_RoDe=_s2LZociYXTmdVOP%2BLJDA5HJ7jZkKr7LChffbaH8w@mail.gmail.com> References: <202006301449.05UEnq2x072917@repo.freebsd.org>, <CAJ5_RoDe=_s2LZociYXTmdVOP%2BLJDA5HJ7jZkKr7LChffbaH8w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Benjamin Kaduk wrote:=0A= >On Tue, Jun 30, 2020 at 7:49 AM Rick Macklem <rmacklem@freebsd.org<mailto:= rmacklem@freebsd.org>> wrote:=0A= >Author: rmacklem=0A= >Date: Tue Jun 30 14:49:51 2020=0A= >New Revision: 362798=0A= >URL: https://svnweb.freebsd.org/changeset/base/362798=0A= >=0A= >Log:=0A= > Testing when a server does not respond to TLS handshake records exposed= =0A= > a couple of problems, since the daemon would be in SSL_connect() for 6 m= inutes.=0A= >=0A= > - When the upcall timed out and was retried, the RPCTLS_SYSC_CLSOCKET sy= scall=0A= > was broken and did not return an error upon a retry. It allocated a fi= le=0A= > descriptor for a NULL socket.=0A= > - The socket structure in the kernel could be free'd while the daemon wa= s=0A= > still using it in SSL_connect().=0A= > - Adjust the timeout a retry count so that upcalls are only attempted on= ce=0A= > with a 10minute timeout.=0A= >=0A= >=0A= >10 minutes seems really long! It sounds from the description like the upc= all so >that=0A= >userspace can run SSL_connect() was taking 6 minutes, and you needed 10 >m= inutes so=0A= >as to be longer than the 6 minutes that is "out of your control"?=0A= Well, I think a long timeout here is ok, since a timeout indicates a broken= daemon.=0A= (The upcalls to the local daemon should be reliable and cannot safely be re= done.=0A= In a perfect world, the upcall mechanism would be "exactly once" instead o= f=0A= "at least once". I think an upcall might fail when the mbuf pool in the ke= rnel=0A= is exhausted, but that should be rare.)=0A= =0A= >I feel like there should be some sockopts available to get the SSL_connect= () timeout=0A= >down, so that the upcall timeout doesn't need to be so long, either.=0A= Yes, 6 minutes does seem like a long time. I only discovered this yesterday= when=0A= I simulated a server that did not respond to handshake records.=0A= =0A= I haven't yet dug into the openssl code to see if there is a way to adjust = this=0A= timeout.=0A= I also do not know what a good timeout value for SSL_connect() might be,=0A= even if the daemon can override the default.=0A= =0A= In practice, this should only happen when trying to do an NFS mount on=0A= a broken server which responds to the "STARTTLS" Null RPC, but does not=0A= do the handshake.=0A= Having the mount attempt stuck for 6minutes before failing is not that seri= ous=0A= a problem, imho.=0A= (When systems boot after something like a power failure, delays getting NFS= =0A= mounts done, due to the NFS server/network needing to be up, is fairly=0A= normal. The "-b" option to put the mount attempt in background has been=0A= around for a long time for this.)=0A= =0A= If you happen to know how to set a timeout for SSL_connect() in the openssl= =0A= library, I would be interested in hearing that.=0A= =0A= rick=0A= =0A= -Ben=0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB336441A427B14216A4A20384DD6F0>