Date: Wed, 1 Jul 2020 15:50:11 -0700 From: Benjamin Kaduk <kaduk@mit.edu> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Benjamin Kaduk <bjkfbsd@gmail.com>, Rick Macklem <rmacklem@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-projects@freebsd.org" <svn-src-projects@freebsd.org> Subject: Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls Message-ID: <20200701225011.GH58278@kduck.mit.edu> In-Reply-To: <QB1PR01MB336412382A4903F74CD28F69DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> References: <202006301449.05UEnq2x072917@repo.freebsd.org> <CAJ5_RoDe=_s2LZociYXTmdVOP%2BLJDA5HJ7jZkKr7LChffbaH8w@mail.gmail.com> <QB1PR01MB336441A427B14216A4A20384DD6F0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <20200630163340.GN58278@kduck.mit.edu> <QB1PR01MB3364FE7A60B953C2D730E6F3DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <QB1PR01MB33642D5CC58DF44548BB1911DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <20200701022040.GE58278@kduck.mit.edu> <QB1PR01MB336412382A4903F74CD28F69DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 01, 2020 at 10:47:19PM +0000, Rick Macklem wrote: > Benjamin Kaduk wrote: > >On Wed, Jul 01, 2020 at 01:23:50AM +0000, Rick Macklem wrote: > >> Rick Macklem wrote: > >> >Benjamin Kaduk wrote: > >> >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote: > >> >>> If you happen to know how to set a timeout for SSL_connect() in the openssl > >> >>> library, I would be interested in hearing that. > >> >> > >> >>As it happens, I took a look before I wrote the initial note, and there > >> >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in > >> >>libssl itself; I expect this is actually just the (kernel's!) TCP timeout. > >> >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have a > >> >>reference already) and using setsockopt() to set the timeout(s). > >> >Interesting. The test case I simulated did not close the TCP socket used by > >> >SSL_connect(). The server just replied to the STARTTLS Null RPC, but did not > >> >call SSL_accept(), so the server side just isn't playing "handshake". > >> >"netstat -a" showed the connection as ESTABLISHED. > >> >During debugging, I also used the trick of putting: > >> > while (1) > >> > sleep(1); > >> >right after the SSL_connect() call and, when watching it via "ps", > >> >it would switch from "sbwait" to "nanoslp" after 6 minutes and > >> >a syslog() call showed that SSL_connect() had returned -1. > >> > > >> >So, if the TCP connection was "established", what caused the SSL_connect() > >> >to return with an error (-1) after 6 minutes? > >> > > >> >Now, there is a 6 minute idle timeout in the RPC code for TCP where it, > >> >by default, closes the connection when there is 6 minutes without any > >> >activity. (I have to look if waiting for a reply for the upcall implies "no activity" and >if > >> >this also happens for AF_LOCAL sockets, which is what the upcalls use.) > >> Ok, I figured out what is happening for this test. > >> It is the 6 minute idle timeout, but it occurs at the server end, where the NFS server > >> end shuts down the TCP connection. > > > >Ah, that makes sense. > > > >> Now, the client cannot assume all servers will do this. > > > >Right. > > > >> I'm going to try playing around with doing a shutdown of the socket on the > >> client end after a shorter timeout on the upcall and see if that can get > >> SSL_connect() to return with a failure in the daemon. > >> > >> >Now, if that happens, a SIGPIPE would be posted to the daemon, which > >> >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes > >> >SSL_connect() to return -1 by making the syscall it is doing (read/recv on the > >> >TCP socket sitting in sbwait) return EINTR, or something like that? > >> Ignore this "theory". It was bunk. > > > >Non-ignored signals would cause SSL_connect() to return, but ignored ones > >should be wholly ignored, yes. > > > >> >I can change this 6minute timeout to see if that affects it. > >> Can't be changed, since it is at the server end of the TCP connection. > > > >Can't you set a client-side (e.g., read) timeout, though? > Well, in this case it would be the read (or recv or ??) that is done inside the > SSL_connect(). > > The timer I can control is the one that I had set to 10minutes, which times out > the upcall RPC to the userland daemon. I had set it to 10minutes so the > SSL_connect() would time out first, but now that I know that won't always happen.. > This timer is now set to 15sec and after it times out, the kernel code does a > soshutdown(so, SHUT_RD) in the client, which seems to be sufficient to get > SSL_connect() to return an error. > > This seems sufficient and works ok for the testing I've done. I don't think what you ended up with is wrong, to be clear. But, you have an SSL* as input to SSL_connect(), and you can call SSL_get_fd() on that SSL*, which will give you a socket fd that you can call setsockopt() on, if you're so inclined. The SSL_connect() abstraction barrier is not leak-proof :) > 15sec is pretty arbitrary, but I figure a timeout on the order of seconds is > reasonable for RPC upcalls to the local daemon. (I'd guess that taking even > 1sec to do an upcall would indicate something is broken.) > If others feel 15sec isn't an appropriate timeout, feel free to comment. > (Note that this timeout should only happen when something is broken, like > the server that does a "STARTTLS" reply but does not do a TLS handshake.) Understood. Thanks, Ben
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200701225011.GH58278>