Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Jul 2020 22:47:19 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Benjamin Kaduk <kaduk@mit.edu>
Cc:        Benjamin Kaduk <bjkfbsd@gmail.com>, Rick Macklem <rmacklem@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-projects@freebsd.org" <svn-src-projects@freebsd.org>
Subject:   Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls
Message-ID:  <QB1PR01MB336412382A4903F74CD28F69DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <20200701022040.GE58278@kduck.mit.edu>
References:  <202006301449.05UEnq2x072917@repo.freebsd.org> <CAJ5_RoDe=_s2LZociYXTmdVOP%2BLJDA5HJ7jZkKr7LChffbaH8w@mail.gmail.com> <QB1PR01MB336441A427B14216A4A20384DD6F0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <20200630163340.GN58278@kduck.mit.edu> <QB1PR01MB3364FE7A60B953C2D730E6F3DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <QB1PR01MB33642D5CC58DF44548BB1911DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>, <20200701022040.GE58278@kduck.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Benjamin Kaduk wrote:=0A=
>On Wed, Jul 01, 2020 at 01:23:50AM +0000, Rick Macklem wrote:=0A=
>> Rick Macklem wrote:=0A=
>> >Benjamin Kaduk wrote:=0A=
>> >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote:=0A=
>> >>> If you happen to know how to set a timeout for SSL_connect() in the =
openssl=0A=
>> >>> library, I would be interested in hearing that.=0A=
>> >>=0A=
>> >>As it happens, I took a look before I wrote the initial note, and ther=
e=0A=
>> >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in=
=0A=
>> >>libssl itself; I expect this is actually just the (kernel's!) TCP time=
out.=0A=
>> >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't ha=
ve a=0A=
>> >>reference already) and using setsockopt() to set the timeout(s).=0A=
>> >Interesting. The test case I simulated did not close the TCP socket use=
d by=0A=
>> >SSL_connect(). The server just replied to the STARTTLS Null RPC, but di=
d not=0A=
>> >call SSL_accept(), so the server side just isn't playing "handshake".=
=0A=
>> >"netstat -a" showed the connection as ESTABLISHED.=0A=
>> >During debugging, I also used the trick of putting:=0A=
>> >    while (1)=0A=
>> >        sleep(1);=0A=
>> >right after the SSL_connect() call and, when watching it via "ps",=0A=
>> >it would switch from "sbwait" to "nanoslp" after 6 minutes and=0A=
>> >a syslog() call showed that SSL_connect() had returned -1.=0A=
>> >=0A=
>> >So, if the TCP connection was "established", what caused the SSL_connec=
t()=0A=
>> >to return with an error (-1) after 6 minutes?=0A=
>> >=0A=
>> >Now, there is a 6 minute idle timeout in the RPC code for TCP where it,=
=0A=
>> >by default, closes the connection when there is 6 minutes without any=
=0A=
>> >activity. (I have to look if waiting for a reply for the upcall implies=
 "no activity" and >if=0A=
>> >this also happens for AF_LOCAL sockets, which is what the upcalls use.)=
=0A=
>> Ok, I figured out what is happening for this test.=0A=
>> It is the 6 minute idle timeout, but it occurs at the server end, where =
the NFS server=0A=
>> end shuts down the TCP connection.=0A=
>=0A=
>Ah, that makes sense.=0A=
>=0A=
>> Now, the client cannot assume all servers will do this.=0A=
>=0A=
>Right.=0A=
>=0A=
>> I'm going to try playing around with doing a shutdown of the socket on t=
he=0A=
>> client end after a shorter timeout on the upcall and see if that can get=
=0A=
>> SSL_connect() to return with a failure in the daemon.=0A=
>>=0A=
>> >Now, if that happens, a SIGPIPE would be posted to the daemon, which=0A=
>> >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes=0A=
>> >SSL_connect() to return -1 by making the syscall it is doing (read/recv=
 on the=0A=
>> >TCP socket sitting in sbwait) return EINTR, or something like that?=0A=
>> Ignore this "theory". It was bunk.=0A=
>=0A=
>Non-ignored signals would cause SSL_connect() to return, but ignored ones=
=0A=
>should be wholly ignored, yes.=0A=
>=0A=
>> >I can change this 6minute timeout to see if that affects it.=0A=
>> Can't be changed, since it is at the server end of the TCP connection.=
=0A=
>=0A=
>Can't you set a client-side (e.g., read) timeout, though?=0A=
Well, in this case it would be the read (or recv or ??) that is done inside=
 the=0A=
SSL_connect().=0A=
=0A=
The timer I can control is the one that I had set to 10minutes, which times=
 out=0A=
the upcall RPC to the userland daemon. I had set it to 10minutes so the=0A=
SSL_connect() would time out first, but now that I know that won't always h=
appen..=0A=
This timer is now set to 15sec and after it times out, the kernel code does=
 a=0A=
soshutdown(so, SHUT_RD) in the client, which seems to be sufficient to get=
=0A=
SSL_connect() to return an error.=0A=
=0A=
This seems sufficient and works ok for the testing I've done.=0A=
=0A=
15sec is pretty arbitrary, but I figure a timeout on the order of seconds i=
s=0A=
reasonable for RPC upcalls to the local daemon. (I'd guess that taking even=
=0A=
1sec to do an upcall would indicate something is broken.)=0A=
If others feel 15sec isn't an appropriate timeout, feel free to comment.=0A=
(Note that this timeout should only happen when something is broken, like=
=0A=
 the server that does a "STARTTLS" reply but does not do a TLS handshake.)=
=0A=
=0A=
Thanks for the comments, rick=0A=
=0A=
to return an error.=0A=
-Ben=0A=
=0A=
> (A comment in the krpc code mentions a 5minute timeout in the client,=0A=
>  but I don't see that in the code?)=0A=
>=0A=
> >When you've got upcalls and library functions both talking to sockets it=
=0A=
> >can get interesting.=0A=
> >=0A=
> >Thanks for the comments, rick=0A=
>=0A=
> Correcting myself, rick=0A=
>=0A=
> -Ben=0A=
>=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?QB1PR01MB336412382A4903F74CD28F69DD6C0>