From owner-svn-src-projects@freebsd.org Wed Jul 1 02:20:46 2020 Return-Path: Delivered-To: svn-src-projects@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BB8B435B29A for ; Wed, 1 Jul 2020 02:20:46 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49xQ1n33SPz4DVW; Wed, 1 Jul 2020 02:20:45 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 0612Kfvo024033 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 Jun 2020 22:20:43 -0400 Date: Tue, 30 Jun 2020 19:20:40 -0700 From: Benjamin Kaduk To: Rick Macklem Cc: Benjamin Kaduk , Rick Macklem , src-committers , "svn-src-projects@freebsd.org" Subject: Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls Message-ID: <20200701022040.GE58278@kduck.mit.edu> References: <202006301449.05UEnq2x072917@repo.freebsd.org> <20200630163340.GN58278@kduck.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) X-Rspamd-Queue-Id: 49xQ1n33SPz4DVW X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of kaduk@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=kaduk@mit.edu X-Spamd-Result: default: False [-2.37 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[24.16.140.251:received]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:18.9.28.0/24]; NEURAL_HAM_LONG(-1.03)[-1.029]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[mit.edu]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.15)[0.154]; RCVD_IN_DNSWL_MED(-0.20)[18.9.28.11:from]; NEURAL_HAM_MEDIUM(-0.99)[-0.994]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3, ipnet:18.9.0.0/16, country:US]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; RWL_MAILSPIKE_VERYGOOD(0.00)[18.9.28.11:from]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2020 02:20:46 -0000 On Wed, Jul 01, 2020 at 01:23:50AM +0000, Rick Macklem wrote: > Rick Macklem wrote: > >Benjamin Kaduk wrote: > >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote: > >>> If you happen to know how to set a timeout for SSL_connect() in the openssl > >>> library, I would be interested in hearing that. > >> > >>As it happens, I took a look before I wrote the initial note, and there > >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in > >>libssl itself; I expect this is actually just the (kernel's!) TCP timeout. > >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have a > >>reference already) and using setsockopt() to set the timeout(s). > >Interesting. The test case I simulated did not close the TCP socket used by > >SSL_connect(). The server just replied to the STARTTLS Null RPC, but did not > >call SSL_accept(), so the server side just isn't playing "handshake". > >"netstat -a" showed the connection as ESTABLISHED. > >During debugging, I also used the trick of putting: > > while (1) > > sleep(1); > >right after the SSL_connect() call and, when watching it via "ps", > >it would switch from "sbwait" to "nanoslp" after 6 minutes and > >a syslog() call showed that SSL_connect() had returned -1. > > > >So, if the TCP connection was "established", what caused the SSL_connect() > >to return with an error (-1) after 6 minutes? > > > >Now, there is a 6 minute idle timeout in the RPC code for TCP where it, > >by default, closes the connection when there is 6 minutes without any > >activity. (I have to look if waiting for a reply for the upcall implies "no activity" and >if > >this also happens for AF_LOCAL sockets, which is what the upcalls use.) > Ok, I figured out what is happening for this test. > It is the 6 minute idle timeout, but it occurs at the server end, where the NFS server > end shuts down the TCP connection. Ah, that makes sense. > Now, the client cannot assume all servers will do this. Right. > I'm going to try playing around with doing a shutdown of the socket on the > client end after a shorter timeout on the upcall and see if that can get > SSL_connect() to return with a failure in the daemon. > > >Now, if that happens, a SIGPIPE would be posted to the daemon, which > >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes > >SSL_connect() to return -1 by making the syscall it is doing (read/recv on the > >TCP socket sitting in sbwait) return EINTR, or something like that? > Ignore this "theory". It was bunk. Non-ignored signals would cause SSL_connect() to return, but ignored ones should be wholly ignored, yes. > >I can change this 6minute timeout to see if that affects it. > Can't be changed, since it is at the server end of the TCP connection. Can't you set a client-side (e.g., read) timeout, though? -Ben > (A comment in the krpc code mentions a 5minute timeout in the client, > but I don't see that in the code?) > > >When you've got upcalls and library functions both talking to sockets it > >can get interesting. > > > >Thanks for the comments, rick > > Correcting myself, rick > > -Ben >