From owner-svn-src-projects@freebsd.org Tue Jun 30 16:33:46 2020 Return-Path: Delivered-To: svn-src-projects@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 005A5350159 for ; Tue, 30 Jun 2020 16:33:46 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 49x90T2bDFz4f9p; Tue, 30 Jun 2020 16:33:45 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 05UGXegL006615 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 Jun 2020 12:33:43 -0400 Date: Tue, 30 Jun 2020 09:33:40 -0700 From: Benjamin Kaduk To: Rick Macklem Cc: Benjamin Kaduk , Rick Macklem , src-committers , "svn-src-projects@freebsd.org" Subject: Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls Message-ID: <20200630163340.GN58278@kduck.mit.edu> References: <202006301449.05UEnq2x072917@repo.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) X-Rspamd-Queue-Id: 49x90T2bDFz4f9p X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of kaduk@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=kaduk@mit.edu X-Spamd-Result: default: False [-2.96 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[24.16.140.251:received]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:18.9.28.0/24]; NEURAL_HAM_LONG(-1.03)[-1.027]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[mit.edu]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[18.9.28.11:from]; NEURAL_HAM_SHORT(-0.44)[-0.445]; NEURAL_HAM_MEDIUM(-0.99)[-0.989]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3, ipnet:18.9.0.0/16, country:US]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; RWL_MAILSPIKE_VERYGOOD(0.00)[18.9.28.11:from]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2020 16:33:46 -0000 On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote: > Benjamin Kaduk wrote: > >On Tue, Jun 30, 2020 at 7:49 AM Rick Macklem > wrote: > >Author: rmacklem > >Date: Tue Jun 30 14:49:51 2020 > >New Revision: 362798 > >URL: https://svnweb.freebsd.org/changeset/base/362798 > > > >Log: > > Testing when a server does not respond to TLS handshake records exposed > > a couple of problems, since the daemon would be in SSL_connect() for 6 minutes. > > > > - When the upcall timed out and was retried, the RPCTLS_SYSC_CLSOCKET syscall > > was broken and did not return an error upon a retry. It allocated a file > > descriptor for a NULL socket. > > - The socket structure in the kernel could be free'd while the daemon was > > still using it in SSL_connect(). > > - Adjust the timeout a retry count so that upcalls are only attempted once > > with a 10minute timeout. > > > > > >10 minutes seems really long! It sounds from the description like the upcall so >that > >userspace can run SSL_connect() was taking 6 minutes, and you needed 10 >minutes so > >as to be longer than the 6 minutes that is "out of your control"? > Well, I think a long timeout here is ok, since a timeout indicates a broken daemon. > (The upcalls to the local daemon should be reliable and cannot safely be redone. > In a perfect world, the upcall mechanism would be "exactly once" instead of > "at least once". I think an upcall might fail when the mbuf pool in the kernel > is exhausted, but that should be rare.) > > >I feel like there should be some sockopts available to get the SSL_connect() timeout > >down, so that the upcall timeout doesn't need to be so long, either. > Yes, 6 minutes does seem like a long time. I only discovered this yesterday when > I simulated a server that did not respond to handshake records. > > I haven't yet dug into the openssl code to see if there is a way to adjust this > timeout. > I also do not know what a good timeout value for SSL_connect() might be, > even if the daemon can override the default. > > In practice, this should only happen when trying to do an NFS mount on > a broken server which responds to the "STARTTLS" Null RPC, but does not > do the handshake. > Having the mount attempt stuck for 6minutes before failing is not that serious > a problem, imho. > (When systems boot after something like a power failure, delays getting NFS > mounts done, due to the NFS server/network needing to be up, is fairly > normal. The "-b" option to put the mount attempt in background has been > around for a long time for this.) > > If you happen to know how to set a timeout for SSL_connect() in the openssl > library, I would be interested in hearing that. As it happens, I took a look before I wrote the initial note, and there doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in libssl itself; I expect this is actually just the (kernel's!) TCP timeout. So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have a reference already) and using setsockopt() to set the timeout(s). -Ben