From owner-svn-src-projects@freebsd.org Wed Jul 1 01:23:54 2020 Return-Path: Delivered-To: svn-src-projects@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3859D35A73F for ; Wed, 1 Jul 2020 01:23:54 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660061.outbound.protection.outlook.com [40.107.66.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49xNm90TSNz4BnK; Wed, 1 Jul 2020 01:23:52 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HZHCo1Gfz749WGJjRZdkUmvL0dSu6lXotzWF5A8QF//lOWawrv+JaGEXrE9bWY/ucE7jY2C39X5vK/mDxJm/n0RnFsMDPaIYiGWx2DnAn9ZyBVRTSzeGd9zGNtJAJC0yXjksIGce+eQbgC7CsoAnTJj48dvjwb8SRxNXrWKXbzT8jmsjZqPRWd1Y7LLzAkFg0uSZBC6bv9gx+0s8wKEheamzcbDc/G3kv5C+cZqIlk9w6tZwhVek3+Z57xw005tE15KNtFjK1Tar/fLa+0WrScT5IvxmQGIi7D8vzmrd8qFn6K6bUnVtitG/kqhDCuqj5WKUd+Pl44WobC+M3lrkdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ozr9pB6zgv7DLg4hWE0f5QBtT3J4oR1Y4QbJyTZT18U=; b=EqePFfvSyJznboavWrVBicBBk4H3T/Pj7DAEkZYoy4hslnrikrjEnc1G1tukil8Dox0xsIzu4Gs/iesVQSo/Hv6bdaIm4XYWX60d5jC5BCapIpTOHEBFRq5RfvohLARwgbIdkoc6gc+hfsgTHqoYwzo4cWGkzWoj4HpV4fdOnoubqZN7sUxyyipseNhWE5EZl+/j3xL+N9noF/WT2CqxXn9tt0qTPzmJw84NV5mTJ+y+4iJr++qDxZTyenEA9vNa6EPFj2+MqRg7NhoONCxCqxcd6rFpMDg51XTKnqGDdMb78xU07gG0+pPGJBnjaW+yJ9fYHMWsSgAgkPbgk8E3LA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ozr9pB6zgv7DLg4hWE0f5QBtT3J4oR1Y4QbJyTZT18U=; b=JvHzFsorzuHaJB+D4VYdm6FKx+OyjaWKw93SkL46vn+7OIQ2IAo6qt4BRYZPabQZh4BXTWjkglOYSHioFwUafpT5kk2DjD5nA+5RzJliYC1i10rIX1KeswRXk/YNAlQz3cOryBY2+bIOk5B9CkvEsdd+sFdQmsQr78VMJR9VN8DWhcMPUM9Y/ubFVcvEWxlotDv/GEA32IhJwhabOyNsun+LUyl6Csx0v95wTN2EUG+OQJYKRWE0yGmE7ZGV1rxKtKtgRY6x8VkezYjT7u0LwP2J6kTc+bCHAIe6RG/DHGeMOmaSbARBrL8tHVagOMWlF5hziD6n3NI/hPXY7kOjFA== Received: from QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:38::14) by YQXPR01MB3463.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c00:44::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3131.20; Wed, 1 Jul 2020 01:23:50 +0000 Received: from QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM ([fe80::60f3:4ca2:8a4a:1e91]) by QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM ([fe80::60f3:4ca2:8a4a:1e91%7]) with mapi id 15.20.3131.028; Wed, 1 Jul 2020 01:23:50 +0000 From: Rick Macklem To: Benjamin Kaduk CC: Benjamin Kaduk , Rick Macklem , src-committers , "svn-src-projects@freebsd.org" Subject: Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls Thread-Topic: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls Thread-Index: AQHWTu29MoFCP0YO4kaq4Sj4aYWIUajxUeOAgAAARmyAAAhnAIAAg0Q8gAANlPs= Date: Wed, 1 Jul 2020 01:23:50 +0000 Message-ID: References: <202006301449.05UEnq2x072917@repo.freebsd.org> , <20200630163340.GN58278@kduck.mit.edu>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ca60a95d-369d-4ba6-4d10-08d81d5d693a x-ms-traffictypediagnostic: YQXPR01MB3463: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 04519BA941 x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: +dg6EQ4GXaYmq0BhvOtUObazblbiHuKsPvrrYXuNOKOpDtOUiDtXXCuJILKiOxnNtHYP0YCEPFxEWwvESbM02EWRMpgB5YMvVTjGu5cGZcrBcG5b1KqUy8vdGt5cz5w6gEmLgaybo7ihsYJTO8t7DZ8htbNmubqqJG4e/jm4ULJjFlhbEeio6RpTX5w5b7IthdYOO0iedNCp8ez0pMnGF/SirR4YAyS4u7lx3zQXTQYGDScSLCUpyE6UzB2tsJWHDcoc9ohtIj7mgK0zV4qijkKMvVAs6oeiRP5aKU4gb2TC7UKFXsu4g5eLdPVwa5uW+Z/XMitZ+db/Ozkz5Vvn+VTXG38zyrrFbTt+roOTI20dMYyImVWCmgg0W9CwsgSb9t8eR0Po/xWhBJaG9JzA8A== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFTY:; SFS:(136003)(366004)(39860400002)(346002)(396003)(376002)(786003)(5660300002)(966005)(316002)(2906002)(52536014)(478600001)(6506007)(86362001)(4326008)(54906003)(64756008)(6916009)(66446008)(9686003)(33656002)(7696005)(55016002)(83380400001)(8676002)(186003)(2940100002)(71200400001)(66946007)(91956017)(66476007)(76116006)(66556008)(8936002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: 5HJv2hcW2BbM/PV83g3ynUM99cbEsQNjef6EkRId5wcGsO0P0PvX1QWQL1dB1d+9T07LdiNn3lgHccKsi+9/2J1cFXv0pDDcX51lwk28pecqpbXiOOZbMw2/Nd7sn8nuuB5w/D+gBh9Ew1Lq8CefH/Q4LROfu8JGEsZiLu5f5d98bLGkN27UJJW2hPylDAzziK6/Z2G3xPt+ZQRJNmpZjU37Ln41UbT4WQfrqzcRy7a5QD0FqesbZJjfT+hpdzrl20/FYnb9iTC+tYfdK9qaskqU6YhTU5kdkGIYABWic9sNJKg9r9mMUNX9yRfSxK3DwznvuUbMvYTurR4T7/0Tgpdc1P0L9c2HlLY38rVN2TL/zHpLCYsfKhjwAAiCn5zAJhToPsL1LlfNiM5wAeWpR5o0h7n2Fhme7giRIsXp6wx5DZDy7T4DfL92XV1bnZbOuE7wwVmV204aV2SmYoHssfNgy2a0HIENr2xk+hKqeaRSpxKJ3bUYmhwgWd65xBDij+z7cN3vaW87XWT40lf17NFPEXzKMZG2LnwL9elZrOs= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: ca60a95d-369d-4ba6-4d10-08d81d5d693a X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jul 2020 01:23:50.6788 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: GPcCY1G5LfltHhUUyISHSqQmn7gUNe6iembeQlNa2x75zjZQjgPzshq9n2+hBI8Hqjy6G1cM2H5Bt2OrCKDJTw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB3463 X-Rspamd-Queue-Id: 49xNm90TSNz4BnK X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector1 header.b=JvHzFsor; dmarc=none; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.61 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.88 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.61:from]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-1.02)[-1.015]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[uoguelph.ca]; RCPT_COUNT_FIVE(0.00)[5]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; NEURAL_HAM_SHORT(-0.36)[-0.364]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.61:from]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1] X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2020 01:23:54 -0000 Rick Macklem wrote:=0A= >Benjamin Kaduk wrote:=0A= >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote:=0A= >>> Benjamin Kaduk wrote:=0A= >>> >On Tue, Jun 30, 2020 at 7:49 AM Rick Macklem > wrote:=0A= >>> >Author: rmacklem=0A= >>> >Date: Tue Jun 30 14:49:51 2020=0A= >>> >New Revision: 362798=0A= >>> >URL: https://svnweb.freebsd.org/changeset/base/362798=0A= >>> >=0A= >>> >Log:=0A= >>> > Testing when a server does not respond to TLS handshake records expo= sed=0A= >>> > a couple of problems, since the daemon would be in SSL_connect() for= 6 minutes.=0A= >>> >=0A= >>> > - When the upcall timed out and was retried, the RPCTLS_SYSC_CLSOCKE= T syscall=0A= >>> > was broken and did not return an error upon a retry. It allocated = a file=0A= >>> > descriptor for a NULL socket.=0A= >>> > - The socket structure in the kernel could be free'd while the daemo= n was=0A= >>> > still using it in SSL_connect().=0A= >>> > - Adjust the timeout a retry count so that upcalls are only attempte= d once=0A= >>> > with a 10minute timeout.=0A= >>> >=0A= >>> >=0A= >>> >10 minutes seems really long! It sounds from the description like the= upcall so >>that=0A= >>> >userspace can run SSL_connect() was taking 6 minutes, and you needed 1= 0 >>>minutes so=0A= >>> >as to be longer than the 6 minutes that is "out of your control"?=0A= >>> Well, I think a long timeout here is ok, since a timeout indicates a br= oken daemon.=0A= >>> (The upcalls to the local daemon should be reliable and cannot safely b= e redone.=0A= >>> In a perfect world, the upcall mechanism would be "exactly once" inste= ad of=0A= >>> "at least once". I think an upcall might fail when the mbuf pool in th= e kernel=0A= >>> is exhausted, but that should be rare.)=0A= >>>=0A= >>> >I feel like there should be some sockopts available to get the SSL_con= nect() timeout=0A= >>> >down, so that the upcall timeout doesn't need to be so long, either.= =0A= >>> Yes, 6 minutes does seem like a long time. I only discovered this yeste= rday when=0A= >>> I simulated a server that did not respond to handshake records.=0A= >>>=0A= >>> I haven't yet dug into the openssl code to see if there is a way to adj= ust this=0A= >>> timeout.=0A= >>> I also do not know what a good timeout value for SSL_connect() might be= ,=0A= >>> even if the daemon can override the default.=0A= >>>=0A= >>> In practice, this should only happen when trying to do an NFS mount on= =0A= >>> a broken server which responds to the "STARTTLS" Null RPC, but does not= =0A= >>> do the handshake.=0A= >>> Having the mount attempt stuck for 6minutes before failing is not that = serious=0A= >>> a problem, imho.=0A= >>> (When systems boot after something like a power failure, delays getting= NFS=0A= >>> mounts done, due to the NFS server/network needing to be up, is fairly= =0A= >>> normal. The "-b" option to put the mount attempt in background has bee= n=0A= >>> around for a long time for this.)=0A= >>>=0A= >>> If you happen to know how to set a timeout for SSL_connect() in the ope= nssl=0A= >>> library, I would be interested in hearing that.=0A= >>=0A= >>As it happens, I took a look before I wrote the initial note, and there= =0A= >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in=0A= >>libssl itself; I expect this is actually just the (kernel's!) TCP timeout= .=0A= >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have = a=0A= >>reference already) and using setsockopt() to set the timeout(s).=0A= >Interesting. The test case I simulated did not close the TCP socket used b= y=0A= >SSL_connect(). The server just replied to the STARTTLS Null RPC, but did n= ot=0A= >call SSL_accept(), so the server side just isn't playing "handshake".=0A= >"netstat -a" showed the connection as ESTABLISHED.=0A= >During debugging, I also used the trick of putting:=0A= > while (1)=0A= > sleep(1);=0A= >right after the SSL_connect() call and, when watching it via "ps",=0A= >it would switch from "sbwait" to "nanoslp" after 6 minutes and=0A= >a syslog() call showed that SSL_connect() had returned -1.=0A= >=0A= >So, if the TCP connection was "established", what caused the SSL_connect()= =0A= >to return with an error (-1) after 6 minutes?=0A= >=0A= >Now, there is a 6 minute idle timeout in the RPC code for TCP where it,=0A= >by default, closes the connection when there is 6 minutes without any=0A= >activity. (I have to look if waiting for a reply for the upcall implies "n= o activity" and >if=0A= >this also happens for AF_LOCAL sockets, which is what the upcalls use.)=0A= Ok, I figured out what is happening for this test.=0A= It is the 6 minute idle timeout, but it occurs at the server end, where the= NFS server=0A= end shuts down the TCP connection.=0A= Now, the client cannot assume all servers will do this.=0A= I'm going to try playing around with doing a shutdown of the socket on the= =0A= client end after a shorter timeout on the upcall and see if that can get=0A= SSL_connect() to return with a failure in the daemon.=0A= =0A= >Now, if that happens, a SIGPIPE would be posted to the daemon, which=0A= >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes=0A= >SSL_connect() to return -1 by making the syscall it is doing (read/recv on= the=0A= >TCP socket sitting in sbwait) return EINTR, or something like that?=0A= Ignore this "theory". It was bunk.=0A= =0A= >I can change this 6minute timeout to see if that affects it.=0A= Can't be changed, since it is at the server end of the TCP connection.=0A= (A comment in the krpc code mentions a 5minute timeout in the client,=0A= but I don't see that in the code?)=0A= =0A= >When you've got upcalls and library functions both talking to sockets it= =0A= >can get interesting.=0A= >=0A= >Thanks for the comments, rick=0A= =0A= Correcting myself, rick=0A= =0A= -Ben=0A= =0A=