From nobody Wed Aug 10 14:32:53 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4M2sq96mVdz4Ywm4; Wed, 10 Aug 2022 14:32:53 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4M2sq96Fq8z3wxy; Wed, 10 Aug 2022 14:32:53 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1660141973; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=inUdTv1XmUERfkORzH+ob1kkQvhaQIfIoBmNlsuoiaY=; b=r8hLxqInq79FIDCyiHdf4UmNmE/IbBCEueedjoUWGnQpEGT6F100pX6NtYwkbmbBz/ltdz 3OSHGwwfcX+/D1MSUmWTY6m8iksHSs8Yhe3qZkSrcstNNOc4iCajFq3O+Jbm/K/dw7FEhU Xi7ZOq0x6GDdfOuG1huDLp8mC/3hpW3ohRoPN2Lgn47cx7kFkEtUIq36dt/CJ65zZ/1pYe OPdWDL/mrT/1CwK/EjLvWPlgLojpDnqM18SahJcIexSYSeOd4kPx9YWszh6lWera3FJNFN 8j+0+G7HmA1KMgAy2Z01REOLGrB2UDXPs6hCIeJD1FmVXSONBXaZ8pRxsSiJlw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4M2sq95GkDzFJS; Wed, 10 Aug 2022 14:32:53 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 27AEWrev050085; Wed, 10 Aug 2022 14:32:53 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 27AEWroU050084; Wed, 10 Aug 2022 14:32:53 GMT (envelope-from git) Date: Wed, 10 Aug 2022 14:32:53 GMT Message-Id: <202208101432.27AEWroU050084@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Gleb Smirnoff Subject: git: d88eb4654f37 - main - tcp: address a wire level race with 2 ACKs at the end of TCP handshake List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: glebius X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: d88eb4654f372d0451139a1dbf525a8f2cad1cf8 Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1660141973; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=inUdTv1XmUERfkORzH+ob1kkQvhaQIfIoBmNlsuoiaY=; b=S9+fOWBYOBFPZqaxaKhsozKcdaAgZXGWbaqDqbBdtzXRkqmtCsPAU1acOQTJeCz65nKv5C 7pH/EQS9exULqw4sPqpGW3ibR+pGQjIISlGhx3fdLZ4SIJ23nWExqb3SqoBJ+3aY13Ty72 NlMyH+zSJVvdmNI3MgKeKPoIlSj5SpYBUFlvLL2Wh6q5cEY4xxKHP8gZTXTEGtMrFYTSor mjHMmn7i36uY9xZiycL4K/FLTvzZCW3yPf/is+JEExx0fzGajKnRQ9f9j7/OtMtlMLqdGz +d45fztiPeMw9SYFN3xyHID56+nAZlfpnjxzupbHy+UNQEceclTb4eGKpkq0Wg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1660141973; a=rsa-sha256; cv=none; b=OA9DxFuKy8fnwQqH6Pu4qmBuTh6FfPAkVb20zdUSDEwfHLwDrYUBiDmB0g+eYKrGljqI9K jqFnPM3tUlUxXsA2Pi/0GZdHfnB8w0jUfOCw43wnmDynpX1iDRzZC7ytEQh+5sXoLhCSa1 +CKaJeLT4n1JAfM0zXbSTdVEBeRp1R31X7f4ryWtikBGLQuOfpFNS22CDoF6jEAmWSupzv 4mHYXbyJgdXMVsdYL9kWY5zd81H40VHAE7h4tC16snbxP4JvjYwHQU1WvvYIpdHqS49SZL xFAzhRqnIkcB9Nx7VQQrT1OExutNrOG9m5JVIQShQV2RKFG8r6Uog2PGRUaY+A== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by glebius: URL: https://cgit.FreeBSD.org/src/commit/?id=d88eb4654f372d0451139a1dbf525a8f2cad1cf8 commit d88eb4654f372d0451139a1dbf525a8f2cad1cf8 Author: Gleb Smirnoff AuthorDate: 2022-08-10 14:32:37 +0000 Commit: Gleb Smirnoff CommitDate: 2022-08-10 14:32:37 +0000 tcp: address a wire level race with 2 ACKs at the end of TCP handshake Imagine we are in SYN-RCVD state and two ACKs arrive at the same time, both valid, e.g. coming from the same host and with valid sequence. First packet would locate the listening socket in the inpcb database, write-lock it and start expanding the syncache entry into a socket. Meanwhile second packet would wait on the write lock of the listening socket. First packet will create a new ESTABLISHED socket, free the syncache entry and unlock the listening socket. Second packet would call into syncache_expand(), but this time it will fail as there is no syncache entry. Second packet would generate RST, effectively resetting the remote connection. It seems to me, that it is impossible to solve this problem with just rearranging locks, as the race happens at a wire level. To solve the problem, for an ACK packet arrived on a listening socket, that failed syncache lookup, perform a second non-wildcard lookup right away. That lookup may find the new born socket. Otherwise, we indeed send RST. Tested by: kp Reviewed by: tuexen, rrs PR: 265154 Differential revision: https://reviews.freebsd.org/D36066 --- sys/netinet/tcp_input.c | 45 +++++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c index a4647297c521..8aed90be69a0 100644 --- a/sys/netinet/tcp_input.c +++ b/sys/netinet/tcp_input.c @@ -833,8 +833,9 @@ tcp_input_with_port(struct mbuf **mp, int *offp, int proto, uint16_t port) * PCB, be it a listening one or a synchronized one. The packet * shall not modify its state. */ - lookupflag = (thflags & (TH_ACK|TH_SYN)) == TH_SYN ? - INPLOOKUP_RLOCKPCB : INPLOOKUP_WLOCKPCB; + lookupflag = INPLOOKUP_WILDCARD | + ((thflags & (TH_ACK|TH_SYN)) == TH_SYN ? + INPLOOKUP_RLOCKPCB : INPLOOKUP_WLOCKPCB); findpcb: #ifdef INET6 if (isipv6 && fwd_tag != NULL) { @@ -857,13 +858,12 @@ findpcb: inp = in6_pcblookup(&V_tcbinfo, &ip6->ip6_src, th->th_sport, &next_hop6->sin6_addr, next_hop6->sin6_port ? ntohs(next_hop6->sin6_port) : - th->th_dport, INPLOOKUP_WILDCARD | lookupflag, - m->m_pkthdr.rcvif); + th->th_dport, lookupflag, m->m_pkthdr.rcvif); } } else if (isipv6) { inp = in6_pcblookup_mbuf(&V_tcbinfo, &ip6->ip6_src, - th->th_sport, &ip6->ip6_dst, th->th_dport, - INPLOOKUP_WILDCARD | lookupflag, m->m_pkthdr.rcvif, m); + th->th_sport, &ip6->ip6_dst, th->th_dport, lookupflag, + m->m_pkthdr.rcvif, m); } #endif /* INET6 */ #if defined(INET6) && defined(INET) @@ -889,13 +889,12 @@ findpcb: inp = in_pcblookup(&V_tcbinfo, ip->ip_src, th->th_sport, next_hop->sin_addr, next_hop->sin_port ? ntohs(next_hop->sin_port) : - th->th_dport, INPLOOKUP_WILDCARD | lookupflag, - m->m_pkthdr.rcvif); + th->th_dport, lookupflag, m->m_pkthdr.rcvif); } } else inp = in_pcblookup_mbuf(&V_tcbinfo, ip->ip_src, - th->th_sport, ip->ip_dst, th->th_dport, - INPLOOKUP_WILDCARD | lookupflag, m->m_pkthdr.rcvif, m); + th->th_sport, ip->ip_dst, th->th_dport, lookupflag, + m->m_pkthdr.rcvif, m); #endif /* INET */ /* @@ -904,6 +903,11 @@ findpcb: * XXX MRT Send RST using which routing table? */ if (inp == NULL) { + if (rstreason != 0) { + /* We came here after second (safety) lookup. */ + MPASS((lookupflag & INPLOOKUP_WILDCARD) == 0); + goto dropwithreset; + } /* * Log communication attempts to ports that are not * in use. @@ -1095,13 +1099,26 @@ findpcb: goto dropunlock; } else if (rstreason == 0) { /* - * No syncache entry or ACK was not - * for our SYN/ACK. Send a RST. + * No syncache entry, or ACK was not for our + * SYN/ACK. Do our protection against double + * ACK. If peer sent us 2 ACKs, then for the + * first one syncache_expand() successfully + * converted syncache entry into a socket, + * while we were waiting on the inpcb lock. We + * don't want to sent RST for the second ACK, + * so we perform second lookup without wildcard + * match, hoping to find the new socket. If + * the ACK is stray indeed, rstreason would + * hint the above code that the lookup was a + * second attempt. + * * NB: syncache did its own logging * of the failure cause. */ + INP_WUNLOCK(inp); rstreason = BANDLIM_RST_OPENPORT; - goto dropwithreset; + lookupflag &= ~INPLOOKUP_WILDCARD; + goto findpcb; } tfo_socket_result: if (so == NULL) { @@ -1390,7 +1407,7 @@ tfo_socket_result: * to upgrade the lock, because calling convention for stacks is * write-lock on PCB. If upgrade fails, drop the SYN. */ - if (lookupflag == INPLOOKUP_RLOCKPCB && INP_TRY_UPGRADE(inp) == 0) + if ((lookupflag & INPLOOKUP_RLOCKPCB) && INP_TRY_UPGRADE(inp) == 0) goto dropunlock; tp->t_fb->tfb_tcp_do_segment(m, th, so, tp, drop_hdrlen, tlen, iptos);