From owner-dev-commits-src-main@freebsd.org Mon Sep 6 21:26:45 2021 Return-Path: Delivered-To: dev-commits-src-main@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9DE1667BB23; Mon, 6 Sep 2021 21:26:45 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4H3M0j42MFz3C2r; Mon, 6 Sep 2021 21:26:45 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 6A23326051; Mon, 6 Sep 2021 21:26:45 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 186LQjPC009509; Mon, 6 Sep 2021 21:26:45 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 186LQjOm009508; Mon, 6 Sep 2021 21:26:45 GMT (envelope-from git) Date: Mon, 6 Sep 2021 21:26:45 GMT Message-Id: <202109062126.186LQjOm009508@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: "Alexander V. Chernikov" Subject: git: 936f4a42fa2a - main - lltable: do not require prefix lookup when checking lle allocation rules. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: melifaro X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 936f4a42fa2a23d21f8f14a8c33627a8207b4b3b Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-main@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for the main branch of the src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2021 21:26:45 -0000 The branch main has been updated by melifaro: URL: https://cgit.FreeBSD.org/src/commit/?id=936f4a42fa2a23d21f8f14a8c33627a8207b4b3b commit 936f4a42fa2a23d21f8f14a8c33627a8207b4b3b Author: Alexander V. Chernikov AuthorDate: 2021-09-03 11:48:36 +0000 Commit: Alexander V. Chernikov CommitDate: 2021-09-06 21:03:22 +0000 lltable: do not require prefix lookup when checking lle allocation rules. With the new FIB_ALGO infrastructure, nearly all subsystems use fib[46]_lookup() functions, which provides lockless lookups. A number of places remains that uses old-style lookup functions, that still requires RIB read lock to return the result. One of such places is arp processing code. FIB_ALGO implementation makes some tradeoffs, resulting in (relatively) prolonged periods of holding RIB_WLOCK. If the lock is held and datapath competes for it, the RX ring may get blocked, ending in traffic delays and losses. As currently arp processing is performed directly in the interrupt handler, handling ARP replies triggers the problem descibed above when the amount of ARP replies is high. To be more specific, prior to creating new ARP entry, routing lookup for the entry address in interface fib is executed. The following conditions are the verified: 1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable, failure is returned. The only exception are host routes w/ gateway==address. 2. If the routing lookup returns different interface and non-host route, we want to support the use case of having multiple interfaces with the same prefix. In fact, the current code just checks if the returned prefix covers target address (always true) and effectively allow allocating ARP entries for any directly-reachable prefix, regardless of its interface. Change the code to perform the following: 1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix. 2) Rewrite first condition check using nexthop flags (1:1 match) 3) Rewrite second condition to check for interface addresses matching target address on the input interface. Differential Revision: https://reviews.freebsd.org/D31824 Reviewed by: ae MFC after: 1 week PR: 257965 --- sys/netinet/in.c | 73 ++++++++++++++++++-------------------------------------- 1 file changed, 23 insertions(+), 50 deletions(-) diff --git a/sys/netinet/in.c b/sys/netinet/in.c index f9b46b414007..ce03c0b6d4cb 100644 --- a/sys/netinet/in.c +++ b/sys/netinet/in.c @@ -64,6 +64,7 @@ __FBSDID("$FreeBSD$"); #include #include +#include #include #include #include @@ -1358,48 +1359,32 @@ in_lltable_free_entry(struct lltable *llt, struct llentry *lle) static int in_lltable_rtcheck(struct ifnet *ifp, u_int flags, const struct sockaddr *l3addr) { - struct rt_addrinfo info; - struct sockaddr_in rt_key, rt_mask; - struct sockaddr rt_gateway; - int rt_flags; + struct nhop_object *nh; + struct in_addr addr; KASSERT(l3addr->sa_family == AF_INET, ("sin_family %d", l3addr->sa_family)); - bzero(&rt_key, sizeof(rt_key)); - rt_key.sin_len = sizeof(rt_key); - bzero(&rt_mask, sizeof(rt_mask)); - rt_mask.sin_len = sizeof(rt_mask); - bzero(&rt_gateway, sizeof(rt_gateway)); - rt_gateway.sa_len = sizeof(rt_gateway); + addr = ((const struct sockaddr_in *)l3addr)->sin_addr; - bzero(&info, sizeof(info)); - info.rti_info[RTAX_DST] = (struct sockaddr *)&rt_key; - info.rti_info[RTAX_NETMASK] = (struct sockaddr *)&rt_mask; - info.rti_info[RTAX_GATEWAY] = (struct sockaddr *)&rt_gateway; - - if (rib_lookup_info(ifp->if_fib, l3addr, NHR_REF, 0, &info) != 0) + nh = fib4_lookup(ifp->if_fib, addr, 0, NHR_NONE, 0); + if (nh == NULL) return (EINVAL); - rt_flags = info.rti_flags; - /* * If the gateway for an existing host route matches the target L3 * address, which is a special route inserted by some implementation * such as MANET, and the interface is of the correct type, then * allow for ARP to proceed. */ - if (rt_flags & RTF_GATEWAY) { - if (!(rt_flags & RTF_HOST) || !info.rti_ifp || - info.rti_ifp->if_type != IFT_ETHER || - (info.rti_ifp->if_flags & (IFF_NOARP | IFF_STATICARP)) != 0 || - memcmp(rt_gateway.sa_data, l3addr->sa_data, + if (nh->nh_flags & NHF_GATEWAY) { + if (!(nh->nh_flags & NHF_HOST) || nh->nh_ifp->if_type != IFT_ETHER || + (nh->nh_ifp->if_flags & (IFF_NOARP | IFF_STATICARP)) != 0 || + memcmp(nh->gw_sa.sa_data, l3addr->sa_data, sizeof(in_addr_t)) != 0) { - rib_free_info(&info); return (EINVAL); } } - rib_free_info(&info); /* * Make sure that at least the destination address is covered @@ -1408,35 +1393,23 @@ in_lltable_rtcheck(struct ifnet *ifp, u_int flags, const struct sockaddr *l3addr * on one interface and the corresponding outgoing packet leaves * another interface. */ - if (!(rt_flags & RTF_HOST) && info.rti_ifp != ifp) { - const char *sa, *mask, *addr, *lim; - const struct sockaddr_in *l3sin; + if ((nh->nh_ifp != ifp) && (nh->nh_flags & NHF_HOST) == 0) { + struct in_ifaddr *ia = (struct in_ifaddr *)ifaof_ifpforaddr(l3addr, ifp); + struct in_addr dst_addr, mask_addr; - mask = (const char *)&rt_mask; - /* - * Just being extra cautious to avoid some custom - * code getting into trouble. - */ - if ((info.rti_addrs & RTA_NETMASK) == 0) + if (ia == NULL) return (EINVAL); - sa = (const char *)&rt_key; - addr = (const char *)l3addr; - l3sin = (const struct sockaddr_in *)l3addr; - lim = addr + l3sin->sin_len; - - for ( ; addr < lim; sa++, mask++, addr++) { - if ((*sa ^ *addr) & *mask) { -#ifdef DIAGNOSTIC - char addrbuf[INET_ADDRSTRLEN]; + /* + * ifaof_ifpforaddr() returns _best matching_ IFA. + * It is possible that ifa prefix does not cover our address. + * Explicitly verify and fail if that's the case. + */ + dst_addr = IA_SIN(ia)->sin_addr; + mask_addr.s_addr = htonl(ia->ia_subnetmask); - log(LOG_INFO, "IPv4 address: \"%s\" " - "is not on the network\n", - inet_ntoa_r(l3sin->sin_addr, addrbuf)); -#endif - return (EINVAL); - } - } + if (!IN_ARE_MASKED_ADDR_EQUAL(dst_addr, addr, mask_addr)) + return (EINVAL); } return (0);