From nobody Mon Jun 9 18:55:42 2025 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4bGLjL39B5z5yLWt for ; Mon, 09 Jun 2025 18:55:50 +0000 (UTC) (envelope-from paul@redbarn.org) Received: from util.redbarn.org (util.redbarn.org [24.104.150.222]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "*.redbarn.org", Issuer "RapidSSL TLS RSA CA G1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4bGLjK3Xfxz3Hd0 for ; Mon, 09 Jun 2025 18:55:49 +0000 (UTC) (envelope-from paul@redbarn.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=redbarn.org header.s=util header.b=fxP1HM8c; spf=pass (mx1.freebsd.org: domain of paul@redbarn.org designates 24.104.150.222 as permitted sender) smtp.mailfrom=paul@redbarn.org; dmarc=pass (policy=reject) header.from=redbarn.org Received: from family.redbarn.org (family.redbarn.org [IPv6:2001:559:8000:cd::5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "*.redbarn.org", Issuer "RapidSSL TLS RSA CA G1" (not verified)) by util.redbarn.org (Postfix) with ESMTPS id 7EE58160B98 for ; Mon, 09 Jun 2025 18:55:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=redbarn.org; s=util; t=1749495342; bh=xM4RwVuejZHfWC3tS7N45ulvUkm9VOFR0U5m8Jat638=; h=From:To:Subject:Date:In-Reply-To:References; b=fxP1HM8cDXJysA5GR0iDm6KX20Os6QvnxAVc3Q/U1KH+kf6rPcJ2W1+zp5Cij87TW PqOkcncBpJ7dc58jcOEAXtmhIjyfAr2fc3LFL1j5EZmjQYOKwiB5scUIessI+6dQve tor+aImSnVegZ2odXz59wiEbKx5Pq1b8x2nvgIdQ= Received: from localhost.localnet (dhcp-188.access.rits.tisf.net [24.104.150.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by family.redbarn.org (Postfix) with ESMTPSA id 5B0C79; Mon, 09 Jun 2025 18:55:42 +0000 (UTC) From: Paul Vixie To: freebsd-net@freebsd.org Subject: Re: fibnum3 Date: Mon, 09 Jun 2025 18:55:42 +0000 Message-ID: <2779555.uZKlY2gecq@localhost> Organization: FW In-Reply-To: References: <2029458.jZfb76A358@localhost> List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Spamd-Result: default: False [-3.40 / 15.00]; DWL_DNSWL_LOW(-1.00)[redbarn.org:dkim]; NEURAL_HAM_LONG(-0.97)[-0.973]; NEURAL_HAM_MEDIUM(-0.94)[-0.936]; CTE_CASE(0.50)[]; MID_RHS_NOT_FQDN(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[redbarn.org,reject]; NEURAL_HAM_SHORT(-0.49)[-0.493]; R_SPF_ALLOW(-0.20)[+ip4:24.104.150.0/24]; R_DKIM_ALLOW(-0.20)[redbarn.org:s=util]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[redbarn.org:+]; RCVD_TLS_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; FREEFALL_USER(0.00)[paul]; HAS_ORG_HEADER(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; MISSING_XM_UA(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; MLMMJ_DEST(0.00)[freebsd-net@freebsd.org]; ASN(0.00)[asn:33651, ipnet:24.104.150.0/24, country:US]; MIME_TRACE(0.00)[0:+] X-Rspamd-Queue-Id: 4bGLjK3Xfxz3Hd0 X-Spamd-Bar: --- On Monday, June 9, 2025 5:19:53 PM UTC Mark Johnston wrote: > On Mon, Jun 09, 2025 at 04:42:55AM +0000, Paul Vixie wrote: > > ... > I've made a number of inline comments. tyvm! this is precisely the kind of review i'd hoped for. see inline. > > - so->so_fibnum = head->so_fibnum; > > + if ((so->so_fibnum = head->so_fibnum) == 0) > > + so->so_fibnum = fibnum; an accept()'ed socket [which this is] inherits a fibnum from its listen() socket which inherits from its creating process. the former can be overridden with setsockopt(SO_FIB) and the latter by setfib(). here, we are making this inherited or crafted [whichever it was] fibnum a fallback, so that in the case of a TCP SYN we can preserve the interface's fibnum [in the mbuf header] as the accept() socket's fibnum. i anticipate further complexity in the future, such that there may someday be more than than one fallback priority. so: > The double assignment looks odd to me. It'd be neater to write: > > so->so_fibnum == head->so_fibnum != 0 ? head->so_fibnum : fibnum; it would not be future proof. as written, the extra memory write will be coalesced in the cache's write buffer, so any performance loss would only occur on processors made before ~Y2K. if we're to neaten it, i'd say: if (head->so_fibnum != 0) so->so_fibnum = head->so_fibnum; else so->so_fibnum = fibnum; ...so as to support future more-levels fallbacking. i dislike the read before write at least as much as you dislike the write after write, because the "if" will emit some kind of conditional branch in the machine code [as would your proposed trinary conditional), which then subjects us to branch prediction and occasional pipeline stalls. write coalescence is simply faster on processors made after ~Y2K. but i'll adapt as shown, optimizing for clarity and future- proofing. i'll also make the corresponding change to tcp_input.c. > > -ifa_ifwithaddr(const struct sockaddr *addr) > > +ifa_ifwithaddr_getfib(const struct sockaddr *addr, uint16_t *fibnum) > > @@ -1834,17 +1834,23 @@ ifa_ifwithaddr(const struct sockaddr *addr) > > done: > > + if (fibnum != NULL && ifp != NULL) > > + *fibnum = ifp->if_fib; > > Rather than complicating this function to add an additional return > value, why not let the caller get the fib number directly? They can > just get ifa->ifa_ifp->if_fib. i wasn't willing to retrieve it outside the NET_EPOCH lock, so i changed the call chain to pull it inside a NET_EPOCH_ASSERT() block. > Similarly, I think adding this extra helper is unnecessary. Callers > which want the FIB number can use ifa_ifwithaddr() and extract it > directly. That's how in6_pcbbind_avail() handles fetching ifaddr flags, > for instance. i did not do a general audit to see if other potential race conditions already exist. if you're sure this is a norm we should be living with, i'll make the change you propose. > > --- a/sys/netinet/in_pcb.c > > +++ b/sys/netinet/in_pcb.c > > @@ -646,7 +646,8 @@ in_pcballoc(struct socket *so, struct inpcbinfo > > *pcbinfo)> > > inp->inp_pcbinfo = pcbinfo; > > inp->inp_socket = so; > > inp->inp_cred = crhold(so->so_cred); > > > > - inp->inp_inc.inc_fibnum = so->so_fibnum; > > + /* The FIB number is in both inp_inc and inp_socket; synchronize. */ > > + inp->inp_inc.inc_fibnum = inp->inp_socket->so_fibnum; > > so == inp->inp_socket, so this part of the change is redundant. nice. i'm loathe to reference "so" after its value has been copied. i realize that C doesn't have a borrow checker but i've been treating pointer copies as if they were destructive, and in many cases adding something like "so = NULL" after the copy to ensure that they are. i'll remove the comment but the code is correct. i'd like to see the so_cred dereference also made from inp->inp_socket if that wouldn't break anything. > > @@ -741,6 +743,8 @@ in_pcbbind(struct inpcb *inp, struct sockaddr_in *sin, > > int flags,> > > } > > if (anonport) > > > > inp->inp_flags |= INP_ANONPORT; > > > > + /* The FIB number is in both inp_inc and inp_socket; synchronize. */ > > + sosetfib(inp->inp_socket, inp->inp_inc.inc_fibnum); > > Why is there no such call in the IPv6 case? an oversight -- thanks. > > static int > > in_pcbbind_avail(struct inpcb *inp, const struct in_addr laddr, > > - ifa_ifwithaddr_check((const struct sockaddr *)&sin) == 0) > > + ifa_ifwithaddr_check_getfib((const struct sockaddr *)&sin, > > + fibnum) == 0) > > This excludes the IN_MULTICAST() case--is that intentional? i don't know. the multicast source address isn't checked against the interface table. if that's an oversight i'll fix that and subsequently retrieve the interface FIB. in the spirit of minimalism i only altered the case where an interface is known to possess the local address. > > + lookupflags, fibnum, cred); > > Indentation on continuing lines should be by four spaces. will fix. thanks for your investment of time in this project. -- Paul Vixie