Date: Sun, 12 Jan 2025 07:17:48 +0000 From: Paul Vixie <paul@redbarn.org> To: Mark Johnston <markj@freebsd.org> Cc: freebsd-net@freebsd.org Subject: Re: per-FIB socket binding Message-ID: <2841433.BEx9A2HvPv@localhost> In-Reply-To: <Z4Khe2IfKp6QVH_n@nuc> References: <7772475.EvYhyI6sBW@dhcp-151.access.rits.tisf.net> <3330519.aeNJFYEL58@localhost> <Z4Khe2IfKp6QVH_n@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, January 11, 2025 4:51:07 PM UTC Mark Johnston wrote: > On Sat, Jan 11, 2025 at 06:25:22AM +0000, Paul Vixie wrote: > > On Monday, January 6, 2025 3:56:55 PM UTC Mark Johnston wrote: > > > On Fri, Dec 27, 2024 at 08:48:48AM +0000, Paul Vixie wrote: > > ... > >=20 > > x =3D y || z; > >=20 > > so if this is forbidden by today's freebsd kernel rules, please educate > > me. i know that GCC and CLang will optimize down to the same instruction > > sequence for either, but i prefer the shorter form since in this rare > > case it is clearer. >=20 > We are not super consistent about it, but style(9) does prescribe > explicit comparisons, i.e., "if (count !=3D 0)" rather than "if (count)". > In any case, I'd add a comment, since that assignment is a bit subtle. i'll add this comment: /* ref. ISO/IEC 9899:2023 =A7 6.5.15 */ =2E..and let the reviewers sort it out. > > ... the SYN|ACK will always use the FIB from the interface where > > the SYN arrived (this is in tcp_syncache.c). >=20 > This isn't clear to me. The initial SYN will create a syncache entry in = the >=20 > if (tp->t_state =3D=3D TCPS_LISTEN && SOLISTENING(so)) >=20 > case in tcp_input_with_port(). In this case we define inc.inc_fibnum =3D > so->so_fibnum, i.e., the FIB of the listening socket. Then, > syncache_add() copies inc to sc_inc, so sc_inc.inc_fibnum for the > syncache entry comes from the listening socket, and syncache_respond() > sets the SYN|ACK mbuf FIB with M_SETFIB(m, sc->sc_inc.inc_fibnum). What > am I missing? (Yes, I should actually do an experiement to check the > behaviour.) this is exactly my understanding of that code. the FIB for the SYN|ACK will= be=20 the one from the interface who received the SYN. it's only later after we=20 receive the ACK that completes the 3-way handshake that we will begin to us= e=20 the FIB of the socket. in terms of your earlier question, this means if we'= re=20 going to drop a SYN because it came in on the wrong interface (some form of= =20 uRPF) then the firewall module (PF or IPFW) will be doing that to the SYN w= ell=20 before a syncache entry is ever created. this seems to imply that the chang= es=20 i'm proposing won't be able to permit connections to complete that would ha= ve=20 not have completed -- which you correctly predicted could be surprising. the kernel implementations of uRPF-like features are unaffected. /* * net.inet.ip.rfc1122_strong_es: the address matches, veri= fy * that the packet arrived via the correct interface. */ if (__predict_false(strong_es && ia->ia_ifp !=3D ifp)) { IPSTAT_INC(ips_badaddr); goto bad; } we're not changing the inbound packet's FIB. /* * net.inet.ip.source_address_validation: drop incoming * packets that pretend to be ours. */ if (V_ip_sav && !(ifp->if_flags & IFF_LOOPBACK) && __predict_false(in_localip_fib(ip->ip_src, ifp->if_fib)= )) =20 { IPSTAT_INC(ips_badaddr); goto bad; } unless i misunderstand, this only rejects packets if the source address is = the=20 same as one of the addresses of the interface it arrives on, whereas the=20 documentation made me expect that a source address which was any address of= =20 any interface would be the trigger. i think my proposal doesn't change this. therefore my focus was on the verrevpath, versrcreach, and antispoof featur= es=20 of IPFW; and the antispoof feature of PF. in those tools, the inbound SYN=20 would be stopped before it ever reached the SYN cache. what i then observed is that the only change resulting from my proposal wou= ld=20 be to the transmission of subsequent segments, which would use the interfac= e's=20 =46IB (if nonzero) rather than the socket's FIB (if zero). if upstream uRPF= =20 would have dropped these segments because they're coming from the wrong pla= ce,=20 then indeed failure would result, and even FIN and RST could not be=20 transmitted, which would mean the socket would die a slow death by eventual= =20 timeout. my proposal would prevent this, but i think any surprise in this c= ase=20 would be a welcome one. if actual segments could not be transmitted because= =20 the socket's FIB did not contain a route back to the connection's source, t= hen=20 failure would be more immediate but the resulting surprise (if any) no less= =20 positive. i hope that if that analysis is correct no-one will demand a socket option = or=20 sysctl. to be effective at fixing path symmetry for multihoming, this logic= =20 has to be the default. =2D-- having finished with accept(), i'm now testing the bind() and bindat()=20 changes, which ought to be effective for any UDP responder who uses a separ= ate=20 socket for each interface address it speaks through, which certainly includ= es=20 all NTP and DNS servers i'm aware of, and may also help QUIC. i've begun to wish for a setfib_fd(2) which would let sshd promote the FIB= =20 from an inbound connection to be process-wide (after fork() before exec().)= =20 that way "netstat -rn" would show the routing table actually being used by = the=20 stdin and stdout of that shell. or perhaps we could just rename SO_SETFIB t= o=20 be SO_FIB and allow getsockopt() to return the FIB? (i'm not sure why it wa= s=20 "set only" in the current implementation.) opinions welcomed, as before.=20 =2D-=20 Paul Vixie
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2841433.BEx9A2HvPv>