Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Jan 2025 07:17:48 +0000
From:      Paul Vixie <paul@redbarn.org>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: per-FIB socket binding
Message-ID:  <2841433.BEx9A2HvPv@localhost>
In-Reply-To: <Z4Khe2IfKp6QVH_n@nuc>
References:  <7772475.EvYhyI6sBW@dhcp-151.access.rits.tisf.net> <3330519.aeNJFYEL58@localhost> <Z4Khe2IfKp6QVH_n@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, January 11, 2025 4:51:07 PM UTC Mark Johnston wrote:
> On Sat, Jan 11, 2025 at 06:25:22AM +0000, Paul Vixie wrote:
> > On Monday, January 6, 2025 3:56:55 PM UTC Mark Johnston wrote:
> > > On Fri, Dec 27, 2024 at 08:48:48AM +0000, Paul Vixie wrote:
> > ...
> >=20
> > 	x =3D y || z;
> >=20
> > so if this is forbidden by today's freebsd kernel rules, please educate
> > me. i know that GCC and CLang will optimize down to the same instruction
> > sequence for either, but i prefer the shorter form since in this rare
> > case it is clearer.
>=20
> We are not super consistent about it, but style(9) does prescribe
> explicit comparisons, i.e., "if (count !=3D 0)" rather than "if (count)".
> In any case, I'd add a comment, since that assignment is a bit subtle.

i'll add this comment:

/* ref. ISO/IEC 9899:2023 =A7 6.5.15 */

=2E..and let the reviewers sort it out.

> > ... the SYN|ACK will always use the FIB from the interface where
> > the SYN arrived (this is in tcp_syncache.c).
>=20
> This isn't clear to me.  The initial SYN will create a syncache entry in =
the
>=20
>   if (tp->t_state =3D=3D TCPS_LISTEN && SOLISTENING(so))
>=20
> case in tcp_input_with_port().  In this case we define inc.inc_fibnum =3D
> so->so_fibnum, i.e., the FIB of the listening socket.  Then,
> syncache_add() copies inc to sc_inc, so sc_inc.inc_fibnum for the
> syncache entry comes from the listening socket, and syncache_respond()
> sets the SYN|ACK mbuf FIB with M_SETFIB(m, sc->sc_inc.inc_fibnum).  What
> am I missing?  (Yes, I should actually do an experiement to check the
> behaviour.)

this is exactly my understanding of that code. the FIB for the SYN|ACK will=
 be=20
the one from the interface who received the SYN. it's only later after we=20
receive the ACK that completes the 3-way handshake that we will begin to us=
e=20
the FIB of the socket. in terms of your earlier question, this means if we'=
re=20
going to drop a SYN because it came in on the wrong interface (some form of=
=20
uRPF) then the firewall module (PF or IPFW) will be doing that to the SYN w=
ell=20
before a syncache entry is ever created. this seems to imply that the chang=
es=20
i'm proposing won't be able to permit connections to complete that would ha=
ve=20
not have completed -- which you correctly predicted could be surprising.

the kernel implementations of uRPF-like features are unaffected.

                /*
                 * net.inet.ip.rfc1122_strong_es: the address matches, veri=
fy
                 * that the packet arrived via the correct interface.
                 */
                if (__predict_false(strong_es && ia->ia_ifp !=3D ifp)) {
                        IPSTAT_INC(ips_badaddr);
                        goto bad;
                }

we're not changing the inbound packet's FIB.

                /*
                 * net.inet.ip.source_address_validation: drop incoming
                 * packets that pretend to be ours.
                 */
                if (V_ip_sav && !(ifp->if_flags & IFF_LOOPBACK) &&
                    __predict_false(in_localip_fib(ip->ip_src, ifp->if_fib)=
))                                        =20
	{
                        IPSTAT_INC(ips_badaddr);
                        goto bad;
                }

unless i misunderstand, this only rejects packets if the source address is =
the=20
same as one of the addresses of the interface it arrives on, whereas the=20
documentation made me expect that a source address which was any address of=
=20
any interface would be the trigger. i think my proposal doesn't change this.

therefore my focus was on the verrevpath, versrcreach, and antispoof featur=
es=20
of IPFW; and the antispoof feature of PF. in those tools, the inbound SYN=20
would be stopped before it ever reached the SYN cache.

what i then observed is that the only change resulting from my proposal wou=
ld=20
be to the transmission of subsequent segments, which would use the interfac=
e's=20
=46IB (if nonzero) rather than the socket's FIB (if zero). if upstream uRPF=
=20
would have dropped these segments because they're coming from the wrong pla=
ce,=20
then indeed failure would result, and even FIN and RST could not be=20
transmitted, which would mean the socket would die a slow death by eventual=
=20
timeout. my proposal would prevent this, but i think any surprise in this c=
ase=20
would be a welcome one. if actual segments could not be transmitted because=
=20
the socket's FIB did not contain a route back to the connection's source, t=
hen=20
failure would be more immediate but the resulting surprise (if any) no less=
=20
positive.

i hope that if that analysis is correct no-one will demand a socket option =
or=20
sysctl. to be effective at fixing path symmetry for multihoming, this logic=
=20
has to be the default.

=2D--

having finished with accept(), i'm now testing the bind() and bindat()=20
changes, which ought to be effective for any UDP responder who uses a separ=
ate=20
socket for each interface address it speaks through, which certainly includ=
es=20
all NTP and DNS servers i'm aware of, and may also help QUIC.

i've begun to wish for a setfib_fd(2) which would let sshd promote the FIB=
=20
from an inbound connection to be process-wide (after fork() before exec().)=
=20
that way "netstat -rn" would show the routing table actually being used by =
the=20
stdin and stdout of that shell. or perhaps we could just rename SO_SETFIB t=
o=20
be SO_FIB and allow getsockopt() to return the FIB? (i'm not sure why it wa=
s=20
"set only" in the current implementation.) opinions welcomed, as before.=20

=2D-=20
Paul Vixie





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2841433.BEx9A2HvPv>