Date: Tue, 18 Feb 2025 05:16:07 +0000 From: Paul Vixie <paul@redbarn.org> To: freebsd-net@freebsd.org, Mark Johnston <markj@freebsd.org> Cc: Santiago Martinez <sm@codenetworks.net>, Jamie Landeg-Jones <jamie@catflap.org> Subject: fibnum2.diff (Re: per-FIB socket binding) Message-ID: <2522290.jE0xQCEvom@localhost> In-Reply-To: <Z3v9R1YnDX9AWMUm@nuc> References: <7772475.EvYhyI6sBW@dhcp-151.access.rits.tisf.net> <38589000.XM6RcZxFsP@dhcp-151.access.rits.tisf.net> <Z3v9R1YnDX9AWMUm@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --nextPart2449487.NG923GbCHz Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" this is the second fibnum patch, which replaces (doesn't add to) the first.= =20 some blanks/tabs/margins lint was incidentally fixed, a few comments were=20 added, the API of several existing functions was changed, and some wrappers= =20 were added to others. as explained inline below, this handles both TCP and = UDP=20 listeners now. i did not add an SO_FIB operator nor shim SO_SETFIB since th= ose=20 are independent of this socket-related work, needed to get path symmetry fo= r=20 shell-related listeners like sshd. comments, questions, and especially testing results would be very welcome. vixie =2D-- On Monday, January 6, 2025 3:56:55 PM UTC Mark Johnston wrote: > On Fri, Dec 27, 2024 at 08:48:48AM +0000, Paul Vixie wrote: > > On Tuesday, December 24, 2024 3:34:45 AM UTC Santiago Martinez wrote: > > > here=E2=80=99s another user of fibs. Each of our servers have multipl= e fibs and > > > jails with fibs. I like the proposed. > >=20 > > On Tuesday, December 24, 2024 5:06:32 AM UTC Jamie Landeg-Jones wrote: > > > I like that. I isolate 5 seperate networks by assigning a fib to each > > > interface, and was initially surprised that I had to jump through ipfw > > > hoops to get it to work properly, in fact at the end of my ipfw rules > > > for these interfaces, just to guarantee no leaking, ... > > >=20 > > > So, yes, I agree that it's crocky, and your proposal is how I origina= lly > > > expected it to work, and indeed, I can so no reason for it not to work > > > that way, but am prepared to be enlightened if anyone else has an > > > opinion on this. > >=20 > > Groovy. See attached patch. This is just for TCP since I have no way to > > test SCTP and I think UDP will have to be handled at the application > > layer. There are two one line changes here. i was wrong, UDP doesn't need application layer changes. this patch is long= er=20 than the first one, since it (1) does TCP in the way mark johnston suggeste= d,=20 (2) also handles UDP, and (3) is tested against 14.2-P1 now although that=20 didn't change the patch. for what it's worth SCTP looks simple but i really= =20 think it should be patched later by someone who can test it. (On Monday, January 6, 2025 3:56:55 PM UTC Mark Johnston wrote:) > One side effect of the patch is that a service listening in FIB 0 that > has no route to the source address of a connection attempt from a > different FIB would previously not accept such a connection, but now it > will. Perhaps that's drastic enough to warrant a sysctl and/or sockopt > to control this behaviour. i hope not. moving from unintentional configuration-dependent failure to=20 unintentional configuration-dependent success is not a change that freebsd= =20 usually optionalizes. > It would be better to pass the fibnum to solisten_clone() and assign it > there. Otherwise, the value of so_fibnum will be wrong for a short > window during which the socket is passed to MAC and other hooks, which > might have some surprising effects. done. =2D-=20 Paul Vixie --nextPart2449487.NG923GbCHz Content-Disposition: attachment; filename="fibnum2.diff" Content-Transfer-Encoding: 7Bit Content-Type: text/x-patch; charset="x-UTF_8J"; name="fibnum2.diff" diff --git a/sys/kern/uipc_socket.c b/sys/kern/uipc_socket.c index 84d817c9ada1..5021e135e752 100644 --- a/sys/kern/uipc_socket.c +++ b/sys/kern/uipc_socket.c @@ -988,13 +988,13 @@ SYSCTL_TIMEVAL_SEC(_kern_ipc, OID_AUTO, sooverinterval, CTLFLAG_RW, * accept(2), the protocol has two options: * 1) Call legacy sonewconn() function, which would call protocol attach * method, same as used for socket(2). - * 2) Call solisten_clone(), do attach that is specific to a cloned connection, - * and then call solisten_enqueue(). + * 2) Call solisten_clone(), do an attach that is specific to a cloned + * connection, and then call solisten_enqueue(). * * Note: the ref count on the socket is 0 on return. */ struct socket * -solisten_clone(struct socket *head) +solisten_clone(struct socket *head, int fibnum) { struct sbuf descrsb; struct socket *so; @@ -1161,7 +1161,8 @@ solisten_clone(struct socket *head) SO_LINGER | SO_OOBINLINE | SO_NOSIGPIPE); so->so_linger = head->so_linger; so->so_state = head->so_state; - so->so_fibnum = head->so_fibnum; + /* note, C has short-circuit evaluation. */ + so->so_fibnum = head->so_fibnum || fibnum; so->so_proto = head->so_proto; so->so_cred = crhold(head->so_cred); #ifdef MAC @@ -1198,7 +1199,7 @@ sonewconn(struct socket *head, int connstatus) { struct socket *so; - if ((so = solisten_clone(head)) == NULL) + if ((so = solisten_clone(head, head->so_fibnum)) == NULL) return (NULL); if (so->so_proto->pr_attach(so, 0, NULL) != 0) { diff --git a/sys/net/if.c b/sys/net/if.c index edc7d8376bbf..da2041c7c980 100644 --- a/sys/net/if.c +++ b/sys/net/if.c @@ -1812,11 +1812,11 @@ ifa_free(struct ifaddr *ifa) ((const struct sockaddr_dl *)(a1))->sdl_alen) == 0)) /* - * Locate an interface based on a complete address. + * Locate an interface based on a complete address, + * and optionally retrieve its FIB number. */ -/*ARGSUSED*/ struct ifaddr * -ifa_ifwithaddr(const struct sockaddr *addr) +ifa_ifwithaddr_getfib(const struct sockaddr *addr, uint16_t *fibnum) { struct ifnet *ifp; struct ifaddr *ifa; @@ -1841,17 +1841,23 @@ ifa_ifwithaddr(const struct sockaddr *addr) } ifa = NULL; done: + if (ifp != NULL && fibnum != NULL) + *fibnum = ifp->if_fib; return (ifa); } +/* + * Test for existence of an interface having this complete address, + * optionally retrieving its FIB number. + */ int -ifa_ifwithaddr_check(const struct sockaddr *addr) +ifa_ifwithaddr_check_getfib(const struct sockaddr *addr, uint16_t *fibnum) { struct epoch_tracker et; int rc; NET_EPOCH_ENTER(et); - rc = (ifa_ifwithaddr(addr) != NULL); + rc = (ifa_ifwithaddr_getfib(addr, fibnum) != NULL); NET_EPOCH_EXIT(et); return (rc); } diff --git a/sys/net/if_var.h b/sys/net/if_var.h index 74692e916558..4ebbb6982c10 100644 --- a/sys/net/if_var.h +++ b/sys/net/if_var.h @@ -528,13 +528,20 @@ int ifa_add_loopback_route(struct ifaddr *, struct sockaddr *); int ifa_del_loopback_route(struct ifaddr *, struct sockaddr *); int ifa_switch_loopback_route(struct ifaddr *, struct sockaddr *); -struct ifaddr *ifa_ifwithaddr(const struct sockaddr *); -int ifa_ifwithaddr_check(const struct sockaddr *); +struct ifaddr *ifa_ifwithaddr_getfib(const struct sockaddr *, uint16_t *); +int ifa_ifwithaddr_check_getfib(const struct sockaddr *, + uint16_t *); +static inline struct ifaddr *ifa_ifwithaddr(const struct sockaddr *sa) { + return (ifa_ifwithaddr_getfib(sa, NULL)); +} +static inline int ifa_ifwithaddr_check(const struct sockaddr *sa) { + return (ifa_ifwithaddr_check_getfib(sa, NULL)); +} struct ifaddr *ifa_ifwithbroadaddr(const struct sockaddr *, int); struct ifaddr *ifa_ifwithdstaddr(const struct sockaddr *, int); struct ifaddr *ifa_ifwithnet(const struct sockaddr *, int, int); struct ifaddr *ifa_ifwithroute(int, const struct sockaddr *, - const struct sockaddr *, u_int); + const struct sockaddr *, u_int); struct ifaddr *ifaof_ifpforaddr(const struct sockaddr *, if_t); int ifa_preferred(struct ifaddr *, struct ifaddr *); diff --git a/sys/netinet/in_pcb.c b/sys/netinet/in_pcb.c index f6904690deab..f2591b387d50 100644 --- a/sys/netinet/in_pcb.c +++ b/sys/netinet/in_pcb.c @@ -691,7 +691,7 @@ in_pcbbind(struct inpcb *inp, struct sockaddr_in *sin, struct ucred *cred) return (EINVAL); anonport = sin == NULL || sin->sin_port == 0; error = in_pcbbind_setup(inp, sin, &inp->inp_laddr.s_addr, - &inp->inp_lport, cred); + &inp->inp_lport, &inp->inp_inc.inc_fibnum, cred); if (error) return (error); if (in_pcbinshash(inp) != 0) { @@ -870,11 +870,11 @@ in_pcb_lport(struct inpcb *inp, struct in_addr *laddrp, u_short *lportp, * calling in_pcbinshash(), or they can just use the resulting * port and address to authorise the sending of a once-off packet. * - * On error, the values of *laddrp and *lportp are not changed. + * On error, the values of *laddrp, *lportp and *fibnum are not changed. */ int in_pcbbind_setup(struct inpcb *inp, struct sockaddr_in *sin, in_addr_t *laddrp, - u_short *lportp, struct ucred *cred) + u_short *lportp, uint16_t *fibnum, struct ucred *cred) { struct socket *so = inp->inp_socket; struct inpcbinfo *pcbinfo = inp->inp_pcbinfo; @@ -945,8 +945,10 @@ in_pcbbind_setup(struct inpcb *inp, struct sockaddr_in *sin, in_addr_t *laddrp, * to any endpoint address, local or not. */ if ((inp->inp_flags & INP_BINDANY) == 0 && - ifa_ifwithaddr_check((struct sockaddr *)sin) == 0) + ifa_ifwithaddr_check_getfib((struct sockaddr *)sin, + fibnum) == 0) { return (EADDRNOTAVAIL); + } } laddr = sin->sin_addr; if (lport) { diff --git a/sys/netinet/in_pcb.h b/sys/netinet/in_pcb.h index 4844bbee3b54..fef196020a03 100644 --- a/sys/netinet/in_pcb.h +++ b/sys/netinet/in_pcb.h @@ -667,7 +667,7 @@ void in_pcbpurgeif0(struct inpcbinfo *, struct ifnet *); int in_pcballoc(struct socket *, struct inpcbinfo *); int in_pcbbind(struct inpcb *, struct sockaddr_in *, struct ucred *); int in_pcbbind_setup(struct inpcb *, struct sockaddr_in *, in_addr_t *, - u_short *, struct ucred *); + u_short *, uint16_t *, struct ucred *); int in_pcbconnect(struct inpcb *, struct sockaddr_in *, struct ucred *, bool); int in_pcbconnect_setup(struct inpcb *, struct sockaddr_in *, in_addr_t *, diff --git a/sys/netinet/tcp_input.c b/sys/netinet/tcp_input.c index 83f85a50ed40..6264948201cf 100644 --- a/sys/netinet/tcp_input.c +++ b/sys/netinet/tcp_input.c @@ -597,7 +597,7 @@ int tcp6_input(struct mbuf **mp, int *offp, int proto) { - return(tcp6_input_with_port(mp, offp, proto, 0)); + return (tcp6_input_with_port(mp, offp, proto, 0)); } #endif /* INET6 */ @@ -1057,7 +1057,9 @@ tcp_input_with_port(struct mbuf **mp, int *offp, int proto, uint16_t port) } inc.inc_fport = th->th_sport; inc.inc_lport = th->th_dport; - inc.inc_fibnum = so->so_fibnum; + + /* note, C has short-circuit evaluation. */ + inc.inc_fibnum = so->so_fibnum || m->m_pkthdr.fibnum; /* * Check for an existing connection attempt in syncache if @@ -1498,7 +1500,7 @@ tcp_autorcvbuf(struct mbuf *m, struct tcphdr *th, struct socket *so, int tcp_input(struct mbuf **mp, int *offp, int proto) { - return(tcp_input_with_port(mp, offp, proto, 0)); + return (tcp_input_with_port(mp, offp, proto, 0)); } static void @@ -4141,7 +4143,7 @@ tcp_compute_pipe(struct tcpcb *tp) tp->sackhint.sack_bytes_rexmit - tp->sackhint.sacked_bytes); } else { - return((*tp->t_fb->tfb_compute_pipe)(tp)); + return ((*tp->t_fb->tfb_compute_pipe)(tp)); } } diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c index 15244a61d8da..0601871fda08 100644 --- a/sys/netinet/tcp_syncache.c +++ b/sys/netinet/tcp_syncache.c @@ -803,7 +803,7 @@ syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m) * as they would have been set up if we had created the * connection when the SYN arrived. */ - if ((so = solisten_clone(lso)) == NULL) + if ((so = solisten_clone(lso, sc->sc_inc.inc_fibnum)) == NULL) goto allocfail; #ifdef MAC mac_socketpeer_set_from_mbuf(m, so); diff --git a/sys/netinet/udp_usrreq.c b/sys/netinet/udp_usrreq.c index 7329600ecc79..1e73e6315123 100644 --- a/sys/netinet/udp_usrreq.c +++ b/sys/netinet/udp_usrreq.c @@ -1218,7 +1218,7 @@ udp_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *addr, inp->inp_vflag &= ~INP_IPV6; } INP_HASH_WLOCK(pcbinfo); - error = in_pcbbind_setup(inp, &src, &laddr.s_addr, &lport, + error = in_pcbbind_setup(inp, &src, &laddr.s_addr, &lport, NULL, td->td_ucred); INP_HASH_WUNLOCK(pcbinfo); if ((flags & PRUS_IPV6) != 0) diff --git a/sys/sys/socketvar.h b/sys/sys/socketvar.h index f1407b7ea237..4b6c1ae2a2c9 100644 --- a/sys/sys/socketvar.h +++ b/sys/sys/socketvar.h @@ -520,7 +520,7 @@ int solisten_proto_check(struct socket *so); bool solisten_enqueue(struct socket *, int); int solisten_dequeue(struct socket *, struct socket **, int); struct socket * - solisten_clone(struct socket *); + solisten_clone(struct socket *, int fibnum); struct socket * sonewconn(struct socket *head, int connstatus); struct socket * --nextPart2449487.NG923GbCHz--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2522290.jE0xQCEvom>