Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Feb 2025 11:58:07 -0800
From:      Gregory Shapiro <gshapiro@freebsd.org>
To:        freebsd-net@freebsd.org
Subject:   bird2 netlink switch causing mbuf exhaustion and reboot
Message-ID:  <555hzkf4hdqr3d357bs4awilg2qpolfeyc245qzzutamnzwqwk@jfjb5htjuwcq>

next in thread | raw e-mail | index | archive | help
I have a handful of FreeBSD 14.2 machines acting as BGP routers using bird2.  Three of them would drop BGP connections at least once a week, the others would either stay stable or sporadically drop off. Through a painful root cause exercise, I've narrowed the cause to the bird2 port changing from using a routing socket to netlink.  On the routers which failed most often, I've switched from bird2 to bird2-rtsock and the problem has disappeared.

I don't know the rtsock/netlink source well enough to debug deeper but I'm more than happy to go back to the netlink version in order to help debug if anyone is interested in tackling the problem.

I'll give some details on the router that failed most often to wet your appetite.

The problem will start with this appearing in the logs:

Jan 29 03:42:08 pesto kernel: [zone: mbuf] kern.ipc.nmbufs limit reached

After which, there are these every 30-40 seconds until I intercede:

Jan 29 03:51:35 pesto kernel: sonewconn: pcb 0xfffff8002c198540 ([::]:179 (proto 6)): Listen queue overflow: 13 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 502, jail 0
Jan 29 03:52:38 pesto kernel: sonewconn: pcb 0xfffff8002c198540 ([::]:179 (proto 6)): Listen queue overflow: 13 already in queue awaiting acceptance (17 occurrences), euid 0, rgid 502, jail 0

After a while, IPv4 connections will start appearing too:

Jan 29 04:49:28 pesto kernel: sonewconn: pcb 0xfffff8002c198000 (0.0.0.0:179 (proto 6)): Listen queue overflow: 13 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 502, jail 0
Jan 29 04:49:48 pesto kernel: sonewconn: pcb 0xfffff8002c198540 ([::]:179 (proto 6)): Listen queue overflow: 13 already in queue awaiting acceptance (36 occurrences), euid 0, rgid 502, jail 0

When I intercede, connections to bird2 (via birdc control) hang as do attempts to shut it down (/usr/local/etc/rc.d/bird stop).  If I try to look at the routing table (netstat -rnf inet6), it will start to list and then the system will reboot itself (likely a kernel crash, but nothing is logged between the sonewconn logs and the boot).  After the reboot, things are good for a few days.

My naive guess it the netlink code leaks mbufs and there is a kernel crash in this state triggered by listed the routing table with netstat.  Besides switching back to bird2 via netlink, if I have someone interested in debugging, I can run a debug kernel to catch the stack trace.  


System details:

FreeBSD pesto.gshapiro.net 14.2-RELEASE FreeBSD 14.2-RELEASE releng/14.2-n269506-c8918d6c7412 GENERIC amd64

bird2-rtsock-2.16.1  (which is stable, bird2-2.16.1 which uses netlink instead of rtsock causes the issue)

Since this is a router, it doesn't do anything except routing (bird2), networking (tailscale, vxlan, wiregaurd) and OS daemons (syslog, ntpd for log accuracy, cron, devd, getty for console).  

Only sysctl tuning done:

net.fibs=2
net.add_addr_allfibs=1

As far as system configuration, here is my rc.conf (happy to share redacted values with those working on this issue):

accounting_enable="YES"
bird_enable="YES"
cloned_interfaces="lo1 vxlan0 vxlan1"
create_args_vxlan0="vxlanid _redacted_ vxlanlocal _redacted_ vxlanremote _redacted_ vxlanttl 255 tunnelfib 1"
create_args_vxlan1="vxlanid _redacted_ vxlanlocal _redacted_ vxlanremote _redacted_ vxlanttl 255 tunnelfib 1"
defaultrouter="_redacted_"
defaultrouter_fib1="${defaultrouter}"
devd_flags="-q"
firewall_enable="YES"
firewall_logging="YES"
firewall_quiet="YES"
firewall_type="/etc/ipfw.rules"
fsck_y_enable="YES"
gateway_enable="YES"
hostname="pesto.gshapiro.net"
ifconfig_lo1_descr="Announced"
ifconfig_lo1_ipv6="inet6 _redacted_ prefer_source"
ifconfig_lo1_alias0="inet6 _redacted_"
ifconfig_vtnet0="inet _redacted_"
ifconfig_vtnet0_descr="HYEHOST Uplink"
ifconfig_vtnet0_ipv6="inet6 _redacted_ no_prefer_iface"
ifconfig_vtnet1_descr="F4IX"
ifconfig_vtnet1_ipv6="inet6 _redacted_ no_prefer_iface"
ifconfig_vxlan0="inet _redacted_"
ifconfig_vxlan0_descr="BGPExchange"
ifconfig_vxlan0_ipv6="inet6 _redacted_ no_prefer_iface"
ifconfig_vxlan1_descr="HNIX BGP Tunnel"
ifconfig_vxlan1_ipv6="inet6 _redacted_ no_prefer_iface"
ipv6_defaultrouter="_redacted_"
ipv6_defaultrouter_fib1="${ipv6_defaultrouter}"
ipv6_gateway_enable="YES"
ipv6_route_bgp6="_redacted_ -iface vtnet0"
ipv6_route_gw6="_redacted_ -iface vtnet0 -fib 0,1"
ipv6_static_routes="gw6 bgp6"
ntpd_enable="YES"
ntpd_sync_on_start="YES"
qemu_guest_agent_enable="YES"
qemu_guest_agent_flags="-d -v -l /var/log/qemu-ga.log"
sshd_enable="YES"
syslogd_flags="-ss"
tailscaled_enable="YES"
tailscaled_exitnode_enable="YES"
tailscaled_up_args="--accept-dns=false --timeout=30s"
wireguard_enable="YES"
wireguard_interfaces="wg0"

And wg0.conf:

# Route64
[Interface]
PrivateKey = _redacted_
Address = _redacted_, _redacted_
Table = off
PostUp = /sbin/ifconfig %i description "Route64 Upstream"
PostUp = /sbin/ifconfig %i inet6 auto_linklocal
PostUp = /sbin/ifconfig %i inet6 no_prefer_iface
PostUp = /sbin/ifconfig %i tunnelfib 1

[Peer]
PublicKey = _redacted_
AllowedIPs = ::/0, 0.0.0.0/0
Endpoint = _redacted_
PersistentKeepAlive = 30



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?555hzkf4hdqr3d357bs4awilg2qpolfeyc245qzzutamnzwqwk>