Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Jul 2024 16:23:00 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 280074] em(4) temporarily hangs and re-enables TX CSUM if a bridge it's a member of is modified
Message-ID:  <bug-280074-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280074

            Bug ID: 280074
           Summary: em(4) temporarily hangs and re-enables TX CSUM if a
                    bridge it's a member of is modified
           Product: Base System
           Version: 14.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: freebsd@kumba.dev

There seems to be a possible regression in the base system em(4) driver in
FreeBSD 14.x, where, if it is in a bridge membership with other interfaces
(specifically epair(4)), removal of another interface from that membership
causes the em(4) interface to briefly become unresponsive and at least TX
checksum offloading gets turned back on.

On one of my FreeBSD appliances, an Intel NUC8i5BEH, there is a single em(4)
interface 'em0' (Intel I219-V CNP(6), devid 0x15be), and this system runs a
single jail for squid.  The em0 interface is assigned at system start to a
bridge(4) interface 'bridge0', and it has lro, tso, rxcsum, rxcsum6, txcsum,
txcsum6, and all vlan flags disabled.  The 'ifconfig em0' output looks like
this (some data is redacted for privacy):

> em0: flags=3D1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER=
_UP> metric 0 mtu 9000
>         options=3D88<VLAN_MTU,VLAN_HWCSUM>
>         ether aa:bb:cc:dd:ee:ff
>         inet x.x.x.x netmask 0xffffff00 broadcast x.x.x.255
>         inet6 fe80::xxxx:xxxx:xxxx:xxxx%em0 prefixlen 64 scopeid 0x1
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
>         nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
If, from the above initial state, I start the squid jail manually, everythi=
ng
works fine, and the SSH connection remains responsive while the jail init
routines are run, including adding the jail's epair(4) interface, 'epair0a'=
, to
the bridge0 membership.

It is when shutting that jail down and removing epair0a from the bridge0
membership that I notice that the SSH connection will become unresponsive f=
or
~10s or more.  Most of the time, it will recover, but sometimes, the connec=
tion
will get dropped and I have to reconnect.  I also notice that ONLY txcsum a=
nd
txcsum6 get re-enabled on the em0 interface, even though ONLY the epair0a
interface gets removed from the bridge membership.  Here's what 'ifconfig e=
m0'
looks like after the fact:

> em0: flags=3D1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER=
_UP> metric 0 mtu 9000
>         options=3D4c00022<TXCSUM,JUMBO_MTU,TXCSUM_IPV6,HWSTATS,MEXTPG>
>         ether aa:bb:cc:dd:ee:ff
>         inet x.x.x.x netmask 0xffffff00 broadcast x.x.x.255
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
>         nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
I have manually confirmed that the interface becomes unresponsive and
txcsum/txcsum6 re-enabled when issuing this command:
> ifconfig bridge0 deletem epair0a
I'm not sure why modifying the bridge and removing an epair(4) member affec=
ts
the em(4) member.  I think it may be doing a short HW reset of the interfac=
e or
something, and that's why TX checksum offloading gets turned back on.  I am=
 not
sure why the other flags flip state as well, especially JUMBO_MTU (system M=
TU
is set to 9000 at boot, so ::shrug::).

I also compiled net/intel-em-kmod from ports and rebooted the system so it'd
load that driver instead of the base system em(4) driver.  Using that modul=
e, I
can add and remove the jail's epair(4) interface without consequence from
bridge0.  SSH remains responsive in both cases, and checksum offloading sta=
ys
disabled.  So it seems like it's some kind of an issue with the base system
driver only.  That driver uses iflib(4) now, while the out-of-tree driver
doesn't, so it could be something related to iflib causing the problem?

I don't know how long this issue may have existed.  Prior to FreeBSD
14.0-RELEASE, I used net/intel-em-kmod almost exclusively on this system, a=
nd
only switched to the base system driver when upgrading to 14.0, since that
driver will be more up-to-date now that Intel itself no longer develops the
out-of-tree driver.  This is when I started noticing this issue on this sys=
tem.
 I also noticed the issue on another appliance I have that used the em(4)
driver and ran a jail, but had an Intel 82583V chipset instead, so I don't
think it's a hardware erratum issue w/ the I219-V.  That other system has s=
ince
been upgraded to something that uses igb(4) now, and no longer has this
problem.

In any event, I am going to look at replacing this NUC appliance with somet=
hing
newer (it's almost 5 years old), and which uses better GbE hardware.  If
someone can figure out possible fixes to em(4) before then, I can try them =
out,
but no promises once I update the affected hardware to something else.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-280074-227>