Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Mar 2014 13:16:14 +0100
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Christopher Forgeron <csforgeron@gmail.com>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <906D7DF8-DD6E-4501-B3ED-42EF728241F4@hostpoint.ch>
In-Reply-To: <1973302314.2516695.1395710288222.JavaMail.root@uoguelph.ca>
References:  <1973302314.2516695.1395710288222.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On 25.03.2014, at 02:18, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Christopher Forgeron wrote:
>>=20
>>=20
>>=20
>> This is regarding the TSO patch that Rick suggested earlier. (With
>> many thanks for his time and suggestion)
>>=20
>>=20
>> As I mentioned earlier, it did not fix the issue on a 10.0 system. It
>> did make it less of a problem on 9.2, but either way, I think it's
>> not needed, and shouldn't be considered as a patch for testing/etc.
>>=20
>>=20
>> Patching TSO to anything other than a max value (and by default the
>> code gives it IP_MAXPACKET) is confusing the matter, as the packet
>> length ultimately needs to be adjusted for many things on the fly
>> like TCP Options, etc. Using static header sizes won't be a good
>> idea.
>>=20
> If you look at tcp_output(), you'll notice that it doesn't do TSO if
> there are any options. That way it knows that the TCP/IP header is
> just hdrlen.
>=20
> If you don't limit the TSO packet (including TCP/IP and ethernet =
headers)
> to 64K, then the "ix" driver can't send them, which is the problem
> you guys are seeing.
>=20
> There are other ways to fix this problem, but they all may introduce
> issues that reducing if_hw_tsomax by a small amount does not.
> For example, m_defrag() could be modified to use 4K pagesize clusters,
> but this might introduce memory fragmentation problems. (I observed
> what I think are memory fragmentation problems when I switched NFS
> to use 4K pagesize clusters for large I/O messages.)
>=20
> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG
> error replies), then that is the size that if_hw_tsomax can be set
> to (just can't change IP_MAXPACKET, but that is defined for other
> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax
> defaults to. It has no other effect w.r.t. TSO.)
>=20
>>=20
>> Additionally, it seems that setting nic TSO will/may be ignored by
>> code like this in sys/netinet/tcp_output.c:
>>=20

Is this confirmed or still a =91it seems=92? Have you actually seen a =
tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was this =
just speculation because the values are stored in different places? =
(Sorry, if you already stated this in another email, it=92s currently =
hard to keep track of all the information.)

Anyway, this dtrace one-liner should be a good test if other values =
appear in tp->t_tsomax:

# dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 && =
args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax: %i\n", =
args[0]->t_tsomax); stack(); }'

Remember to adjust the value in the condition to whatever you=92re =
currently expecting. The value seems to be 0 for new connections, =
probably when tcp_mss() has not been called yet. So that=92s seems =
normal and I have excluded that case too. This will also print a kernel =
stack trace in case it sees an unexpected value.


> Yes, but I don't know why.
> The only conjecture I can come up with is that another net driver is
> stacked above "ix" and the setting for if_hw_tsomax doesn't propagate
> up. (If you look at the commit log message for r251296, the intent
> of adding if_hw_tsomax was to allow device drivers to set a smaller
> tsomax than IP_MAXPACKET.)
>=20
> Are you using any of the "stacked" network device drivers like
> lagg? I don't even know what the others all are?
> Maybe someone else can list them?

I guess the most obvious are lagg and vlan (and probably carp on FreeBSD =
9.x or older).

On request from Jack, we=92ve eliminated lagg and vlan from the picture, =
which gives us plain ixgbe interfaces with no stacked interfaces on top =
of it. And we can still reproduce the problem.


Markus


>=20
> rick
>>=20
>> 10.0 Code:
>>=20
>> 780 if (len > tp->t_tsomax - hdrlen) { !!
>> 781 len =3D tp->t_tsomax - hdrlen; !!
>> 782 sendalot =3D 1;
>> 783 }
>>=20
>>=20
>>=20
>>=20
>> I've put debugging here, set the nic's max TSO as per Rick's patch (
>> set to say 32k), and have seen that tp->t_tsomax =3D=3D IP_MAXPACKET.
>> It's being set someplace else, and thus our attempts to set TSO on
>> the nic may be in vain.
>>=20
>>=20
>> It may have mattered more in 9.2, as I see the code doesn't use
>> tp->t_tsomax in some locations, and may actually default to what the
>> nic is set to.
>>=20
>> The NIC may still win, I didn't walk through the code to confirm, it
>> was enough to suggest to me that setting TSO wouldn't fix this
>> issue.
>>=20
>>=20
>> However, this is still a TSO related issue, it's just not one related
>> to the setting of TSO's max size.
>>=20
>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a single
>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a bit
>> longer to increase confidence in this assertion, but I don't want to
>> waste time on this when I could be logging problem packets on a
>> system with TSO enabled.
>>=20
>>=20
>> Comments are very welcome..
>>=20
>>=20
>>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?906D7DF8-DD6E-4501-B3ED-42EF728241F4>