Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Mar 2014 23:11:17 +0100
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Garrett Wollman <wollman@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, Christopher Forgeron <csforgeron@gmail.com>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <1573EFCE-EFCF-4ABF-A1A5-77714B56F9F1@hostpoint.ch>
In-Reply-To: <2042344654.506796.1395783960030.JavaMail.root@uoguelph.ca>
References:  <2042344654.506796.1395783960030.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On 25.03.2014, at 22:46, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Markus Gebert wrote:
>>=20
>> On 25.03.2014, at 02:18, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>=20
>>> Christopher Forgeron wrote:
>>>>=20
>>>>=20
>>>>=20
>>>> This is regarding the TSO patch that Rick suggested earlier. (With
>>>> many thanks for his time and suggestion)
>>>>=20
>>>>=20
>>>> As I mentioned earlier, it did not fix the issue on a 10.0 system.
>>>> It
>>>> did make it less of a problem on 9.2, but either way, I think it's
>>>> not needed, and shouldn't be considered as a patch for
>>>> testing/etc.
>>>>=20
>>>>=20
>>>> Patching TSO to anything other than a max value (and by default
>>>> the
>>>> code gives it IP_MAXPACKET) is confusing the matter, as the packet
>>>> length ultimately needs to be adjusted for many things on the fly
>>>> like TCP Options, etc. Using static header sizes won't be a good
>>>> idea.
>>>>=20
>>> If you look at tcp_output(), you'll notice that it doesn't do TSO
>>> if
>>> there are any options. That way it knows that the TCP/IP header is
>>> just hdrlen.
>>>=20
>>> If you don't limit the TSO packet (including TCP/IP and ethernet
>>> headers)
>>> to 64K, then the "ix" driver can't send them, which is the problem
>>> you guys are seeing.
>>>=20
>>> There are other ways to fix this problem, but they all may
>>> introduce
>>> issues that reducing if_hw_tsomax by a small amount does not.
>>> For example, m_defrag() could be modified to use 4K pagesize
>>> clusters,
>>> but this might introduce memory fragmentation problems. (I observed
>>> what I think are memory fragmentation problems when I switched NFS
>>> to use 4K pagesize clusters for large I/O messages.)
>>>=20
>>> If setting IP_MAXPACKET to 65518 fixes the problem (no more EFBIG
>>> error replies), then that is the size that if_hw_tsomax can be set
>>> to (just can't change IP_MAXPACKET, but that is defined for other
>>> things). (It just happens that IP_MAXPACKET is what if_hw_tsomax
>>> defaults to. It has no other effect w.r.t. TSO.)
>>>=20
>>>>=20
>>>> Additionally, it seems that setting nic TSO will/may be ignored by
>>>> code like this in sys/netinet/tcp_output.c:
>>>>=20
>>=20
>> Is this confirmed or still a =91it seems=92? Have you actually seen a
>> tp->t_tsomax value in tcp_output() bigger than if_hw_tsomax or was
>> this just speculation because the values are stored in different
>> places? (Sorry, if you already stated this in another email, it=92s
>> currently hard to keep track of all the information.)
>>=20
>> Anyway, this dtrace one-liner should be a good test if other values
>> appear in tp->t_tsomax:
>>=20
>> # dtrace -n 'fbt::tcp_output:entry / args[0]->t_tsomax !=3D 0 &&
>> args[0]->t_tsomax !=3D 65518 / { printf("unexpected tp->t_tsomax:
>> %i\n", args[0]->t_tsomax); stack(); }'
>>=20
>> Remember to adjust the value in the condition to whatever you=92re
>> currently expecting. The value seems to be 0 for new connections,
>> probably when tcp_mss() has not been called yet. So that=92s seems
>> normal and I have excluded that case too. This will also print a
>> kernel stack trace in case it sees an unexpected value.
>>=20
>>=20
>>> Yes, but I don't know why.
>>> The only conjecture I can come up with is that another net driver
>>> is
>>> stacked above "ix" and the setting for if_hw_tsomax doesn't
>>> propagate
>>> up. (If you look at the commit log message for r251296, the intent
>>> of adding if_hw_tsomax was to allow device drivers to set a smaller
>>> tsomax than IP_MAXPACKET.)
>>>=20
>>> Are you using any of the "stacked" network device drivers like
>>> lagg? I don't even know what the others all are?
>>> Maybe someone else can list them?
>>=20
>> I guess the most obvious are lagg and vlan (and probably carp on
>> FreeBSD 9.x or older).
>>=20
>> On request from Jack, we=92ve eliminated lagg and vlan from the
>> picture, which gives us plain ixgbe interfaces with no stacked
>> interfaces on top of it. And we can still reproduce the problem.
>>=20
> This was related to the "did if_hw_tsomax set tp->t_tsomax to the
> same value?" question. Since you reported that my patch that set
> if_hw_tsomax in the driver didn't fix the problem, that suggests
> that tp->t_tsomax isn't being set to if_hw_tsomax from the driver,
> but we don't know why?

Jack asked us to remove lagg/vlans in the very beginning of this thread, =
and when had done that, the problem was still there. So my answer was =
not related to your recent patch. I wanted to clarify that we have been =
testing with ixgbe only for quite some time and that stacked interfaces =
could not be a source of problems in our test scenario.

We have just started testing your patch that sets if_hw_tsomax =
yesterday. So far I have it running on two systems along with some =
printfs and the dtrace one-liner that watches over tp->t_tsomax in =
tcp_output(). So far we=92ve haven=92t had any problems with these two =
servers, and the dtrace probe never fired, so far it looks like =
tp->t_tsomax always gets set from if_hw_tsomax. But it=92s too soon to =
make a conclusion, it may take days to trigger the problem again. It =
might also be fixed with your patch.

I=92m booting more systems with the test kernel and I will be watching =
all of them with dtrace to see I i find an occurence where tp->t_tsomax =
is off. I hope that with more systems, I=92ll have an answer more =
quickly.

But digging around the code, I still don=92t see a way how tp->tsomax =
could not have been set from if_hw_tsomax when there are no stacked =
interfaces=85


Markus


> rick
>=20
>>=20
>> Markus
>>=20
>>=20
>>>=20
>>> rick
>>>>=20
>>>> 10.0 Code:
>>>>=20
>>>> 780 if (len > tp->t_tsomax - hdrlen) { !!
>>>> 781 len =3D tp->t_tsomax - hdrlen; !!
>>>> 782 sendalot =3D 1;
>>>> 783 }
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>> I've put debugging here, set the nic's max TSO as per Rick's patch
>>>> (
>>>> set to say 32k), and have seen that tp->t_tsomax =3D=3D =
IP_MAXPACKET.
>>>> It's being set someplace else, and thus our attempts to set TSO on
>>>> the nic may be in vain.
>>>>=20
>>>>=20
>>>> It may have mattered more in 9.2, as I see the code doesn't use
>>>> tp->t_tsomax in some locations, and may actually default to what
>>>> the
>>>> nic is set to.
>>>>=20
>>>> The NIC may still win, I didn't walk through the code to confirm,
>>>> it
>>>> was enough to suggest to me that setting TSO wouldn't fix this
>>>> issue.
>>>>=20
>>>>=20
>>>> However, this is still a TSO related issue, it's just not one
>>>> related
>>>> to the setting of TSO's max size.
>>>>=20
>>>> A 10.0-STABLE system with tso disabled on ix0 doesn't have a
>>>> single
>>>> packet over IP_MAXPACKET in 1 hour of runtime. I'll let it go a
>>>> bit
>>>> longer to increase confidence in this assertion, but I don't want
>>>> to
>>>> waste time on this when I could be logging problem packets on a
>>>> system with TSO enabled.
>>>>=20
>>>>=20
>>>> Comments are very welcome..
>>>>=20
>>>>=20
>>>>=20
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to
>>> "freebsd-net-unsubscribe@freebsd.org"
>>>=20
>>=20
>>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1573EFCE-EFCF-4ABF-A1A5-77714B56F9F1>