Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Apr 2019 18:26:28 +0300
From:      Slava Shwartsman <slavash@FreeBSD.org>
To:        "Andrey V. Elsukov" <bu7cher@yandex.ru>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, slavash@mellanox.com
Subject:   Re: svn commit: r341586 - head/sys/dev/mlx5/mlx5_en
Message-ID:  <a5ad5cb6-e1ae-2603-87ca-1d48b753e067@FreeBSD.org>
In-Reply-To: <a1ff0879-abae-25c5-9350-809186d2cf85@yandex.ru>
References:  <201812051425.wB5EP38T004562@repo.freebsd.org> <a1ff0879-abae-25c5-9350-809186d2cf85@yandex.ru>

next in thread | previous in thread | raw e-mail | index | archive | help


On 16-Apr-19 17:39, Andrey V. Elsukov wrote:
> On 05.12.2018 17:25, Slava Shwartsman wrote:
>> Author: slavash
>> Date: Wed Dec  5 14:25:03 2018
>> New Revision: 341586
>> URL: https://svnweb.freebsd.org/changeset/base/341586
>>
>> Log:
>>    mlx5en: Implement backpressure indication.
>>    
>>    The backpressure indication is implemented using an unlimited rate type of
>>    mbuf send tag. When the upper layers typically the socket layer has obtained such
>>    a tag, it can then query the destination driver queue for the current
>>    amount of space available in the send queue.
>>    
>>    A single mbuf send tag may be referenced multiple times and a refcount has been added
>>    to the mlx5e_priv structure to track its usage. Because the send tag resides
>>    in the mlx5e_channel structure, there is no need to wait for refcounts to reach
>>    zero until the mlx4en(4) driver is detached. The channels structure is persistant
>>    during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed
>>    without any need of synchronization.
>>    
>>    The mlx5e_snd_tag structure was extended to contain a type field, because there are now
>>    two different tag types which end up in the driver which need to be distinguished.
>>    
>>    Submitted by:   hselasky@
>>    Approved by:    hselasky (mentor)
>>    MFC after:      1 week
>>    Sponsored by:   Mellanox Technologies
>> @@ -587,27 +609,33 @@ mlx5e_xmit(struct ifnet *ifp, struct mbuf *mb)
>>   	struct mlx5e_sq *sq;
>>   	int ret;
>>   
>> -	sq = mlx5e_select_queue(ifp, mb);
>> -	if (unlikely(sq == NULL)) {
>> -#ifdef RATELIMIT
>> -		/* Check for route change */
>> -		if (mb->m_pkthdr.snd_tag != NULL &&
>> -		    mb->m_pkthdr.snd_tag->ifp != ifp) {
>> +	if (mb->m_pkthdr.snd_tag != NULL) {
>> +		sq = mlx5e_select_queue_by_send_tag(ifp, mb);
>> +		if (unlikely(sq == NULL)) {
>> +			/* Check for route change */
>> +			if (mb->m_pkthdr.snd_tag->ifp != ifp) {
>> +				/* Free mbuf */
>> +				m_freem(mb);
>> +
>> +				/*
>> +				 * Tell upper layers about route
>> +				 * change and to re-transmit this
>> +				 * packet:
>> +				 */
>> +				return (EAGAIN);
>> +			}
> 
> Hi,
> 
> I just discovered something strange and found that this commit is the
> cause.
> The test system has mlx5en 100G interface. It has two vlans: vlan500 and
> vlan100.
> Via vlan500 it receives some packets flows. Then it routes these packets
> into vlan100.
> But packets are dropped in mlx5e_xmit() with EAGAIN error code.
> 
> # dtrace -n 'fbt::ip6_output:return {printf("%d", arg1);}'
> dtrace: description 'fbt::ip6_output:return ' matched 1 probe
> CPU     ID                    FUNCTION:NAME
>   23  54338                ip6_output:return 35
>   16  54338                ip6_output:return 35
>   21  54338                ip6_output:return 35
>   22  54338                ip6_output:return 35
>   24  54338                ip6_output:return 35
>   23  54338                ip6_output:return 35
>   14  54338                ip6_output:return 35
> ^C
> 
> # dtrace -n 'fbt::mlx5e_xmit:return {printf("%d", arg1);}'
> dtrace: description 'fbt::mlx5e_xmit:return ' matched 1 probe
> CPU     ID                    FUNCTION:NAME
>   16  69030                mlx5e_xmit:return 35
>   23  69030                mlx5e_xmit:return 35
>   26  69030                mlx5e_xmit:return 35
>   25  69030                mlx5e_xmit:return 35
>   24  69030                mlx5e_xmit:return 35
>   21  69030                mlx5e_xmit:return 35
>   26  69030                mlx5e_xmit:return 35
> ^C
> 
> The kernel config is GENERIC.
> 13.0-CURRENT #9 r345758+82f3d57(svn_head)-dirty
> 

Hi Andrey,

Thanks for letting us know about this regression.
I would like to try to reproduce this issue in house.

Can you please share the exact steps to reproduce it?
- Can I reproduce the issue with B2B setup?
- What is the route command you used to make the route between the VLANs?
- What app are you using to generate the traffic?


Slava



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a5ad5cb6-e1ae-2603-87ca-1d48b753e067>