Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jan 2016 16:34:46 -0200
From:      "Marcus Cenzatti" <cenzatti@hush.com>
To:        "Luigi Rizzo" <rizzo@iet.unipi.it>, freebsd-net@freebsd.org
Subject:   Re: Chelsio T520-SO-CR low performance (netmap tested) for RX
Message-ID:  <20160123183447.4B150A0126@smtp.hushmail.com>
In-Reply-To: <CA%2BhQ2%2Bg4qzCZFsQ9meQh8uWacWSOz4RDDtw0AnPzJ4%2BE5-9Ymg@mail.gmail.com>
References:  <20160123053428.2091EA0121@smtp.hushmail.com> <20160123154052.GA4574@ox> <20160123171300.0F448A0121@smtp.hushmail.com> <CA%2BhQ2%2Bg4kU4LA4PexRPBv7z49ZWh-mDqdpw18SeoYaBueHyjZg@mail.gmail.com> <20160123174840.32B1DA0121@smtp.hushmail.com> <CA%2BhQ2%2Bg4qzCZFsQ9meQh8uWacWSOz4RDDtw0AnPzJ4%2BE5-9Ymg@mail.gmail.com> 

next in thread | previous in thread | raw e-mail | index | archive | help


On 1/23/2016 at 4:00 PM, "Luigi Rizzo" <rizzo@iet.unipi.it> wrote:
>
>On Sat, Jan 23, 2016 at 9:48 AM, Marcus Cenzatti 
><cenzatti@hush.com> wrote:
>>
>>
>> On 1/23/2016 at 3:35 PM, "Luigi Rizzo" <rizzo@iet.unipi.it> 
>wrote:
>>>
>>>On Sat, Jan 23, 2016 at 9:12 AM, Marcus Cenzatti
>>><cenzatti@hush.com> wrote:
>>>>
>>>>
>>>> On 1/23/2016 at 1:40 PM, "Navdeep Parhar" <nparhar@gmail.com>
>>>wrote:
>>>>>
>>>>>On Sat, Jan 23, 2016 at 03:34:27AM -0200, Marcus Cenzatti 
>wrote:
>>>>>> hello,
>>>>>>
>>>>>> I am testing a chelsio t520-so-cr connected to a Intel card
>>>with
>>>>>ix(4)
>>>>>> driver, I can get the ncxl0 interface to transmit at 14Mpps 
>to
>>>>>another
>>>>>> chelsio or to a Intel card. However I can only get 800Kpps-
>>>1Mpps
>>>>>for
>>>>>> RX tests from both chelsio or Intel.
>>>>>>
>>>>>> I have test with both FreeBSD 11 and FreeBSD 10.3-PRERELEASE.
>>>>>>
>>>>>> I tested it untuned first and later I have applied tuning
>>>>>> recommendations I found on BSDRP[1] website. Results still
>>>>>ranging
>>>>>> from 800Kpps to 1Mpps for RX.
>>>>>>
>>>>>> Tests are done w/ with pkt-gen in netmap mode on ncxl 
>interface
>>>>>with
>>>>>> both IP address and MAC address source/dest.
>>>>>
>>>>>The ncxl interfaces have their own MAC addresses.  Make sure 
>the
>>>>>sender
>>>>>uses the MAC of the receiver's ncxl interface as the 
>destination
>>>>>MAC.
>>>>>(netmap's pkt-gen -f tx transmits L2 broadcasts by default).
>>>>>
>>>>>Check for PAUSE frames coming out of the receiver (sysctl 
>dev.cxl
>>>>>| grep
>>>>>tx_pause).  If it's receiving frames on netmap interface the
>>>>>tx_pause
>>>>>counter should not move.
>>>>>
>>>>>Regards,
>>>>>Navdeep
>>>>>
>>>>
>>>> hello,
>>>>
>>>> yes, MAC addresses are correct, I did the tests again and
>>>tx_pause won't move, here is the full transcript for the tests:
>>>>
>>>> ===> BOX #1 CHELSIO
>>>>
>>>> chelsio# ifconfig -v ncxl0
>>>> ncxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> 
>metric
>>>0 mtu 1500
>>>>         ether 00:07:43:33:8d:c1
>>>>         inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255
>>>>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>>         media: Ethernet 10Gbase-SR <full-duplex>
>>>>         status: active
>>>>
>>>> chelsio# ifconfig -v cxl0
>>>> cxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric
>>>0 mtu 1500
>>>>
>>>options=ec00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VL
>AN
>>>_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
>>>>         ether 00:07:43:33:8d:c0
>>>>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>>         media: Ethernet 10Gbase-SR <full-duplex>
>>>>         status: active
>>>>         plugged: SFP/SFP+/SFP28 10G Base-SR (LC)
>>>>         vendor: FINISAR CORP. PN: FTLX8571D3BCL-FC SN: AL1073K
>>>DATE: 2011-06-28
>>>>         module temperature: 42.79 C Voltage: 3.23 Volts
>>>>         RX: 0.53 mW (-2.74 dBm) TX: 0.48 mW (-3.12 dBm)
>>>>
>>>> chelsio# ./pkt-gen -i ncxl0 -f rx -d 00:07:43:33:8d:c1 -s
>>>00:07:e9:44:d2:ba
>>>> 311.132189 main [1715] interface is ncxl0
>>>> 311.132447 extract_ip_range [291] range is 0.0.0.0:90 to
>>>0.0.0.0:90
>>>> 311.132472 extract_ip_range [291] range is 0.0.0.0:7 to 
>0.0.0.0:7
>>>
>>>
>>>wait, the lower case -s and -d are for IP addresses,
>>>you need to use -S and -D for the MAC addresses.
>>>This way you are sending broadcasts, which likely
>>>means that the chelsio is replicating the packets
>>>to both the netmap and the regular port and the
>>>latter (which perhaps comes first) is likely
>>>dropping packets.
>>>
>>>cheers
>>>luigi
>>
>> woops, my bad, yes probably we had some drop, with -S and -D now 
>I get 1.2Mpps.
>>
>> curiously, I have always used -s/-d with IP addresses on ix-ix 
>testing this is why I never noticed the case, since ix always 
>received 14Mpps, but you probably explained it since ix has one 
>single deviceport per wire, hence the different behavior
>>
>> performance stills very low when compared to TX and to what is 
>expected
>
>ok so next we can try and see what else is going on.
>please check the following:
>a) are you connected through a switch ? if so, try to send
>  out some packets through the ncxl0 port (using pkt-gen
>  and its native MAC address) so the switch can learn the
>  address and does not need to replicate traffic on all
>  ports (which generally is done at a limited rate).
>b) see if using different packet sizes (say 256, 512, 1024, 1500
>  passed as the -l option to pkt-gen) affects the rx rate.
>  If the rate does not change (except for 1500 bytes)
>  it may be a problem with interrupt moderation
>
>c) use progressively increasing packet rates on the sender,
>  using -R xxxx (start at 500000 packets per second,
>  and then go up until the receiver cannot sustain the
>  tx rate.
>
>d) use a smaller batch size on the receiver (-b XXX, use
>  values such as 2, 4, 8, 16...) and see if things improve.
>  Smaller batch sizes make pkt-gen check the NIC more often
>  thus overcoming possible problems with interrupt moderation.
>
>Let us know the outcome. Depending on what you see there
>are several possible explanations.
>

Ok, revisiting the summary
- TX host = Intel ix (host 1)
- RX host = Chelsio T520 (host 2)
- Simple topology host1==host2 directly connected intel port 0 (ix0) w/ chelsio port 0 (ncxl0).

Tests results:

=> Batch 1 packet len

TX at 256 bytes = 4.46Mpps/TX and 889Kpps/RX
TX at 256 bytes = 2.33Mpps/TX and 888Kpps/RX, 9.3Gbps on TX side according to pkt-gen
TX at 1024 bytes = 1.19Mpps/TX and 889Kpps/RX, 9.3Bps on TX
TX at 1500 bytes = 816Kpps/TX and 816Kpps/RX, 9.8Gbps on TX

=> Batch 2 rates

-R 500000 / TX Speed: 499.99 Kpps Bandwidth: 240.00 Mbps (raw 336.00 Mbps) / RX 499Kpps
-R 700000 / TX Speed: 699.96 Kpps Bandwidth: 335.98 Mbps (raw 470.38 Mbps) / RX 699Kpps
-R 900000 / TX Speed: 899.98 Kpps Bandwidth: 431.99 Mbps (raw 604.78 Mbps) / RX 888Kpps

reached the same limits on batch #1.

=> Batch 3 RX batch sizes, default pkt-gen packet len and fixed 900000 rate

-r 2 / TX 899.98Kpps / RX 672Kpps
-r 4 / TX 899.98Kpps / RX 713Kpps
-r 8 / TX 899.98Kpps / RX 889Kpps
-r 16 / TX 899.98Kpps / RX 889Kpps

Results make sense for rates bellow the max, but did not improve... only degraded.








Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160123183447.4B150A0126>