Date: Sat, 23 Jan 2016 16:34:46 -0200 From: "Marcus Cenzatti" <cenzatti@hush.com> To: "Luigi Rizzo" <rizzo@iet.unipi.it>, freebsd-net@freebsd.org Subject: Re: Chelsio T520-SO-CR low performance (netmap tested) for RX Message-ID: <20160123183447.4B150A0126@smtp.hushmail.com> In-Reply-To: <CA%2BhQ2%2Bg4qzCZFsQ9meQh8uWacWSOz4RDDtw0AnPzJ4%2BE5-9Ymg@mail.gmail.com> References: <20160123053428.2091EA0121@smtp.hushmail.com> <20160123154052.GA4574@ox> <20160123171300.0F448A0121@smtp.hushmail.com> <CA%2BhQ2%2Bg4kU4LA4PexRPBv7z49ZWh-mDqdpw18SeoYaBueHyjZg@mail.gmail.com> <20160123174840.32B1DA0121@smtp.hushmail.com> <CA%2BhQ2%2Bg4qzCZFsQ9meQh8uWacWSOz4RDDtw0AnPzJ4%2BE5-9Ymg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 1/23/2016 at 4:00 PM, "Luigi Rizzo" <rizzo@iet.unipi.it> wrote: > >On Sat, Jan 23, 2016 at 9:48 AM, Marcus Cenzatti ><cenzatti@hush.com> wrote: >> >> >> On 1/23/2016 at 3:35 PM, "Luigi Rizzo" <rizzo@iet.unipi.it> >wrote: >>> >>>On Sat, Jan 23, 2016 at 9:12 AM, Marcus Cenzatti >>><cenzatti@hush.com> wrote: >>>> >>>> >>>> On 1/23/2016 at 1:40 PM, "Navdeep Parhar" <nparhar@gmail.com> >>>wrote: >>>>> >>>>>On Sat, Jan 23, 2016 at 03:34:27AM -0200, Marcus Cenzatti >wrote: >>>>>> hello, >>>>>> >>>>>> I am testing a chelsio t520-so-cr connected to a Intel card >>>with >>>>>ix(4) >>>>>> driver, I can get the ncxl0 interface to transmit at 14Mpps >to >>>>>another >>>>>> chelsio or to a Intel card. However I can only get 800Kpps- >>>1Mpps >>>>>for >>>>>> RX tests from both chelsio or Intel. >>>>>> >>>>>> I have test with both FreeBSD 11 and FreeBSD 10.3-PRERELEASE. >>>>>> >>>>>> I tested it untuned first and later I have applied tuning >>>>>> recommendations I found on BSDRP[1] website. Results still >>>>>ranging >>>>>> from 800Kpps to 1Mpps for RX. >>>>>> >>>>>> Tests are done w/ with pkt-gen in netmap mode on ncxl >interface >>>>>with >>>>>> both IP address and MAC address source/dest. >>>>> >>>>>The ncxl interfaces have their own MAC addresses. Make sure >the >>>>>sender >>>>>uses the MAC of the receiver's ncxl interface as the >destination >>>>>MAC. >>>>>(netmap's pkt-gen -f tx transmits L2 broadcasts by default). >>>>> >>>>>Check for PAUSE frames coming out of the receiver (sysctl >dev.cxl >>>>>| grep >>>>>tx_pause). If it's receiving frames on netmap interface the >>>>>tx_pause >>>>>counter should not move. >>>>> >>>>>Regards, >>>>>Navdeep >>>>> >>>> >>>> hello, >>>> >>>> yes, MAC addresses are correct, I did the tests again and >>>tx_pause won't move, here is the full transcript for the tests: >>>> >>>> ===> BOX #1 CHELSIO >>>> >>>> chelsio# ifconfig -v ncxl0 >>>> ncxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> >metric >>>0 mtu 1500 >>>> ether 00:07:43:33:8d:c1 >>>> inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255 >>>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>>> media: Ethernet 10Gbase-SR <full-duplex> >>>> status: active >>>> >>>> chelsio# ifconfig -v cxl0 >>>> cxl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric >>>0 mtu 1500 >>>> >>>options=ec00bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VL >AN >>>_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> >>>> ether 00:07:43:33:8d:c0 >>>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>>> media: Ethernet 10Gbase-SR <full-duplex> >>>> status: active >>>> plugged: SFP/SFP+/SFP28 10G Base-SR (LC) >>>> vendor: FINISAR CORP. PN: FTLX8571D3BCL-FC SN: AL1073K >>>DATE: 2011-06-28 >>>> module temperature: 42.79 C Voltage: 3.23 Volts >>>> RX: 0.53 mW (-2.74 dBm) TX: 0.48 mW (-3.12 dBm) >>>> >>>> chelsio# ./pkt-gen -i ncxl0 -f rx -d 00:07:43:33:8d:c1 -s >>>00:07:e9:44:d2:ba >>>> 311.132189 main [1715] interface is ncxl0 >>>> 311.132447 extract_ip_range [291] range is 0.0.0.0:90 to >>>0.0.0.0:90 >>>> 311.132472 extract_ip_range [291] range is 0.0.0.0:7 to >0.0.0.0:7 >>> >>> >>>wait, the lower case -s and -d are for IP addresses, >>>you need to use -S and -D for the MAC addresses. >>>This way you are sending broadcasts, which likely >>>means that the chelsio is replicating the packets >>>to both the netmap and the regular port and the >>>latter (which perhaps comes first) is likely >>>dropping packets. >>> >>>cheers >>>luigi >> >> woops, my bad, yes probably we had some drop, with -S and -D now >I get 1.2Mpps. >> >> curiously, I have always used -s/-d with IP addresses on ix-ix >testing this is why I never noticed the case, since ix always >received 14Mpps, but you probably explained it since ix has one >single deviceport per wire, hence the different behavior >> >> performance stills very low when compared to TX and to what is >expected > >ok so next we can try and see what else is going on. >please check the following: >a) are you connected through a switch ? if so, try to send > out some packets through the ncxl0 port (using pkt-gen > and its native MAC address) so the switch can learn the > address and does not need to replicate traffic on all > ports (which generally is done at a limited rate). >b) see if using different packet sizes (say 256, 512, 1024, 1500 > passed as the -l option to pkt-gen) affects the rx rate. > If the rate does not change (except for 1500 bytes) > it may be a problem with interrupt moderation > >c) use progressively increasing packet rates on the sender, > using -R xxxx (start at 500000 packets per second, > and then go up until the receiver cannot sustain the > tx rate. > >d) use a smaller batch size on the receiver (-b XXX, use > values such as 2, 4, 8, 16...) and see if things improve. > Smaller batch sizes make pkt-gen check the NIC more often > thus overcoming possible problems with interrupt moderation. > >Let us know the outcome. Depending on what you see there >are several possible explanations. > Ok, revisiting the summary - TX host = Intel ix (host 1) - RX host = Chelsio T520 (host 2) - Simple topology host1==host2 directly connected intel port 0 (ix0) w/ chelsio port 0 (ncxl0). Tests results: => Batch 1 packet len TX at 256 bytes = 4.46Mpps/TX and 889Kpps/RX TX at 256 bytes = 2.33Mpps/TX and 888Kpps/RX, 9.3Gbps on TX side according to pkt-gen TX at 1024 bytes = 1.19Mpps/TX and 889Kpps/RX, 9.3Bps on TX TX at 1500 bytes = 816Kpps/TX and 816Kpps/RX, 9.8Gbps on TX => Batch 2 rates -R 500000 / TX Speed: 499.99 Kpps Bandwidth: 240.00 Mbps (raw 336.00 Mbps) / RX 499Kpps -R 700000 / TX Speed: 699.96 Kpps Bandwidth: 335.98 Mbps (raw 470.38 Mbps) / RX 699Kpps -R 900000 / TX Speed: 899.98 Kpps Bandwidth: 431.99 Mbps (raw 604.78 Mbps) / RX 888Kpps reached the same limits on batch #1. => Batch 3 RX batch sizes, default pkt-gen packet len and fixed 900000 rate -r 2 / TX 899.98Kpps / RX 672Kpps -r 4 / TX 899.98Kpps / RX 713Kpps -r 8 / TX 899.98Kpps / RX 889Kpps -r 16 / TX 899.98Kpps / RX 889Kpps Results make sense for rates bellow the max, but did not improve... only degraded.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160123183447.4B150A0126>