Date: Tue, 2 Jun 2015 08:40:04 -0700 From: Adrian Chadd <adrian@freebsd.org> To: Blake Caldwell <caldweba@colorado.edu> Cc: Luigi Rizzo <rizzo@iet.unipi.it>, Hans Petter Selasky <hps@selasky.org>, Oded Shanoon <odeds@mellanox.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: netmap and mlx4 driver status (linux) Message-ID: <CAJ-Vmona1z2f_ANeCjJhcpqE7zyGrbYRkCuf6D6E63am-ZD4GA@mail.gmail.com> In-Reply-To: <F98706D0-A1F6-4C8C-967B-EC9A8C5BA862@colorado.edu> References: <3010CFE2-66B7-416B-92DE-C1B669CC33BE@colorado.edu> <555C9F30.8070405@selasky.org> <CA%2BhQ2%2BjM9bvSQ8rjB=8ikZ-DtuVqYzj84MAGRTd75UJX0_Ur0g@mail.gmail.com> <C5A3C8CB-C3FE-46B7-A1BF-2C48D978358B@colorado.edu> <F98706D0-A1F6-4C8C-967B-EC9A8C5BA862@colorado.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, You'll likely want to poke the linux mellanox driver maintainer for some he= lp. -adrian On 1 June 2015 at 17:08, Blake Caldwell <caldweba@colorado.edu> wrote: > Wondering if those experienced with other netmap drivers might be able to= comment what is limiting performance of mlx4. It seems that the reason pk= t-gen is only getting 2.4Mpps with mlx4 40G is that pkt-gen is saturating a= core. This clearly shouldn=E2=80=99t be the case as evidenced by netmap pa= pers (14.8Mpps at 900Mz core). As would be expected, the output from =E2= =80=98perf top=E2=80=99 shows that sender_body and poll() are the largest u= serspace CPU hogs (measured in % of samples=E2=80=94over 24 cpus) > > 29.65% [netmap] [k] netmap_poll > 12.47% [mlx4_en] [k] mlx4_netmap_txsync > 8.69% libc-2.19.so [.] poll > 6.15% pkt-gen [.] sender_body > 2.26% [kernel] [k] local_clock > 2.12% [kernel] [k] context_tracking_user_exit > 1.87% [kernel] [k] select_estimate_accuracy > 1.81% [kernel] [k] system_call > =E2=80=A6. > 1.24% [netmap] [k] nm_txsync_prologue > =E2=80=A6. > 0.63% [mlx4_en] [k] mlx4_en_arm_cq > 0.61% [kernel] [k] account_user_time > > > Furthermore it appears from annotating the code in pkt-gen.c with utiliza= tion, about 50% of sender_body is spent on this line while iterating throug= h the rings: > https://github.com/caldweba/netmap/blob/master/examples/pkt-gen.c#L1091 <= https://github.com/caldweba/netmap/blob/master/examples/pkt-gen.c#L1091> > if (nm_ring_empty(txring)) > > Does this mean it is waiting for free slots most of the time and increasi= ng from 8 rings might help? > > Here are the current module parameters in case they shed light on the iss= ue. Also, netmap config kernel messages are shown below. > > Thanks in advance. > > /sys/module/netmap/parameters/adaptive_io: 0 > /sys/module/netmap/parameters/admode: 0 > /sys/module/netmap/parameters/bridge_batch: 1024 > /sys/module/netmap/parameters/buf_curr_num: 163840 > /sys/module/netmap/parameters/buf_curr_size: 2048 > /sys/module/netmap/parameters/buf_num: 163840 > /sys/module/netmap/parameters/buf_size: 2048 > /sys/module/netmap/parameters/default_pipes: 0 > /sys/module/netmap/parameters/flags: 0 > /sys/module/netmap/parameters/fwd: 0 > /sys/module/netmap/parameters/generic_mit: 100000 > /sys/module/netmap/parameters/generic_rings: 1 > /sys/module/netmap/parameters/generic_ringsize: 1024 > /sys/module/netmap/parameters/if_curr_num: 100 > /sys/module/netmap/parameters/if_curr_size: 1024 > /sys/module/netmap/parameters/if_num: 100 > /sys/module/netmap/parameters/if_size: 1024 > /sys/module/netmap/parameters/mitigate: 1 > /sys/module/netmap/parameters/mmap_unreg: 0 > /sys/module/netmap/parameters/no_pendintr: 1 > /sys/module/netmap/parameters/no_timestamp: 0 > /sys/module/netmap/parameters/priv_buf_num: 4098 > /sys/module/netmap/parameters/priv_buf_size: 2048 > /sys/module/netmap/parameters/priv_if_num: 1 > /sys/module/netmap/parameters/priv_if_size: 1024 > /sys/module/netmap/parameters/priv_ring_num: 4 > /sys/module/netmap/parameters/priv_ring_size: 20480 > /sys/module/netmap/parameters/ring_curr_num: 200 > /sys/module/netmap/parameters/ring_curr_size: 36864 > /sys/module/netmap/parameters/ring_num: 200 > /sys/module/netmap/parameters/ring_size: 36864 > /sys/module/netmap/parameters/txsync_retry: 2 > /sys/module/netmap/parameters/verbose: 0 > > >> On May 28, 2015, at 12:47 AM, Blake Caldwell <caldweba@colorado.edu> wro= te: >> >> Hello, >> >> I have made necessary tweaks to the mlx4 patches for a successful build = on Linux 3.13.11 (Ubuntu 14.04) and enabled the driver in the Linux build s= ystem. See: >> https://github.com/caldweba/netmap.git <https://github.com/caldweba/netm= ap.git> for my additional commits. >> >> Without any core modifications to the mlx4 netmap driver, I am actually = getting reduced performance, 2.5 Mpps on a 40G port. I=E2=80=99m interested= in improving the performance of this driver, but as I=E2=80=99m new to net= map and even these drivers, some assistance would be welcome. As Luigi ment= ioned, the Mellanox developer documentation seems to be a stumbling point. = Would anyone from Mellanox be able to lend some expertise? >> >> It would appear mlx4_netmap_txsync() is the place to focus optimization,= and the comments Luigi put in will be helpful. Although I=E2=80=99m a litt= le confused the on the remaining work for mlx4_netmap_tx_config (marked TOD= O). See https://github.com/caldweba/netmap/blob/master/LINUX/mlx4_netmap_li= nux.h <https://github.com/caldweba/netmap/blob/master/LINUX/mlx4_netmap_lin= ux.h> for Luigi=E2=80=99s current mlx4_netmap_txsync() code. >> >> Below is my output from pkt-gen and from ethtool on the device. >> >> Best regards, >> Blake >> >> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 >> $ sudo build-apps/pkt-gen -i p2p1 -f tx -n 500111222 -l 60 -w 5 >> 060.428278 main [1649] interface is p2p1 >> 060.428770 extract_ip_range [287] range is 10.0.0.1:0 to 10.0.0.1:0 >> 060.428782 extract_ip_range [287] range is 10.1.0.1:0 to 10.1.0.1:0 >> 060.875064 main [1840] mapped 334980KB at 0x7fd1f04d5000 >> Sending on netmap:p2p1: 8 queues, 1 threads and 1 cpus. >> 10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff) >> 060.875151 main [1924] Sending 512 packets every 0.000000000 s >> 060.875158 main [1926] Wait 5 secs for phy reset >> 065.875244 main [1928] Ready... >> 065.875276 nm_open [456] overriding ifname p2p1 ringid 0x0 flags 0x1 >> 065.914805 sender_body [1014] start, fd 4 main_fd 3 >> 065.958284 sender_body [1083] drop copy >> 066.915788 main_thread [1446] 2468560 pps (2471088 pkts in 1001024 usec) >> 067.916827 main_thread [1446] 2476292 pps (2478865 pkts in 1001039 usec) >> 068.917815 main_thread [1446] 2476261 pps (2478708 pkts in 1000988 usec) >> 069.918864 main_thread [1446] 2476232 pps (2478827 pkts in 1001048 usec) >> 070.919902 main_thread [1446] 2476031 pps (2478604 pkts in 1001039 usec) >> 071.920920 main_thread [1446] 2476304 pps (2478825 pkts in 1001018 usec) >> 072.921896 main_thread [1446] 2476349 pps (2478766 pkts in 1000976 usec) >> 073.922948 main_thread [1446] 2476327 pps (2478932 pkts in 1001052 usec) >> 074.923924 main_thread [1446] 2476301 pps (2478715 pkts in 1000975 usec) >> 075.924903 main_thread [1446] 2476257 pps (2478681 pkts in 1000979 usec) >> 076.925918 main_thread [1446] 2476195 pps (2478708 pkts in 1001015 usec) >> 077.926970 main_thread [1446] 2476242 pps (2478849 pkts in 1001053 usec) >> >> dmesg: >> [52591.017469] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Fe= b 2014) >> [52591.017621] mlx4_en 0000:04:00.0: registered PHC clock >> [52591.017780] mlx4_en 0000:04:00.0: Activating port:1 >> [52591.023552] mlx4_en: eth0: Using 192 TX rings >> [52591.023554] mlx4_en: eth0: Using 8 RX rings >> [52591.023556] mlx4_en: eth0: frag:0 - size:1526 prefix:0 align:0 stri= de:1536 >> [52591.040585] mlx4_en: eth0: Initializing port >> [52591.040732] 779.121252 [2720] netmap_attach success for e= th0 tx 8/512 rx 8/1024 queues/slots >> [52591.060580] mlx4_en 0000:04:00.0: Activating port:2 >> [52591.068337] mlx4_en: eth1: Using 192 TX rings >> [52591.068340] mlx4_en: eth1: Using 8 RX rings >> [52591.068342] mlx4_en: eth1: frag:0 - size:1526 prefix:0 align:0 stri= de:1536 >> [52591.085696] mlx4_en: eth1: Initializing port >> [52591.085867] 779.166352 [2720] netmap_attach success for e= th1 tx 8/512 rx 8/1024 queues/slots >> [52591.960730] mlx4_en: eth0: Link Up >> [52593.029536] systemd-udevd[50993]: renamed network interface eth0 to p= 2p1 >> [52593.061736] systemd-udevd[50996]: renamed network interface eth1 to r= ename28 >> [52624.680481] mlx4_en: p2p1: frag:0 - size:1526 prefix:0 align:0 stri= de:1536 >> [52624.834109] 812.888289 [ 473] mlx4_en_tx_irq XXXXXXXXX tx= _irq 0 unexpected, ignoring >> >> [55436.322304] 622.179577 [ 665] mlx4_netmap_config using only 8 = out of 192 tx queues >> [55436.339688] 622.196947 [ 672] mlx4_netmap_config txr 8 txd 512= bufsize 32768 -- rxr 8 rxd 1024 act 1024 bufsize 16384 >> [55436.361877] 622.219119 [ 124] mlx4_netmap_reg setting netma= p mode for eth0 to ON >> [55436.379345] 622.236575 [ 127] mlx4_netmap_reg unloading eth= 0 >> [55436.485781] 622.342926 [ 163] mlx4_netmap_reg loading eth0 >> [55436.501124] mlx4_en: p2p1: frag:0 - size:1526 prefix:0 align:0 stri= de:1536 >> [55436.517514] 622.374635 [ 628] mlx4_netmap_rx_config stride 16 pos= sible frags 1 descsize 0 DS_SIZE 16 >> [55436.536462] 622.393570 [ 648] mlx4_netmap_rx_config ring 0 done >> [55436.551746] 622.408842 [ 648] mlx4_netmap_rx_config ring 1 done >> [55436.567111] 622.424194 [ 628] mlx4_netmap_rx_config stride 16 pos= sible frags 1 descsize 0 DS_SIZE 16 >> [55436.586261] 622.443330 [ 648] mlx4_netmap_rx_config ring 2 done >> [55436.601844] 622.458900 [ 648] mlx4_netmap_rx_config ring 3 done >> [55436.617525] 622.474569 [ 648] mlx4_netmap_rx_config ring 4 done >> [55436.633057] 622.490089 [ 648] mlx4_netmap_rx_config ring 5 done >> [55436.648376] 622.505396 [ 648] mlx4_netmap_rx_config ring 6 done >> [55436.780501] 622.637414 [ 165] mlx4_netmap_reg start_port re= turns 0 >> [55436.796403] mlx4_en: p2p1: Link Down >> [55437.755281] mlx4_en: p2p1: Link Up >> >> >> $ ethtool p2p1 >> Settings for p2p1: >> Supported ports: [ TP ] >> Supported link modes: 10000baseT/Full >> Supported pause frame use: No >> Supports auto-negotiation: No >> Advertised link modes: 10000baseT/Full >> Advertised pause frame use: No >> Advertised auto-negotiation: No >> Speed: 40000Mb/s >> Duplex: Full >> Port: Twisted Pair >> PHYAD: 0 >> Transceiver: internal >> Auto-negotiation: off >> MDI-X: Unknown >> Cannot get wake-on-lan settings: Operation not permitted >> Current message level: 0x00000014 (20) >> link ifdown >> Link detected: yes >> >> $ ethtool -i p2p1 >> driver: mlx4_en >> version: 2.2-1 (Feb 2014) >> firmware-version: 2.30.3200 >> bus-info: 0000:04:00.0 >> supports-statistics: yes >> supports-test: yes >> supports-eeprom-access: no >> supports-register-dump: no >> supports-priv-flags: yes >> >> ethtool -g p2p1 >> Ring parameters for p2p1: >> Pre-set maximums: >> RX: 8192 >> RX Mini: 0 >> RX Jumbo: 0 >> TX: 8192 >> Current hardware settings: >> RX: 1024 >> RX Mini: 0 >> RX Jumbo: 0 >> TX: 512 >> >> Coalesce parameters for p2p1: >> Adaptive RX: on TX: off >> stats-block-usecs: 0 >> sample-interval: 0 >> pkt-rate-low: 400000 >> pkt-rate-high: 450000 >> >> rx-usecs: 16 >> rx-frames: 44 >> rx-usecs-irq: 0 >> rx-frames-irq: 0 >> >> tx-usecs: 16 >> tx-frames: 16 >> tx-usecs-irq: 0 >> tx-frames-irq: 0 >> >> rx-usecs-low: 0 >> rx-frame-low: 0 >> tx-usecs-low: 0 >> tx-frame-low: 0 >> >> rx-usecs-high: 128 >> rx-frame-high: 0 >> tx-usecs-high: 0 >> tx-frame-high: 0 >> >> $ sudo ethtool -k p2p1 >> Features for p2p1: >> rx-checksumming: on >> tx-checksumming: on >> tx-checksum-ipv4: on >> tx-checksum-ip-generic: off [fixed] >> tx-checksum-ipv6: on >> tx-checksum-fcoe-crc: off [fixed] >> tx-checksum-sctp: off [fixed] >> scatter-gather: on >> tx-scatter-gather: on >> tx-scatter-gather-fraglist: off [fixed] >> tcp-segmentation-offload: on >> tx-tcp-segmentation: on >> tx-tcp-ecn-segmentation: off [fixed] >> tx-tcp6-segmentation: on >> udp-fragmentation-offload: off [fixed] >> generic-segmentation-offload: on >> generic-receive-offload: on >> large-receive-offload: off [fixed] >> rx-vlan-offload: on >> tx-vlan-offload: on >> ntuple-filters: off [fixed] >> receive-hashing: on >> highdma: on [fixed] >> rx-vlan-filter: on [fixed] >> vlan-challenged: off [fixed] >> tx-lockless: off [fixed] >> netns-local: off [fixed] >> tx-gso-robust: off [fixed] >> tx-fcoe-segmentation: off [fixed] >> tx-gre-segmentation: off [fixed] >> tx-ipip-segmentation: off [fixed] >> tx-sit-segmentation: off [fixed] >> tx-udp_tnl-segmentation: off [fixed] >> tx-mpls-segmentation: off [fixed] >> fcoe-mtu: off [fixed] >> tx-nocache-copy: on >> loopback: off >> rx-fcs: off [fixed] >> rx-all: off [fixed] >> tx-vlan-stag-hw-insert: off [fixed] >> rx-vlan-stag-hw-parse: off [fixed] >> rx-vlan-stag-filter: off [fixed] >> l2-fwd-offload: off [fixed] >> >> >>> On May 20, 2015, at 9:18 AM, Luigi Rizzo <rizzo@iet.unipi.it <mailto:ri= zzo@iet.unipi.it>> wrote: >>> >>> hi all, >>> >>> the mlx4 netmap patch (for linux only) was something i did long >>> ago when i had some mellanox hardware available, but no documentation >>> so i had to resort to interpreting what the linux driver did. >>> >>> At the time i had the following performance (on PCIe v2 bus): >>> >>> 10G ports: tx/rx at about 7 Mpps with 64 byte packets >>> could saturate the link with 192 or 256 byte packets >>> >>> 40G ports: tx/rx at about 11 Mpps with 64 byte packets >>> max 28 Gbit/s even with 1500 byte frames >>> >>> I don't know if the limited performance was due to bus, >>> firmware or lack of documentation, anyways this is not >>> something i can or want to deal with. >>> >>> My understanding is that Mellanox does not release programming >>> documentation, so the only way to have native netmap support >>> for that card would be to have Mellanox work on that and >>> provide a suitable patch. >>> >>> I do not expect more than a week's work (the typical extra >>> code in each driver is about 500 lines, and very simple) >>> for someone with access to documentation. Also, the patch >>> for FreeBSD and Linux is typically very similar so once we >>> have a driver for one, the other would be trivial. >>> >>> It would be of course great to add Mellanox to the list of >>> devices with native netmap support, together with Chelsio >>> and Intel. >>> >>> Perhaps Hans (who may have contacts) can talk to the right >>> people and figure out. On my side, I am happy to give directions >>> on what needs to be done and import any patch that should >>> be made available. >>> >>> cheers >>> luigi >>> >>> On Wed, May 20, 2015 at 4:50 PM, Hans Petter Selasky <hps@selasky.org <= mailto:hps@selasky.org>> wrote: >>> On 05/20/15 16:13, Blake Caldwell wrote: >>> Hello, >>> >>> I noticed that the mlx4_en patch for netmap is LINUX/wip-patches, so th= ey are not enabled in the normal build process. I=E2=80=99m curious about t= he status of mlx4 support? >>> >>> If additional work to the patches is needed, any details as to what the= issues were. >>> >>> Any info would be great! Thanks in advance! >>> >>> >>> Hi Blake, >>> >>> The MLX4 driver is being actively maintained in -stable and -current. R= egarding netmap support for the FreeBSD MLX4 en driver, I'm not sure. Maybe= Oded knows, CC'ed? Do you have a link for the patch you are referring? >>> >>> This there any particular use-case you are interested in? >>> >>> --HPS >>> >>> >>> _______________________________________________ >>> freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net <http://lists.fre= ebsd.org/mailman/listinfo/freebsd-net> >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org <= mailto:freebsd-net-unsubscribe@freebsd.org>" >>> >>> >>> >>> -- >>> -----------------------------------------+-----------------------------= -- >>> Prof. Luigi RIZZO, rizzo@iet.unipi.it <mailto:rizzo@iet.unipi.it> . D= ip. di Ing. dell'Informazione >>> http://www.iet.unipi.it/~luigi/ <http://www.iet.unipi.it/~luigi/> = . Universita` di Pisa >>> TEL +39-050-2217533 . via Diotisalvi 2 >>> Mobile +39-338-6809875 . 56122 PISA (Italy) >>> -----------------------------------------+-----------------------------= -- >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmona1z2f_ANeCjJhcpqE7zyGrbYRkCuf6D6E63am-ZD4GA>