Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jan 2016 02:28:29 -0200
From:      "Marcus Cenzatti" <cenzatti@hush.com>
To:        "Luigi Rizzo" <rizzo@iet.unipi.it>
Cc:        freebsd-net@freebsd.org, "Navdeep Parhar" <nparhar@gmail.com>
Subject:   Re: solved: Re: Chelsio T520-SO-CR low performance (netmap tested) for RX
Message-ID:  <20160124042830.3D674A0128@smtp.hushmail.com>
In-Reply-To: <CA%2BhQ2%2Bg7_haaXLFjMuG00ANsUkFdyGzFQyjT4NYVBmPY-vECBg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 1/24/2016 at 1:10 AM, "Luigi Rizzo" <rizzo@iet.unipi.it> wrote:
>
>Thanks for re-running the experiments.
>
>I am changing the subject so that in the archives it is clear
>that the chelsio card works fine.
>
>Overall the tests confirm that whenever you hit the host stack you 
>are bound
>to the poor performance of the latter. The problem does not appear 
>using intel
>as a receiver because on the intel card netmap mode disables the 
>host stack.
>
>More comments on the experiments:
>
>The only meaningful test is the one where you use the DMAC of the 
>ncxl0 port:
>
>    SENDER: ./pkt-gen -i ix0 -f tx -S 00:07:e9:44:d2:ba -D 
>00:07:43:33:8d:c1
>
>in the other experiment you transmit broadcast frames and hit the 
>network stack.
>ARP etc do not matter since tx and rx are directly connected.
>
>On the receiver you do not need to specify addresses:
>
>    RECEIVER: ./pkt-gen -i ncxl0 -f rx
>
>The numbers in netstat are clearly rounded, so 15M is probably 
>14.88M
>(line rate), and 3.7M that you see correctly represents the 
>difference
>between incoming and received packets.
>
>The fact that you see drops may be related to the NIC being unable 
>to
>replenish the queue fast enough, which in turn may be a hardware 
>or a
>software (netmap) issue.
>You may try experiment with shorter batches on the receive side
>(say, -b 64 or less) and see if you have better results.
>
>A short batch replenishes the rx queue more frequently, but it is
>not a conclusive experiment because there is an optimization in
>the netmap poll code which, as an unintended side effect, 
>replenishes
>the queue less often than it should.
>For a conclusive experiment you should grab the netmap code from
>github.com/luigirizzo/netmap and use pkt-gen-b which
>uses busy wait and works around the poll "optimization"
>
>thanks again for investigating the issue.
>
>cheers
>luigi
>

so as a summary, with IP test on intel card, netmap disables the host stack while on chelsio netmap does not disable the host stack and we ket things injected to host, so the only reliable test is mac based when using chelsio cards?

yes I am already running github's netmap code, let's try with busy code:

intel# netmap-master/examples/pkt-gen-b -i ix0 -f tx -S 00:07:e9:44:d2:ba -D 00:07:43:33:8d:c1
626.695437 main [1930] interface is ix0
626.695477 main [2050] running on 1 cpus (have 8)
626.695514 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
626.695524 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
626.800887 main [2148] mapped 334980KB at 0x801800000
Sending on netmap:ix0: 8 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:07:e9:44:d2:ba -> 00:07:43:33:8d:c1)
626.800959 main [2233] Sending 512 packets every  0.000000000 s
626.800965 main [2235] Wait 2 secs for phy reset
628.801494 main [2237] Ready...
628.801746 sender_body [1211] start, fd 3 main_fd 3
628.837383 sender_body [1293] drop copy
629.802349 main_thread [1720] 14.122 Mpps (14.131 Mpkts 6.783 Gbps in 1000633 usec) 415.72 avg_batch 0 min_space
630.803494 main_thread [1720] 14.503 Mpps (14.520 Mpkts 6.970 Gbps in 1001144 usec) 457.64 avg_batch 99999 min_space
631.804491 main_thread [1720] 14.474 Mpps (14.489 Mpkts 6.954 Gbps in 1000997 usec) 427.45 avg_batch 99999 min_space
632.807500 main_thread [1720] 14.430 Mpps (14.474 Mpkts 6.947 Gbps in 1003009 usec) 470.69 avg_batch 99999 min_space
633.808488 main_thread [1720] 14.455 Mpps (14.470 Mpkts 6.945 Gbps in 1000988 usec) 442.18 avg_batch 99999 min_space
(...)
976.810270 sender_body [1334] pending tx tail 477 head 530 on ring 3
976.810300 sender_body [1334] pending tx tail 1241 head 1293 on ring 5
977.283393 main_thread [1720] 7.848 Mpps (8.249 Mpkts 3.959 Gbps in 1051008 usec) 473.06 avg_batch 99999 min_space
Sent 5019797634 packets 301187858040 bytes 11178464 events 60 bytes each in 348.01 seconds.
Speed: 14.424 Mpps Bandwidth: 6.924 Gbps (raw 9.693 Gbps). Average batch: 449.06 pkts


chelsio# ./pkt-gen-b -i ncxl0 -f rx
785.659290 main [1930] interface is ncxl0
785.659337 main [2050] running on 1 cpus (have 4)
785.659477 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
785.659496 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
785.718707 main [2148] mapped 334980KB at 0x801800000
Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
785.718784 main [2235] Wait 2 secs for phy reset
787.729197 main [2237] Ready...
787.729449 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
788.730089 main_thread [1720] 11.159 Mpps (11.166 Mpkts 5.360 Gbps in 1000673 usec) 205.89 avg_batch 0 min_space
789.730588 main_thread [1720] 11.164 Mpps (11.169 Mpkts 5.361 Gbps in 1000500 usec) 183.54 avg_batch 0 min_space
790.734224 main_thread [1720] 11.172 Mpps (11.213 Mpkts 5.382 Gbps in 1003636 usec) 198.84 avg_batch 0 min_space
^C791.140853 sigint_h [404] received control-C on thread 0x801406800
791.742841 main_thread [1720] 4.504 Mpps (4.542 Mpkts 2.180 Gbps in 1008617 usec) 179.62 avg_batch 0 min_space
Received 38091031 packets 2285461860 bytes 196774 events 60 bytes each in 3.41 seconds.
Speed: 11.166 Mpps Bandwidth: 5.360 Gbps (raw 7.504 Gbps). Average batch: 193.58 pkts

same results... same numbers on netstat too

chelsio# ./pkt-gen-b -b 64 -i ncxl0 -f rx
522.430459 main [1930] interface is ncxl0
522.430507 main [2050] running on 1 cpus (have 4)
522.430644 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
522.430662 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
522.677743 main [2148] mapped 334980KB at 0x801800000
Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
522.677822 main [2235] Wait 2 secs for phy reset
524.698114 main [2237] Ready...
524.698373 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
525.699118 main_thread [1720] 10.958 Mpps (10.966 Mpkts 5.264 Gbps in 1000765 usec) 61.84 avg_batch 0 min_space
526.700108 main_thread [1720] 11.086 Mpps (11.097 Mpkts 5.327 Gbps in 1000991 usec) 61.06 avg_batch 0 min_space
527.705650 main_thread [1720] 11.166 Mpps (11.227 Mpkts 5.389 Gbps in 1005542 usec) 61.91 avg_batch 0 min_space
528.707113 main_thread [1720] 11.090 Mpps (11.107 Mpkts 5.331 Gbps in 1001463 usec) 61.34 avg_batch 0 min_space
529.707617 main_thread [1720] 10.847 Mpps (10.853 Mpkts 5.209 Gbps in 1000504 usec) 62.51 avg_batch 0 min_space
^C530.556309 sigint_h [404] received control-C on thread 0x801406800
530.709133 main_thread [1720] 9.166 Mpps (9.180 Mpkts 4.406 Gbps in 1001516 usec) 62.92 avg_batch 0 min_space
Received 64430028 packets 3865801680 bytes 1041000 events 60 bytes each in 5.86 seconds.
Speed: 10.999 Mpps Bandwidth: 5.279 Gbps (raw 7.391 Gbps). Average batch: 61.89 pkts

chelsio# ./pkt-gen-b -b 48 -i ncxl0 -f rx
962.590603 main [1930] interface is ncxl0
962.590651 main [2050] running on 1 cpus (have 4)
962.590791 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
962.590810 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
962.840889 main [2148] mapped 334980KB at 0x801800000
Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
962.840963 main [2235] Wait 2 secs for phy reset
964.848016 main [2237] Ready...
964.848279 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
965.849263 main_thread [1720] 10.314 Mpps (10.325 Mpkts 4.956 Gbps in 1001020 usec) 47.93 avg_batch 0 min_space
966.855171 main_thread [1720] 10.322 Mpps (10.383 Mpkts 4.984 Gbps in 1005908 usec) 47.93 avg_batch 0 min_space
967.857352 main_thread [1720] 10.602 Mpps (10.625 Mpkts 5.100 Gbps in 1002182 usec) 46.42 avg_batch 0 min_space
968.858268 main_thread [1720] 10.343 Mpps (10.353 Mpkts 4.969 Gbps in 1000916 usec) 47.62 avg_batch 0 min_space
^C969.524538 sigint_h [404] received control-C on thread 0x801406800
969.895765 main_thread [1720] 6.588 Mpps (6.835 Mpkts 3.281 Gbps in 1037497 usec) 47.94 avg_batch 0 min_space
Received 48520680 packets 2911240800 bytes 1020880 events 60 bytes each in 4.68 seconds.
Speed: 10.376 Mpps Bandwidth: 4.981 Gbps (raw 6.973 Gbps). Average batch: 47.53 pkts

chelsio# ./pkt-gen-b -b 32 -i ncxl0 -f rx
338.251691 main [1930] interface is ncxl0
338.251741 main [2050] running on 1 cpus (have 4)
338.251878 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
338.251897 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
338.494886 main [2148] mapped 334980KB at 0x801800000
Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
338.494967 main [2235] Wait 2 secs for phy reset
340.501849 main [2237] Ready...
340.502099 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
341.502529 main_thread [1720] 9.044 Mpps (9.048 Mpkts 4.343 Gbps in 1000462 usec) 31.99 avg_batch 0 min_space
342.503257 main_thread [1720] 9.784 Mpps (9.792 Mpkts 4.700 Gbps in 1000728 usec) 31.82 avg_batch 0 min_space
343.504752 main_thread [1720] 10.071 Mpps (10.086 Mpkts 4.841 Gbps in 1001495 usec) 31.76 avg_batch 0 min_space
344.533756 main_thread [1720] 9.046 Mpps (9.309 Mpkts 4.468 Gbps in 1029004 usec) 31.99 avg_batch 0 min_space
345.534754 main_thread [1720] 11.161 Mpps (11.172 Mpkts 5.363 Gbps in 1000998 usec) 31.58 avg_batch 0 min_space
346.535754 main_thread [1720] 9.262 Mpps (9.271 Mpkts 4.450 Gbps in 1001000 usec) 31.93 avg_batch 0 min_space
347.536755 main_thread [1720] 10.169 Mpps (10.179 Mpkts 4.886 Gbps in 1001001 usec) 31.74 avg_batch 0 min_space
348.537256 main_thread [1720] 9.896 Mpps (9.901 Mpkts 4.752 Gbps in 1000501 usec) 31.79 avg_batch 0 min_space
349.538757 main_thread [1720] 8.997 Mpps (9.011 Mpkts 4.325 Gbps in 1001501 usec) 31.99 avg_batch 0 min_space
350.548161 main_thread [1720] 9.709 Mpps (9.800 Mpkts 4.704 Gbps in 1009404 usec) 31.83 avg_batch 0 min_space
351.548888 main_thread [1720] 6.208 Mpps (6.212 Mpkts 2.982 Gbps in 1000727 usec) 31.98 avg_batch 0 min_space
^C354.108457 sigint_h [404] received control-C on thread 0x801406800
354.588907 main_thread [1720] 0.000 pps (0.000 pkts 0.000 bps in 1010653 usec) 0.00 avg_batch 0 min_space
Received 103781019 packets 6226861140 bytes 3259138 events 60 bytes each in 13.61 seconds.
Speed: 7.627 Mpps Bandwidth: 3.661 Gbps (raw 5.126 Gbps). Average batch: 31.84 pkts


chelsio# ./pkt-gen-b -b 16 -i ncxl0 -f rx
369.154719 main [1930] interface is ncxl0
369.154770 main [2050] running on 1 cpus (have 4)
369.154931 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
369.154950 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
369.213861 main [2148] mapped 334980KB at 0x801800000
Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
369.213934 main [2235] Wait 2 secs for phy reset
371.287288 main [2237] Ready...
371.287548 receiver_body [1412] reading from netmap:ncxl0 fd 3 main_fd 3
391.613739 main_thread [1720] 1.754 Mpps (1.756 Mpkts 842.694 Mbps in 1001124 usec) 16.00 avg_batch 0 min_space
392.614736 main_thread [1720] 7.671 Mpps (7.678 Mpkts 3.686 Gbps in 1000998 usec) 16.00 avg_batch 0 min_space
393.615737 main_thread [1720] 7.890 Mpps (7.898 Mpkts 3.791 Gbps in 1001000 usec) 16.00 avg_batch 0 min_space
394.618238 main_thread [1720] 7.417 Mpps (7.435 Mpkts 3.569 Gbps in 1002501 usec) 16.00 avg_batch 0 min_space
395.622739 main_thread [1720] 7.812 Mpps (7.847 Mpkts 3.767 Gbps in 1004501 usec) 16.00 avg_batch 0 min_space
396.623737 main_thread [1720] 7.176 Mpps (7.184 Mpkts 3.448 Gbps in 1000998 usec) 16.00 avg_batch 0 min_space
397.624737 main_thread [1720] 8.188 Mpps (8.196 Mpkts 3.934 Gbps in 1001000 usec) 16.00 avg_batch 0 min_space
398.625238 main_thread [1720] 6.984 Mpps (6.988 Mpkts 3.354 Gbps in 1000501 usec) 16.00 avg_batch 0 min_space
399.626238 main_thread [1720] 8.535 Mpps (8.544 Mpkts 4.101 Gbps in 1001000 usec) 16.00 avg_batch 0 min_space
400.628241 main_thread [1720] 7.943 Mpps (7.959 Mpkts 3.820 Gbps in 1002003 usec) 16.00 avg_batch 0 min_space
401.639007 main_thread [1720] 6.890 Mpps (6.964 Mpkts 3.343 Gbps in 1010766 usec) 16.00 avg_batch 0 min_space
402.641242 main_thread [1720] 3.720 Mpps (3.728 Mpkts 1.790 Gbps in 1002235 usec) 16.00 avg_batch 0 min_space
403.674984 main_thread [1720] 0.000 pps (0.000 pkts 0.000 bps in 1033742 usec) 0.00 avg_batch 0 min_space
^C404.054679 sigint_h [404] received control-C on thread 0x801406800
404.713489 main_thread [1720] 0.000 pps (0.000 pkts 0.000 bps in 1038505 usec) 0.00 avg_batch 0 min_space
Received 82176988 packets 4930619280 bytes 5136173 events 60 bytes each in 12.71 seconds.
Speed: 6.464 Mpps Bandwidth: 3.103 Gbps (raw 4.344 Gbps). Average batch: 16.00 pkts

chelsio# ./pkt-gen-b -b 8 -i ncxl0 -f rx
425.948206 main [1930] interface is ncxl0
425.948257 main [2050] running on 1 cpus (have 4)
425.948416 extract_ip_range [367] range is 10.0.0.1:0 to 10.0.0.1:0
425.948435 extract_ip_range [367] range is 10.1.0.1:0 to 10.1.0.1:0
426.007359 main [2148] mapped 334980KB at 0x801800000
Receiving from netmap:ncxl0: 2 queues, 1 threads and 1 cpus.
426.007441 main [2235] Wait 2 secs for phy reset
428.027495 main [2237] Ready...
456.499220 main_thread [1720] 4.701 Mpps (4.703 Mpkts 2.258 Gbps in 1000463 usec) 8.00 avg_batch 24 min_space
457.505129 main_thread [1720] 4.710 Mpps (4.738 Mpkts 2.274 Gbps in 1005909 usec) 8.00 avg_batch 24 min_space
458.505221 main_thread [1720] 4.705 Mpps (4.705 Mpkts 2.258 Gbps in 1000092 usec) 8.00 avg_batch 24 min_space
459.506715 main_thread [1720] 4.774 Mpps (4.782 Mpkts 2.295 Gbps in 1001495 usec) 8.00 avg_batch 21 min_space
460.509489 main_thread [1720] 4.961 Mpps (4.974 Mpkts 2.388 Gbps in 1002773 usec) 8.00 avg_batch 16 min_space
461.510218 main_thread [1720] 4.987 Mpps (4.990 Mpkts 2.395 Gbps in 1000729 usec) 8.00 avg_batch 16 min_space
462.511226 main_thread [1720] 4.931 Mpps (4.936 Mpkts 2.369 Gbps in 1001008 usec) 8.00 avg_batch 16 min_space
^C462.865617 sigint_h [404] received control-C on thread 0x801406800
463.519966 main_thread [1720] 1.837 Mpps (1.853 Mpkts 889.275 Mbps in 1008741 usec) 8.00 avg_batch 23 min_space
Received 36200232 packets 2172013920 bytes 4525050 events 60 bytes each in 7.48 seconds.
Speed: 4.840 Mpps Bandwidth: 2.323 Gbps (raw 3.253 Gbps). Average batch: 8.00 pkts

so, the lower the batch the smaller performance.

did you expect some other behaviour?

thank you very much again




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160124042830.3D674A0128>