Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 29 Jun 2008 11:26:25 -0400
From:      Paul <paul@gtcomm.net>
To:        Ingo Flaschberger <if@xip.at>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, andrew@modulus.org
Subject:   Re: Freebsd IP Forwarding  performance (question,	and some info) [7-stable, current, em, smp]
Message-ID:  <4867A9A1.9070507@gtcomm.net>
In-Reply-To: <alpine.LFD.1.10.0806291255480.7208@filebunker.xip.at>
References:  <4867420D.7090406@gtcomm.net> <alpine.LFD.1.10.0806291255480.7208@filebunker.xip.at>

next in thread | previous in thread | raw e-mail | index | archive | help
Polling makes no difference.. It uses the cpus in a slightly different 
way but the pps rate is similar..
I tried different HZ settings, I edited kern_poll so i could have a 
burst max of 8000.. Polling doesn't do anything any more. The only thing 
I noticed it does is lower the latency on packets when the cpu is idle 
(i.e. not many pps going through)
Hardware system is dual opteron 2212 with intel pci express server NIC 
dual port.

em0: Flow control watermarks high = 30720 low = 29220
em0: tx_int_delay = 66, tx_abs_int_delay = 66
em0: rx_int_delay = 64, rx_abs_int_delay = 98
em0: fifo workaround = 0, fifo_reset_count = 0


          input          (em0)           output
   packets  errs      bytes    packets  errs      bytes colls
    384700 32541   23851406        457     0      24876     0
    385543 25403   23903670        459     0      24910     0
    383906 24245   23802188        435     0      23738     0
    383656 25182   23786676        427     0      23128     0


em0: tx_int_delay = 66, tx_abs_int_delay = 66
em0: rx_int_delay = 0, rx_abs_int_delay = 98
em0: fifo workaround = 0, fifo_reset_count = 0

            input          (em0)           output
   packets  errs      bytes    packets  errs      bytes colls
    393787 11217   24414800        461     0      25012     0
    390227 16909   24194076        439     0      23776     0
    389938 15321   24176158        433     0      23506     0
    388685 19562   24098474        449     0      24370     0
    392908 11242   24360300        465     0      25234     0
    387329 19426   24014402        440     0      23938     0


Ingo Flaschberger wrote:
> Dear Paul,
>
> tried interface polling?
>
> what hardware system? how are the nic's connected?
>
> Kind regards,
>         ingo flaschberger
>
> geschaeftsleitung
> ---------------------------
> netstorage-crossip-flat:fee
> powered by
> crossip communications gmbh
> ---------------------------
> sebastian kneipp gasse 1
> a-1020 wien
> fix: +43-1-726 15 22-217
> fax: +43-1-726 15 22-111
> ---------------------------
> On Sun, 29 Jun 2008, Paul wrote:
>
>> This is just a question but who can get more than 400k pps forwarding 
>> performance ?
>> I have tested fbsd 6/7/8 so far with many different configs. (all 
>> using intel pci-ex nic and SMP)
>> fbsd  7-stable/8(current) seem to be the fastest and always hit this 
>> ceiling of 400k pps.  Soon as it hits that I get errors galore.
>> Received no buffers, missed packets, rx overruns.. It's because 'em0 
>> taskq' is 90% cpu or so..
>> Now, while this is happening I have two CPU's 100% idle, and the 
>> other two CPUs are about 60%/20% ..
>> So why in the world can't it use more cpus? Simple test setup:
>> packet generator on em0
>> destination out em1
>> have to have ip forwarding and fastforwarding on (fastforward 
>> definitely makes a big difference, another 100kpps or so, without it 
>> can barely hit 300k)
>> Packets are TCP, randomized sources, randomized ports for src and 
>> dst, single destination ip.
>> I even tried the yandex driver in FBSD6 but it could barely even get 
>> 200k pps and it had a lot of weird issues, and fbsd6 couldn't hit 
>> 400k pps by itself.
>> I am not using polling, that seems to make no difference, i tried 
>> that too.
>> So question. What can I do for more performance (SMP)?  Are there any 
>> good kernel options?
>> If I disable ip forwarding i can do 750kpps with no errors because 
>> it's not going anywhere..em0 taskq cpu usage is less than half of 
>> what it is when it's forwarding.  so obviously the issue is somewhere 
>> in the forwarding path and fastforwarding greatly helps!! see below.
>> forwarding off:
>>           input          (em0)           output
>>  packets  errs      bytes    packets  errs      bytes colls
>>   757223     0   46947830          1     0        226     0
>>   753551     0   46720166          1     0        178     0
>>   756359     0   46894262          1     0        178     0
>>   757570     0   46969344          1     0        178     0
>>   753724     0   46730830          1     0        178     0
>>   745372     0   46213130          1     0        178     0
>>
>>
>> (I had to slow down the packet generation to about 420-430kpps)
>> forwarding on:
>>          input          (em0)           output
>>  packets  errs      bytes    packets  errs      bytes colls
>>   285918 151029   17726936        460     0      25410     0
>>   284929 146151   17665602        417     0      22642     0
>>   284253 147000   17623690        442     0      23884     0
>>   285438 147765   17697160        448     0      24316     0
>>   286582 147171   17768088        456     0      24748     0
>>   287194 147088   17806032        422     0      22912     0
>>   285812 141713   17720348        440     0      23884     0
>>   284958 137579   17667412        457     0      25104     0
>>
>> fastforwarding on:
>>
>>         input          (em0)           output
>>  packets  errs      bytes    packets  errs      bytes colls
>>   399795 22790   24787310        459     0      25130     0
>>   397425 25254   24640354        434     0      23560     0
>>   403223 26937   24999830        431     0      23452     0
>>   396587 21431   24588398        467     0      25288     0
>>   400970 25776   24860144        459     0      24910     0
>>   397819 23657   24664782        432     0      23452     0
>>   406222 27418   25185768        432     0      23506     0
>>   406718 12407   25216520        461     0      25018     0
>>
>> PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>  11 root     171 ki31     0K    64K CPU1   1  29:24 100.00% {idle: cpu1}
>>  11 root     171 ki31     0K    64K RUN    0  28:46 100.00% {idle: cpu0}
>>  11 root     171 ki31     0K    64K CPU3   3  24:32 84.62% {idle: cpu3}
>>   0 root     -68    0     0K   128K CPU2   2  12:59 84.13% {em0 taskq}
>>   0 root     -68    0     0K   128K -      3   2:12 19.92% {em1 taskq}
>>  11 root     171 ki31     0K    64K RUN    2  19:46 19.63% {idle: cpu2}
>>
>>
>>
>> Well if anything.. at least it's a good show of the difference 
>> fastforwarding makes!! :)
>> I have
>> options         NO_ADAPTIVE_MUTEXES     ## Improve routing performance?
>> options         STOP_NMI                # Stop CPUS using NMI instead 
>> of IPI
>> no IPV6
>> no firewall loaded
>> no netgraph
>> HZ is 4000
>> em driver is 4096 on receive buffers
>> using VLAN devices (em1 output)
>> Tested on Xeon and Opteron processor
>> Don't have exact results.
>> Above results are dual opteron 2212 with freebsd current
>> FreeBSD 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Jun 28 23:37:39 CDT 
>> 2008 Well I'm curious of the results of others..
>>
>> Thanks for reading!! :)
>>
>>
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4867A9A1.9070507>