Date: Fri, 19 Aug 2011 17:21:08 +0900 From: Takuya ASADA <syuu@dokukino.com> To: net@freebsd.org Subject: Re: Multiqueue support for bpf Message-ID: <CALG4x-Vvu=LsRjdvaz19%2B_QTr2uAqcY514OxA3dy=L%2BnY-qV5g@mail.gmail.com> In-Reply-To: <CALG4x-VFC0yJK_dB9Z%2BDoBvBv1FGjOuVYWd=jtTBs0FeArjALg@mail.gmail.com> References: <CALG4x-VwhLmnh%2BRq0T8zdzp=yMD8o_WQ64_eqzc_dEhF-_mrGA@mail.gmail.com> <2AB05A3E-BDC3-427D-B4A7-ABDDFA98D194@dudu.ro> <0BB87D28-3094-422D-8262-5FA0E40BFC7C@dudu.ro> <CALG4x-VFC0yJK_dB9Z%2BDoBvBv1FGjOuVYWd=jtTBs0FeArjALg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Any comments or suggestions? 2011/8/18 Takuya ASADA <syuu@dokukino.com>: > 2011/8/16 Vlad Galu <dudu@dudu.ro>: >> On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote: >>> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote: >>>> Hi all, >>>> >>>> I implemented multiqueue support for bpf, I'd like to present for review. >>>> This is a Google Summer of Code project, the project goal is to >>>> support multiqueue network interface on BPF, and provide interfaces >>>> for multithreaded packet processing using BPF. >>>> Modern high performance NICs have multiple receive/send queues and RSS >>>> feature, this allows to process packet concurrently on multiple >>>> processors. >>>> Main purpose of the project is to support these hardware and get >>>> benefit of parallelism. >>>> >>>> This provides following new APIs: >>>> - queue filter for each bpf descriptor (bpf ioctl) >>>> - BIOCENAQMASK Enables multiqueue filter on the descriptor >>>> - BIOCDISQMASK Disables multiqueue filter on the descriptor >>>> - BIOCSTRXQMASK Set mask bit on specified RX queue >>>> - BIOCCRRXQMASK Clear mask bit on specified RX queue >>>> - BIOCGTRXQMASK Get mask bit on specified RX queue >>>> - BIOCSTTXQMASK Set mask bit on specified TX queue >>>> - BIOCCRTXQMASK Clear mask bit on specified TX queue >>>> - BIOCGTTXQMASK Get mask bit on specified TX queue >>>> - BIOCSTOTHERMASK Set mask bit for the packets which not tied >>>> with any queues >>>> - BIOCCROTHERMASK Clear mask bit for the packets which not tied >>>> with any queues >>>> - BIOCGTOTHERMASK Get mask bit for the packets which not tied >>>> with any queues >>>> >>>> - generic interface for getting hardware queue information from NIC >>>> driver (socket ioctl) >>>> - SIOCGIFQLEN Get interface RX/TX queue length >>>> - SIOCGIFRXQAFFINITY Get interface RX queue affinity >>>> - SIOCGIFTXQAFFINITY Get interface TX queue affinity >>>> >>>> Patch for -CURRENT is here, right now it only supports igb(4), >>>> ixgbe(4), mxge(4): >>>> http://www.dokukino.com/mq_bpf_20110813.diff >>>> >>>> And below is performance benchmark: >>>> >>>> ==== >>>> I implemented benchmark programs based on >>>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/), >>>> >>>> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs. >>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c >>>> >>>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs. >>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c >>>> >>>> I benchmarked with six conditions: >>>> - benchmark1 only reads bpf, doesn't write packet anywhere >>>> - benchmark2 writes packet on memory(mfs) >>>> - benchmark3 writes packet on hdd(zfs) >>>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy >>>> - benchmark5 writes packet on memory(mfs), with zerocopy >>>> - benchmark6 writes packet on hdd(zfs), with zerocopy >>>> >>>>> From benchmark result, I can say the performance is increased using >>>> mq_bpf on 10GbE, but not on GbE. >>>> >>>> * Throughput benchmark >>>> - Test environment >>>> - FreeBSD node >>>> CPU: Core i7 X980 (12 threads) >>>> MB: ASUS P6X58D Premium(Intel X58) >>>> NIC1: Intel Gigabit ET Dual Port Server Adapter(82576) >>>> NIC2: Intel Ethernet X520-DA2 Server Adapter(82599) >>>> - Linux node >>>> CPU: Core 2 Quad (4 threads) >>>> MB: GIGABYTE GA-G33-DS3R(Intel G33) >>>> NIC1: Intel Gigabit ET Dual Port Server Adapter(82576) >>>> NIC2: Intel Ethernet X520-DA2 Server Adapter(82599) >>>> >>>> iperf used for generate network traffic, with following argument options >>>> - Linux node: iperf -c [IP] -i 10 -t 100000 -P12 >>>> - FreeBSD node: iperf -s >>>> # 12 threads, TCP >>>> >>>> following sysctl parameter is changed >>>> sysctl -w net.bpf.maxbufsize=1048576 >>> >>> >>> Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though. >> >> More, actually. Your current buffer is easily filled. > > Hi, > > I measured performance again with maxbufsize = 268435456 and multiple > cpu configurations, here's an result. > It seems the performance on 10GbE is bit unstable, not scaling > linearly by adding cpus/queues. > Maybe it depends some sort of system parameter, but I don't figure out > the answer. > > Multithreaded BPF performance is increasing than single thread BPF in > all case, anyway. > > * Test environment > - FreeBSD node > CPU: Core i7 X980 (12 threads) > # Tested on 1 core, 2 core, 4 core and 6 core configuration (Each > core has 2 threads using HT) > MB: ASUS P6X58D Premium(Intel X58) > NIC: Intel Ethernet X520-DA2 Server Adapter(82599) > > - Linux node > CPU: Core 2 Quad (4 threads) > MB: GIGABYTE GA-G33-DS3R(Intel G33) > NIC: Intel Ethernet X520-DA2 Server Adapter(82599) > > - iperf > Linux node: iperf -c [IP] -i 10 -t 100000 -P16 > FreeBSD node: iperf -s > # 16 threads, TCP > - system parameter > net.bpf.maxbufsize=268435456 > hw.ixgbe.num_queues=[n queues] > > * 2threads, 2queues > - iperf throughput > iperf only: 8.845Gbps > test_mqbpf: 5.78Gbps > test_sqbpf: 6.89Gbps > - test program throughput > test_mqbpf: 4526.863414 Mbps > test_sqbpf: 762.452475 Mbps > - received/dropped > test_mqbpf: > 45315011 packets received (BPF) > 9646958 packets dropped (BPF) > test_sqbpf: > 56216145 packets received (BPF) > 49765127 packets dropped (BPF) > > * 4threads, 4queues > - iperf throughput > iperf only: 3.03Gbps > test_mqbpf: 2.49Gbps > test_sqbpf: 2.57Gbps > - test program throughput > test_mqbpf: 2420.195051 Mbps > test_sqbpf: 430.774870 Mbps > - received/dropped > test_mqbpf: > 19601503 packets received (BPF) > 0 packets dropped (BPF) > test_sqbpf: > 22803778 packets received (BPF) > 18869653 packets dropped (BPF) > > * 8threads, 8queues > - iperf throughput > iperf only: 5.80Gbps > test_mqbpf: 4.42Gbps > test_sqbpf: 4.30Gbps > - test program throughput > test_mqbpf: 4242.314913 Mbps > test_sqbpf: 1291.719866 Mbps > - received/dropped > test_mqbpf: > 34996953 packets received (BPF) > 361947 packets dropped (BPF) > test_sqbpf: > 35738058 packets received (BPF) > 24749546 packets dropped (BPF) > > * 12threads, 12queues > - iperf throughput > iperf only: 9.31Gbps > test_mqbpf: 8.06Gbps > test_sqbpf: 5.67Gbps > - test program throughput > test_mqbpf: 8089.242472 Mbps > test_sqbpf: 5754.910665 Mbps > - received/dropped > test_mqbpf: > 73783957 packets received (BPF) > 9938 packets dropped (BPF) > test_sqbpf: > 49434479 packets received (BPF) > 0 packets dropped (BPF) >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALG4x-Vvu=LsRjdvaz19%2B_QTr2uAqcY514OxA3dy=L%2BnY-qV5g>
