From owner-freebsd-net@FreeBSD.ORG Fri Aug 19 08:21:49 2011 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6EDEF106564A for ; Fri, 19 Aug 2011 08:21:49 +0000 (UTC) (envelope-from syuu@dokukino.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 371858FC08 for ; Fri, 19 Aug 2011 08:21:48 +0000 (UTC) Received: by qyk4 with SMTP id 4so136179qyk.13 for ; Fri, 19 Aug 2011 01:21:48 -0700 (PDT) Received: by 10.224.208.200 with SMTP id gd8mr1698112qab.252.1313742108102; Fri, 19 Aug 2011 01:21:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.100.84 with HTTP; Fri, 19 Aug 2011 01:21:08 -0700 (PDT) In-Reply-To: References: <2AB05A3E-BDC3-427D-B4A7-ABDDFA98D194@dudu.ro> <0BB87D28-3094-422D-8262-5FA0E40BFC7C@dudu.ro> From: Takuya ASADA Date: Fri, 19 Aug 2011 17:21:08 +0900 Message-ID: To: net@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: Multiqueue support for bpf X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 08:21:49 -0000 Any comments or suggestions? 2011/8/18 Takuya ASADA : > 2011/8/16 Vlad Galu : >> On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote: >>> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote: >>>> Hi all, >>>> >>>> I implemented multiqueue support for bpf, I'd like to present for revi= ew. >>>> This is a Google Summer of Code project, the project goal is to >>>> support multiqueue network interface on BPF, and provide interfaces >>>> for multithreaded packet processing using BPF. >>>> Modern high performance NICs have multiple receive/send queues and RSS >>>> feature, this allows to process packet concurrently on multiple >>>> processors. >>>> Main purpose of the project is to support these hardware and get >>>> benefit of parallelism. >>>> >>>> This provides following new APIs: >>>> - queue filter for each bpf descriptor (bpf ioctl) >>>> =C2=A0 - BIOCENAQMASK =C2=A0 =C2=A0Enables multiqueue filter on the de= scriptor >>>> =C2=A0 - BIOCDISQMASK =C2=A0 =C2=A0Disables multiqueue filter on the d= escriptor >>>> =C2=A0 - BIOCSTRXQMASK =C2=A0 =C2=A0Set mask bit on specified RX queue >>>> =C2=A0 - BIOCCRRXQMASK =C2=A0 =C2=A0Clear mask bit on specified RX que= ue >>>> =C2=A0 - BIOCGTRXQMASK =C2=A0 =C2=A0Get mask bit on specified RX queue >>>> =C2=A0 - BIOCSTTXQMASK =C2=A0 =C2=A0Set mask bit on specified TX queue >>>> =C2=A0 - BIOCCRTXQMASK =C2=A0 =C2=A0Clear mask bit on specified TX que= ue >>>> =C2=A0 - BIOCGTTXQMASK =C2=A0 =C2=A0Get mask bit on specified TX queue >>>> =C2=A0 - BIOCSTOTHERMASK =C2=A0 =C2=A0Set mask bit for the packets whi= ch not tied >>>> with any queues >>>> =C2=A0 - BIOCCROTHERMASK =C2=A0 =C2=A0Clear mask bit for the packets w= hich not tied >>>> with any queues >>>> =C2=A0 - BIOCGTOTHERMASK =C2=A0 =C2=A0Get mask bit for the packets whi= ch not tied >>>> with any queues >>>> >>>> - generic interface for getting hardware queue information from NIC >>>> driver (socket ioctl) >>>> =C2=A0 - SIOCGIFQLEN =C2=A0 =C2=A0Get interface RX/TX queue length >>>> =C2=A0 - SIOCGIFRXQAFFINITY =C2=A0 =C2=A0Get interface RX queue affini= ty >>>> =C2=A0 - SIOCGIFTXQAFFINITY =C2=A0 =C2=A0Get interface TX queue affini= ty >>>> >>>> Patch for -CURRENT is here, right now it only supports igb(4), >>>> ixgbe(4), mxge(4): >>>> http://www.dokukino.com/mq_bpf_20110813.diff >>>> >>>> And below is performance benchmark: >>>> >>>> =3D=3D=3D=3D >>>> I implemented benchmark programs based on >>>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/), >>>> >>>> test_sqbpf measures bpf throughput on one thread, without using multiq= ueue APIs. >>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=3D//depot/projects/soc2011= /mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c >>>> >>>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue AP= Is. >>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=3D//depot/projects/soc2011= /mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c >>>> >>>> I benchmarked with six conditions: >>>> - benchmark1 only reads bpf, doesn't write packet anywhere >>>> - benchmark2 writes packet on memory(mfs) >>>> - benchmark3 writes packet on hdd(zfs) >>>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zeroc= opy >>>> - benchmark5 writes packet on memory(mfs), with zerocopy >>>> - benchmark6 writes packet on hdd(zfs), with zerocopy >>>> >>>>> From benchmark result, I can say the performance is increased using >>>> mq_bpf on 10GbE, but not on GbE. >>>> >>>> * Throughput benchmark >>>> - Test environment >>>> - FreeBSD node >>>> =C2=A0CPU: Core i7 X980 (12 threads) >>>> =C2=A0MB: ASUS P6X58D Premium(Intel X58) >>>> =C2=A0NIC1: Intel Gigabit ET Dual Port Server Adapter(82576) >>>> =C2=A0NIC2: Intel Ethernet X520-DA2 Server Adapter(82599) >>>> - Linux node >>>> =C2=A0CPU: Core 2 Quad (4 threads) >>>> =C2=A0MB: GIGABYTE GA-G33-DS3R(Intel G33) >>>> =C2=A0NIC1: Intel Gigabit ET Dual Port Server Adapter(82576) >>>> =C2=A0NIC2: Intel Ethernet X520-DA2 Server Adapter(82599) >>>> >>>> iperf used for generate network traffic, with following argument optio= ns >>>> =C2=A0- Linux node: iperf -c [IP] -i 10 -t 100000 -P12 >>>> =C2=A0- FreeBSD node: iperf -s >>>> =C2=A0# 12 threads, TCP >>>> >>>> following sysctl parameter is changed >>>> =C2=A0sysctl -w net.bpf.maxbufsize=3D1048576 >>> >>> >>> Thank you for your work! You may want to increase that (4x/8x) and reru= n the test, though. >> >> More, actually. Your current buffer is easily filled. > > Hi, > > I measured performance again with maxbufsize =3D 268435456 and multiple > cpu configurations, here's an result. > It seems the performance on 10GbE is bit unstable, not scaling > linearly by adding cpus/queues. > Maybe it depends some sort of system parameter, but I don't figure out > the answer. > > Multithreaded BPF performance is increasing than single thread BPF in > all case, anyway. > > * Test environment > =C2=A0- FreeBSD node > =C2=A0=C2=A0CPU: Core i7 X980 (12 threads) > =C2=A0# Tested on 1 core, 2 core, 4 core and 6 core configuration (Each > core has 2 threads using HT) > =C2=A0=C2=A0MB: ASUS P6X58D Premium(Intel X58) > =C2=A0=C2=A0NIC: Intel Ethernet X520-DA2 Server Adapter(82599) > > =C2=A0- Linux node > =C2=A0=C2=A0CPU: Core 2 Quad (4 threads) > =C2=A0=C2=A0MB: GIGABYTE GA-G33-DS3R(Intel G33) > =C2=A0=C2=A0NIC: Intel Ethernet X520-DA2 Server Adapter(82599) > > =C2=A0- iperf > =C2=A0 Linux node: iperf -c [IP] -i 10 -t 100000 -P16 > =C2=A0 FreeBSD node: iperf -s > =C2=A0 # 16 threads, TCP > =C2=A0- system parameter > =C2=A0 net.bpf.maxbufsize=3D268435456 > =C2=A0 hw.ixgbe.num_queues=3D[n queues] > > * 2threads, 2queues > =C2=A0- iperf throughput > =C2=A0 iperf only: 8.845Gbps > =C2=A0 test_mqbpf: 5.78Gbps > =C2=A0 test_sqbpf: 6.89Gbps > =C2=A0- test program throughput > =C2=A0 test_mqbpf: 4526.863414 Mbps > =C2=A0 test_sqbpf: 762.452475 Mbps > =C2=A0- received/dropped > =C2=A0 test_mqbpf: > =C2=A0 =C2=A0 =C2=A045315011 packets received (BPF) > =C2=A0 =C2=A0 =C2=A09646958 packets dropped (BPF) > =C2=A0 test_sqbpf: > =C2=A0 =C2=A0 =C2=A056216145 packets received (BPF) > =C2=A0 =C2=A0 =C2=A049765127 packets dropped (BPF) > > * 4threads, 4queues > =C2=A0- iperf throughput > =C2=A0 iperf only: 3.03Gbps > =C2=A0 test_mqbpf: 2.49Gbps > =C2=A0 test_sqbpf: 2.57Gbps > =C2=A0- test program throughput > =C2=A0 test_mqbpf: 2420.195051 Mbps > =C2=A0 test_sqbpf: 430.774870 Mbps > =C2=A0- received/dropped > =C2=A0 test_mqbpf: > =C2=A0 =C2=A0 =C2=A019601503 packets received (BPF) > =C2=A0 =C2=A0 =C2=A00 packets dropped (BPF) > =C2=A0 test_sqbpf: > =C2=A0 =C2=A0 =C2=A022803778 packets received (BPF) > =C2=A0 =C2=A0 =C2=A018869653 packets dropped (BPF) > > * 8threads, 8queues > =C2=A0- iperf throughput > =C2=A0 iperf only: 5.80Gbps > =C2=A0 test_mqbpf: 4.42Gbps > =C2=A0 test_sqbpf: 4.30Gbps > =C2=A0- test program throughput > =C2=A0 test_mqbpf: 4242.314913 Mbps > =C2=A0 test_sqbpf: 1291.719866 Mbps > =C2=A0- received/dropped > =C2=A0 test_mqbpf: > =C2=A0 =C2=A0 =C2=A034996953 packets received (BPF) > =C2=A0 =C2=A0 =C2=A0361947 packets dropped (BPF) > =C2=A0 test_sqbpf: > =C2=A0 =C2=A0 =C2=A035738058 packets received (BPF) > =C2=A0 =C2=A0 =C2=A024749546 packets dropped (BPF) > > * 12threads, 12queues > =C2=A0- iperf throughput > =C2=A0 iperf only: 9.31Gbps > =C2=A0 test_mqbpf: 8.06Gbps > =C2=A0 test_sqbpf: 5.67Gbps > =C2=A0- test program throughput > =C2=A0 test_mqbpf: 8089.242472 Mbps > =C2=A0 test_sqbpf: 5754.910665 Mbps > =C2=A0- received/dropped > =C2=A0 test_mqbpf: > =C2=A0 =C2=A0 =C2=A073783957 packets received (BPF) > =C2=A0 =C2=A0 =C2=A09938 packets dropped (BPF) > =C2=A0 test_sqbpf: > =C2=A0 =C2=A0 =C2=A049434479 packets received (BPF) > =C2=A0 =C2=A0 =C2=A00 packets dropped (BPF) >