Date: Fri, 24 Jan 2014 15:56:17 +0100 From: Vincenzo Maffione <v.maffione@gmail.com> To: Wang Weidong <wangweidong1@huawei.com> Cc: =?ISO-8859-1?Q?facolt=E0?= <giuseppe.lettieri73@gmail.com>, Giuseppe Lettieri <g.lettieri@iet.unipi.it>, Luigi Rizzo <rizzo@iet.unipi.it>, net@freebsd.org Subject: Re: netmap: I got some troubles with netmap Message-ID: <CA%2B_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com> In-Reply-To: <52E1E272.8060009@huawei.com> References: <52D74E15.1040909@huawei.com> <CA%2BhQ2%2BjBhSyHwFsFo%2BzH-EuJEkKEcyc6YBH%2BfnEHi=Y27FyWyQ@mail.gmail.com> <92C7725B-B30A-4A19-925A-A93A2489A525@iet.unipi.it> <52D8A5E1.9020408@huawei.com> <52DD1914.7090506@iet.unipi.it> <52E1E272.8060009@huawei.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] 2014/1/24 Wang Weidong <wangweidong1@huawei.com> > On 2014/1/20 20:39, Giuseppe Lettieri wrote: > > Hi Wang, > > > > OK, you are using the netmap support in the upstream qemu git. That does > not yet include all our modifications, some of which are very important for > high throughput with VALE. In particular, the upstream qemu does not > include the batching improvements in the frontend/backend interface, and it > does not include the "map ring" optimization of the e1000 frontend. Please > find attached a gzipped patch that contains all of our qemu code. The patch > is against the latest upstream master (commit 1cf892ca). > > > > Please ./configure the patched qemu with the following option, in > addition to any other option you may need: > > > > --enable-e1000-paravirt --enable-netmap \ > > --extra-cflags=-I/path/to/netmap/sys/directory > > > > Note that --enable-e1000-paravirt is needed to enable the "map ring" > optimization in the e1000 frontend, even if you are not going to use the > e1000-paravirt device. > > > > Now you should be able to rerun your tests. I am also attaching a README > file that describes some more tests you may want to run. > > > > Hello, > Yes, I patch the qemu-netmap-bc767e701.patch to the qemu, download the > 20131019-tinycore-netmap.hdd. > And I do some test that: > > 1. I use the bridge below: > qemu-system-x86_64 -m 2048 -boot c -net nic -net bridge,br=br1 -hda > /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0 > test between two vms. > br1 without device. > Use pktgen, I got the 237.95 kpps. > Use the netserver/netperf I got the speed 1037M bits/sec with TCP_STREAM. > The max speed is up to 1621M. > Use the netserver/netperf I got the speed 3296/s with TCP_RR > Use the netserver/netperf I got the speed 234M/86M bits/sec with UDP_STREAM > > When I add a device from host to the br1, the speed is 159.86 kpps. > Use the netserver/netperf I got the speed 720M bits/sec with TCP_STREAM. > The max speed is up to 1000M. > Use the netserver/netperf I got the speed 3556/s with TCP_RR > Use the netserver/netperf I got the speed 181M/181M bits/sec with > UDP_STREAM > > What do you think of these data? > You are using the old/deprecated QEMU command line syntax (-net), and therefore honestly It's not clear to me what kind of network configuration you are running. Please use our scripts "launch-qemu.sh", "prep-taps.sh", according to what described in the README.images file (attached). Alternatively, use the syntax like in the following examples (#1) qemu-system-x86_64 archdisk.qcow -enable-kvm -device virtio-net-pci,netdev=mynet -netdev tap,ifname=tap01,id=mynet,script=no,downscript=no -smp 2 (#2) qemu-system-x86_64 archdisk.qcow -enable-kvm -device e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 so that it's clear to us what network frontend (e.g. emulated NIC) and network backend (e.g. netmap, tap, vde, ecc..) you are using. In example #1 we are using virtio-net as frontend and tap as backend, while in example #2 we are using e1000 as frontend and netmap as backend. Also consider giving more than one core (e.g. -smp 2) to each guest, to mitigate receiver livelock problems. > > 2. I use the vale below: > qemu-system-x86_64 -m 2048 -boot c -net nic -net netmap,vale0:0 -hda > /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0 > > Same for here, it's not clear what you are using. I guess each guest has an e1000 device and is connected to a different port of the same vale switch (e.g. vale0:0 and vale0:1)? Test with 2 vms from the same host > vale0 without device. > I use the pkt-gen, the speed is 938 Kpps > You should get ~4Mpps with e1000 frontend + netmap backend on a reasonably good machine. Make sure you have ./configure'd QEMU with --enable-e1000-paravirt. > I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 195M/195M, > then add -- -m 8, I only got 1.07M/1.07M. > When use the smaller msg size, the speed will smaller? > If you use e1000 with netperf (without pkt-gen) your performance is doomed to be horrible. Use e1000-paravirt (as a frontend) instead if you are interested in netperf experiment. Also consider that the point in using the "-- -m8" options is experimenting high packet rates, so what you should measure here is not the througput in Mbps, but the packet rate: netperf reports the number of packets sent and received, so you can obtain the packet rate by dividing by the running time. The throughput in Mbps is uninteresting, if you want high bulk throughput you just don't use "-- -m 8", but leave the defaults. Using virtio-net in this case will help because of the TSO offloadings. cheers Vincenzo > > with vale-ctl -a vale0:eth2, > use pkt-gen, the speed is 928 Kpps > I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 209M/208M, > then add -- -m 8, I only got 1.06M/1.06M. > > with vale-ctl -h vale0:eth2, > use pkt-gen, the speed is 928 Kpps > I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 192M/192M, > then add -- -m 8, I only got 1.06M/1.06M. > > Test with 2 vms form two host, > I only can test it by vale-ctl -h vale0:eth2 and set eth2 into promisc > use pkt-gen with the default params, the speed is about 750 Kpps > use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 160M/160M > Is this right? > > 3. I can't use the l2 utils. > When I do the "sudo l2open -t eth0 l2recv[l2send], I got that "l2open > ioctl(TUNSETIFF...): Invalid argument" > and "use l2open -r eth0 l2recv", wait a moment (only several seconds), I > got the result: > TEST-RESULT: 0.901 kpps 1pkts > select/read=100.00 err=0 > > And I can't find the l2 utils from the net? Is it implemented by your team? > > All of them is tested on vms. > > Cheers. > Wang > > > > > > Cheers, > > Giuseppe > > > > Il 17/01/2014 04:39, Wang Weidong ha scritto: > >> On 2014/1/16 18:24, facoltà wrote: > [...] > >> > >> > > > > > > > -- Vincenzo Maffione [-- Attachment #2 --] EXPERIMENTING WITH NETMAP, VALE AND FAST QEMU --------------------------------------------- To ease experiments with Netmap, the VALE switch and our Qemu enhancements we have prepared a couple of bootable images (linux and FreeBSD). You can find them on the netmap page http://info.iet.unipi.it/~luigi/netmap/ where you can also look at more recent versions of this file. Below are step-by-step instructions on experiments you can run with these images. The two main versions are picobsd.hdd -> FreeBSD HEAD (netmap + VALE) tinycore.hdd -> Linux (qemu + netmap + VALE) Booting the image ----------------- For all experiments you need to copy the image on a USB stick and boot a PC with it. Alternatively, you can use the image with VirtualBox, Qemu or other emulators, as an example qemu-system-x86_64 -hda IMAGE_FILE -m 1G -machine accel=kvm ... (remove 'accel=kvm' if your host does not support kvm). The images do not install anything on the hard disk. Both systems have preloaded drivers for a number of network cards (including the intel 10 Gbit ones) with netmap extensions. The VALE switch is also available (it is part of the netmap module). ssh, scp and a few other utilities are also included. FreeBSD image: + the OS boots directly in console mode, you can switch between terminals with ALT-Fn. The password for the 'root' account is 'setup' + if you are connected to a network, you can use dhclient em0 # or other interface name to obtain an IP address and external connectivity. Linux image: + in addition to the netmap/VALE modules, the KVM kernel module is also preloaded. + the boot-loader gives you two main options (each with a variant to delay boot in case you have slow devices): + "Boot TinyCore" boots in an X11 environment as user 'tc'. You can create a few terminals using the icon at the bottom. You can use "sudo -s" to get root access. In case no suitable video card is available/detected, it falls back to command line mode. + "Boot Core (command line only)" boots in console mode with virtual terminals. You're automatically logged in as user 'tc'. To log in the other terminals use the same username (no password required). + The system should automatically recognize the existing ethernet devices, and load the appropriate netmap-capable device drivers when available. Interfaces are configured through DHCP when possible. General test recommendations ---------------------------- NOTE: The tests outlined in the following sections can generate very high packet rates, and some hardware misconfiguration problems may prevent you from achieving maximum speed. Common problems are: + slow link autonegotiation. Our programs typically wait 2-4 seconds for link negotiation to complete, but some NIC/switch combinations are much slower. In this case you should increase the delay (pkt-gen has the -w XX option for that) or possibly force the link speed and duplex mode on both sides. Check the link speed to make sure there are no nogotiation problems, and that you see the expected speed. ethtool IFNAME # on linux ifconfig IFNAME # on FreeBSD + ethernet flow control. If the receiving port is slow (often the case in presence of multicast/broadcast traffic, or also unicast if you are sending to non-netmap receivers), it will generate ethernet flow control frames that throttle down the sender. We recommend to disable BOTH RX and TX ethernet flow control on BOTH sender and receiver. On Linux this can be done with ethtool: ethtool -A IFNAME tx off rx off whereas on FreeBSD there are device-specific sysctl sysctl dev.ix.0.queue0.flow_control = 0 + CPU power saving. The CPU governor on linux, or equivalent in FreeBSD, tend to throttle down the clock rate reducing performance. Unlike other similar systems, netmap does not have busy-wait loops, so the CPU load is generally low and this can trigger the clock slowdown. Make sure that ALL CPUs run at maximum speed disabling the dynamic frequency-scaling mechanisms. cpufreq-set -gperformance # on linux sysctl dev.cpu.0.freq=3401 # on FreeBSD. + wrong MAC address netmap does not put the NIC in promiscuous mode, so unless the application does it, the NIC will only receive broadcast traffic or unicast directed to its own MAC address. STANDARD SOCKET TESTS --------------------- For most socket-based experiments you can use the "netperf" tool installed on the system (version 2.6.0). Be careful to use a matching version for the other netperf endpoint (e.g. netserver) when running tests between different machines. Interesting experiments are: netperf -H x.y.z.w -tTCP_STREAM # test TCP throughput netperf -H x.y.z.w -tTCP_RR # test latency netperf -H x.y.z.w -tUDP_STREAM -- -m8 # test UDP throughput with short packets where x.y.z.w is the host running "netserver". RAW SOCKET AND TAP TESTS ------------------------ For experiments with raw sockets and tap devices you can use the l2 utilities (l2open, l2send, l2recv) installed on the system. With these utilities you can send/receive custom network packets to/from raw sockets or tap file descriptors. The receiver can be run with one of the following commands l2open -r IFNAME l2recv # receive from a raw socket attached to IFNAME l2open -t IFNAME l2recv # receive from a file descriptor opened on the tap IFNAME The receiver process will wait indefinitely for the first packet and then keep receiving as long as packets keep coming. When the flow stops (after a 2 seconds timeout) the process terminates and prints the received packet rate and packet count. To run the sender in an easy way, you can use the script l2-send.sh in the home directory. This script defines several shell variables that can be manually changed to customize the test (see the comments in the script itself). As an example, you can test configurations with Virtual Machines attached to host tap devices bridged together. Tests using the Linux in-kernel pktgen -------------------------------------- To use the Linux in-kernel packet generator, you can use the script "linux-pktgen.sh" in the home directory. The pktgen creates a kernel thread for each hardware TX queue of a given NIC. By manually changing the script shell variable definitions you can change the test configuration (e.g. addresses in the generated packet). Please change the "NCPU" variable to match the number of CPUs on your machine. The script has an argument which specifies the number of NIC queues (i.e. kernel threads) to use minus one. For example: ./linux-pktgen.sh 2 # Uses 3 NIC queues When the script terminates, it prints the per-queue rates and the total rate achieved. NETMAP AND VALE EXPERIMENTS --------------------------- For most experiments with netmap you can use the "pkt-gen" command (do not confuse it with the Linux in-kernel pktgen), which has a large number of options to send and receive traffic (also on TAP devices). pkt-gen normally generates UDP traffic for a specific IP address and using the brodadcast MAC address Netmap testing with network interfaces -------------------------------------- Remember that you need a netmap-capable driver in order to use netmap on a specific NIC. Currently supported drivers are e1000, e1000e, ixgbe, igb. For updated information please visit http://info.iet.unipi.it/~luigi/netmap/ Before running pkt-gen, make sure that the link is up. Run pkt-gen on an interface called "IFNAME": pkt-gen -i IFNAME -f tx # run a pkt-gen sender pkt-gen -i IFNAME -f rx # run a pkt-gen receiver pkt-gen without arguments will show other options, e.g. + -w sec modifies the wait time for link negotioation + -l len modifies the packet size + -d, -s set the IP destination/source addresses and ports + -D, -S set the MAC destination/source addresses and more. Testing the VALE switch ------------------------ To use the VALE switch instead of physical ports you only need to change the interface name in the pkt-gen command. As an example, on a single machine, you can run senders and receivers on multiple ports of a VALE switch as follows (run the commands into separate terminals to see the output) pkt-gen -ivale0:01 -ftx # run a sender on the port 01 of the switch vale0 pkt-gen -ivale0:02 -frx # receiver on the port 02 of same switch pkt-gen -ivale0:03 -ftx # another sender on the port 03 The VALE switches and ports are created (and destroyed) on the fly. Transparent connection of physical ports to the VALE switch ----------------------------------------------------------- It is also possible to use a network device as a port of a VALE switch. You can do this with the following command: vale-ctl -h vale0:eth0 # attach interface "eth0" to the "vale0" switch To detach an interface from a bridge: vale-ctl -d vale0:eth0 # detach interface "eth0" from the "vale0" switch These operations can be issued at any moment. Tests with our modified QEMU ---------------------------- The Linux image also contains our modified QEMU, with the VALE backend and the "e1000-paravirt" frontend (a paravirtualized e1000 emulation). After you have booted the image on a physical machine (so you can exploit KVM), you can boot the same image a second time (recursively) with QEMU. Therefore, you can run all the tests above also from within the virtual machine environment. To make VM testing easier, the home directory contains some some useful scripts to set up and launch VMs on the physical machine. + "prep-taps.sh" creates and sets up two permanent tap interfaces ("tap01" and "tap02") and a Linux in-kernel bridge. The tap interfaces are then bridged together on the same bridge. The bridge interface ("br0"), is given the address 10.0.0.200/24. This setup can be used to make two VMs communicate through the host bridge, or to test the speed of a linux switch using l2open + "unprep-taps.sh" undoes the above setup. + "launch-qemu.sh" can be used to run QEMU virtual machines. It takes four arguments: + The first argument can be "qemu" or "kvm", depending on whether we want to use the standard QEMU binary translation or the hardware virtualization acceleration. + The third argument can be "--tap", "--netuser" or "--vale", and tells QEMU what network backend to use: a tap device, the QEMU user networking (slirp), or a VALE switch port. + When the third argument is "--tap" or "--vale", the fourth argument specifies an index (e.g. "01", "02", etc..) which tells QEMU what tap device or VALE port to use as backend. You can manually modify the script to set the shell variables that select the type of emulated device (e.g. e1000, virtio-net-pci, ...) and related options (ioeventfd, virtio vhost, e1000 mitigation, ....). The default setup has an "e1000" device with interrupt mitigation disabled. You can try the paravirtualized e1000 device ("e1000-paravirt") or the "virtio-net" device to get better performance. However, bear in mind that these paravirtualized devices don't have netmap support (whereas the standard e1000 does have netmap support). Examples: # Run a kvm VM attached to the port 01 of a VALE switch ./launch-qemu.sh kvm --vale 01 # Run a kvm VM attached to the port 02 of the same VALE switch ./launch-qemu.sh kvm --vale 02 # Run a kvm VM attached to the tap called "tap01" ./launch-qemu.sh kvm --tap 01 # Run a kvm VM attached to the tap called "tap02" ./launch-qemu.sh kvm --tap 02 Guest-to-guest tests -------------------- If you run two VMs attached to the same switch (which can be a Linux bridge or a VALE switch), you can run guest-to-guest experiments. All the tests reported in the previous sections are possible (normal sockets, raw sockets, pkt-gen, ...), indipendently of the backend used. In the following examples we assume that: + Each VM has an ethernet interface called "eth0". + The interface of the first VM is given the IP 10.0.0.1/24. + The interface of the second VM is given the IP 10.0.0.2/24. + The Linux bridge interface "br0" on the host is given the IP 10.0.0.200/24. Examples: [1] ### Test UDP short packets over traditional sockets ### # On the guest 10.0.0.2 run netserver # on the guest 10.0.0.1 run netperf -H10.0.0.2 -tUDP_STREAM -- -m8 [2] ### Test UDP short packets with pkt-gen ### # On the guest 10.0.0.2 run pkt-gen -ieth0 -frx # On the guest 10.0.0.1 run pkt-gen -ieth0 -ftx [3] ### Test guest-to-guest latency ### # On the guest 10.0.0.2 run netserver # On the guest 10.0.0.1 run netperf -H10.0.0.2 -tTCP_RR Note that you can use pkt-gen into a VM only if the emulated ethernet device is supported by netmap. The default emulated device is "e1000", which has netmap support. If you try to run pkt-gen on an unsupported device, pkt-gen will not work, reporting that it is unable to register the interface. Guest-to-host tests (follows from the previous section) ------------------------------------------------------- If you run only a VM on your host machine, you can measure the network performance between the VM and the host machine. In this case the experiment setup depends on the backend you are using. With the tap backend, you can use the bridge interface "br0" as a communication endpoint. You can run normal/raw sockets experiments, but you cannot use pkt-gen on the "br0" interface, since the Linux bridge interface is not supported by netmap. Examples with the tap backend: [1] ### Test TCP throughput over traditional sockets ### # On the host run netserver # on the guest 10.0.0.1 run netperf -H10.0.0.200 -tTCP_STREAM [2] ### Test UDP short packets with pkt-gen and l2 ### # On the host run l2open -r br0 l2recv # On the guest 10.0.0.1 run (xx:yy:zz:ww:uu:vv is the # "br0" hardware address) pkt-gen -ieth0 -ftx -d10.0.0.200:7777 -Dxx:yy:zz:ww:uu:vv With the VALE backend you can perform only UDP tests, since we don't have a netmap application which implements a TCP endpoint: pkt-gen generates UDP packets. As a communication endpoint on the host, you can use a virtual VALE port opened on the fly by a pkt-gen instance. Examples with the VALE backend: [1] ### Test UDP short packets ### # On the host run pkt-gen -ivale0:99 -frx # On the guest 10.0.0.1 run pkt-gen -ieth0 -ftx [2] ### Test UDP big packets (receiver on the guest) ### # On the guest 10.0.0.1 run pkt-gen -ieth0 -frx # On the host run pkt-gen -ivale0:99 -ftx -l1460
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g>
