Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Jan 2014 15:56:17 +0100
From:      Vincenzo Maffione <v.maffione@gmail.com>
To:        Wang Weidong <wangweidong1@huawei.com>
Cc:        =?ISO-8859-1?Q?facolt=E0?= <giuseppe.lettieri73@gmail.com>, Giuseppe Lettieri <g.lettieri@iet.unipi.it>, Luigi Rizzo <rizzo@iet.unipi.it>, net@freebsd.org
Subject:   Re: netmap: I got some troubles with netmap
Message-ID:  <CA%2B_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g@mail.gmail.com>
In-Reply-To: <52E1E272.8060009@huawei.com>
References:  <52D74E15.1040909@huawei.com> <CA%2BhQ2%2BjBhSyHwFsFo%2BzH-EuJEkKEcyc6YBH%2BfnEHi=Y27FyWyQ@mail.gmail.com> <92C7725B-B30A-4A19-925A-A93A2489A525@iet.unipi.it> <52D8A5E1.9020408@huawei.com> <52DD1914.7090506@iet.unipi.it> <52E1E272.8060009@huawei.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
2014/1/24 Wang Weidong <wangweidong1@huawei.com>

> On 2014/1/20 20:39, Giuseppe Lettieri wrote:
> > Hi Wang,
> >
> > OK, you are using the netmap support in the upstream qemu git. That does
> not yet include all our modifications, some of which are very important for
> high throughput with VALE. In particular, the upstream qemu does not
> include the batching improvements in the frontend/backend interface, and it
> does not include the "map ring" optimization of the e1000 frontend. Please
> find attached a gzipped patch that contains all of our qemu code. The patch
> is against the latest upstream master (commit 1cf892ca).
> >
> > Please ./configure the patched qemu with the following option, in
> addition to any other option you may need:
> >
> > --enable-e1000-paravirt --enable-netmap \
> > --extra-cflags=-I/path/to/netmap/sys/directory
> >
> > Note that --enable-e1000-paravirt is needed to enable the "map ring"
> optimization in the e1000 frontend, even if you are not going to use the
> e1000-paravirt device.
> >
> > Now you should be able to rerun your tests. I am also attaching a README
> file that describes some more tests you may want to run.
> >
>
> Hello,


> Yes, I patch the qemu-netmap-bc767e701.patch to the qemu, download the
> 20131019-tinycore-netmap.hdd.
> And I do some test that:
>
> 1. I use the bridge below:
> qemu-system-x86_64 -m 2048 -boot c -net nic -net bridge,br=br1 -hda
> /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
> test between two vms.
> br1 without device.
> Use pktgen, I got the 237.95 kpps.
> Use the netserver/netperf I got the speed 1037M bits/sec with TCP_STREAM.
> The max speed is up to 1621M.
> Use the netserver/netperf I got the speed 3296/s with TCP_RR
> Use the netserver/netperf I got the speed 234M/86M bits/sec with UDP_STREAM
>
> When I add a device from host to the br1, the speed is 159.86 kpps.
> Use the netserver/netperf I got the speed 720M bits/sec with TCP_STREAM.
> The max speed is up to 1000M.
> Use the netserver/netperf I got the speed 3556/s with TCP_RR
> Use the netserver/netperf I got the speed 181M/181M bits/sec with
> UDP_STREAM
>
> What do you think of these data?
>

You are using the old/deprecated QEMU command line syntax (-net), and
therefore honestly It's not clear to me what kind of network configuration
you are running.

Please use our scripts "launch-qemu.sh", "prep-taps.sh", according to what
described in the README.images file (attached).
Alternatively, use the syntax like in the following examples

(#1)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device
virtio-net-pci,netdev=mynet -netdev
tap,ifname=tap01,id=mynet,script=no,downscript=no -smp 2
(#2)   qemu-system-x86_64 archdisk.qcow -enable-kvm -device
e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev
netmap,ifname=vale0:01,id=mynet -smp 2

so that it's clear to us what network frontend (e.g. emulated NIC) and
network backend (e.g. netmap, tap, vde, ecc..) you are using.
In example #1 we are using virtio-net as frontend and tap as backend, while
in example #2 we are using e1000 as frontend and netmap as backend.
Also consider giving more than one core (e.g. -smp 2) to each guest, to
mitigate receiver livelock problems.


>
> 2. I use the vale below:
> qemu-system-x86_64 -m 2048 -boot c -net nic -net netmap,vale0:0 -hda
> /home/wwd/tinycores/20131019-tinycore-netmap.hdd -enable-kvm -vnc :0
>
> Same for here, it's not clear what you are using. I guess each guest has
an e1000 device and is connected to a different port of the same vale
switch (e.g. vale0:0 and vale0:1)?

Test with 2 vms from the same host
> vale0 without device.
> I use the pkt-gen, the speed is 938 Kpps
>

You should get ~4Mpps with e1000 frontend + netmap backend on a reasonably
good machine. Make sure you have ./configure'd QEMU with
--enable-e1000-paravirt.


> I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 195M/195M,
> then add -- -m 8, I only got 1.07M/1.07M.
> When use the smaller msg size, the speed will smaller?
>

If you use e1000 with netperf (without pkt-gen) your performance is doomed
to be horrible. Use e1000-paravirt (as a frontend) instead if you are
interested in netperf experiment.
Also consider that the point in using the "-- -m8" options is experimenting
high packet rates, so what you should measure here is not the througput in
Mbps, but the packet rate: netperf reports the number of packets sent and
received, so you can obtain the packet rate by dividing by the running time.
The throughput in Mbps is uninteresting, if you want high bulk throughput
you just don't use "-- -m 8", but leave the defaults.
Using virtio-net in this case will help because of the TSO offloadings.

cheers
  Vincenzo


>
> with vale-ctl -a vale0:eth2,
> use pkt-gen, the speed is 928 Kpps
> I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 209M/208M,
> then add -- -m 8, I only got 1.06M/1.06M.
>
> with vale-ctl -h vale0:eth2,
> use pkt-gen, the speed is 928 Kpps
> I use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 192M/192M,
> then add -- -m 8, I only got 1.06M/1.06M.
>
> Test with 2 vms form two host,
> I only can test it by vale-ctl -h vale0:eth2 and set eth2 into promisc
> use pkt-gen with the default params, the speed is about 750 Kpps
> use netperf -H 10.0.0.2 -t UDP_STREAM, I got the speed is 160M/160M
> Is this right?
>

> 3. I can't use the l2 utils.
> When I do the "sudo l2open -t eth0 l2recv[l2send], I got that "l2open
> ioctl(TUNSETIFF...): Invalid argument"
> and "use l2open -r eth0 l2recv", wait a moment (only several seconds), I
> got the result:
> TEST-RESULT: 0.901 kpps 1pkts
> select/read=100.00 err=0
>
> And I can't find the l2 utils from the net? Is it implemented by your team?
>
> All of them is tested on vms.
>
> Cheers.
> Wang
>
>
> >
> > Cheers,
> > Giuseppe
> >
> > Il 17/01/2014 04:39, Wang Weidong ha scritto:
> >> On 2014/1/16 18:24, facoltà wrote:
> [...]
> >>
> >>
> >
> >
>
>
>


-- 
Vincenzo Maffione

[-- Attachment #2 --]
	EXPERIMENTING WITH NETMAP, VALE AND FAST QEMU
	---------------------------------------------

To ease experiments with Netmap, the VALE switch and our Qemu enhancements
we have prepared a couple of bootable images (linux and FreeBSD).
You can find them on the netmap page

	http://info.iet.unipi.it/~luigi/netmap/

where you can also look at more recent versions of this file.

Below are step-by-step instructions on experiments you can run
with these images. The two main versions are

	picobsd.hdd	-> FreeBSD HEAD (netmap + VALE)
	tinycore.hdd	-> Linux (qemu + netmap + VALE)      

Booting the image
-----------------
For all experiments you need to copy the image on a USB stick
and boot a PC with it. Alternatively, you can use the image
with VirtualBox, Qemu or other emulators, as an example

    qemu-system-x86_64 -hda IMAGE_FILE -m 1G -machine accel=kvm ...

(remove 'accel=kvm' if your host does not support kvm).
The images do not install anything on the hard disk.

Both systems have preloaded drivers for a number of network cards
(including the intel 10 Gbit ones) with netmap extensions.
The VALE switch is also available (it is part of the netmap module).
ssh, scp and a few other utilities are also included.

FreeBSD image:

  + the OS boots directly in console mode, you can switch
    between terminals with ALT-Fn.
    The password for the 'root' account is 'setup'

  + if you are connected to a network, you can use
    	dhclient em0 # or other interface name
    to obtain an IP address and external connectivity.

Linux image:

  + in addition to the netmap/VALE modules, the KVM kernel module
    is also preloaded.

  + the boot-loader gives you two main options (each with
    a variant to delay boot in case you have slow devices):

    + "Boot TinyCore"
      boots in an X11 environment as user 'tc'.
      You can create a few terminals using the icon at the
      bottom. You can use "sudo -s" to get root access.
      In case no suitable video card is available/detected,
      it falls back to command line mode.

    + "Boot Core (command line only)"
      boots in console mode with virtual terminals.
      You're automatically logged in as user 'tc'.
      To log in the other terminals use the same username 
      (no password required).

  + The system should automatically recognize the existing ethernet
    devices, and load the appropriate netmap-capable device drivers
    when available.  Interfaces are configured through DHCP when possible.


General test recommendations
----------------------------
NOTE: The tests outlined in the following sections can generate very high
packet rates, and some hardware misconfiguration problems may prevent
you from achieving maximum speed.
Common problems are:

+ slow link autonegotiation.
  Our programs typically wait 2-4 seconds for
  link negotiation to complete, but some NIC/switch combinations
  are much slower. In this case you should increase the delay
  (pkt-gen has the -w XX option for that) or possibly force
  the link speed and duplex mode on both sides.

  Check the link speed to make sure there are no nogotiation
  problems, and that you see the expected speed.

    ethtool IFNAME	# on linux
    ifconfig IFNAME	# on FreeBSD

+ ethernet flow control.
  If the receiving port is slow (often the case in presence
  of multicast/broadcast traffic, or also unicast if you are
  sending to non-netmap receivers), it will generate ethernet
  flow control frames that throttle down the sender.

  We recommend to disable BOTH RX and TX ethernet flow control
  on BOTH sender and receiver.
  On Linux this can be done with ethtool:

    ethtool -A IFNAME tx off rx off

  whereas on FreeBSD there are device-specific sysctl

	sysctl dev.ix.0.queue0.flow_control = 0

+ CPU power saving.
  The CPU governor on linux, or equivalent in FreeBSD, tend to
  throttle down the clock rate reducing performance.
  Unlike other similar systems, netmap does not have busy-wait
  loops, so the CPU load is generally low and this can trigger
  the clock slowdown.

  Make sure that ALL CPUs run at maximum speed disabling the
  dynamic frequency-scaling mechanisms.

    cpufreq-set -gperformance	# on linux

    sysctl dev.cpu.0.freq=3401	# on FreeBSD.

+ wrong MAC address
  netmap does not put the NIC in promiscuous mode, so unless the
  application does it, the NIC will only receive broadcast traffic or
  unicast directed to its own MAC address.


STANDARD SOCKET TESTS
---------------------
For most socket-based experiments you can use the "netperf" tool installed
on the system (version 2.6.0). Be careful to use a matching version for
the other netperf endpoint (e.g. netserver) when running tests between
different machines.

Interesting experiments are:

    netperf -H x.y.z.w -tTCP_STREAM  # test TCP throughput
    netperf -H x.y.z.w -tTCP_RR      # test latency
    netperf -H x.y.z.w -tUDP_STREAM -- -m8  # test UDP throughput with short packets

where x.y.z.w is the host running "netserver".


RAW SOCKET AND TAP TESTS
------------------------
For experiments with raw sockets and tap devices you can use the l2
utilities (l2open, l2send, l2recv) installed on the system.
With these utilities you can send/receive custom network packets
to/from raw sockets or tap file descriptors.

The receiver can be run with one of the following commands

    l2open -r IFNAME l2recv     # receive from a raw socket attached to IFNAME
    l2open -t IFNAME l2recv     # receive from a file descriptor opened on the tap IFNAME

The receiver process will wait indefinitely for the first packet
and then keep receiving as long as packets keep coming. When the
flow stops (after a 2 seconds timeout) the process terminates and
prints the received packet rate and packet count.

To run the sender in an easy way, you can use the script l2-send.sh
in the home directory. This script defines several shell variables
that can be manually changed to customize the test (see
the comments in the script itself).

As an example, you can test configurations with Virtual
Machines attached to host tap devices bridged together.


Tests using the Linux in-kernel pktgen
--------------------------------------
To use the Linux in-kernel packet generator, you can use the
script "linux-pktgen.sh" in the home directory.
The pktgen creates a kernel thread for each hardware TX queue
of a given NIC.

By manually changing the script shell variable definitions you
can change the test configuration (e.g. addresses in the generated
packet). Please change the "NCPU" variable to match the number
of CPUs on your machine. The script has an argument which
specifies the number of NIC queues (i.e. kernel threads)
to use minus one.

For example:

    ./linux-pktgen.sh 2  # Uses 3 NIC queues

When the script terminates, it prints the per-queue rates and
the total rate achieved.


NETMAP AND VALE EXPERIMENTS
---------------------------

For most experiments with netmap you can use the "pkt-gen" command
(do not confuse it with the Linux in-kernel pktgen), which has a large
number of options to send and receive traffic (also on TAP devices).

pkt-gen normally generates UDP traffic for a specific IP address
and using the brodadcast MAC address

Netmap testing with network interfaces
--------------------------------------

Remember that you need a netmap-capable driver in order to use
netmap on a specific NIC. Currently supported drivers are e1000,
e1000e, ixgbe, igb. For updated information please visit
http://info.iet.unipi.it/~luigi/netmap/

Before running pkt-gen, make sure that the link is up.

Run pkt-gen on an interface called "IFNAME":

    pkt-gen -i IFNAME -f tx  # run a pkt-gen sender
    pkt-gen -i IFNAME -f rx  # run a pkt-gen receiver

pkt-gen without arguments will show other options, e.g.
  + -w sec	modifies the wait time for link negotioation
  + -l len	modifies the packet size
  + -d, -s	set the IP destination/source addresses and ports
  + -D, -S	set the MAC destination/source addresses

and more.

Testing the VALE switch
------------------------

To use the VALE switch instead of physical ports you only need
to change the interface name in the pkt-gen command.
As an example, on a single machine, you can run senders and receivers
on multiple ports of a VALE switch as follows (run the commands into
separate terminals to see the output)

    pkt-gen -ivale0:01 -ftx  # run a sender on the port 01 of the switch vale0
    pkt-gen -ivale0:02 -frx  # receiver on the port 02 of same switch
    pkt-gen -ivale0:03 -ftx  # another sender on the port 03

The VALE switches and ports are created (and destroyed) on the fly.


Transparent connection of physical ports to the VALE switch
-----------------------------------------------------------

It is also possible to use a network device as a port of a VALE
switch. You can do this with the following command:

    vale-ctl -h vale0:eth0  # attach interface "eth0" to the "vale0" switch

To detach an interface from a bridge:

    vale-ctl -d vale0:eth0  # detach interface "eth0" from the "vale0" switch

These operations can be issued at any moment.


Tests with our modified QEMU
----------------------------

The Linux image also contains our modified QEMU, with the VALE backend and
the "e1000-paravirt" frontend (a paravirtualized e1000 emulation).

After you have booted the image on a physical machine (so you can exploit
KVM), you can boot the same image a second time (recursively) with QEMU.
Therefore, you can run all the tests above also from within the virtual
machine environment.

To make VM testing easier, the home directory contains some
some useful scripts to set up and launch VMs on the physical machine.

+ "prep-taps.sh"
  creates and sets up two permanent tap interfaces ("tap01" and "tap02")
  and a Linux in-kernel bridge. The tap interfaces are then bridged
  together on the same bridge. The bridge interface ("br0"), is given
  the address 10.0.0.200/24.

  This setup can be used to make two VMs communicate through the
  host bridge, or to test the speed of a linux switch using
  l2open

+ "unprep-taps.sh"
  undoes the above setup.

+ "launch-qemu.sh"
  can be used to run QEMU virtual machines. It takes four arguments:

    + The first argument can be "qemu" or "kvm", depending on
      whether we want to use the standard QEMU binary translation
      or the hardware virtualization acceleration.

    + The third argument can be "--tap", "--netuser" or "--vale",
      and tells QEMU what network backend to use: a tap device,
      the QEMU user networking (slirp), or a VALE switch port.

    + When the third argument is "--tap" or "--vale", the fourth
      argument specifies an index (e.g. "01", "02", etc..) which
      tells QEMU what tap device or VALE port to use as backend.

  You can manually modify the script to set the shell variables that
  select the type of emulated device (e.g.  e1000, virtio-net-pci, ...)
  and related options (ioeventfd, virtio vhost, e1000 mitigation, ....).

  The default setup has an "e1000" device with interrupt mitigation
  disabled.

You can try the paravirtualized e1000 device ("e1000-paravirt")
or the "virtio-net" device to get better performance. However, bear
in mind that these paravirtualized devices don't have netmap support
(whereas the standard e1000 does have netmap support).

Examples:

    # Run a kvm VM attached to the port 01 of a VALE switch
    ./launch-qemu.sh kvm --vale 01

    # Run a kvm VM attached to the port 02 of the same VALE switch
    ./launch-qemu.sh kvm --vale 02

    # Run a kvm VM attached to the tap called "tap01"
    ./launch-qemu.sh kvm --tap 01

    # Run a kvm VM attached to the tap called "tap02"
    ./launch-qemu.sh kvm --tap 02


Guest-to-guest tests
--------------------

If you run two VMs attached to the same switch (which can be a Linux
bridge or a VALE switch), you can run guest-to-guest experiments.

All the tests reported in the previous sections are possible (normal
sockets, raw sockets, pkt-gen, ...), indipendently of the backend used.

In the following examples we assume that:

    + Each VM has an ethernet interface called "eth0".

    + The interface of the first VM is given the IP 10.0.0.1/24.

    + The interface of the second VM is given the IP 10.0.0.2/24.

    + The Linux bridge interface "br0" on the host is given the
      IP 10.0.0.200/24.

Examples:

    [1] ### Test UDP short packets over traditional sockets ###
        # On the guest 10.0.0.2 run
            netserver
        # on the guest 10.0.0.1 run
            netperf -H10.0.0.2 -tUDP_STREAM -- -m8

    [2] ### Test UDP short packets with pkt-gen ###
        # On the guest 10.0.0.2 run
            pkt-gen -ieth0 -frx
        # On the guest 10.0.0.1 run
            pkt-gen -ieth0 -ftx

    [3] ### Test guest-to-guest latency ###
        # On the guest 10.0.0.2 run
            netserver
        # On the guest 10.0.0.1 run
            netperf -H10.0.0.2 -tTCP_RR

Note that you can use pkt-gen into a VM only if the emulated ethernet
device is supported by netmap. The default emulated device is
"e1000", which has netmap support.  If you try to run pkt-gen on
an unsupported device, pkt-gen will not work, reporting that it is
unable to register the interface.


Guest-to-host tests (follows from the previous section)
-------------------------------------------------------

If you run only a VM on your host machine, you can measure the
network performance between the VM and the host machine.  In this
case the experiment setup depends on the backend you are using.

With the tap backend, you can use the bridge interface "br0" as a
communication endpoint. You can run normal/raw sockets experiments,
but you cannot use pkt-gen on the "br0" interface, since the Linux
bridge interface is not supported by netmap.

Examples with the tap backend:

    [1] ### Test TCP throughput over traditional sockets ###
        # On the host run
            netserver
        # on the guest 10.0.0.1 run
            netperf -H10.0.0.200 -tTCP_STREAM

    [2] ### Test UDP short packets with pkt-gen and l2 ###
        # On the host run
            l2open -r br0 l2recv
        # On the guest 10.0.0.1 run (xx:yy:zz:ww:uu:vv is the
        # "br0" hardware address)
            pkt-gen -ieth0 -ftx -d10.0.0.200:7777 -Dxx:yy:zz:ww:uu:vv


With the VALE backend you can perform only UDP tests, since we don't have
a netmap application which implements a TCP endpoint: pkt-gen generates
UDP packets.
As a communication endpoint on the host, you can use a virtual VALE port
opened on the fly by a pkt-gen instance.

Examples with the VALE backend:

    [1] ### Test UDP short packets ###
        # On the host run
            pkt-gen -ivale0:99 -frx
        # On the guest 10.0.0.1 run
            pkt-gen -ieth0 -ftx

    [2] ### Test UDP big packets (receiver on the guest) ###
        # On the guest 10.0.0.1 run
            pkt-gen -ieth0 -frx
        # On the host run pkt-gen -ivale0:99 -ftx -l1460


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B_eA9hOzQiOWKvHOiKjY4kjxmerMWp=MhtF_vbr8t-q4V732g>