Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Dec 2016 19:36:16 -0600
From:      Xiaoye Sun <Xiaoye.Sun@rice.edu>
To:        freebsd-net@freebsd.org
Subject:   Can netmap be more efficient when it just does bridging between NIC and Linux kernal?
Message-ID:  <CAJnByzh8ypkWYfXd8U5ACLKp1d_KcJjHBY740wUFnS1WKiEdfw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I am wondering if there a way to reduce the CPU usage of a netmap program
similar to the bridge.c example.

In my use case, I have a distributed application/framework (e.g. Spark or
Hadoop) running on a cluster of machines (each of the machines runs Linux
and has an Intel 10Gbps NIC). The application is both computation and
network intensive. So there is a lot of data transfers between machines. I
divide different data into two types (type 1 and type 2). Packets of type 1
data are sent through netmap (these packets don't go through Linux network
stack). Packets of type 2 data are sent through Linux network stack. Both
type 1 and type 2 data could be small or large.

My netmap program runs on all the machines in the cluster and processes the
packets of type 1 data  (create, send, receive the packets) and forward
packets of type 2 data between the NIC and the kernel by swapping the
pointer to the NIC slot and the pointer to the kernel stack slot (similar
to the bridge.c example in netmap repository).

With my netmap program running on the machines, for an application having
no type 1 data (netmap program behaves like a bridge which only does slot
pointer swapping), the total running time of the application is longer than
the case where no netmap program runs on the machines.

It seems to me that the netmap program either slows down the network
transfer for type 2 data, or it eats up too many CPU cycles and competes
with the application process. However, with my netmap program running,
iperf can reach 10Gbps bandwidth with 40-50% CPU usage on the netmap
program (the netmap program is doing pointer swaping for iperf packets). I
also found that after each poll returns, most of the time, the program
might just swap one pointer, so there is a lot of system call overhead.

Can anybody help me diagnose the source of the problem or is there a better
way to write such program?

I am wondering if there is a way to tuning the configuration so that the
netmap program won't take up too much extra CPU when it runs like the
bridge.c program.


Best,
Xiaoye



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJnByzh8ypkWYfXd8U5ACLKp1d_KcJjHBY740wUFnS1WKiEdfw>