From owner-freebsd-net@FreeBSD.ORG  Thu Apr 19 20:11:30 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 025881065672
	for <net@freebsd.org>; Thu, 19 Apr 2012 20:11:30 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 659818FC0A
	for <net@freebsd.org>; Thu, 19 Apr 2012 20:11:29 +0000 (UTC)
Received: (qmail 15206 invoked from network); 19 Apr 2012 20:00:01 -0000
Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <rizzo@iet.unipi.it>; 19 Apr 2012 20:00:01 -0000
Message-ID: <4F907011.9080602@freebsd.org>
Date: Thu, 19 Apr 2012 22:05:37 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:11.0) Gecko/20120327 Thunderbird/11.0.1
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@iet.unipi.it>
References: <20120419133018.GA91364@onelab2.iet.unipi.it>
In-Reply-To: <20120419133018.GA91364@onelab2.iet.unipi.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: current@freebsd.org, net@freebsd.org
Subject: Re: Some performance measurements on the FreeBSD network stack
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Apr 2012 20:11:30 -0000

On 19.04.2012 15:30, Luigi Rizzo wrote:
> I have been running some performance tests on UDP sockets,
> using the netsend program in tools/tools/netrate/netsend
> and instrumenting the source code and the kernel do return in
> various points of the path. Here are some results which
> I hope you find interesting.

Jumping over very interesting analysis...

> - the next expensive operation, consuming another 100ns,
>    is the mbuf allocation in m_uiotombuf(). Nevertheless, the allocator
>    seems to scale decently at least with 4 cores.  The copyin() is
>    relatively inexpensive (not reported in the data below, but
>    disabling it saves only 15-20ns for a short packet).
>
>    I have not followed the details, but the allocator calls the zone
>    allocator and there is at least one critical_enter()/critical_exit()
>    pair, and the highly modular architecture invokes long chains of
>    indirect function calls both on allocation and release.
>
>    It might make sense to keep a small pool of mbufs attached to the
>    socket buffer instead of going to the zone allocator.
>    Or defer the actual encapsulation to the
>    (*so->so_proto->pr_usrreqs->pru_send)() which is called inline, anyways.

The UMA mbuf allocator is certainly not perfect but rather good.
It has a per-CPU cache of mbuf's that are very fast to allocate
from.  Once it has used them it needs to refill from the global
pool which may happen from time to time and show up in the averages.

> - another big bottleneck is the route lookup in ip_output()
>    (between entries 51 and 56). Not only it eats another
>    100ns+ on an empty routing table, but it also
>    causes huge contentions when multiple cores
>    are involved.

This is indeed a big problem.  I'm working (rough edges remain) on
changing the routing table locking to an rmlock (read-mostly) which
doesn't produce any lock contention or cache pollution.  Also skipping
the per-route lock while the table read-lock is held should help some
more.  All in all this should give a massive gain in high pps situations
at the expense of costlier routing table changes.  However changes
are seldom to essentially never with a single default route.

After that the ARP table will gets same treatment and the low stack
lock contention points should be gone for good.

-- 
Andre