From owner-freebsd-net@FreeBSD.ORG  Mon Jul  7 13:39:00 2008
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9F8E8106568C;
	Mon,  7 Jul 2008 13:39:00 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id ABD368FC14;
	Mon,  7 Jul 2008 13:38:59 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 6EFC646C66;
	Mon,  7 Jul 2008 09:38:58 -0400 (EDT)
Date: Mon, 7 Jul 2008 14:38:58 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20080707224659.B7844@besplex.bde.org>
Message-ID: <20080707142018.U63144@fledge.watson.org>
References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net>
	<486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net>
	<486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net>
	<alpine.LFD.1.10.0807021052041.557@filebunker.xip.at>
	<486B4F11.6040906@gtcomm.net>
	<alpine.LFD.1.10.0807021155280.557@filebunker.xip.at>
	<486BC7F5.5070604@gtcomm.net>
	<20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net>
	<20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net>
	<alpine.LFD.1.10.0807041106591.19613@filebunker.xip.at>
	<486DF1A3.9000409@gtcomm.net>
	<alpine.LFD.1.10.0807041303490.20760@filebunker.xip.at>
	<486E65E6.3060301@gtcomm.net>
	<alpine.LFD.1.10.0807052356130.2145@filebunker.xip.at>
	<4871DB8E.5070903@freebsd.org>
	<20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org>
	<20080707213356.G7572@besplex.bde.org>
	<20080707134036.S63144@fledge.watson.org>
	<20080707224659.B7844@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: FreeBSD Net <freebsd-net@FreeBSD.org>, Andre Oppermann <andre@FreeBSD.org>,
	Ingo Flaschberger <if@xip.at>, Paul <paul@gtcomm.net>
Subject: Re: Freebsd IP Forwarding performance (question, and some info)
 [7-stable, current, em, smp]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jul 2008 13:39:00 -0000


On Mon, 7 Jul 2008, Bruce Evans wrote:

>> (1) sendto() to a specific address and port on a socket that has been bound 
>> to
>>    INADDR_ANY and a specific port.
>> 
>> (2) sendto() on a specific address and port on a socket that has been bound 
>> to
>>    a specific IP address (not INADDR_ANY) and a specific port.
>> 
>> (3) send() on a socket that has been connect()'d to a specific IP address 
>> and
>>    a specific port, and bound to INADDR_ANY and a specific port.
>> 
>> (4) send() on a socket that has been connect()'d to a specific IP address
>>    and a specific port, and bound to a specific IP address (not INADDR_ANY)
>>    and a specific port.
>> 
>> The last of these should really be quite a bit faster than the first of 
>> these, but I'd be interested in seeing specific measurements for each if 
>> that's possible!
>
> Not sure if I understand networking well enough to set these up quickly. 
> Does netrate use one of (3) or (4) now?

(3) and (4) are effectively the same thing, I think, since connect(2) should 
force the selection of a source IP address, but I think it's not a bad idea to 
confirm that. :-)

The structure of the desired micro-benchmark here is basically:

int
main(int argc, char *argv)
{
 	struct sockaddr_in sin;

 	/* Parse command line arguments such as addresss and ports. */
 	if (bind_desired) {
 		/* Set up sockaddr_in. */
 		if (bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0)
 			err(-1, "bind");
 	}

 	/* Set up destination sockaddr_in. */
 	if (connect_desired) {
 		if (connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0)
 			err(-1, "connect");
 	}

 	while (appropriate_condition) {
 		if (connect_desired) {
 			if (send(s, ...) < 0)
 				errors++;
 		} else {
 			if (sendto(s, (struct sockaddr *)&sin, sizeof(sin)) < 0)
 				errors++;
 		}
 	}
}

> I can tell you vaguely about old results for netrate (send()) vs ttcp 
> (sendto()).  send() is lighter weight of course, and this made a difference 
> of 10-20%, but after further tuning the difference became smaller, which 
> suggests that everything ends up waiting for something in common.
>
> Now I can measure cache misses better and hope that a simple count of cache 
> misses will be a more reproducible indicator of significant bottlenecks than 
> pps.  I got nowhere trying to reduce instruction counts, possibly because it 
> would take avoiding 100's of instructions to get the same benefit as 
> avoiding a single cache miss.

If you look at the design of the higher performance UDP applications, they 
will generally bind a specific IP (perhaps every IP on the host with its own 
socket), and if they do sustained communication to a specific endpoint they 
will use connect(2) rather than providing an address for each send(2) system 
call to the kernel.

udp_output(2) makes the trade-offs there fairly clear: with the most recent 
rev, the optimal case is one connect(2) has been called, allowing a single 
inpcb read lock and no global data structure access, vs. an application 
calling sendto(2) for each system call and the local binding remaining 
INADDR_ANY.  Middle ground applications, such as named(8) will force a local 
binding using bind(2), but then still have to pass an address to each 
sendto(2).  In the future, this case will be further optimized in our code by 
using a global read lock rather than a global write lock: we have to check for 
collisions, but we don't actually have to reserve the new 4-tuple for the UDP 
socket as it's an ephemeral association rather than a connect(2).

Robert N M Watson
Computer Laboratory
University of Cambridge