From owner-freebsd-net@FreeBSD.ORG Mon Jul 7 13:39:00 2008 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F8E8106568C; Mon, 7 Jul 2008 13:39:00 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id ABD368FC14; Mon, 7 Jul 2008 13:38:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 6EFC646C66; Mon, 7 Jul 2008 09:38:58 -0400 (EDT) Date: Mon, 7 Jul 2008 14:38:58 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Bruce Evans In-Reply-To: <20080707224659.B7844@besplex.bde.org> Message-ID: <20080707142018.U63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <486B4F11.6040906@gtcomm.net> <486BC7F5.5070604@gtcomm.net> <20080703160540.W6369@delplex.bde.org> <486C7F93.7010308@gtcomm.net> <20080703195521.O6973@delplex.bde.org> <486D35A0.4000302@gtcomm.net> <486DF1A3.9000409@gtcomm.net> <486E65E6.3060301@gtcomm.net> <4871DB8E.5070903@freebsd.org> <20080707191918.B4703@besplex.bde.org> <4871FB66.1060406@freebsd.org> <20080707213356.G7572@besplex.bde.org> <20080707134036.S63144@fledge.watson.org> <20080707224659.B7844@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Net , Andre Oppermann , Ingo Flaschberger , Paul Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 13:39:00 -0000 On Mon, 7 Jul 2008, Bruce Evans wrote: >> (1) sendto() to a specific address and port on a socket that has been bound >> to >> INADDR_ANY and a specific port. >> >> (2) sendto() on a specific address and port on a socket that has been bound >> to >> a specific IP address (not INADDR_ANY) and a specific port. >> >> (3) send() on a socket that has been connect()'d to a specific IP address >> and >> a specific port, and bound to INADDR_ANY and a specific port. >> >> (4) send() on a socket that has been connect()'d to a specific IP address >> and a specific port, and bound to a specific IP address (not INADDR_ANY) >> and a specific port. >> >> The last of these should really be quite a bit faster than the first of >> these, but I'd be interested in seeing specific measurements for each if >> that's possible! > > Not sure if I understand networking well enough to set these up quickly. > Does netrate use one of (3) or (4) now? (3) and (4) are effectively the same thing, I think, since connect(2) should force the selection of a source IP address, but I think it's not a bad idea to confirm that. :-) The structure of the desired micro-benchmark here is basically: int main(int argc, char *argv) { struct sockaddr_in sin; /* Parse command line arguments such as addresss and ports. */ if (bind_desired) { /* Set up sockaddr_in. */ if (bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(-1, "bind"); } /* Set up destination sockaddr_in. */ if (connect_desired) { if (connect(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(-1, "connect"); } while (appropriate_condition) { if (connect_desired) { if (send(s, ...) < 0) errors++; } else { if (sendto(s, (struct sockaddr *)&sin, sizeof(sin)) < 0) errors++; } } } > I can tell you vaguely about old results for netrate (send()) vs ttcp > (sendto()). send() is lighter weight of course, and this made a difference > of 10-20%, but after further tuning the difference became smaller, which > suggests that everything ends up waiting for something in common. > > Now I can measure cache misses better and hope that a simple count of cache > misses will be a more reproducible indicator of significant bottlenecks than > pps. I got nowhere trying to reduce instruction counts, possibly because it > would take avoiding 100's of instructions to get the same benefit as > avoiding a single cache miss. If you look at the design of the higher performance UDP applications, they will generally bind a specific IP (perhaps every IP on the host with its own socket), and if they do sustained communication to a specific endpoint they will use connect(2) rather than providing an address for each send(2) system call to the kernel. udp_output(2) makes the trade-offs there fairly clear: with the most recent rev, the optimal case is one connect(2) has been called, allowing a single inpcb read lock and no global data structure access, vs. an application calling sendto(2) for each system call and the local binding remaining INADDR_ANY. Middle ground applications, such as named(8) will force a local binding using bind(2), but then still have to pass an address to each sendto(2). In the future, this case will be further optimized in our code by using a global read lock rather than a global write lock: we have to check for collisions, but we don't actually have to reserve the new 4-tuple for the UDP socket as it's an ephemeral association rather than a connect(2). Robert N M Watson Computer Laboratory University of Cambridge