From owner-freebsd-net@FreeBSD.ORG Wed Feb 15 21:00:42 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1B4B616A420 for ; Wed, 15 Feb 2006 21:00:42 +0000 (GMT) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5547A43D45 for ; Wed, 15 Feb 2006 21:00:41 +0000 (GMT) (envelope-from andre@freebsd.org) Received: (qmail 81755 invoked from network); 15 Feb 2006 20:56:53 -0000 Received: from c00l3r.networx.ch (HELO freebsd.org) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 15 Feb 2006 20:56:53 -0000 Message-ID: <43F39692.7A3228BA@freebsd.org> Date: Wed, 15 Feb 2006 22:01:06 +0100 From: Andre Oppermann X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: Luigi Rizzo References: <7bb8f24157080b6aaacb897a99259df9@madhaus.cns.utoronto.ca> <711b7ec873f31bc5be50ce477313fac3@madhaus.cns.utoronto.ca> <200602110002.21275.max@love2party.net> <43F38CF5.71C326C1@freebsd.org> <20060215123043.A29559@xorpc.icir.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Marcos Bedinelli , Max Laier , Julian Elischer , freebsd-net@freebsd.org Subject: Re: Network performance in a dual CPU system X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Feb 2006 21:00:42 -0000 Luigi Rizzo wrote: > > On Wed, Feb 15, 2006 at 09:20:05PM +0100, Andre Oppermann wrote: > ... > > >From my profiling with the Agilent tester there seem to be two areas where > > the packet filters (ipfw in my test case) burn a lot of CPU per packet. > > That is a) setup of lots of packet variables unconditionally at the entry > > of ip_fw_chk() no matter whether they get looked at later or not, and b) > > the switch() going through all the packet inspection options is for some > > reason not optimized by the compiler and burns even more CPU. Some sort > > of JIT (as in the new bpf code) which replaces the case testing and jumps > > directly to the proper place in the switch statement would go a long way > > of making it way more performant. > > i was expecting some overhead in the initial setting of > variables but the cost of the switch() surprises me a bit. > did you look at the assembly code produced, or otherwise > could you explain a bit more how you think the switch > affects performance ? > Maybe one could make it cheaper through an indirect function call ? > (in the end, instructions are already indexes for a jump table). I didn't look at the assembler code as I can't do assembler. In my testing (on UP) the peak forwarding rate on this particular hardware with fastforwarding enabled dropped from 580kpps to 476kpps (ipfw allow all) to 357kpps (30 non-matching rules on IP address). The number of CPU instructions and branches per packet is as follows: maxkpps instr. branch mispred dcache icfetch icmiss fastfwd 580 2238 300 3.8 1429 812 0.06 fastfwd+ipfw 476 2573 329 17.2 1721 1005 4.31 fastfwd+ipfw30 357 3493 508 15.2 2129 1500 3.35 The setup of the packet variables only happens once per packet. The overhead thus must come from the micro-op evaluation. -- Andre