From owner-freebsd-net@FreeBSD.ORG Sat Mar 31 16:04:02 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CF95416A402 for ; Sat, 31 Mar 2007 16:04:02 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outO.internet-mail-service.net (outO.internet-mail-service.net [216.240.47.238]) by mx1.freebsd.org (Postfix) with ESMTP id BB8B213C43E for ; Sat, 31 Mar 2007 16:04:02 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.32) with ESMTP; Sat, 31 Mar 2007 08:34:26 -0700 Received: from [192.168.2.4] (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 60180125ADE; Sat, 31 Mar 2007 09:03:47 -0700 (PDT) Message-ID: <460E8663.9040309@elischer.org> Date: Sat, 31 Mar 2007 09:03:47 -0700 From: Julian Elischer User-Agent: Thunderbird 1.5.0.10 (Macintosh/20070221) MIME-Version: 1.0 To: Andre Oppermann References: <460D75CE.70804@elischer.org> <20070330145938.A88154@xorpc.icir.org> <460DA258.2030402@elischer.org> <460E19EE.3020700@freebsd.org> In-Reply-To: <460E19EE.3020700@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Luigi Rizzo , ipfw@freebsd.org, FreeBSD Net Subject: Re: IPFW update frequency X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Mar 2007 16:04:03 -0000 Thanks for the information.. The main thrust for me is to make it not hold any locks during processing. performance is 2nd Andre Oppermann wrote: > Julian Elischer wrote: >> Luigi Rizzo wrote: >>> On Fri, Mar 30, 2007 at 01:40:46PM -0700, Julian Elischer wrote: >>>> I have been looking at the IPFW code recently, especially with >>>> respect to locking. >>>> There are some things that could be done to improve IPFW's behaviour >>>> when processing packets, but some of these take a >>>> toll (there is always a toll) on the 'updating' side of things. >>> >>> certainly ipfw was not designed with SMP in mind. If you can tell us >>> what is your plan to make the list lock free >>> (which one, the static or dynamic ones ?) maybe we can comment more. >>> >>> E.g. one option could be the usual trick of adding refcounts to >>> the individual rules, and then using an array of pointers to them. >>> While processing you grab a refcount to the array, and release it once >>> done with the packet. If there is an addition or removal, you duplicate >>> the array (which may be expensive for the large 20k rules mentioned), >>> manipulate the copy and then atomically swap the pointers to the head. >> >> This is pretty close.. I know I've mentioned this to people several >> times over >> the last year or so. the trick is to try do it in a way that the >> average packet >> doesn't need to do any locks to get in and the updater does more work. >> if you are willing to acquire a lock on both starting and ending >> the run through the firewall it is easy. >> (I already have code to do that..) >> (see http://www.freebsd.org/~julian/atomic_replace.c (untested but >> probably close.) >> doing it without requiring that each packet get those locks however is >> a whole new level of problem. > > The locking overhead per packet in ipfw is by no means its limiting > factor. Actually it's a very small part and pretty much any work on > it is lost love. It would be much better spent time to optimize the > main rule loop of ipfw to speed things up. I was profiling ipfw early > last year with an Agilent packet generator and hwpmc. In the meantime > the packet forwarding path (w/o ipfw) has been improved but relative > to each other the number are still correct. > > Numbers pre-taskqueue improvements from early 2006: > fastfwd 580357 pps > fastfwd+pfil_pass 565477 pps (no rules, just pass packet on) > fastfwd+ipfw_allow 505952 pps (one rule) > fastfwd+ipfw_30rules 401768 pps (30 IP address non-matching rules) > fastfwd+pf_pass 476190 pps (one rule) > fastfwd+pf_30rules 342262 pps (30 IP address non-matching rules) > > The overhead per packet is big. Enabling of ipfw and the pfil/ipfw > per packet and their indirect function calls cause a loss of only > about 15'000 pps (0.9%). On the other hand the first rule costs 12.9% > and each additional rule 0.6%. All this is without any complex rules > like table lookups, state tracking, etc. > > idle fastfwd fastfwd+ipfw_allow fastfwd+ipfw_30rules > cycles 2596685731 2598214743 2597973265 2596702381 > cpu-clk-unhalted 7824023 2582240847 2518187670 2483904362 > instructions 2317535 1324655330 1492363346 2026009148 > branches 316786 174329367 191263118 294700024 > branch-mispredicts 19757 2235749 10003461 8848407 > dc-access 1417532 829159482 998427224 1235192770 > dc-refill-from-l2 2124 4767395 4346738 4548311 > dc-refill-from-system 89 803102 819658 654661 > dtlb-l2-hit 626 10435843 9304448 12352018 > dtlb-miss 129 255493 130998 112644 > ic-fetch 804423 471138619 583149432 870371492 > ic-miss 2358 34831 2505198 1947943 > itlb-l2-hit 0 74 12 12 > itlb-miss 42 92 82 82 > lock-cycles 77 803 352 451 > locked-instructions 4 19 2 4 > lock-dc-access 6 20 6 7 > lock-dc-miss 0 0 0 0 > > Hardware is a dual Opteron 852 at 2.6GHz on a Tyan 2882 mainboard with > a dual Intel em network card plugged into a PCI64-133 slot. Packets > are flowing from em0 -> em1. >