Date: Sat, 31 Mar 2007 09:03:47 -0700 From: Julian Elischer <julian@elischer.org> To: Andre Oppermann <andre@freebsd.org> Cc: Luigi Rizzo <rizzo@icir.org>, ipfw@freebsd.org, FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: IPFW update frequency Message-ID: <460E8663.9040309@elischer.org> In-Reply-To: <460E19EE.3020700@freebsd.org> References: <460D75CE.70804@elischer.org> <20070330145938.A88154@xorpc.icir.org> <460DA258.2030402@elischer.org> <460E19EE.3020700@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for the information.. The main thrust for me is to make it not hold any locks during processing. performance is 2nd Andre Oppermann wrote: > Julian Elischer wrote: >> Luigi Rizzo wrote: >>> On Fri, Mar 30, 2007 at 01:40:46PM -0700, Julian Elischer wrote: >>>> I have been looking at the IPFW code recently, especially with >>>> respect to locking. >>>> There are some things that could be done to improve IPFW's behaviour >>>> when processing packets, but some of these take a >>>> toll (there is always a toll) on the 'updating' side of things. >>> >>> certainly ipfw was not designed with SMP in mind. If you can tell us >>> what is your plan to make the list lock free >>> (which one, the static or dynamic ones ?) maybe we can comment more. >>> >>> E.g. one option could be the usual trick of adding refcounts to >>> the individual rules, and then using an array of pointers to them. >>> While processing you grab a refcount to the array, and release it once >>> done with the packet. If there is an addition or removal, you duplicate >>> the array (which may be expensive for the large 20k rules mentioned), >>> manipulate the copy and then atomically swap the pointers to the head. >> >> This is pretty close.. I know I've mentioned this to people several >> times over >> the last year or so. the trick is to try do it in a way that the >> average packet >> doesn't need to do any locks to get in and the updater does more work. >> if you are willing to acquire a lock on both starting and ending >> the run through the firewall it is easy. >> (I already have code to do that..) >> (see http://www.freebsd.org/~julian/atomic_replace.c (untested but >> probably close.) >> doing it without requiring that each packet get those locks however is >> a whole new level of problem. > > The locking overhead per packet in ipfw is by no means its limiting > factor. Actually it's a very small part and pretty much any work on > it is lost love. It would be much better spent time to optimize the > main rule loop of ipfw to speed things up. I was profiling ipfw early > last year with an Agilent packet generator and hwpmc. In the meantime > the packet forwarding path (w/o ipfw) has been improved but relative > to each other the number are still correct. > > Numbers pre-taskqueue improvements from early 2006: > fastfwd 580357 pps > fastfwd+pfil_pass 565477 pps (no rules, just pass packet on) > fastfwd+ipfw_allow 505952 pps (one rule) > fastfwd+ipfw_30rules 401768 pps (30 IP address non-matching rules) > fastfwd+pf_pass 476190 pps (one rule) > fastfwd+pf_30rules 342262 pps (30 IP address non-matching rules) > > The overhead per packet is big. Enabling of ipfw and the pfil/ipfw > per packet and their indirect function calls cause a loss of only > about 15'000 pps (0.9%). On the other hand the first rule costs 12.9% > and each additional rule 0.6%. All this is without any complex rules > like table lookups, state tracking, etc. > > idle fastfwd fastfwd+ipfw_allow fastfwd+ipfw_30rules > cycles 2596685731 2598214743 2597973265 2596702381 > cpu-clk-unhalted 7824023 2582240847 2518187670 2483904362 > instructions 2317535 1324655330 1492363346 2026009148 > branches 316786 174329367 191263118 294700024 > branch-mispredicts 19757 2235749 10003461 8848407 > dc-access 1417532 829159482 998427224 1235192770 > dc-refill-from-l2 2124 4767395 4346738 4548311 > dc-refill-from-system 89 803102 819658 654661 > dtlb-l2-hit 626 10435843 9304448 12352018 > dtlb-miss 129 255493 130998 112644 > ic-fetch 804423 471138619 583149432 870371492 > ic-miss 2358 34831 2505198 1947943 > itlb-l2-hit 0 74 12 12 > itlb-miss 42 92 82 82 > lock-cycles 77 803 352 451 > locked-instructions 4 19 2 4 > lock-dc-access 6 20 6 7 > lock-dc-miss 0 0 0 0 > > Hardware is a dual Opteron 852 at 2.6GHz on a Tyan 2882 mainboard with > a dual Intel em network card plugged into a PCI64-133 slot. Packets > are flowing from em0 -> em1. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?460E8663.9040309>