Date: Mon, 07 Jul 2008 11:11:37 +0200 From: Andre Oppermann <andre@freebsd.org> To: Robert Watson <rwatson@FreeBSD.org> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Bart Van Kerckhove <bart@it-ss.be>, Ingo Flaschberger <if@xip.at>, Paul <paul@gtcomm.net> Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] Message-ID: <4871DDC9.6060706@freebsd.org> In-Reply-To: <20080707095013.N63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><alpine.LFD.1.10.0807021052041.557@filebunker.xip.at><486B4F11.6040906@gtcomm.net><alpine.LFD.1.10.0807021155280.557@filebunker.xip.at><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><alpine.LFD.1.10.0807041106591.19613@filebunker.xip.at><486DF1A3.9000409@gtcomm.net><alpine.LFD.1.10.0807041303490.20760@filebunker.xip.at><486E65E6.3060301@gtcomm.net> <alpine.LFD.1.10.0807052356130.2145@filebunker.xip.at> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <20080706132148.E44832@fledge.watson.org> <4871D81B.8070507@freebsd.org> <20080707095013.N63144@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Robert Watson wrote: >>> Experience suggests that forwarding workloads see significant lock >>> contention in the routing and transmit queue code. The former needs >>> some kernel hacking to address in order to improve parallelism for >>> routing lookups. The latter is harder to address given the hardware >>> you're using: modern 10gbps cards frequently offer multiple transmit >>> queues that can be used independently (which our cxgb driver >>> supports), but 1gbps cards generally don't. >> >> Actually the routing code is not contended. The workload in router is >> mostly serialized without much opportunity for contention. With many >> interfaces and any-to-any traffic patterns it may get some >> contention. The locking overhead per packet is always there and has >> some impact though. > > Yes, I don't see any real sources of contention until we reach the > output code, which will run in the input if_em taskqueue threads, as the > input path generates little or no contention of the packets are not > destined for local delivery. I was a little concerned about mention of The interface output was the second largest block after the cache misses IIRC. The output part seems to have received only moderate attention and detailed performance analysis compared to the interface input path. Most network drivers do a write to the hardware for every packet sent in addition to other overhead that may be necessary for their transmit DMA rings. That adds significant overhead compared to the RX path where those costs are amortized over a larger number packets. > degrading performance as firewall complexity grows -- I suspect there's > a nice project for someone to do looking at why this is the case. I was > under the impression that, in 7.x and later, we use rwlocks to protect > firewall state, and that unless stateful firewall rules are used, these > are locked read-only rather than writable... The overhead of just looking at the packet (twice) in ipfw or other firewall packets is a huge overhead. The main loop of ipfw is a very large block of code. Unless one implements compilation of firewall to native machine code there is not much that can be done. With LLVM we will see some very interesting opportunity in that area. Other than that the ipfw instruction over per rule seems to be quite close to the optimum. I'm not saying one shouldn't take a close look with a profiler to verify this is actually the case. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4871DDC9.6060706>