Date: Mon, 07 Jul 2008 13:13:19 +0200 From: Andre Oppermann <andre@freebsd.org> To: Robert Watson <rwatson@FreeBSD.org> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Paul <paul@gtcomm.net> Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] Message-ID: <4871FA4F.40206@freebsd.org> In-Reply-To: <20080707114538.K63144@fledge.watson.org> References: <4867420D.7090406@gtcomm.net> <200806301944.m5UJifJD081781@lava.sentex.ca> <20080701004346.GA3898@stlux503.dsto.defence.gov.au> <alpine.LFD.1.10.0807010257570.19444@filebunker.xip.at> <20080701010716.GF3898@stlux503.dsto.defence.gov.au> <alpine.LFD.1.10.0807010308320.19444@filebunker.xip.at> <486986D9.3000607@monkeybrains.net> <48699960.9070100@gtcomm.net> <ea7b9c170806302005n2a66f592h2127f87a0ba2c6d2@mail.gmail.com> <20080701033117.GH83626@cdnetworks.co.kr> <ea7b9c170806302050p2a3a5480t29923a4ac2d7c852@mail.gmail.com> <4869ACFC.5020205@gtcomm.net> <4869B025.9080006@gtcomm.net> <486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net> <486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net> <4871E85C.8090907@freebsd.org> <20080707114538.K63144@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Distributing the interrupts and taskqueues among the available CPUs >> gives concurrent forwarding with bi- or multi-directional traffic. All >> incoming traffic from any particular interface is still serialized >> though. > > ... although not on multiple input queue-enabled hardware and drivers. > While I've really only focused on local traffic performance with my > 10gbps Chelsio setup, it should be possible to do packet forwarding from > multiple input queues using that hardware and driver today. > > I'll update the netisr2 patches, which allow work to be pushed to > multiple CPUs from a single input queue. However, these necessarily > take a cache miss or two on packet header data in order to break out the > packets from the input queue into flows that can be processed > independently without ordering constraints, so if those cache misses on > header data are a big part of the performance of a configuration, load > balancing in this manner may not help. What would be neat is if the > cards without multiple input queues could still tag receive descriptors > with a flow identifier generated from the IP/TCP/etc layers that could > be used for work placement. The cache miss is really the elephant in the room. If the network card supports multiple RX rings with separate interrupts and a stable hash based (that includes IP+Port src+dst) distribution they can be bound to different CPUs. It is very important to maintain the packet order for flows that go through the router. Otherwise TCP and VoIP will suffer. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4871FA4F.40206>