Date: Wed, 24 Apr 2013 01:34:22 +0200 From: Kajetan Staszkiewicz <vegeta@tuxpowered.net> To: freebsd-net@freebsd.org Subject: Re: pf performance? Message-ID: <201304240134.22740.vegeta@tuxpowered.net> In-Reply-To: <5176E5C1.9090601@soe.ucsc.edu> References: <5176E5C1.9090601@soe.ucsc.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Dnia wtorek, 23 kwietnia 2013 o 21:49:21 Erich Weiler napisa=C5=82(a): > Hello all, >=20 > I have a question here about how FreeBSD (8.1-RELEASE-p13 specifically) > behaves when acting as a firewall. I understand the pf process is > "giant locked" to a single CPU core when inspecting packets inbound and > outbound. I was wondering, how does that manifest when I look at "top > -P" on the firewall? >=20 > Right now I have a dual port Myricom 10G NIC (packets inbound on one > interface and outbound on the other), and the mxge driver is > "multiplexing" interrupt processing across all the CPU cores for speed. > So, when the firewall is busy, I see all the cpu cores quite busy > processing interrupts (like 70% or more CPU utilization). But, all CPU > work seems to be in interrupts. I don't see anything, or *very* little, > in system or user space for CPU utilization. Should the pf process be > using some CPU too? If so, how could I tell that? I'm trying to figure > out if I'm limited by not having enough CPU to process the interrupts or > not enough CPU to process the packet filtering process. Right now it > looks like interrupts but I'm not sure. As far as I understand, processing of packets by pf takes place in receivin= g=20 network card's interrupt handler even up to sending the packet via another= =20 network card (at least in my case, when using route-to targets, which make= =20 routing inside pf). > The Myricom folks looked at our debugging info on the mxge driver and > say that based on what they see, mxge is dropping packets because the > host cannot pull packets out of the NIC buffer fast enough. The host is > using a four core Xeon X5677 3.46GHz CPU. We're processing 140,000 > packets per second or so, and I see rates up to several gigabits per > second, but all my research seems to indicate it can do better than > that, and that we should not be dropping packets. Or maybe the question > is: why doesn't the host pull the packets from the NIC fast enough? Is > the CPU tied up doing something else? Interrupts? As for my performance issues, at first I noticed that I always had some cor= es=20 overloaded and some doing noting. So I performed the following tuning: =2D disabled HT on CPUs =2D deferred netisr and no NIC interrupts assigned to cores used by netisr =2D each core gets only one interrupt But this is in case of NICs with just a single interrupt (so I have netisr = at=20 cpu0 and 1, one NIC on cpu3, one nic on cpu4), it might not help when you h= ave=20 ones that can load all cores. Some more tips: =2D use interrupt coalescing, if you do, tune it to be more agressive =2D create states on *both* sides of your firewall, for me this lowered loa= davg 2-3 times on a machine with around 400 rules. =2D keep state amount low, I was surprised how many states were hanging in "closing" state which has quite a long default timeout. How do you count the 140kpps value? One interface, both, in, out? I'd like = to=20 relate this somehow to my values. =2D-=20 | pozdrawiam / greetings | powered by Debian, CentOS and FreeBSD | | Kajetan Staszkiewicz | jabber,email: vegeta()tuxpowered net | | Vegeta | www: http://vegeta.tuxpowered.net | `------------------------^---------------------------------------'
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201304240134.22740.vegeta>