Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Apr 2013 01:34:22 +0200
From:      Kajetan Staszkiewicz <vegeta@tuxpowered.net>
To:        freebsd-net@freebsd.org
Subject:   Re: pf performance?
Message-ID:  <201304240134.22740.vegeta@tuxpowered.net>
In-Reply-To: <5176E5C1.9090601@soe.ucsc.edu>
References:  <5176E5C1.9090601@soe.ucsc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Dnia wtorek, 23 kwietnia 2013 o 21:49:21 Erich Weiler napisa=C5=82(a):
> Hello all,
>=20
> I have a question here about how FreeBSD (8.1-RELEASE-p13 specifically)
> behaves when acting as a firewall.  I understand the pf process is
> "giant locked" to a single CPU core when inspecting packets inbound and
> outbound.  I was wondering, how does that manifest when I look at "top
> -P" on the firewall?
>=20
> Right now I have a dual port Myricom 10G NIC (packets inbound on one
> interface and outbound on the other), and the mxge driver is
> "multiplexing" interrupt processing across all the CPU cores for speed.
>   So, when the firewall is busy, I see all the cpu cores quite busy
> processing interrupts (like 70% or more CPU utilization).  But, all CPU
> work seems to be in interrupts.  I don't see anything, or *very* little,
> in system or user space for CPU utilization.  Should the pf process be
> using some CPU too?  If so, how could I tell that?  I'm trying to figure
> out if I'm limited by not having enough CPU to process the interrupts or
> not enough CPU to process the packet filtering process.  Right now it
> looks like interrupts but I'm not sure.

As far as I understand, processing of packets by pf takes place in receivin=
g=20
network card's interrupt handler even up to sending the packet via another=
=20
network card (at least in my case, when using route-to targets, which make=
=20
routing inside pf).

> The Myricom folks looked at our debugging info on the mxge driver and
> say that based on what they see, mxge is dropping packets because the
> host cannot pull packets out of the NIC buffer fast enough.  The host is
> using a four core Xeon X5677 3.46GHz CPU.  We're processing 140,000
> packets per second or so, and I see rates up to several gigabits per
> second, but all my research seems to indicate it can do better than
> that, and that we should not be dropping packets.  Or maybe the question
> is: why doesn't the host pull the packets from the NIC fast enough?  Is
> the CPU tied up doing something else?  Interrupts?

As for my performance issues, at first I noticed that I always had some cor=
es=20
overloaded and some doing noting. So I performed the following tuning:
=2D disabled HT on CPUs
=2D deferred netisr and no NIC interrupts assigned to cores used by netisr
=2D each core gets only one interrupt
But this is in case of NICs with just a single interrupt (so I have netisr =
at=20
cpu0 and 1, one NIC on cpu3, one nic on cpu4), it might not help when you h=
ave=20
ones that can load all cores.

Some more tips:
=2D use interrupt coalescing, if you do, tune it to be more agressive
=2D create states on *both* sides of your firewall, for me this lowered loa=
davg
  2-3 times on a machine with around 400 rules.
=2D keep state amount low, I was surprised how many states were hanging in
  "closing" state which has quite a long default timeout.

How do you count the 140kpps value? One interface, both, in, out? I'd like =
to=20
relate this somehow to my values.

=2D-=20
| pozdrawiam / greetings | powered by Debian, CentOS and FreeBSD |
|  Kajetan Staszkiewicz  | jabber,email: vegeta()tuxpowered net  |
|        Vegeta          | www: http://vegeta.tuxpowered.net     |
`------------------------^---------------------------------------'



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201304240134.22740.vegeta>