From owner-freebsd-net@FreeBSD.ORG Tue Apr 23 23:34:26 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 46F68203 for ; Tue, 23 Apr 2013 23:34:26 +0000 (UTC) (envelope-from vegeta@tuxpowered.net) Received: from mail-bk0-x233.google.com (mail-bk0-x233.google.com [IPv6:2a00:1450:4008:c01::233]) by mx1.freebsd.org (Postfix) with ESMTP id CEFF514FC for ; Tue, 23 Apr 2013 23:34:25 +0000 (UTC) Received: by mail-bk0-f51.google.com with SMTP id y8so510972bkt.10 for ; Tue, 23 Apr 2013 16:34:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:subject:date:user-agent:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id :x-gm-message-state; bh=qAtpUmoFpeAjMbXw7m6oJ08w2DwPC6Ql0CkBMf+VVAw=; b=fLKjyPy1WLiAMCISK3CHJv2hs6U1Vqvb7FlevNc5mgm/vc2JUM619+0hpNTJCefpEO Dzauhvw3/AAbiiOUEk4/JciS/8v1tW7f2Dv5+EBCvX1MS9/atJkk9dXSwRrVLCFqB1P9 LlonXqhtM8fYE8L6kZ9QqerU0Fc/j9hRPvbBP7/tEW3Rf3gRGnXeFXVAGr3rScpLF0Re lVzg3YO3lamzjN/QKejljYClkSMG70zkKOtMV2Z3LJqMfB3uxOZvs4ViMiqk+b5HhKXZ J3h/NjJ0x51Z1C8eGH49NYJu1zoUp2pKCOf9KU7qgq2pj3YCiAG5cZUEXc1y4CrO9yJa G+Sw== X-Received: by 10.204.173.9 with SMTP id n9mr14033114bkz.47.1366760064484; Tue, 23 Apr 2013 16:34:24 -0700 (PDT) Received: from zvezda.localnet ([2a02:8108:1440:5b:2677:3ff:fe7b:7648]) by mx.google.com with ESMTPSA id cv9sm70532bkb.5.2013.04.23.16.34.23 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 23 Apr 2013 16:34:23 -0700 (PDT) From: Kajetan Staszkiewicz To: freebsd-net@freebsd.org Subject: Re: pf performance? Date: Wed, 24 Apr 2013 01:34:22 +0200 User-Agent: KMail/1.13.5 (Linux/3.6.6-vegeta.1; KDE/4.4.5; x86_64; ; ) References: <5176E5C1.9090601@soe.ucsc.edu> In-Reply-To: <5176E5C1.9090601@soe.ucsc.edu> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201304240134.22740.vegeta@tuxpowered.net> X-Gm-Message-State: ALoCoQl7suXB+VPb2P5aan2Yr7S3GgmZjw/kETCR5lW708FC/DMSXt4mk1q6+bnQtoebSjzgJc2m X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Apr 2013 23:34:26 -0000 Dnia wtorek, 23 kwietnia 2013 o 21:49:21 Erich Weiler napisa=C5=82(a): > Hello all, >=20 > I have a question here about how FreeBSD (8.1-RELEASE-p13 specifically) > behaves when acting as a firewall. I understand the pf process is > "giant locked" to a single CPU core when inspecting packets inbound and > outbound. I was wondering, how does that manifest when I look at "top > -P" on the firewall? >=20 > Right now I have a dual port Myricom 10G NIC (packets inbound on one > interface and outbound on the other), and the mxge driver is > "multiplexing" interrupt processing across all the CPU cores for speed. > So, when the firewall is busy, I see all the cpu cores quite busy > processing interrupts (like 70% or more CPU utilization). But, all CPU > work seems to be in interrupts. I don't see anything, or *very* little, > in system or user space for CPU utilization. Should the pf process be > using some CPU too? If so, how could I tell that? I'm trying to figure > out if I'm limited by not having enough CPU to process the interrupts or > not enough CPU to process the packet filtering process. Right now it > looks like interrupts but I'm not sure. As far as I understand, processing of packets by pf takes place in receivin= g=20 network card's interrupt handler even up to sending the packet via another= =20 network card (at least in my case, when using route-to targets, which make= =20 routing inside pf). > The Myricom folks looked at our debugging info on the mxge driver and > say that based on what they see, mxge is dropping packets because the > host cannot pull packets out of the NIC buffer fast enough. The host is > using a four core Xeon X5677 3.46GHz CPU. We're processing 140,000 > packets per second or so, and I see rates up to several gigabits per > second, but all my research seems to indicate it can do better than > that, and that we should not be dropping packets. Or maybe the question > is: why doesn't the host pull the packets from the NIC fast enough? Is > the CPU tied up doing something else? Interrupts? As for my performance issues, at first I noticed that I always had some cor= es=20 overloaded and some doing noting. So I performed the following tuning: =2D disabled HT on CPUs =2D deferred netisr and no NIC interrupts assigned to cores used by netisr =2D each core gets only one interrupt But this is in case of NICs with just a single interrupt (so I have netisr = at=20 cpu0 and 1, one NIC on cpu3, one nic on cpu4), it might not help when you h= ave=20 ones that can load all cores. Some more tips: =2D use interrupt coalescing, if you do, tune it to be more agressive =2D create states on *both* sides of your firewall, for me this lowered loa= davg 2-3 times on a machine with around 400 rules. =2D keep state amount low, I was surprised how many states were hanging in "closing" state which has quite a long default timeout. How do you count the 140kpps value? One interface, both, in, out? I'd like = to=20 relate this somehow to my values. =2D-=20 | pozdrawiam / greetings | powered by Debian, CentOS and FreeBSD | | Kajetan Staszkiewicz | jabber,email: vegeta()tuxpowered net | | Vegeta | www: http://vegeta.tuxpowered.net | `------------------------^---------------------------------------'