From owner-freebsd-net@FreeBSD.ORG Tue Jan 27 23:38:18 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3A7AC525 for ; Tue, 27 Jan 2015 23:38:18 +0000 (UTC) Received: from mail-oi0-f42.google.com (mail-oi0-f42.google.com [209.85.218.42]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F193AD3F for ; Tue, 27 Jan 2015 23:38:17 +0000 (UTC) Received: by mail-oi0-f42.google.com with SMTP id i138so14967080oig.1 for ; Tue, 27 Jan 2015 15:38:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=BEAYc7JPF23sGEHKBeJ0nibwraz1ddBiEvjKzhfal5o=; b=UlYe/4OMyIkvobRqy4vOG5FSxfCN4Tx4rk9fMZQ1n/zF4kOqHGjWNa/fCF1ckdIsht 5cCvTb9uaDCcttjL55GfGkvKYzGxaVX3ETeCR5/zdANUIWkwEHxIdSR5BrwXf0xjoENH 3vWaDJeGDD2CUi2fBYCLdbXf1wlnxqpj65DZJGLICVqCIotX3ZFR48+4PsTzPBluSCmw 3otGSsLvPLkhIGGcbvclLXdwyJ9Ok9+s/lS9Te5YY+xoehc62inUPdXDf3XFkvrunN2/ yXqKLu1DWEqLvCaqALyu6DvPs3ygYXxB4oPL0ATj61kRqwZIlyXAGZ+hxCeIiRCqSIsQ E00w== X-Gm-Message-State: ALoCoQmwkEax5kHFkC7u7kHQrWjbgqQfIw542QFXLz+ICoOK7NuVLeFP6C/tKPQUrd2YEuMGX16u X-Received: by 10.202.68.6 with SMTP id r6mr269186oia.69.1422401891715; Tue, 27 Jan 2015 15:38:11 -0800 (PST) Received: from ?IPv6:2610:160:11:33:956e:9562:4694:6bbf? ([2610:160:11:33:956e:9562:4694:6bbf]) by mx.google.com with ESMTPSA id cr2sm1377160oec.15.2015.01.27.15.38.11 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 27 Jan 2015 15:38:11 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.1\)) Subject: Re: is polling still a thing? From: Jim Thompson In-Reply-To: <87pp9zc1wk.fsf@marcos.anarc.at> Date: Tue, 27 Jan 2015 17:38:11 -0600 Message-Id: <6BB47230-9AB8-4F0B-843B-7C51330F8306@netgate.com> References: <871tmgceup.fsf@marcos.anarc.at> <1422384769.867067950.y2iiuu53@frv34.fwdcdn.com> <87pp9zc1wk.fsf@marcos.anarc.at> To: =?utf-8?Q?Antoine_Beaupr=C3=A9?= X-Mailer: Apple Mail (2.2070.1) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-net@freebsd.org, wishmaster X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jan 2015 23:38:18 -0000 > On Jan 27, 2015, at 4:08 PM, Antoine Beaupr=C3=A9 = wrote: >=20 > On 2015-01-27 13:57:20, wishmaster wrote: >> Have you consider to use netmap-based ipfw instead pf in DDoS = mitigation? I think you should. And without any network ''haks'' like = polling. >=20 > My understanding of netmap was that it wasn't useful for packet > forwarding, because its design is for transmitting packets directly to > userland faster, whereas routers dataflow stay mostly in the router=E2=80= =A6 the problem is that the =E2=80=9Cdata flow=E2=80=9D in freebsd isn=E2=80=99= t very fast. (I=E2=80=99d go so far to say, =E2=80=9Cbroken=E2=80=9D, = but that=E2=80=99s throwing rocks.) But as long as the window is already broken: the rtentry locking is a good example of how the stack is broken. the lack of FIB caching is another issue and the packet-at-a-time-to-completion is another. (no batching) So =E2=80=99N=E2=80=99 packets worth of address lookups, (ACLs, =E2=80=A6,= etc) at a time. Just like =E2=80=9CClick=E2=80=9D showed a decade ago = (and where the polling mode was of use). But it=E2=80=99s trivial to build a packet forwarder (more L2 than L3, = but all things are possible) using netmap (or dpdk) that smacks the = freebsd (and linux) stacks with a large stick. The netmap code comes with a =E2=80=9Cbridge.c=E2=80=9D example that is = just that, a dead-simple bridge. Another example, =E2=80=9Cnetmap-fwd=E2=80= =9D runs at 14.88Mpps between two 10Gbps interfaces. (Neither pf or the kernel-resident ipfw will come close, both are more = than an order of magnitude slower.) Here=E2=80=99s something a bit more than =E2=80=9Cdead simple=E2=80=9D: = https://github.com/caladri/brilter This would be even faster if Juli would use one of the Lua JITs, e.g.: = http://wingolog.org/archives/2014/09/02/high-performance-packet-filtering-= with-pflua And if you want to go =E2=80=98full tilt=E2=80=99, Click runs on top of = netmap since 2012: https://github.com/kohler/click/commits/netmap = (the code is in the = master branch, too. use master.) As for the netmap-ipfw code, it=E2=80=99s 6.5Mpps to 10Mpps (later = editions of the code: = http://freebsd.1045724.n5.nabble.com/ipfw-meets-netmap-6-5-Mpps-in-userspa= ce-td5734014.html = > I'm hesitant in switching back to ipfw, considering how nice the > featureset and syntax of pf is. But if that's what's needed to restore > sanity=E2=80=A6 pf is sane? No, I don=E2=80=99t think so. (yes, it does say =E2=80=9Cpf=E2=80=9D at the front of =E2=80=9CpfSense=E2= =80=9D. so what? I mean, have you looked at the code?) Turn off polling, unless you know you need it. You=E2=80=99ll know you = =E2=80=98need it=E2=80=9D if you start making changes to the stack. There is a lot of =E2=80=9Cmystery meat=E2=80=9D in most fields, and the = field of computers / operating systems contains it=E2=80=99s share. As a somewhat associated example, Intel says, "hyperthreading helps = (networking) performance!=E2=80=9D 6wind says this too. freebsd = developers say, "hyperthreading hurts performance=E2=80=9D. In the end, it depends what is stalling the CPU. Hyper-threading is a = trick to share the write pipes on the core, and traditional = implementations of memcpy() will fill these pipes (call them buffers if = you like.) And the stack does a lot of =E2=80=9Cmemcpy()=E2=80=9D (I=E2=80=99m = waiting for the yowls of =E2=80=9Cwe zero-copy!=E2=80=9D, because anyone = who asserts this just hasn=E2=80=99t looked at the stack.) There are = tricks (if your code is interleaving access to the write pipes well, = you=E2=80=99ll see more benefit. This really wants cache-aligned data = structures, etc.) So, that=E2=80=99s just a long-winded =E2=80=9CYMMV=E2=80=9D. Jim