From owner-freebsd-net@FreeBSD.ORG Mon May 4 16:28:30 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 83EFB631 for ; Mon, 4 May 2015 16:28:30 +0000 (UTC) Received: from mail-ob0-f177.google.com (mail-ob0-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4D21B13D0 for ; Mon, 4 May 2015 16:28:30 +0000 (UTC) Received: by obblk2 with SMTP id lk2so71101500obb.0 for ; Mon, 04 May 2015 09:28:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=Tf2BC+oF8vDKDEHYOYUljvApi3eiMqjZjZ33V+6kaGk=; b=D/66SJKXVheHfhARorBFOoWnjEgThHPNC8/MhV/Bw9Uub2Byhqxz50v+rolqZBnUNo 1qDpqD3aapu+XkCjMRiSk2ldgpUHfasNPtdIa08UUBNo3BpQ8DGsVrEgrcVIOhDYXrjQ u6Gwb6dJtazEtaC7vhISQzCaqkOgbQfHeY+lR4iL95DlKd2A4OuwomqfbHbGn/oPvhmR 3/hgdG5Xah1/3gpcL//bzauv9AinsoV8N9OCr2gVGvbHcbWQdQNOJFhvlWrV21g2XLD9 pbe3n/CcnUVspiLVZ0FF9SUOaDp9iRJQdmiGUH9aJrtRQWIhc+iBNu5LrOG8HhY1MyO4 Oh4A== X-Gm-Message-State: ALoCoQl6wysg1eUO9m7yt2tFMHpN874mhxQPMm73uIWSGSMVCP5oAMDfIWvkou3DtiC4TIip9PC9 X-Received: by 10.202.184.3 with SMTP id i3mr17846059oif.61.1430756501988; Mon, 04 May 2015 09:21:41 -0700 (PDT) Received: from ?IPv6:2610:160:11:33:a5e2:6d5a:67d9:998e? ([2610:160:11:33:a5e2:6d5a:67d9:998e]) by mx.google.com with ESMTPSA id a76sm8041013oig.11.2015.05.04.09.21.40 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 04 May 2015 09:21:40 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: netmap-ipfw on em0 em1 From: Jim Thompson In-Reply-To: <1009610346.1107538.1430753353703.JavaMail.yahoo@mail.yahoo.com> Date: Mon, 4 May 2015 11:21:40 -0500 Cc: Luigi Rizzo , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <1009610346.1107538.1430753353703.JavaMail.yahoo@mail.yahoo.com> To: Barney Cordoba X-Mailer: Apple Mail (2.2098) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2015 16:28:30 -0000 While it is a true statement that, "You can do anything in the kernel = that you can do in user space.=E2=80=9D, it is not a helpful statement. = Yes, the kernel is just a program. In a similar way, =E2=80=9CYou can just pop it into any kernel and it = works.=E2=80=9D is also not helpful. It works, but it doesn=E2=80=99t = work well, because of other infrastructure issues. Both of your statements reduce to the age-old, =E2=80=9Cproof is left as = an exercise for the student=E2=80=9D. There is a lot of kernel infrastructure that is just plain crusty(*) and = which directly impedes performance in this area. But there is plenty of cruft, Barney. Here are two threads which are = three years old, with the issues it points out still unresolved, and = multiple places where 100ns or more is lost: = https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.html= = https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.html= 100ns is death at 10Gbps with min-sized packets. quoting: http://luca.ntop.org/10g.pdf --- Taking as a reference a 10 Gbit/s link, the raw throughput is well below = the memory bandwidth of modern systems (between 6 and 8 GBytes/s for CPU = to memory, up to 5 GBytes/s on PCI-Express x16). How- ever a 10Gbit/s = link can generate up to 14.88 million Packets Per Second (pps), which = means that the system must be able to process one packet every 67.2 ns. = This translates to about 200 clock cycles even for the faster CPUs, and = might be a challenge considering the per- packet overheads normally = involved by general-purpose operating systems. The use of large frames = reduces the pps rate by a factor of 20..50, which is great on end hosts = only concerned in bulk data transfer. Monitoring systems and traffic = generators, however, must be able to deal with worst case conditions.=E2=80= =9D Forwarding and filtering must also be able to deal with worst case, and = nobody does well with kernel-based networking here. = https://github.com/gvnn3/netperf/blob/master/Documentation/Papers/ABSDCon2= 015Paper.pdf 10Gbps NICs are $200-$300 today, and they=E2=80=99ll be included on the = motherboard during the next hardware refresh. Broadwell-DE (Xeon-D) has = 10G in the SoC, and others are coming. 10Gbps switches can be had at around $100/port. This is exactly the = point at which the adoption curve for 1Gbps Ethernet ramped over a = decade ago. (*) A few more simple examples of cruft: Why, in 2015 does the kernel have a =E2=80=98fast forwarding=E2=80=99 = option, and worse, one that isn=E2=80=99t enabled by default? = Shouldn=E2=80=99t =E2=80=9Cfast forwarding" be the default? Why, in 2015, does FreeBSD not ship with IPSEC enabled in GENERIC? = (Reason: each and every time this has come up in recent memory, someone = has pointed out that it impacts performance. = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D128030) Why, in 2015, does anyone think it=E2=80=99s acceptable for =E2=80=9Cfast = forwarding=E2=80=9D to break IPSEC? Why, in 2015, does anyone think it=E2=80=99s acceptable that the = setkey(8) man page documents, of all things, DES-CBC and HMAC-MD5 for a = SA? That=E2=80=99s some kind of sick joke, right? This completely flies in the face of RFC 4835. > On May 4, 2015, at 10:29 AM, Barney Cordoba via freebsd-net = wrote: >=20 > It's not faster than "wedging" into the if_input()s. It simply can't = be. Your getting packets at interrupt time as soon as their processed = and you there's no network stack involved, and your able to receive and = transmit without a process switch. At worst it's the same, without the = extra plumbing. It's not rocket science to "bypass the network stack". > The only advantage of bringing it into user space would be that it's = easier to write threaded handlers for complex uses; but not as a = firewall (which is the limit of the context of my comment). You can do = anything in the kernel that you can do in user space. The reason a = kernel module with if_input() hooks is better is that you can use the = standard kernel without all of the netmap hacks. You can just pop it = into any kernel and it works. > BC=20 >=20 >=20 > On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo = wrote: >=20 >=20 > On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net < > freebsd-net@freebsd.org> wrote: >=20 >> Frankly I'm baffled by netmap. You can easily write a loadable kernel >> module that moves packets from 1 interface to another and hook in the >> firewall; why would you want to bring them up into user space? It's = 1000s >> of lines of unnecessary code. >>=20 >>=20 > Because it is much faster. >=20 > The motivation for netmap-like > solutions (that includes Intel's DPDK, PF_RING/DNA > and several proprietary implementations) is speed: > they bypass the entire network stack, and a > good part of the device drivers, so you can access > packets=20 >=20 > 10+ times faster. > So things are actually the other way around: > the 1000's of unnecessary > lines of code > (not really thousands, though) > are > those that you'd pay going through the standard > network stack > when you > don't need any of its services. >=20 > Going to userspace is just a side effect -- turns out to > be easier to develop and run your packet processing code > in userspace, but there are netmap clients (e.g. the > VALE software switch) which run entirely in the kernel. >=20 > cheers > luigi >=20 >=20 >=20 >>=20 >>=20 >> On Sunday, May 3, 2015 3:10 AM, Raimundo Santos = >> wrote: >>=20 >>=20 >> Clarifying things for the sake of documentation: >>=20 >> To use the host stack, append a ^ character after the name of the = interface >> you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 = RELEASE.) >>=20 >> Examples: >>=20 >> "kipfw em0" does nothing useful. >> "kipfw netmap:em0" disconnects the NIC from the usual data path, = i.e., >> there are no host communications. >> "kipfw netmap:em0 netmap:em0^" or "kipfw netmap:em0+" places the >> netmap-ipfw rules between the NIC and the host stack entry point = associated >> (the IP addresses configured on it with ifconfig, ARP and RARP, = etc...) >> with the same NIC. >>=20 >> On 10 November 2014 at 18:29, Evandro Nunes = >> wrote: >>=20 >>> dear professor luigi, >>> i have some numbers, I am filtering 773Kpps with kipfw using 60% of = CPU >> and >>> system using the rest, this system is a 8core at 2.4Ghz, but only = one >> core >>> is in use >>> in this next round of tests, my NIC is now an avoton with igb(4) = driver, >>> currently with 4 queues per NIC (total 8 queues for kipfw bridge) >>> i have read in your papers we should expect something similar to = 1.48Mpps >>> how can I benefit from the other CPUs which are completely idle? I = tried >>> CPU Affinity (cpuset) kipfw but system CPU usage follows userland = kipfw >> so >>> I could not set one CPU to userland while other for system >>>=20 >>=20 >> All the papers talk about *generating* lots of packets, not = *processing* >> lots of packets. What this netmap example does is processing. If = someone >> really wants to use the host stack, the expected performance WILL BE = worse >> - what's the point of using a host stack bypassing tool/framework if >> someone will end up using the host stack? >>=20 >> And by generating, usually the papers means: minimum sized UDP = packets. >>=20 >>=20 >>>=20 >>> can you please enlighten? >>>=20 >>=20 >> For everyone: read the manuals, read related and indicated materials >> (papers, web sites, etc), and, as a least resource, read the code. = Within >> netmap's codes, it's more easy than it sounds. >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >>=20 >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >=20 >=20 >=20 > --=20 > = -----------------------------------------+------------------------------- > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. = dell'Informazione > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > TEL +39-050-2217533 . via Diotisalvi 2 > Mobile +39-338-6809875 . 56122 PISA (Italy) > = -----------------------------------------+------------------------------- > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 >=20 >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"