From owner-freebsd-net@freebsd.org Fri Jan 8 17:02:38 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B70A8A68F01 for ; Fri, 8 Jan 2016 17:02:38 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-ig0-x22e.google.com (mail-ig0-x22e.google.com [IPv6:2607:f8b0:4001:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8448E11A0 for ; Fri, 8 Jan 2016 17:02:38 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by mail-ig0-x22e.google.com with SMTP id t15so55926254igr.0 for ; Fri, 08 Jan 2016 09:02:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Y6a9E28gzhO14KGf8m7Wyipot1MMFCZydE/bP6LGzxs=; b=ohwwaGcCfC+sdaf9C6iBlDyqQZwBKWMZCbITDkJ/z1f5PyKFWOuJ+Zx3BQnVdIVLW/ F3qd7I0abWVUIRtl5JgjB7Vk+NDpjcYVIRj+qANg2KzdLFKCyZIgVK/gUUpyIQ4Yf4VM EBU5h1ezy5B7db5fjBvdEa/ji0LjfJ9NT+TAPaYNkuHBDVHjA2z6yfY2B6KCsN0Uc6hh vqkvH8idk3uyjSBvxSiPkiBOVzHWAGpXCe6AAkFoBuza9QC5DQhZE62f94INNuOl7axX yeX+zolZ9Z1ayW03TuLrKdkiMiPhQE/1asUPrdZdvBygOT39jVcl6zlvKktkI/2g30Qk kccw== MIME-Version: 1.0 X-Received: by 10.50.136.226 with SMTP id qd2mr22966351igb.37.1452272557975; Fri, 08 Jan 2016 09:02:37 -0800 (PST) Received: by 10.36.121.202 with HTTP; Fri, 8 Jan 2016 09:02:37 -0800 (PST) In-Reply-To: <20160108204606.G2420@besplex.bde.org> References: <20160104101747.58347.qmail@f5-external.bushwire.net> <20160104194044.GD3625@kib.kiev.ua> <20160104210741.32812.qmail@f5-external.bushwire.net> <20160107161213.GZ3625@kib.kiev.ua> <20160107192840.GF3625@kib.kiev.ua> <20160108172323.W1815@besplex.bde.org> <20160108075815.3243.qmail@f5-external.bushwire.net> <20160108204606.G2420@besplex.bde.org> Date: Fri, 8 Jan 2016 09:02:37 -0800 Message-ID: Subject: Re: Does FreeBSD have sendmmsg or recvmmsg system calls? From: Adrian Chadd To: Bruce Evans Cc: Mark Delany , FreeBSD Net Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jan 2016 17:02:38 -0000 On 8 January 2016 at 03:02, Bruce Evans wrote: > On Fri, 8 Jan 2016, Adrian Chadd wrote: > >> On 7 January 2016 at 23:58, Mark Delany wrote: >>> >>> On 08Jan16, Bruce Evans allegedly wrote: >>>> >>>> If the NIC can't reach line rate >>> >>> >>>> Network stack overheads are also enormous. >>> >>> >>> Bruce makes some excellent points. >>> >>> I challenge anyone to get line rate UDP out of FBSD (or Linux) for a >>> 1G NIC yet alone a 10G NIC listening to a single port. It was exactly >>> my frustration with UDP performance that led me down the path of >>> *mmsg() and netmap. >>> >>> Frankly this is an opportunity for FBSD as UDP performance appears to >>> be a neglected area. >> >> >> I'm there, on 16 threads. >> >> I'd rather we do it on two or three, as a lot of time is wasted in >> producer/consumer locking. but yeah, 500k tx/rx should be doable per >> CPU with only locking changes. .. and I did mean "kernel producer/consumer locking changes." > > Line rate for 1 Gbps is about 1500 kpps (small packets). > > With I218V2 (em), I see enormous lock contention above 3 or 4 (user) > threads, and 8 are slightly slower than 1. 1 doesn't saturate the NIC, > and 2 is optimal. > The RSS support in -HEAD lets you get away with parallelising UDP streams very nicely. The framework is pretty simple (!): * drivers ask the RSS code for the RSS config and RSS hash to use, and configure the hardware appropriately; * the netisr input paths check the existence of the RSS hash and will calculte it in software if reqiured; * v4/v6 reassembly is done (at the IP level, /not/ at the protocol level) and if it needs a new RSS hash / netisr reinjection, that'll happen; * the PCB lookup code for listen sockets now allows one listen socket per RSS bucket - as the RSS / PCBGROUPS code already extended the PCB to have one PCB table per RSS bucket (as well as a global one); So: * userland code queries RSS for the CPU and RSS bucket setup; * you then create one listen socket per RSS bucket, bind it to the local thread (if you want) and tell it "you're in RSS bucket X"; * .. and then in the UDP case for local-bound sockets, the transmit/receive path does not require modifying the global PCB state, so the locking is kept per-RSS bucket, and scales linearly with the number of CPUs you have (until you hit the NIC queue limits.) https://github.com/erikarn/freebsd-rss/ and: http://adrianchadd.blogspot.com/2014/06/hacking-on-receive-side-scaling-rss-on.html http://adrianchadd.blogspot.com/2014/07/application-awareness-of-receive-side.html http://adrianchadd.blogspot.com/2014/08/receive-side-scaling-figuring-out-how.html http://adrianchadd.blogspot.com/2014/09/receive-side-scaling-testing-udp.html http://adrianchadd.blogspot.com/2014/10/more-rss-udp-tests-this-time-on-dell.html -adrian