From owner-freebsd-current@FreeBSD.ORG Mon Dec 2 11:36:48 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7E21C914; Mon, 2 Dec 2013 11:36:48 +0000 (UTC) Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com [IPv6:2a00:1450:4010:c04::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5CD4B1C65; Mon, 2 Dec 2013 11:36:47 +0000 (UTC) Received: by mail-lb0-f174.google.com with SMTP id c11so8403816lbj.33 for ; Mon, 02 Dec 2013 03:36:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qTU7Q/AIp9kHrcHK+lK70NkoSLxkcH8Ceb88oF4YTGA=; b=dCpyXHnM47pnGYFAmZzhhNqqcVbBZKgxmy0+ScCJbwPoQVPZ4pJKSqKtMo2MSmqWVg IV4mCZKr4iV+zUFDukIiRJVpQ/xWAY7hmU54XyKotPV8XpyWT5v1KsgqDu8IKIfNWYhB UqTzc/Ut+WyYlBneXatwFu98v43A/jYH6h+/RD5u34v7GEOHhvtxY0zSH9kZ7cdzJFT5 Y37mcRR8XvjkAXkAkbdy42TMG744MFBs4/JoPkjM28KOfUZO5LJ4MuXL28jidE3z2Dx/ kUKh/d2TAD6TObUcgacb1lSoOyus6k5Akz4Kf0xZ03bjOXRMjJMYe3HkHXb3iutseAae 4s1w== MIME-Version: 1.0 X-Received: by 10.152.28.230 with SMTP id e6mr39187123lah.3.1385984205170; Mon, 02 Dec 2013 03:36:45 -0800 (PST) Received: by 10.114.166.163 with HTTP; Mon, 2 Dec 2013 03:36:45 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Mon, 2 Dec 2013 19:36:45 +0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Sepherosa Ziehau To: Oleg Moskalenko Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 11:36:48 -0000 On Mon, Dec 2, 2013 at 12:29 PM, Oleg Moskalenko wrote= : > Sepherosa, while reading your description I noticed another long-standing > problem for UDP application developers: the UDP sockets are always hashed > with 2-tuple. But UDP sockets can be "connected", too, to a remote addres= s, > with connect(...) > The connected UDP sockets will be in connect hash, which is hashed using faddr/laddr/fport/lport. SO_REUSEPORT only affects wildcard sockets. > function. Unfortunately, with 2-tuple hashing, that pattern is useless fo= r > large-scale applications: if a large number of UDP sockets on the same > local port are "connected" to remote address, then the kernel have to go > thru the long list of UDP sockets with the same hash value. > > If the connected UDP sockets would use 4-tuples, then it would be very > helpful for the new generation of the UDP-based media applications. For > example, servers which use DTLS protocol would become simpler and more > efficient. > > If you are talking about RSS, then igb, ixgbe and mxge (and may be other drivers) support RSS extension (mxge is not using RSS, but still 4-tuple hash), which will include UDP fport/lport into Toeplitz hash calculation. Well, for fragments of a UDP datagram, if the ports are taken into consideration the RSS hash will be different for leading fragment and rest of the fragments; I think that's why MS didn't include ports for UDP. Best Regards, sephe > Thanks > Oleg > > > > On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau wro= te: > >> >> >> >> On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i wrote: >> >>> Well seems Dragonfly has some version of it already from commit [1]. >>> >>> >> The distribution algorithm was changed a little bit after initial commit >> to gain more idle time (bnx(4) output has already been maxed out): >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be= 28b150d3f4fd518914bdeba6 >> >> Well, I also addressed a reasonable concern from nginx folks (I am not >> quite sure about Linux's position on it; Linux original implementation o= f >> SO_REUSEPORT from Google had this drawback, which I mentioned in the com= mit >> message): >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a4= 5eb69750219f79f5e8982272 >> >> As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is i= n >> dports; should be easier to be back ported to FreeBSD's ports. I failed= to >> convince nginx folks to merge it into mainline and I am currently onto >> other stuffs, will come back to them later. If FreeBSD is going to >> implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx >> mainline will be easier. >> >> I also put up a brief description of SO_REUSEPORT in dfly; may be useful >> to you: >> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt >> >> Best Regards, >> sephe >> >> >>> In FreeBSD there is the framework for this with by defining PCBGROUP. >>> Also the explanation of it at [2] and [3]. >>> It can achieve approximately the same features of SO_RESUSEPORT of linu= x. >>> The only thing missing is the marketing behind it and i think and bette= r >>> RSS support. >>> By looking at dates the support is there before linux so all you guys >>> looking for it can experiment with it. >>> >>> What i was trying to accomplish was something else from performance >>> improvement and >>> maybe put a sysctl behind it to make it more acceptable.. >>> >>> [1] >>> >>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c= 9c021abb8197718d7a2d441c9 >>> [2] >>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts= #L51 >>> [3] >>> http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html >>> >>> >>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko >> >wrote: >>> >>> > Tim, you are wrong. Read what is "multicast" definition, and read how >>> UDP >>> > and TCP sockets work in Linux 3.9+ kernels. >>> > >>> > Oleg . >>> > >>> > >>> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle >> >wrote: >>> > >>> >> >>> >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i wrote: >>> >> >>> >> > Hello, >>> >> > >>> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two >>> daemons to >>> >> > share the same port and possibly listening ip =85 >>> >> >>> >> These flags are used with TCP-based servers. >>> >> >>> >> I=92ve used them to make software upgrades go more smoothly. >>> >> Without them, the following often happens: >>> >> >>> >> * Old server stops. In the process, all of its TCP connections are >>> >> closed. >>> >> >>> >> * Connections to old server remain in the TCP connection table until >>> the >>> >> remote end can acknowledge. >>> >> >>> >> * New server starts. >>> >> >>> >> * New server tries to open port but fails because that port is =93st= ill >>> in >>> >> use=94 by connections in the TCP connection table. >>> >> >>> >> With these flags, the new server can open the port even though >>> >> it is =93still in use=94 by existing connections. >>> >> >>> >> >>> >> > This is not the case today. >>> >> > Only multicast sockets seem to have the behaviour of broadcasting >>> the >>> >> data >>> >> > to all sockets sharing the same properties through these options! >>> >> >>> >> That is what multicast is for. >>> >> >>> >> If you want the same data sent to all listeners, then >>> >> that is multicast behavior and you should be using >>> >> a multicast socket. >>> >> >>> >> > The patch at [1] implements/corrects the behaviour for UDP sockets= . >>> >> >>> >> You=92re trying to turn all UDP sockets with those options >>> >> into multicast sockets. >>> >> >>> >> If you want a multicast socket, you should ask for one. >>> >> >>> >> Tim >>> >> >>> >> _______________________________________________ >>> >> freebsd-net@freebsd.org mailing list >>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or= g >>> " >>> >> >>> > >>> > >>> >>> >>> -- >>> Ermal >>> _______________________________________________ >>> freebsd-current@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to " >>> freebsd-current-unsubscribe@freebsd.org" >>> >> >> >> >> -- >> Tomorrow Will Never Die >> > > --=20 Tomorrow Will Never Die