Date: Mon, 2 Dec 2013 19:36:45 +0800 From: Sepherosa Ziehau <sepherosa@gmail.com> To: Oleg Moskalenko <mom040267@gmail.com> Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= <eri@freebsd.org>, freebsd-net <freebsd-net@freebsd.org>, Tim Kientzle <kientzle@freebsd.org>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org> Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour Message-ID: <CAMOc5cyGZNTwc=gT868J6S=ebEARbTV4%2B0o7aAhwhygSP-Z6aQ@mail.gmail.com> In-Reply-To: <CALDtMrLgm-D30u8HWWF=sVda0h4QtYdyiGHpYPw1kfTWbMbJ6Q@mail.gmail.com> References: <CAPBZQG29BEJJ8BK=gn%2Bg_n5o7JSnPbsKQ-=3=6AkFOxzt%2B=wGQ@mail.gmail.com> <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> <CALDtMrKvwXW-ou8X7zsKx2ST=dKD7FqHvvnQtGo30znTWU%2BVQQ@mail.gmail.com> <CAPBZQG0=bcHyv7aZse=WKfjk5=6D2-%2B6EQHiAaDZqGtaodhMMA@mail.gmail.com> <CAMOc5cwFGwk0dS5VT-YxfP3Yt38R8aO-KJTX6W832uOFEdavgA@mail.gmail.com> <CALDtMrLgm-D30u8HWWF=sVda0h4QtYdyiGHpYPw1kfTWbMbJ6Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 2, 2013 at 12:29 PM, Oleg Moskalenko <mom040267@gmail.com>wrote= : > Sepherosa, while reading your description I noticed another long-standing > problem for UDP application developers: the UDP sockets are always hashed > with 2-tuple. But UDP sockets can be "connected", too, to a remote addres= s, > with connect(...) > The connected UDP sockets will be in connect hash, which is hashed using faddr/laddr/fport/lport. SO_REUSEPORT only affects wildcard sockets. > function. Unfortunately, with 2-tuple hashing, that pattern is useless fo= r > large-scale applications: if a large number of UDP sockets on the same > local port are "connected" to remote address, then the kernel have to go > thru the long list of UDP sockets with the same hash value. > > If the connected UDP sockets would use 4-tuples, then it would be very > helpful for the new generation of the UDP-based media applications. For > example, servers which use DTLS protocol would become simpler and more > efficient. > > If you are talking about RSS, then igb, ixgbe and mxge (and may be other drivers) support RSS extension (mxge is not using RSS, but still 4-tuple hash), which will include UDP fport/lport into Toeplitz hash calculation. Well, for fragments of a UDP datagram, if the ports are taken into consideration the RSS hash will be different for leading fragment and rest of the fragments; I think that's why MS didn't include ports for UDP. Best Regards, sephe > Thanks > Oleg > > > > On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau <sepherosa@gmail.com>wro= te: > >> >> >> >> On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i <eri@freebsd.org> wrote: >> >>> Well seems Dragonfly has some version of it already from commit [1]. >>> >>> >> The distribution algorithm was changed a little bit after initial commit >> to gain more idle time (bnx(4) output has already been maxed out): >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be= 28b150d3f4fd518914bdeba6 >> >> Well, I also addressed a reasonable concern from nginx folks (I am not >> quite sure about Linux's position on it; Linux original implementation o= f >> SO_REUSEPORT from Google had this drawback, which I mentioned in the com= mit >> message): >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a4= 5eb69750219f79f5e8982272 >> >> As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is i= n >> dports; should be easier to be back ported to FreeBSD's ports. I failed= to >> convince nginx folks to merge it into mainline and I am currently onto >> other stuffs, will come back to them later. If FreeBSD is going to >> implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx >> mainline will be easier. >> >> I also put up a brief description of SO_REUSEPORT in dfly; may be useful >> to you: >> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt >> >> Best Regards, >> sephe >> >> >>> In FreeBSD there is the framework for this with by defining PCBGROUP. >>> Also the explanation of it at [2] and [3]. >>> It can achieve approximately the same features of SO_RESUSEPORT of linu= x. >>> The only thing missing is the marketing behind it and i think and bette= r >>> RSS support. >>> By looking at dates the support is there before linux so all you guys >>> looking for it can experiment with it. >>> >>> What i was trying to accomplish was something else from performance >>> improvement and >>> maybe put a sysctl behind it to make it more acceptable.. >>> >>> [1] >>> >>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c= 9c021abb8197718d7a2d441c9 >>> [2] >>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts= #L51 >>> [3] >>> http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html >>> >>> >>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko <mom040267@gmail.com >>> >wrote: >>> >>> > Tim, you are wrong. Read what is "multicast" definition, and read how >>> UDP >>> > and TCP sockets work in Linux 3.9+ kernels. >>> > >>> > Oleg . >>> > >>> > >>> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle <kientzle@freebsd.org >>> >wrote: >>> > >>> >> >>> >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i <eri@freebsd.org> wrote: >>> >> >>> >> > Hello, >>> >> > >>> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two >>> daemons to >>> >> > share the same port and possibly listening ip =85 >>> >> >>> >> These flags are used with TCP-based servers. >>> >> >>> >> I=92ve used them to make software upgrades go more smoothly. >>> >> Without them, the following often happens: >>> >> >>> >> * Old server stops. In the process, all of its TCP connections are >>> >> closed. >>> >> >>> >> * Connections to old server remain in the TCP connection table until >>> the >>> >> remote end can acknowledge. >>> >> >>> >> * New server starts. >>> >> >>> >> * New server tries to open port but fails because that port is =93st= ill >>> in >>> >> use=94 by connections in the TCP connection table. >>> >> >>> >> With these flags, the new server can open the port even though >>> >> it is =93still in use=94 by existing connections. >>> >> >>> >> >>> >> > This is not the case today. >>> >> > Only multicast sockets seem to have the behaviour of broadcasting >>> the >>> >> data >>> >> > to all sockets sharing the same properties through these options! >>> >> >>> >> That is what multicast is for. >>> >> >>> >> If you want the same data sent to all listeners, then >>> >> that is multicast behavior and you should be using >>> >> a multicast socket. >>> >> >>> >> > The patch at [1] implements/corrects the behaviour for UDP sockets= . >>> >> >>> >> You=92re trying to turn all UDP sockets with those options >>> >> into multicast sockets. >>> >> >>> >> If you want a multicast socket, you should ask for one. >>> >> >>> >> Tim >>> >> >>> >> _______________________________________________ >>> >> freebsd-net@freebsd.org mailing list >>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or= g >>> " >>> >> >>> > >>> > >>> >>> >>> -- >>> Ermal >>> _______________________________________________ >>> freebsd-current@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to " >>> freebsd-current-unsubscribe@freebsd.org" >>> >> >> >> >> -- >> Tomorrow Will Never Die >> > > --=20 Tomorrow Will Never Die
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMOc5cyGZNTwc=gT868J6S=ebEARbTV4%2B0o7aAhwhygSP-Z6aQ>