Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Dec 2013 19:36:45 +0800
From:      Sepherosa Ziehau <sepherosa@gmail.com>
To:        Oleg Moskalenko <mom040267@gmail.com>
Cc:        =?ISO-8859-1?Q?Ermal_Lu=E7i?= <eri@freebsd.org>, freebsd-net <freebsd-net@freebsd.org>, Tim Kientzle <kientzle@freebsd.org>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Subject:   Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour
Message-ID:  <CAMOc5cyGZNTwc=gT868J6S=ebEARbTV4%2B0o7aAhwhygSP-Z6aQ@mail.gmail.com>
In-Reply-To: <CALDtMrLgm-D30u8HWWF=sVda0h4QtYdyiGHpYPw1kfTWbMbJ6Q@mail.gmail.com>
References:  <CAPBZQG29BEJJ8BK=gn%2Bg_n5o7JSnPbsKQ-=3=6AkFOxzt%2B=wGQ@mail.gmail.com> <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> <CALDtMrKvwXW-ou8X7zsKx2ST=dKD7FqHvvnQtGo30znTWU%2BVQQ@mail.gmail.com> <CAPBZQG0=bcHyv7aZse=WKfjk5=6D2-%2B6EQHiAaDZqGtaodhMMA@mail.gmail.com> <CAMOc5cwFGwk0dS5VT-YxfP3Yt38R8aO-KJTX6W832uOFEdavgA@mail.gmail.com> <CALDtMrLgm-D30u8HWWF=sVda0h4QtYdyiGHpYPw1kfTWbMbJ6Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 2, 2013 at 12:29 PM, Oleg Moskalenko <mom040267@gmail.com>wrote=
:

> Sepherosa, while reading your description I noticed another long-standing
> problem for UDP application developers: the UDP sockets are always hashed
> with 2-tuple. But UDP sockets can be "connected", too, to a remote addres=
s,
> with connect(...)
>

The connected UDP sockets will be in connect hash, which is hashed using
faddr/laddr/fport/lport.  SO_REUSEPORT only affects wildcard sockets.


> function. Unfortunately, with 2-tuple hashing, that pattern is useless fo=
r
> large-scale applications: if a large number of UDP sockets on the same
> local port are "connected" to remote address, then the kernel have to go
> thru the long list of UDP sockets with the same hash value.
>
> If the connected UDP sockets would use 4-tuples, then it would be very
> helpful for the new generation of the UDP-based media applications. For
> example, servers which use DTLS protocol would become simpler and more
> efficient.
>
>
If you are talking about RSS, then igb, ixgbe and mxge (and may be other
drivers) support RSS extension (mxge is not using RSS, but still 4-tuple
hash), which will include UDP fport/lport into Toeplitz hash calculation.
Well, for fragments of a UDP datagram, if the ports are taken into
consideration the RSS hash will be different for leading fragment and rest
of the fragments; I think that's why MS didn't include ports for UDP.

Best Regards,
sephe


> Thanks
> Oleg
>
>
>
> On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau <sepherosa@gmail.com>wro=
te:
>
>>
>>
>>
>> On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i <eri@freebsd.org> wrote:
>>
>>> Well seems Dragonfly has some version of it already from commit [1].
>>>
>>>
>> The distribution algorithm was changed a little bit after initial commit
>> to gain more idle time (bnx(4) output has already been maxed out):
>>
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be=
28b150d3f4fd518914bdeba6
>>
>> Well, I also addressed a reasonable concern from nginx folks (I am not
>> quite sure about Linux's position on it; Linux original implementation o=
f
>> SO_REUSEPORT from Google had this drawback, which I mentioned in the com=
mit
>> message):
>>
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a4=
5eb69750219f79f5e8982272
>>
>> As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is i=
n
>> dports; should be easier to be back ported to FreeBSD's ports.  I failed=
 to
>> convince nginx folks to merge it into mainline and I am currently onto
>> other stuffs, will come back to them later.  If FreeBSD is going to
>> implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx
>> mainline will be easier.
>>
>> I also put up a brief description of SO_REUSEPORT in dfly; may be useful
>> to you:
>> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt
>>
>> Best Regards,
>> sephe
>>
>>
>>>  In FreeBSD there is the framework for this with by defining PCBGROUP.
>>> Also the explanation of it at [2] and [3].
>>> It can achieve approximately the same features of SO_RESUSEPORT of linu=
x.
>>> The only thing missing is the marketing behind it and i think and bette=
r
>>> RSS support.
>>> By looking at dates the support is there before linux so all you guys
>>> looking for it can experiment with it.
>>>
>>> What i was trying to accomplish was something else from performance
>>> improvement and
>>> maybe put a sysctl behind it to make it more acceptable..
>>>
>>> [1]
>>>
>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c=
9c021abb8197718d7a2d441c9
>>> [2]
>>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts=
#L51
>>> [3]
>>> http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html
>>>
>>>
>>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko <mom040267@gmail.com
>>> >wrote:
>>>
>>> > Tim, you are wrong. Read what is "multicast" definition, and read how
>>> UDP
>>> > and TCP sockets work in Linux 3.9+ kernels.
>>> >
>>> > Oleg .
>>> >
>>> >
>>> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle <kientzle@freebsd.org
>>> >wrote:
>>> >
>>> >>
>>> >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i <eri@freebsd.org> wrote:
>>> >>
>>> >> > Hello,
>>> >> >
>>> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two
>>> daemons to
>>> >> > share the same port and possibly listening ip =85
>>> >>
>>> >> These flags are used with TCP-based servers.
>>> >>
>>> >> I=92ve used them to make software upgrades go more smoothly.
>>> >> Without them, the following often happens:
>>> >>
>>> >> * Old server stops.  In the process, all of its TCP connections are
>>> >> closed.
>>> >>
>>> >> * Connections to old server remain in the TCP connection table until
>>> the
>>> >> remote end can acknowledge.
>>> >>
>>> >> * New server starts.
>>> >>
>>> >> * New server tries to open port but fails because that port is =93st=
ill
>>> in
>>> >> use=94 by connections in the TCP connection table.
>>> >>
>>> >> With these flags, the new server can open the port even though
>>> >> it is =93still in use=94 by existing connections.
>>> >>
>>> >>
>>> >> > This is not the case today.
>>> >> > Only multicast sockets seem to have the behaviour of broadcasting
>>> the
>>> >> data
>>> >> > to all sockets sharing the same properties through these options!
>>> >>
>>> >> That is what multicast is for.
>>> >>
>>> >> If you want the same data sent to all listeners, then
>>> >> that is multicast behavior and you should be using
>>> >> a multicast socket.
>>> >>
>>> >> > The patch at [1] implements/corrects the behaviour for UDP sockets=
.
>>> >>
>>> >> You=92re trying to turn all UDP sockets with those options
>>> >> into multicast sockets.
>>> >>
>>> >> If you want a multicast socket, you should ask for one.
>>> >>
>>> >> Tim
>>> >>
>>> >> _______________________________________________
>>> >> freebsd-net@freebsd.org mailing list
>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or=
g
>>> "
>>> >>
>>> >
>>> >
>>>
>>>
>>> --
>>> Ermal
>>> _______________________________________________
>>> freebsd-current@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "
>>> freebsd-current-unsubscribe@freebsd.org"
>>>
>>
>>
>>
>> --
>> Tomorrow Will Never Die
>>
>
>


--=20
Tomorrow Will Never Die



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMOc5cyGZNTwc=gT868J6S=ebEARbTV4%2B0o7aAhwhygSP-Z6aQ>