From owner-freebsd-net@FreeBSD.ORG Mon Dec 2 04:29:25 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1A36C2B; Mon, 2 Dec 2013 04:29:25 +0000 (UTC) Received: from mail-pd0-x22d.google.com (mail-pd0-x22d.google.com [IPv6:2607:f8b0:400e:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 6026E6F4D; Mon, 2 Dec 2013 04:29:25 +0000 (UTC) Received: by mail-pd0-f173.google.com with SMTP id p10so17221296pdj.4 for ; Sun, 01 Dec 2013 20:29:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=NaCGDx3HFUa3c+mKizZv7n3HQlfVnUN6VxQjv0+vs+c=; b=aZgFITjM/m7qc1pCJvCyjNfiGheQPjkmIs8FIqZnwk1kZcX7CBxNFO2a2n6sKbUcsq +IAiOMwwnJLZcPEq+ZveIJjTvzIgmUsDBFEkzsoSr2lLC9DIfqOT8CTLh+QAqNrF/4Bp +DR0Yus61LH+pLD9Jf5Esvg5BqDtEQ9WOI6tMls3Cf5lDLR2xjdzkMA36QcO7SQp3Mwp 8cJwI2+yuwGz80so1gUfpyhUy5ZngyGlAmnxrVqTlJUdJqK2Okk8T49Sqh+3egAN6BYn XwilHYr6hV4GKsqaN6Te+/kjRDd1VZ/c48kEhNwS03D/RriFq0MjCKWPhTs+JKXynj9x 77zg== MIME-Version: 1.0 X-Received: by 10.68.254.164 with SMTP id aj4mr1231772pbd.161.1385958564133; Sun, 01 Dec 2013 20:29:24 -0800 (PST) Received: by 10.68.147.131 with HTTP; Sun, 1 Dec 2013 20:29:24 -0800 (PST) In-Reply-To: References: <4053E074-EDC5-49AB-91A7-E50ABE36602E@freebsd.org> Date: Sun, 1 Dec 2013 20:29:24 -0800 Message-ID: Subject: Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour From: Oleg Moskalenko To: Sepherosa Ziehau Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.16 Cc: =?ISO-8859-1?Q?Ermal_Lu=E7i?= , freebsd-net , Tim Kientzle , "freebsd-current@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Dec 2013 04:29:25 -0000 Sepherosa, while reading your description I noticed another long-standing problem for UDP application developers: the UDP sockets are always hashed with 2-tuple. But UDP sockets can be "connected", too, to a remote address, with connect(...) function. Unfortunately, with 2-tuple hashing, that pattern is useless for large-scale applications: if a large number of UDP sockets on the same local port are "connected" to remote address, then the kernel have to go thru the long list of UDP sockets with the same hash value. If the connected UDP sockets would use 4-tuples, then it would be very helpful for the new generation of the UDP-based media applications. For example, servers which use DTLS protocol would become simpler and more efficient. Thanks Oleg On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau wrote= : > > > > On Sat, Nov 30, 2013 at 2:42 AM, Ermal Lu=E7i wrote: > >> Well seems Dragonfly has some version of it already from commit [1]. >> >> > The distribution algorithm was changed a little bit after initial commit > to gain more idle time (bnx(4) output has already been maxed out): > > http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be2= 8b150d3f4fd518914bdeba6 > > Well, I also addressed a reasonable concern from nginx folks (I am not > quite sure about Linux's position on it; Linux original implementation of > SO_REUSEPORT from Google had this drawback, which I mentioned in the comm= it > message): > > http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a45= eb69750219f79f5e8982272 > > As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is in > dports; should be easier to be back ported to FreeBSD's ports. I failed = to > convince nginx folks to merge it into mainline and I am currently onto > other stuffs, will come back to them later. If FreeBSD is going to > implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx > mainline will be easier. > > I also put up a brief description of SO_REUSEPORT in dfly; may be useful > to you: > http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt > > Best Regards, > sephe > > >> In FreeBSD there is the framework for this with by defining PCBGROUP. >> Also the explanation of it at [2] and [3]. >> It can achieve approximately the same features of SO_RESUSEPORT of linux= . >> The only thing missing is the marketing behind it and i think and better >> RSS support. >> By looking at dates the support is there before linux so all you guys >> looking for it can experiment with it. >> >> What i was trying to accomplish was something else from performance >> improvement and >> maybe put a sysctl behind it to make it more acceptable.. >> >> [1] >> >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9= c021abb8197718d7a2d441c9 >> [2] >> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=3Dbigexcerpts#= L51 >> [3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.htm= l >> >> >> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko > >wrote: >> >> > Tim, you are wrong. Read what is "multicast" definition, and read how >> UDP >> > and TCP sockets work in Linux 3.9+ kernels. >> > >> > Oleg . >> > >> > >> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle > >wrote: >> > >> >> >> >> On Nov 29, 2013, at 4:04 AM, Ermal Lu=E7i wrote: >> >> >> >> > Hello, >> >> > >> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two >> daemons to >> >> > share the same port and possibly listening ip =85 >> >> >> >> These flags are used with TCP-based servers. >> >> >> >> I=92ve used them to make software upgrades go more smoothly. >> >> Without them, the following often happens: >> >> >> >> * Old server stops. In the process, all of its TCP connections are >> >> closed. >> >> >> >> * Connections to old server remain in the TCP connection table until >> the >> >> remote end can acknowledge. >> >> >> >> * New server starts. >> >> >> >> * New server tries to open port but fails because that port is =93sti= ll >> in >> >> use=94 by connections in the TCP connection table. >> >> >> >> With these flags, the new server can open the port even though >> >> it is =93still in use=94 by existing connections. >> >> >> >> >> >> > This is not the case today. >> >> > Only multicast sockets seem to have the behaviour of broadcasting t= he >> >> data >> >> > to all sockets sharing the same properties through these options! >> >> >> >> That is what multicast is for. >> >> >> >> If you want the same data sent to all listeners, then >> >> that is multicast behavior and you should be using >> >> a multicast socket. >> >> >> >> > The patch at [1] implements/corrects the behaviour for UDP sockets. >> >> >> >> You=92re trying to turn all UDP sockets with those options >> >> into multicast sockets. >> >> >> >> If you want a multicast socket, you should ask for one. >> >> >> >> Tim >> >> >> >> _______________________________________________ >> >> freebsd-net@freebsd.org mailing list >> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org= " >> >> >> > >> > >> >> >> -- >> Ermal >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.or= g >> " >> > > > > -- > Tomorrow Will Never Die >