From owner-freebsd-net@FreeBSD.ORG Sat Dec 1 08:28:26 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CC19BBBD for ; Sat, 1 Dec 2012 08:28:26 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 3CD788FC13 for ; Sat, 1 Dec 2012 08:28:25 +0000 (UTC) Received: (qmail 77664 invoked from network); 1 Dec 2012 09:59:16 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 1 Dec 2012 09:59:16 -0000 Message-ID: <50B9BF95.2040103@freebsd.org> Date: Sat, 01 Dec 2012 09:28:05 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Keith Arner Subject: Re: Problems with ephemeral port selection References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Dec 2012 08:28:26 -0000 On 30.11.2012 15:09, Keith Arner wrote: > I've noticed some issues with ephemeral port number selection from > tcp_connect(), which limit the number of concurrent, outgoing connections > that can be established (connect(), rather than accept()). Sifting through > the source code, I believe the issuess stem from two problems in the > tcp_connect() code path. Specifically: > > 1) The wrong function gets called to determine if a given ephemeral > port number is currently usable. > 2) The ephemeral port number gets selected without considering the > foreign addr/port. > > Curiously, the effect of #1 mostly cancels the effect of #2, such that > the common calling convention gives you a correct result so long as you > only have a small number of outgoing connections. However, once you get to > a large number of outgoing connections, things start to break down. (I'll > define large and small later.) > > As a side note, I have been working with FreeBSD 7.2. The implementations > of several of the relevant functions have been refactored somewhere between > 7.2-RELEASE and 9-STABLE, but the core problems in the logic seem to be > the same between versions. > > For problem #1, the code path that selects the ephemeral port number is: > tcp_connect() -> > in_pcbbind() -> > in_pcbbind_setup() -> > in_pcb_lport() [not in FreeBSD 7.2] -> > in_pcblookup_local() > > There is a loop in in_pcb_lport() [or directly in in_pcbbind_setup() in > earlier releases] that considers candidate ephemeral port numbers and > calls in_pcblookup_local() to determine if a given candidate is suitable. > The default behaviour (if the caller has not set either SO_REUSEADDR or > SO_REUSEPORT) is to pick a local port number that is not in use by > *any* local TCP socket. > > So long as the number of concurrent, outgoing connections is less than the > range configured by `sysctl net.inet.ip.portrange.*`, selecting a totally > unique ephemeral port number works OK. However, you cannot exceed that > limit, even if each outgoing connection has a unique faddr/fport. This > does not limit the number of connections that can be accept()'ed, only the > number of connections that can be connect()'ed. > > In this particular path, I think the code should call in_pcblookup_hash(), > rather than in_pcblookup_local(). The criteria in in_pcblookup_hash() only > match if the full 5-tuple matches, rather than just the local port number. > The complication, of course, comes from the fact that in_pcbbind() is > called from both bind() and for the implicit bind that happens for a > connect(). The matching criteria in in_pcblookup_local() make sense for > the former but not quite for the later. > > I mentioned that the above is the default behaviour you get when you don't > specify SO_REUSEADDR or SO_REUSEPORT. Setting SO_REUSEADDR > before calling connect() has some surprizing consequences (surprizing in the > sense that I don't believe SO_REUSEADDR is supposed to have any effect > on connect()). In this case, when in_pcblookup_local() is called, wild_okay > is set to false. This changes the matching criteria to (in effect) allow > tcp_connect() to use the full 5-tuple space. However, this brings us to the > second problem. > > Problem #2 is that the ephemeral port number is chosen before the > fport/faddr gets set on the pcb; that is tcp_connect() calls in_pcbbind() to > select the ephemeral port number, *then* calls in_pcbconnect_setup() to > populate the fport/faddr. With SO_REUSEADDR, in_pcbbind() can select > an in-use local port. If the local port is used by a socket with a different > laddr/fport/faddr, all is good. However, if the local port selection > results in a > full conflict it will get rejected by the call to in_pcblookup_hash() inside > in_pcbconnect_setup(). This happens *after* the loop inside > in_pcbbind(), so the call to tcp_connect() fails with EADDRINUSE. Thus, > with SO_REUSEADDR, connect() can fail with EADDRINUSE long before > the ephemeral port space has been exhausted. The application could re-try > the call to connect() and likely succeed, as a new local port would be > selected. > > Overall, this behaviour hinders the ability to open a large number of > outbound connections: > * If you don't specify SO_REUSEADDR, you have a fairly limited maximum > number of outbound connections. > * If you do specify SO_REUSEADDR, you are able to open a much larger > number of outbound connections, but must retry on EADDRINUSE. > > I believe that the logic under tcp_connect() should be modified to: > > - behave uniformly whether or not SO_REUSEADDR has been set > - allow outgoing connection requests to re-use a local port number, so > long as the remaining elements of the tuple (laddr, fport, faddr) are > unique Keith, this is an excellent analysis. Could you please file it as a problem report too and post the PR-number here so we can better track it? Thank you. -- Andre