From owner-freebsd-net@FreeBSD.ORG Fri Nov 30 14:09:10 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D3CCB9FC for ; Fri, 30 Nov 2012 14:09:10 +0000 (UTC) (envelope-from keith.arner@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6055E8FC08 for ; Fri, 30 Nov 2012 14:09:10 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id u54so196138wey.13 for ; Fri, 30 Nov 2012 06:09:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=u1jCHVz//jT3aUWmvwl1+sPVG9rJNWfLUygbcdeAODI=; b=gWw/YMMewNsINY3zjfONDQJsEsJgPsLIcG9CJRy1xu55eev79RUwl0WcbbSIrMcMvx c6mHqDRA0Y8OXexekhZWceUNGIWo9Y167fdrarlTKNhiNPn/RaUBhkul78cCbKel2YiV YTeoNh9wOXhtJf2EazXI0fCsIeZCdE9ItfVS59SLxfF0GkxIVAgFHimxxvBT+GEuD7Yw yPkk5hCIilhSH7cE5BCtgAPmF3SB1rHDZLtff1arD96gixHYt5+m+7N8Nh0W04r9L9Sy nZ1CZoZsQ5pAPWlzoFED7fU9KtmETaS+wcVhmWynm+VvWQjKnysJggoDdpIjpa/CEpF1 ZDkQ== MIME-Version: 1.0 Received: by 10.216.228.20 with SMTP id e20mr496023weq.166.1354284549098; Fri, 30 Nov 2012 06:09:09 -0800 (PST) Sender: keith.arner@gmail.com Received: by 10.216.123.129 with HTTP; Fri, 30 Nov 2012 06:09:08 -0800 (PST) Date: Fri, 30 Nov 2012 09:09:08 -0500 X-Google-Sender-Auth: t8jLyy67pM5nYec6Jy5VZgiZlTg Message-ID: Subject: Problems with ephemeral port selection From: Keith Arner To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Nov 2012 14:09:10 -0000 I've noticed some issues with ephemeral port number selection from tcp_connect(), which limit the number of concurrent, outgoing connections that can be established (connect(), rather than accept()). Sifting through the source code, I believe the issuess stem from two problems in the tcp_connect() code path. Specifically: 1) The wrong function gets called to determine if a given ephemeral port number is currently usable. 2) The ephemeral port number gets selected without considering the foreign addr/port. Curiously, the effect of #1 mostly cancels the effect of #2, such that the common calling convention gives you a correct result so long as you only have a small number of outgoing connections. However, once you get to a large number of outgoing connections, things start to break down. (I'll define large and small later.) As a side note, I have been working with FreeBSD 7.2. The implementations of several of the relevant functions have been refactored somewhere between 7.2-RELEASE and 9-STABLE, but the core problems in the logic seem to be the same between versions. For problem #1, the code path that selects the ephemeral port number is: tcp_connect() -> in_pcbbind() -> in_pcbbind_setup() -> in_pcb_lport() [not in FreeBSD 7.2] -> in_pcblookup_local() There is a loop in in_pcb_lport() [or directly in in_pcbbind_setup() in earlier releases] that considers candidate ephemeral port numbers and calls in_pcblookup_local() to determine if a given candidate is suitable. The default behaviour (if the caller has not set either SO_REUSEADDR or SO_REUSEPORT) is to pick a local port number that is not in use by *any* local TCP socket. So long as the number of concurrent, outgoing connections is less than the range configured by `sysctl net.inet.ip.portrange.*`, selecting a totally unique ephemeral port number works OK. However, you cannot exceed that limit, even if each outgoing connection has a unique faddr/fport. This does not limit the number of connections that can be accept()'ed, only the number of connections that can be connect()'ed. In this particular path, I think the code should call in_pcblookup_hash(), rather than in_pcblookup_local(). The criteria in in_pcblookup_hash() only match if the full 5-tuple matches, rather than just the local port number. The complication, of course, comes from the fact that in_pcbbind() is called from both bind() and for the implicit bind that happens for a connect(). The matching criteria in in_pcblookup_local() make sense for the former but not quite for the later. I mentioned that the above is the default behaviour you get when you don't specify SO_REUSEADDR or SO_REUSEPORT. Setting SO_REUSEADDR before calling connect() has some surprizing consequences (surprizing in the sense that I don't believe SO_REUSEADDR is supposed to have any effect on connect()). In this case, when in_pcblookup_local() is called, wild_okay is set to false. This changes the matching criteria to (in effect) allow tcp_connect() to use the full 5-tuple space. However, this brings us to the second problem. Problem #2 is that the ephemeral port number is chosen before the fport/faddr gets set on the pcb; that is tcp_connect() calls in_pcbbind() to select the ephemeral port number, *then* calls in_pcbconnect_setup() to populate the fport/faddr. With SO_REUSEADDR, in_pcbbind() can select an in-use local port. If the local port is used by a socket with a different laddr/fport/faddr, all is good. However, if the local port selection results in a full conflict it will get rejected by the call to in_pcblookup_hash() inside in_pcbconnect_setup(). This happens *after* the loop inside in_pcbbind(), so the call to tcp_connect() fails with EADDRINUSE. Thus, with SO_REUSEADDR, connect() can fail with EADDRINUSE long before the ephemeral port space has been exhausted. The application could re-try the call to connect() and likely succeed, as a new local port would be selected. Overall, this behaviour hinders the ability to open a large number of outbound connections: * If you don't specify SO_REUSEADDR, you have a fairly limited maximum number of outbound connections. * If you do specify SO_REUSEADDR, you are able to open a much larger number of outbound connections, but must retry on EADDRINUSE. I believe that the logic under tcp_connect() should be modified to: - behave uniformly whether or not SO_REUSEADDR has been set - allow outgoing connection requests to re-use a local port number, so long as the remaining elements of the tuple (laddr, fport, faddr) are unique Keith -- "A problem well put is half solved."