Date: Mon, 3 Dec 2012 15:38:30 GMT From: Keith Arner <vornum@gmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/174087: Problems with ephemeral port selection Message-ID: <201212031538.qB3FcUop000779@red.freebsd.org> Resent-Message-ID: <201212031540.qB3Fe0XQ043142@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 174087 >Category: kern >Synopsis: Problems with ephemeral port selection >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Dec 03 15:40:00 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Keith Arner >Release: 7.2 >Organization: Panasas >Environment: FreeBSD pa-twin-19a 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Mon Apr 19 16:24:09 EDT 2010 root@perf-x3:/usr/obj/usr0/jimz/freebsd-c-rack/sys/PANASAS amd64 >Description: Date: Fri, 30 Nov 2012 09:09:08 -0500 From: Keith Arner <vornum@gmail.com> To: freebsd-net@freebsd.org Subject: Problems with ephemeral port selection Message-ID: <CAEo_tUH9LPzPFP-O=317rYEQ3nT66b4biQshV_8=L8hReO_BLg@mail.gmail.com> I've noticed some issues with ephemeral port number selection from tcp_connect(), which limit the number of concurrent, outgoing connections that can be established (connect(), rather than accept()). Sifting through the source code, I believe the issuess stem from two problems in the tcp_connect() code path. Specifically: 1) The wrong function gets called to determine if a given ephemeral port number is currently usable. 2) The ephemeral port number gets selected without considering the foreign addr/port. Curiously, the effect of #1 mostly cancels the effect of #2, such that the common calling convention gives you a correct result so long as you only have a small number of outgoing connections. However, once you get to a large number of outgoing connections, things start to break down. (I'll define large and small later.) As a side note, I have been working with FreeBSD 7.2. The implementations of several of the relevant functions have been refactored somewhere between 7.2-RELEASE and 9-STABLE, but the core problems in the logic seem to be the same between versions. For problem #1, the code path that selects the ephemeral port number is: tcp_connect() -> in_pcbbind() -> in_pcbbind_setup() -> in_pcb_lport() [not in FreeBSD 7.2] -> in_pcblookup_local() There is a loop in in_pcb_lport() [or directly in in_pcbbind_setup() in earlier releases] that considers candidate ephemeral port numbers and calls in_pcblookup_local() to determine if a given candidate is suitable. The default behaviour (if the caller has not set either SO_REUSEADDR or SO_REUSEPORT) is to pick a local port number that is not in use by *any* local TCP socket. So long as the number of concurrent, outgoing connections is less than the range configured by `sysctl net.inet.ip.portrange.*`, selecting a totally unique ephemeral port number works OK. However, you cannot exceed that limit, even if each outgoing connection has a unique faddr/fport. This does not limit the number of connections that can be accept()'ed, only the number of connections that can be connect()'ed. In this particular path, I think the code should call in_pcblookup_hash(), rather than in_pcblookup_local(). The criteria in in_pcblookup_hash() only match if the full 5-tuple matches, rather than just the local port number. The complication, of course, comes from the fact that in_pcbbind() is called from both bind() and for the implicit bind that happens for a connect(). The matching criteria in in_pcblookup_local() make sense for the former but not quite for the later. I mentioned that the above is the default behaviour you get when you don't specify SO_REUSEADDR or SO_REUSEPORT. Setting SO_REUSEADDR before calling connect() has some surprizing consequences (surprizing in the sense that I don't believe SO_REUSEADDR is supposed to have any effect on connect()). In this case, when in_pcblookup_local() is called, wild_okay is set to false. This changes the matching criteria to (in effect) allow tcp_connect() to use the full 5-tuple space. However, this brings us to the second problem. Problem #2 is that the ephemeral port number is chosen before the fport/faddr gets set on the pcb; that is tcp_connect() calls in_pcbbind() to select the ephemeral port number, *then* calls in_pcbconnect_setup() to populate the fport/faddr. With SO_REUSEADDR, in_pcbbind() can select an in-use local port. If the local port is used by a socket with a different laddr/fport/faddr, all is good. However, if the local port selection results in a full conflict it will get rejected by the call to in_pcblookup_hash() inside in_pcbconnect_setup(). This happens *after* the loop inside in_pcbbind(), so the call to tcp_connect() fails with EADDRINUSE. Thus, with SO_REUSEADDR, connect() can fail with EADDRINUSE long before the ephemeral port space has been exhausted. The application could re-try the call to connect() and likely succeed, as a new local port would be selected. Overall, this behaviour hinders the ability to open a large number of outbound connections: * If you don't specify SO_REUSEADDR, you have a fairly limited maximum number of outbound connections. * If you do specify SO_REUSEADDR, you are able to open a much larger number of outbound connections, but must retry on EADDRINUSE. I believe that the logic under tcp_connect() should be modified to: - behave uniformly whether or not SO_REUSEADDR has been set - allow outgoing connection requests to re-use a local port number, so long as the remaining elements of the tuple (laddr, fport, faddr) are unique ========== Follow-up from the freebsd-net mailing list: Date: Sat, 01 Dec 2012 11:31:31 -0300 From: Fernando Gont <fernando@gont.com.ar> To: Keith Arner <vornum@gmail.com> Cc: freebsd-net@freebsd.org Subject: Re: Problems with ephemeral port selection Message-ID: <50BA14C3.4070601@gont.com.ar> In-Reply-To: <CAEo_tUH9LPzPFP-O=317rYEQ3nT66b4biQshV_8=L8hReO_BLg@mail.gmail.com> References: <CAEo_tUH9LPzPFP-O=317rYEQ3nT66b4biQshV_8=L8hReO_BLg@mail.gmail.com> Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help Hi, Keith, On 11/30/2012 11:09 AM, Keith Arner wrote: > > - behave uniformly whether or not SO_REUSEADDR has been set > - allow outgoing connection requests to re-use a local port number, so > long as the remaining elements of the tuple (laddr, fport, faddr) are > unique Please take a look at the discussion on how to "steal" incomming connections in Section 3.1 of RFC 6056. Cheers, -- Fernando Gont e-mail: fernando@gont.com.ar || fgont@si6networks.com PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1 >How-To-Repeat: connect() a large number of sockets, specifying SO_REUSEADDR before calling connect(). Note that the call to connect() fails with EADDRINUSE long before we run into any resource exhaustion. Then connect() a large number of sockets, without specificying SO_REUSADDR (while all the previous sockets are still open). Note that connect() then fails with EADDRNOTAVAIL; this occurs as soon as the total number of outgoing connections equals the ephemeral port range. #include <sys/types.h> #include <sys/socket.h> #include <stdio.h> #include <errno.h> #include <stdlib.h> #include <netinet/in.h> #include <netinet/tcp.h> #include <unistd.h> #include <sys/ioctl.h> #include <net/if.h> #include <arpa/inet.h> int last_child = -1; #define complain(exit_val) \ { \ return(exit_val); \ } int SockOpt(int s, int level, int opt) { int opt_val = 1; int ret = setsockopt(s, level, opt, &opt_val, sizeof(opt_val)); if (ret) { perror("Could not setsockopt() on socket"); complain(-1); } return 0; } int open_server(int port) { int ret; struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl(INADDR_ANY); sin.sin_port = htons(port); int server = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); if (server < 0) { perror("Could not open server socket"); complain(-1); } SockOpt(server, SOL_SOCKET, SO_REUSEADDR); ret = bind(server, (struct sockaddr *)&sin, sizeof(sin)); if (ret) { perror("Could not bind() server socket"); complain(-1); } ret = listen(server, 5); if (ret) { perror("Could not listen() server socket"); complain(-1); } return server; } int cycle_client(int server, int iteration, int port, int reuse) { int ret; struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK); sin.sin_port = htons(port); int client = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); if (client < 0) { fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno); perror("Could not open client socket"); complain(-1); } if (reuse) { SockOpt(client, SOL_SOCKET, SO_REUSEADDR); } ret = connect(client, (struct sockaddr *)&sin, sizeof(sin)); if (ret) { fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno); perror("Could not connect() client socket"); complain(-1); } int len; int child = accept(server, (struct sockaddr *)&sin, &len); if (child < 0) { fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno); perror("Could not accept() child socket"); complain(-1); } /* Why are we not closing the sockets? * * The point of this program is to illustrate the behaviour of the * network stack when we open (or, rather connect()) a large number of * outgoing sockets. Thus, we want the sockets to linger around, to * consume ephemeral port numbers. Note that we could get largely * similar behaviour by closing the sockets (if we close the client * socket first), as the pcbs would linger in the TIME_WAIT state, * consuming emphemeral port numbers. * * Note that because TIME_WAIT connections count against up, the * behaviour being illustrated does not rely on a large number of * concurrent connections, just a large number of outgoing connections * established over a short time period. But it is easier to understand * the operation of this program if we leave the sockets open. /* ret = close(client); if (ret) { fprintf(stderr, "Iteration %d, errno %d: ", iteration), errno; perror("Could not close() client"); complain(-1); } */ /* if (last_child) { ret = close(child); if (ret) { fprintf(stderr, "Iteration %d, errno %d: ", iteration, errno); perror("Could not close() child"); complain(-1); } } */ last_child = child; return 0; } /* Main loop to illustrate ephemeral port number behaviour.*/ int main(int argc, void **argv) { /* num_iterations: How many sockets do we want to try to open per remote * port number? Should be set higher than the number of unique * ephemeral port numbers that the stack can choose from. With the * default FreeBSD settings, that works out to: * * net.inet.ip.portrange.last: 65535 * net.inet.ip.portrange.first: 49152 * * 65535 - 49152 = 16383 */ int num_iterations = 20 * 1000; /* num_ports: How many distinct remote ports to we want to connect to? */ int num_ports = 2; /* port: base, remote port number to connect to */ int port = 12345; /* reuse: Should we set SO_REUSEADDR before calling connect()? * Note that we alternate this value each for each remote port, to * illustrate the differences in behaviour between setting it or not. */ int reuse = 1; int port_loop; for (port_loop=0; port_loop<num_ports; port_loop++) { /* Set up a listening socket on the next remote port number. */ int server = open_server(port); int i=0; for(; i<num_iterations; i++) { /* Open a bunch of sockets; and bail out on the first failure. */ if (cycle_client(server, i, port, reuse)) { break; } } /* How many connections did we manage to establish on this port * number (and with this "reuse" setting)? If all is working, * we ought to be able to establish as many connections as there * are ephemeral ports, and we ought to be able to do so for each * remote port number (baring memory exhaustion problems). */ fprintf(stderr, "port %d; reuse %d; opened %d\n", port, reuse, i); /* Advance to the next remote port, and toggle whether we set * SO_REUSEADDR. */ port++; reuse = !reuse; } return 0; } >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201212031538.qB3FcUop000779>