From owner-freebsd-net@freebsd.org Wed Feb 6 15:19:23 2019 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8576014D67AF; Wed, 6 Feb 2019 15:19:23 +0000 (UTC) (envelope-from king.c.david@googlemail.com) Received: from mail-lf1-x141.google.com (mail-lf1-x141.google.com [IPv6:2a00:1450:4864:20::141]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1857371329; Wed, 6 Feb 2019 15:19:22 +0000 (UTC) (envelope-from king.c.david@googlemail.com) Received: by mail-lf1-x141.google.com with SMTP id l142so5653764lfe.2; Wed, 06 Feb 2019 07:19:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MX6F//+vnTfAMTbqIBnkUA2aCeR7Ba1341kddZZ8sv0=; b=Pxx1ekyoGO478e6VHsxBGKqll/zdRxyD6izUb0I9m8l92lgIfgHdvoypPfCDbcZfmE uUC1lToWOKMHePMROHFZD5eByZPKMcEm5zCWK7iGjcQlVvEEd8yvniG6m3vRt4QQZIUG SFkdA4ufuMGs7Y8PlkzB/pfRR3fB2FIEDpwHn02DkHeIBCDvNaUcOUa7nGMGgK6M+mOY CRhfPC+PMhLg7IF7DC96aPrlhDKumc9eJXL6qolApPhTP/9Ew/VXHJYBuNUfSDaXozWd C0jYoePi/LfHc/TZSpIo9DN2J8tCV9QhCCNwT15ITNx+0bkOQlagEGvfTPBC/FxrCU7l a/yg== X-Gm-Message-State: AHQUAuZhASDg5SmVI/GqetDm7zDEvEjmTSEot9WLXAjbL0ypnQF8Pdaj bxbcCHpm94ke3Rk6/5Q1esqp9qnONR5upHgyqhcaBNYJ X-Google-Smtp-Source: AHgI3IYn8mEw+H8d9GQJalZpEiwkt6tJksD2ODDdQ1zhgeaEEOT6TlYez9y2nP4KyhmpmdltUR/rBy7YW85EdYxoMvM= X-Received: by 2002:a19:4948:: with SMTP id l8mr7372768lfj.156.1549466360491; Wed, 06 Feb 2019 07:19:20 -0800 (PST) MIME-Version: 1.0 References: <1549461051.318520353.gg4fwwj8@frv39.fwdcdn.com> In-Reply-To: <1549461051.318520353.gg4fwwj8@frv39.fwdcdn.com> From: David King Date: Wed, 6 Feb 2019 15:18:44 +0000 Message-ID: Subject: Re: Request for more intelligent local port allocation algorithm To: Paul Cc: freebsd-net@freebsd.org, freebsd-stable@freebsd.org X-Rspamd-Queue-Id: 1857371329 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.65 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0]; R_DKIM_ALLOW(-0.20)[googlemail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[googlemail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_DN_SOME(0.00)[]; NEURAL_HAM_SHORT(-0.31)[-0.307,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[googlemail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[1.4.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; IP_SCORE(-0.34)[ip: (2.66), ipnet: 2a00:1450::/32(-2.30), asn: 15169(-1.96), country: US(-0.07)]; FREEMAIL_TO(0.00)[ukr.net]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2019 15:19:23 -0000 Just to add to this, if anyone is doing some work on the outbound tcp connection, could they also have a look at the bug here https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210726 Thanks! On Wed, 6 Feb 2019 at 15:15, Paul wrote: > Hi dev team, > > It's not a secret that when application is trying to establish new TCP > connection, without > first binding a socket to specific local interface address, OS handles > that automatically. > Unfortunately there is a catch, that lies in a different logic of local > port allocation: > (1) when socket is bound before connect() vs (2) when it is not. When > allocating the port > in in_pcb_lport() by checking whether different ports are free, using > in_pcblookup_local(), > the behaviour is following: > > (1) Bound, ie laddr is assigned with specific address: > Port is considered occupied only if there is a PCBs that matches both > laddr and lport > > (2) Not bound, ie laddr == INADDR_ANY: > Port is considered occupied if there is any PCBs that only matches > lport. What this > means is that in order to allocate a port none of the all available > local addresses > should have it allocated, even though this requirement is ridiculous, > since we are > allocating only one PCB > > Looking though the code, it seems that (2) is due to the fact that > tcp_connect() first > allocates the port, indirectly through the call to in_pcbbind() and only > then allocates > the actual local address, also indirectly, though the call to > in_pcbconnect_setup(), that > in turn calls in_pcbladdr(). So, probably, in order to guarantee that > in_pcbconnect_setup() > will not fail we make sure that all range of local addresses are > available, no matter > which one of them is actually selected by in_pcbladdr()? > > In real world, this creates serious problems for servers that have a lot > of outgoing > connections, for example nginx proxy with a lot of open HTTP2 connections. > In order to > avoid this limitation we have created workarounds within the nginx config > as well as > within our own software, basically by having 50 local addresses and only > following the > scenario (1). Alas, all of the built-in Unix utilities as well as other > software always > follow scenario (2). As the result given large number of connections there > may be points > in time, when whole range of ports is occupied by at least one local > address. Even worse is > the outcome of such condition: when in_pcb_lport() travels over the range > of possible port > numbers, making myriad of calls to in_pcblookup_local(), some kind of > important lock is > being held withing the kernel. So important that it leads to a complete > lock of the system. > Even the direct terminal access is not available: it is not responsive. > The more calls to > connect through scenario (2) there are the longer it takes the system to > unfreeze. Given > some circumstances, the only option is hard reset. > > Is it possible to somehow update the code that does connect via scenario > (2) to enable > more intelligent port allocation, like for example allocating local > address and port simultaneously > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >