From owner-freebsd-stable@freebsd.org Wed Feb 6 15:13:20 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9EC5F14D64FA for ; Wed, 6 Feb 2019 15:13:20 +0000 (UTC) (envelope-from devgs@ukr.net) Received: from frv199.fwdcdn.com (frv199.fwdcdn.com [212.42.77.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ukr.net", Issuer "Thawte RSA CA 2018" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E0ED771018 for ; Wed, 6 Feb 2019 15:13:18 +0000 (UTC) (envelope-from devgs@ukr.net) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=ffe; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id:To: Subject:From:Date:Sender:Reply-To:Cc:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=5VZ1fmVFY2b5Yfbndt+XTTfnVftwgGgFzK7A/b1ghmk=; b=UPkjulI2TLVh88Fq4W5iflUEZ9 3mBvq7GRoVKxaQwI+dkv9YpP7oiHArYogBxhQP3kHxu2W/Grvrc0xm91IU4DHKwTHDJemaAQnsuf6 tkQnsUdshHjwnUG+M8/Hkweiqv1r3hhjX0fhypWczel0i81Ag4zCx0CYulx1YHP9FUzg=; Received: from [10.10.10.39] (helo=frv39.fwdcdn.com) by frv199.fwdcdn.com with smtp ID 1grOsv-000I4F-5B for freebsd-stable@freebsd.org; Wed, 06 Feb 2019 17:13:09 +0200 Date: Wed, 06 Feb 2019 17:13:08 +0200 From: Paul Subject: Request for more intelligent local port allocation algorithm To: freebsd-net@freebsd.org, freebsd-stable@freebsd.org X-Mailer: mail.ukr.net 5.0 Message-Id: <1549461051.318520353.gg4fwwj8@frv39.fwdcdn.com> Received: from devgs@ukr.net by frv39.fwdcdn.com; Wed, 06 Feb 2019 17:13:09 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: binary X-Rspamd-Queue-Id: E0ED771018 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ukr.net header.s=ffe header.b=UPkjulI2; dmarc=pass (policy=none) header.from=ukr.net; spf=pass (mx1.freebsd.org: domain of devgs@ukr.net designates 212.42.77.199 as permitted sender) smtp.mailfrom=devgs@ukr.net X-Spamd-Result: default: False [-3.76 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; R_DKIM_ALLOW(-0.20)[ukr.net:s=ffe]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:212.42.77.0/24]; FREEMAIL_FROM(0.00)[ukr.net]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; NEURAL_SPAM_SHORT(0.23)[0.235,0]; DWL_DNSWL_LOW(-1.00)[ukr.net.dwl.dnswl.org : 127.0.5.1]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[ukr.net:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[ukr.net,none]; MX_GOOD(-0.01)[mxs.ukr.net]; IP_SCORE(0.01)[country: UA(0.04)]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[ukr.net]; ASN(0.00)[asn:8856, ipnet:212.42.77.0/24, country:UA]; RCVD_TLS_LAST(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2019 15:13:20 -0000 Hi dev team, It's not a secret that when application is trying to establish new TCP connection, without first binding a socket to specific local interface address, OS handles that automatically. Unfortunately there is a catch, that lies in a different logic of local port allocation: (1) when socket is bound before connect() vs (2) when it is not. When allocating the port in in_pcb_lport() by checking whether different ports are free, using in_pcblookup_local(), the behaviour is following: (1) Bound, ie laddr is assigned with specific address: Port is considered occupied only if there is a PCBs that matches both laddr and lport (2) Not bound, ie laddr == INADDR_ANY: Port is considered occupied if there is any PCBs that only matches lport. What this means is that in order to allocate a port none of the all available local addresses should have it allocated, even though this requirement is ridiculous, since we are allocating only one PCB Looking though the code, it seems that (2) is due to the fact that tcp_connect() first allocates the port, indirectly through the call to in_pcbbind() and only then allocates the actual local address, also indirectly, though the call to in_pcbconnect_setup(), that in turn calls in_pcbladdr(). So, probably, in order to guarantee that in_pcbconnect_setup() will not fail we make sure that all range of local addresses are available, no matter which one of them is actually selected by in_pcbladdr()? In real world, this creates serious problems for servers that have a lot of outgoing connections, for example nginx proxy with a lot of open HTTP2 connections. In order to avoid this limitation we have created workarounds within the nginx config as well as within our own software, basically by having 50 local addresses and only following the scenario (1). Alas, all of the built-in Unix utilities as well as other software always follow scenario (2). As the result given large number of connections there may be points in time, when whole range of ports is occupied by at least one local address. Even worse is the outcome of such condition: when in_pcb_lport() travels over the range of possible port numbers, making myriad of calls to in_pcblookup_local(), some kind of important lock is being held withing the kernel. So important that it leads to a complete lock of the system. Even the direct terminal access is not available: it is not responsive. The more calls to connect through scenario (2) there are the longer it takes the system to unfreeze. Given some circumstances, the only option is hard reset. Is it possible to somehow update the code that does connect via scenario (2) to enable more intelligent port allocation, like for example allocating local address and port simultaneously