From owner-freebsd-questions@freebsd.org Tue Feb 14 21:54:46 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97CEDCDFC16 for ; Tue, 14 Feb 2017 21:54:46 +0000 (UTC) (envelope-from dch@skunkwerks.at) Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 63E5313D7 for ; Tue, 14 Feb 2017 21:54:45 +0000 (UTC) (envelope-from dch@skunkwerks.at) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 8B6C221212; Tue, 14 Feb 2017 16:54:39 -0500 (EST) Received: from web6 ([10.202.2.216]) by compute7.internal (MEProxy); Tue, 14 Feb 2017 16:54:39 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=skunkwerks.at; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=mesmtp; bh=TGs6DZXkK4mWeIhVevUdFS3Akb c=; b=R3IOfrQ19NmkhbXj8L/VM2hXPK+Y4q6e1ZtC18tOzMrb2n1gFIFF0B2Jth QAVBO+1PttZjBxKakQ8ohJYf0fCluEmGR31IGxExXhrIgHDyVLiqWrAkw+CRI4kj 9eN+MmuFOQnlWk/GTXX9hpcF+1iUNsePaqL+UR4lCMILztM+M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=smtpout; bh=TG s6DZXkK4mWeIhVevUdFS3Akbc=; b=he2Pg0ECjmHPy09dbAC5zMRLoLqJ2gGOzq xHUOjMeTBbXAM1U/40RX7HZgWu6mhqmQ4Ie6POSgJ4gvfv5CVXrlpHvwCRmdhSSW dyioyaJnOTv9Eshu1dgqXCPDQ7HiMeNgnQgo45rDpVCse02arVNZ3bCBDetuid1S gSCdTXuRc= X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 6D46248004; Tue, 14 Feb 2017 16:54:39 -0500 (EST) Message-Id: <1487109279.452074.881096096.1E71745B@webmail.messagingengine.com> From: Dave Cottlehuber To: freebsd-questions@freebsd.org Cc: paulbeard@gmail.com MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-5653d111 Subject: Re: where is somaxconn in FreeBSD 10.x? In-Reply-To: <333CC779-4527-4199-8BD9-8E7793D9FF86@gmail.com> References: <333CC779-4527-4199-8BD9-8E7793D9FF86@gmail.com> Date: Tue, 14 Feb 2017 22:54:39 +0100 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2017 21:54:46 -0000 On Tue, 14 Feb 2017, at 04:11, Paul Beard wrote: > FreeBSD www 10.3-STABLE FreeBSD 10.3-STABLE #0 r312644: Sun Jan 22 > 11:36:16 PST 2017 root@www:/usr/obj/usr/src/sys/SHUTTLE i386 >=20 > Seeing a lot of these=20 >=20 > Feb 13 18:57:09 www kernel: sonewconn: pcb 0xca51e2f4: Listen queue > overflow: 76 already in queue awaiting acceptance (4 occurrences) >=20 > and my exploration of it through the Google suggest I need to raise my > connection/listen queue. But I=E2=80=99m not sure what sysctl tunable nee= ds > adjusting.=20 Hi Paul, TLDR use `netstat -ALan tcp` repeatedly to try to catch the process with the overflowing listen queues, find what port its listening on, and if necessary correlate that with fstat(1) to find the culprit, and then do something about that process. Having been through this recently, here's my understanding of the problem and some options. See http://mail.tarsnap.com/spiped/msg00159.html for some discussion, and Colin's very helpful answers, and https://gist.github.com/dch/e4a2c128072556bf131e117232c3622a for the data I found useful along the way. Most importantly, fundamentally this is a bottleneck problem - you can shift the bottleneck around, and maybe put it somewhere that is no longer critical for your app, but there will always be another bottleneck waiting.=20 This is typically an application-level issue where the application is unable to accept connections as fast as the kernel is able to provide them -- it's buffers and queues all the way down. The listen queue is related to a specific socket for that application, so tuning the kernel itself will probably not improve the situation much, if at all. The listen queue may fill up at a proxy server (nginx, haproxy etc) in front of some other application, or a network tunnel or vpn. However it may be possible to change the socket settings within your program, directly with a config setting, or via recompilation, to handle more connections by default. If that's not the case, then you enter the realm of load balancers (net/haproxy for example) to spread the backend load across multiple instances of your app in a pool. What would be nice is if this error provided the name or pid of the offending process, but as it doesn't you'll need to use netstat(1) to track down which process is the initial culprit. -A Show the address of a protocol control block (PCB) asso- ciated with a socket; used for debugging. -a Show the state of all sockets; normally sockets used by server processes are not shown. -L Show the size of the various listen queues.=20 The first count shows the number of unaccepted connections, the second count shows the amount of unaccepted incomplete connections, and the third count is the maximum number of queued connections. -n Do not resolve numeric addresses and port numbers to names. See GENERAL OPTIONS. Are the relevant options, so you'll want something like this, using -p tcp to filter out other protocols: netstat -ALanp tcp Current listen queue sizes (qlen/incqlen/maxqlen) Tcpcb Proto Listen Local Address=20=20= =20=20 fffff8012c041820 tcp4 0/0/128 *.443=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 fffff80269b54000 tcp6 0/0/128 *.443=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 fffff80e18be0820 tcp4 0/0/128 *.80=20=20=20=20=20=20=20=20= =20=20=20=20 fffff80cca843000 tcp4 16/0/10 *.15984 <-- see the 16/ here=20 is the culprit So we can see here that the process listening on 15984 is unable to process connections as fast as the kernel can receive and pass them through. If this is a transient port you would need to use `fstat |grep fffff80cca843000` or similar to find which process is the problem. I hope that helps, and hopefully that my explanation is also more or less correct. BTW regarding tuning, I found the following pages useful, but ultimately it simply delayed the problem. https://fasterdata.es.net/host-tuning/freebsd/ http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html#FreeBSD https://calomel.org/freebsd_network_tuning.html A+ Dave