Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Feb 2017 22:54:39 +0100
From:      Dave Cottlehuber <dch@skunkwerks.at>
To:        freebsd-questions@freebsd.org
Cc:        paulbeard@gmail.com
Subject:   Re: where is somaxconn in FreeBSD 10.x? 
Message-ID:  <1487109279.452074.881096096.1E71745B@webmail.messagingengine.com>
In-Reply-To: <333CC779-4527-4199-8BD9-8E7793D9FF86@gmail.com>
References:  <333CC779-4527-4199-8BD9-8E7793D9FF86@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Feb 2017, at 04:11, Paul Beard wrote:
> FreeBSD www 10.3-STABLE FreeBSD 10.3-STABLE #0 r312644: Sun Jan 22
> 11:36:16 PST 2017     root@www:/usr/obj/usr/src/sys/SHUTTLE  i386
>=20
> Seeing a lot of these=20
>=20
> Feb 13 18:57:09 www kernel: sonewconn: pcb 0xca51e2f4: Listen queue
> overflow: 76 already in queue awaiting acceptance (4 occurrences)
>=20
> and my exploration of it through the Google suggest I need to raise my
> connection/listen queue. But I=E2=80=99m not sure what sysctl tunable nee=
ds
> adjusting.=20

Hi Paul,

TLDR use `netstat -ALan tcp` repeatedly to try to catch the process with
the overflowing listen queues, find what port its listening on, and if
necessary correlate that with fstat(1) to find the culprit, and then do
something about  that process.

Having been through this recently, here's my understanding of the
problem and some options. See
http://mail.tarsnap.com/spiped/msg00159.html for some discussion, and
Colin's very helpful answers, and
https://gist.github.com/dch/e4a2c128072556bf131e117232c3622a for the
data I found useful along the way.

Most importantly, fundamentally this is a bottleneck problem - you can
shift the bottleneck around, and maybe put it somewhere that is no
longer critical for your app, but there will always be another
bottleneck waiting.=20

This is typically an application-level issue where the application is
unable to accept connections as fast as the kernel is able to provide
them -- it's buffers and queues all the way down. The listen queue is
related to a specific socket for that application, so tuning the kernel
itself will probably not improve the situation much, if at all. The
listen queue may fill up at a proxy server (nginx, haproxy etc) in front
of some other application, or a network tunnel  or vpn.

However it may be possible to change the socket settings within your
program, directly with a config setting, or via recompilation, to handle
more connections by default. If that's not the case, then you enter the
realm of load balancers  (net/haproxy for example) to spread the backend
load across multiple instances of your app in a pool.

What would be nice is if this error provided the name or pid of the
offending process, but as it doesn't you'll need to use netstat(1) to
track down which process is the initial culprit.

	     -A           Show the address of a protocol control block
	     (PCB) asso-
		     ciated with a socket; used for debugging.

	     -a           Show the state of all sockets; normally
	     sockets used by
		     server processes are not shown.

	     -L           Show the size of the various listen queues.=20
	     The first
		     count shows the number of unaccepted connections,
		     the
		     second count shows      the amount of unaccepted
		     incomplete
		     connections, and the third      count is the
		     maximum number of
		     queued connections.

	     -n           Do not resolve numeric addresses and port
	     numbers to
		     names.  See GENERAL OPTIONS.

Are the relevant options, so you'll want something like this, using -p
tcp to filter out other protocols:

netstat -ALanp tcp
Current listen queue sizes (qlen/incqlen/maxqlen)
Tcpcb            Proto Listen                           Local Address=20=20=
=20=20
fffff8012c041820 tcp4  0/0/128               *.443=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20
fffff80269b54000 tcp6  0/0/128               *.443=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20
fffff80e18be0820 tcp4  0/0/128               *.80=20=20=20=20=20=20=20=20=
=20=20=20=20
fffff80cca843000 tcp4  16/0/10             *.15984 <-- see the 16/ here=20
is the culprit

So we can see here that the process listening on 15984 is unable to
process connections as fast as the kernel can receive and pass them
through. If this is a transient port you would need to use `fstat |grep
fffff80cca843000` or similar to find which process is the problem.

I hope that helps, and hopefully that my explanation is also more or
less correct.

BTW regarding tuning, I found the following pages useful, but ultimately
it simply delayed the problem.

https://fasterdata.es.net/host-tuning/freebsd/
http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html#FreeBSD
https://calomel.org/freebsd_network_tuning.html

A+
Dave



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1487109279.452074.881096096.1E71745B>