Date: Tue, 14 Feb 2017 22:54:39 +0100 From: Dave Cottlehuber <dch@skunkwerks.at> To: freebsd-questions@freebsd.org Cc: paulbeard@gmail.com Subject: Re: where is somaxconn in FreeBSD 10.x? Message-ID: <1487109279.452074.881096096.1E71745B@webmail.messagingengine.com> In-Reply-To: <333CC779-4527-4199-8BD9-8E7793D9FF86@gmail.com> References: <333CC779-4527-4199-8BD9-8E7793D9FF86@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Feb 2017, at 04:11, Paul Beard wrote: > FreeBSD www 10.3-STABLE FreeBSD 10.3-STABLE #0 r312644: Sun Jan 22 > 11:36:16 PST 2017 root@www:/usr/obj/usr/src/sys/SHUTTLE i386 >=20 > Seeing a lot of these=20 >=20 > Feb 13 18:57:09 www kernel: sonewconn: pcb 0xca51e2f4: Listen queue > overflow: 76 already in queue awaiting acceptance (4 occurrences) >=20 > and my exploration of it through the Google suggest I need to raise my > connection/listen queue. But I=E2=80=99m not sure what sysctl tunable nee= ds > adjusting.=20 Hi Paul, TLDR use `netstat -ALan tcp` repeatedly to try to catch the process with the overflowing listen queues, find what port its listening on, and if necessary correlate that with fstat(1) to find the culprit, and then do something about that process. Having been through this recently, here's my understanding of the problem and some options. See http://mail.tarsnap.com/spiped/msg00159.html for some discussion, and Colin's very helpful answers, and https://gist.github.com/dch/e4a2c128072556bf131e117232c3622a for the data I found useful along the way. Most importantly, fundamentally this is a bottleneck problem - you can shift the bottleneck around, and maybe put it somewhere that is no longer critical for your app, but there will always be another bottleneck waiting.=20 This is typically an application-level issue where the application is unable to accept connections as fast as the kernel is able to provide them -- it's buffers and queues all the way down. The listen queue is related to a specific socket for that application, so tuning the kernel itself will probably not improve the situation much, if at all. The listen queue may fill up at a proxy server (nginx, haproxy etc) in front of some other application, or a network tunnel or vpn. However it may be possible to change the socket settings within your program, directly with a config setting, or via recompilation, to handle more connections by default. If that's not the case, then you enter the realm of load balancers (net/haproxy for example) to spread the backend load across multiple instances of your app in a pool. What would be nice is if this error provided the name or pid of the offending process, but as it doesn't you'll need to use netstat(1) to track down which process is the initial culprit. -A Show the address of a protocol control block (PCB) asso- ciated with a socket; used for debugging. -a Show the state of all sockets; normally sockets used by server processes are not shown. -L Show the size of the various listen queues.=20 The first count shows the number of unaccepted connections, the second count shows the amount of unaccepted incomplete connections, and the third count is the maximum number of queued connections. -n Do not resolve numeric addresses and port numbers to names. See GENERAL OPTIONS. Are the relevant options, so you'll want something like this, using -p tcp to filter out other protocols: netstat -ALanp tcp Current listen queue sizes (qlen/incqlen/maxqlen) Tcpcb Proto Listen Local Address=20=20= =20=20 fffff8012c041820 tcp4 0/0/128 *.443=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 fffff80269b54000 tcp6 0/0/128 *.443=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 fffff80e18be0820 tcp4 0/0/128 *.80=20=20=20=20=20=20=20=20= =20=20=20=20 fffff80cca843000 tcp4 16/0/10 *.15984 <-- see the 16/ here=20 is the culprit So we can see here that the process listening on 15984 is unable to process connections as fast as the kernel can receive and pass them through. If this is a transient port you would need to use `fstat |grep fffff80cca843000` or similar to find which process is the problem. I hope that helps, and hopefully that my explanation is also more or less correct. BTW regarding tuning, I found the following pages useful, but ultimately it simply delayed the problem. https://fasterdata.es.net/host-tuning/freebsd/ http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html#FreeBSD https://calomel.org/freebsd_network_tuning.html A+ Dave
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1487109279.452074.881096096.1E71745B>