From owner-freebsd-questions@freebsd.org Fri Jul 26 13:11:57 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4E9A7BFDE0 for ; Fri, 26 Jul 2019 13:11:57 +0000 (UTC) (envelope-from web@3dresearch.com) Received: from smtpf.telissant.net (smtpf.telissant.net [104.225.11.241]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E4FD28FD02 for ; Fri, 26 Jul 2019 13:11:55 +0000 (UTC) (envelope-from web@3dresearch.com) Received: from boleo.3dresearch.com (localhost [127.0.0.1]) by smtpf.telissant.net (Postfix) with ESMTP id 45w8cM5CNgz76x for ; Fri, 26 Jul 2019 09:11:47 -0400 (EDT) X-Virus-Scanned: amavisd-new at telissant.net Received: from smtpf.telissant.net ([127.0.0.1]) by boleo.3dresearch.com (boleo.3dresearch.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UcZUaTNKG3vj for ; Fri, 26 Jul 2019 09:11:46 -0400 (EDT) Received: from elettra.3dresearch.com (unknown [71.112.242.196]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: elettra@boleo.3dresearch.com) by smtpf.telissant.net (Postfix) with ESMTPSA id 45w8cL5cd3z76t for ; Fri, 26 Jul 2019 09:11:46 -0400 (EDT) Received: from elettra.3dresearch.com (localhost [127.0.0.1]) by elettra.3dresearch.com (Postfix) with SMTP id 286BB2AC7BC for ; Fri, 26 Jul 2019 09:11:46 -0400 (EDT) Date: Fri, 26 Jul 2019 09:11:38 -0400 From: Janos Dohanics To: FreeBSD Questions Subject: Re: Help:: Listen queue overflow killing servers Message-Id: <20190726091138.ffb39f75029373f85ab0edb5@3dresearch.com> In-Reply-To: <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com> References: <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.32; amd64-portbld-freebsd10.4) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E4FD28FD02 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of web@3dresearch.com designates 104.225.11.241 as permitted sender) smtp.mailfrom=web@3dresearch.com X-Spamd-Result: default: False [-1.51 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ENVFROM_SERVICE_ACCT(1.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MV_CASE(0.50)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[smtpf.telissant.net]; NEURAL_HAM_SHORT(-0.92)[-0.919,0]; RECEIVED_SPAMHAUS_PBL(0.00)[196.242.112.71.zen.spamhaus.org : 127.0.0.10]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:36236, ipnet:104.225.11.0/24, country:US]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; DMARC_NA(0.00)[3dresearch.com]; RCPT_COUNT_ONE(0.00)[1]; IP_SCORE(-0.79)[asn: 36236(-3.87), country: US(-0.05)]; FROM_SERVICE_ACCT(1.00)[] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2019 13:11:57 -0000 On Fri, 26 Jul 2019 12:58:45 +0100 Paul Macdonald via freebsd-questions wrote: >=20 > Hi, >=20 > Over the past few months i've seen several boxes (4 or 5) become=20 > unresponsive as a result of a Listen queue overflow state. >=20 > Processes stack up, none are killable, all these are within jails and=20 > neither the jail can be stopped nor the server rebooted (without a > power cycle). >=20 > All are on ZFS and are std apache/php/mysql servers with nothing too > exotic. >=20 > All on 12.0-RELEASE, i've only started seeing these issues recently, > but it feels like more and more. >=20 > /var/log/messages shows tyically; >=20 > =A0=A0=A0 kernel: sonewconn: pcb 0xfffff813395e3d58: Listen queue > overflow: 193 already in queue awaiting acceptance (83 occurrences) >=20 > netstat -Lan=A0 shows >=20 > tcp4 193/0/128=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 x.x.x.x.443 > tcp4=A0 193/0/128=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 x.x.x.x.80 >=20 > connections cannot be killed with tcpdrop ( except ssh which can!) >=20 > All processes seem to be in Disk State ( many many apache processes > but others getting stuck too) >=20 > www=A0=A0=A0=A0=A0 60089=A0=A0=A0 0.0 0.1=A0 196588=A0=A0 78328=A0 -=A0 D= J=A0=A0 21:07 > 1:19.54 /usr/local/sbin/httpd -DNOHTTPACCEPT > .. >=20 > www=A0=A0=A0=A0=A0 93713=A0=A0=A0 0.0 0.0=A0 183576=A0=A0 33164=A0 -=A0 D= J=A0=A0 23:57 > 0:00.01 /usr/local/sbin/httpd -DNOHTTPACCEPT >=20 > but no zombies.. >=20 > last pid: 24773;=A0 load averages:=A0 0.00,=A0 0.00, 0.00=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=20 > =A0=A0=A0 up 52+11:41:09=A0 11:48:02 > 918 processes: 1 running, 917 sleeping > CPU:=A0 0.0% user,=A0 0.0% nice,=A0 0.0% system,=A0 0.0% interrupt,=A0 10= 0% idle > Mem: 107M Active, 3729M Inact, 93G Wired, 27G Free > ARC: 79G Total, 54G MFU, 23G MRU, 243M Anon, 710M Header, 1615M Other > =A0=A0=A0=A0 73G Compressed, 191G Uncompressed, 2.60:1 Ratio > Swap: 4096M Total, 4096M Free >=20 >=20 > I'd appreciate any advice as at present it looks like my only option > is to hard power cycle these I have also been trying to find a resolution to a similar problem (FreeBSD 12.0-STABLE r345381, virtual instace, not jail). Apparently at random, TCP sockets on ports 110 and 143 are stuck in CLOSE_WAIT state (cyrus 3.0.10). My understanding is that in CLOSE_WAIT state the socket is waiting for the server application to close the socket. When the listening queue overflows, I too am unable restart cyrus, even with kill -9, reboot(8) doesn't work, new ssh connection is not accepted. Hard reboot is the only "remedy". I have increased the cyrus listen queue from the default 32 to 128, but I think that's just putting a larger bucket under a leaking roof. --=20 Janos Dohanics