From owner-freebsd-questions@freebsd.org Fri Jul 26 22:02:46 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DFA2EAAC7C for ; Fri, 26 Jul 2019 22:02:46 +0000 (UTC) (envelope-from paul@ifdnrg.com) Received: from outbound.ifdnrg.com (outbound.ifdnrg.com [193.200.98.22]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "outbound.ifdnrg.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DA2CE85311 for ; Fri, 26 Jul 2019 22:02:45 +0000 (UTC) (envelope-from paul@ifdnrg.com) Received: from [192.168.0.210] (38.156-30-62.static.virginmediabusiness.co.uk [62.30.156.38]) (authenticated bits=0) by outbound.ifdnrg.com (8.15.2/8.15.2) with ESMTPSA id x6QM2dto045892 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Fri, 26 Jul 2019 23:02:40 +0100 (BST) (envelope-from paul@ifdnrg.com) X-Authentication-Warning: outbound.ifdnrg.com: Host 38.156-30-62.static.virginmediabusiness.co.uk [62.30.156.38] claimed to be [192.168.0.210] Subject: Re: Help:: Listen queue overflow killing servers To: freebsd-questions@freebsd.org References: <6485e15869f8b205cf36811adaeed0e5.squirrel@webmail.harte-lyne.ca> From: Paul Macdonald Message-ID: Date: Fri, 26 Jul 2019 23:02:43 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <6485e15869f8b205cf36811adaeed0e5.squirrel@webmail.harte-lyne.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Rspamd-Queue-Id: DA2CE85311 X-Spamd-Bar: + X-Spamd-Result: default: False [1.65 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(0.00)[+ip4:193.200.98.0/23]; HAS_XAW(0.00)[]; TO_DN_NONE(0.00)[]; MX_GOOD(-0.01)[cached: as1.ifdnrg.com]; DKIM_TRACE(0.00)[ifdnrg.com:+]; DMARC_POLICY_ALLOW(0.00)[ifdnrg.com,quarantine]; RCVD_IN_DNSWL_LOW(-0.10)[22.98.200.193.list.dnswl.org : 127.0.5.1]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:20860, ipnet:193.200.98.0/23, country:GB]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RSPAMD_URIBL(4.50)[ifdnrg.com]; R_DKIM_ALLOW(0.00)[ifdnrg.com:s=ifdnrg-default]; URIBL_BLOCKED(0.00)[ifdnrg.com.multi.uribl.com]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-0.44)[-0.438,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; NEURAL_HAM_MEDIUM(-0.34)[-0.342,0]; RCPT_COUNT_ONE(0.00)[1]; DWL_DNSWL_LOW(-1.00)[ifdnrg.com.dwl.dnswl.org : 127.0.5.1]; BAD_REP_POLICIES(0.10)[]; NEURAL_SPAM_SHORT(0.14)[0.136,0]; IP_SCORE(-1.09)[ipnet: 193.200.98.0/23(-4.63), asn: 20860(-0.75), country: GB(-0.08)]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2019 22:02:46 -0000 On 26/07/2019 22:18, James B. Byrne via freebsd-questions wrote: > >>>>>> On 7/26/19 4:58 AM, Paul Macdonald via freebsd-questions wrote: >>>>>>> Over the past few months i've seen several boxes (4 or 5) become >>>>>>> unresponsive as a result of a Listen queue overflow state. > Since upgrading our hosts to 12.0 we have experienced many 'lockouts' > of both bhyve vm guests and jails, all running on zfs. There are > known issues with Bhyve guests getting into a deadlock state waiting > on zio or encountering memory exhaustion in releases after > FreeBSD-11.1. This has multiple causes: > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231117 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 > > We have worked around this by restricting max ARC to the net of system > memory minus all vm allocations minus 4GB. We also noted that as the > capacity of zfs pools approached 80% the deadlock is encountered more > frequently. The ARC on the system that went down today was unbounded, and stupidly large  (60GB) but we had 30GB free so i didn't see harm, although i've now restricted. (now: Max of 10GB, currently 4GB wired, 118GB free.) We're totally aware of keeping ZFS below 80% utilisation, we've seen major performance cliffs if thats reached in the past. In this case we're at 2% utilisation on a new (<1 month) 1TB NVMe on PCI-e No bhyve, just 1 jail. This was a high spec box under no significant load, but now the fifth box in a few months thats crumpled under listen queue overflows Paul -- ------------------------- Paul Macdonald IFDNRG Ltd Web and video hosting ------------------------- t: 0131 5548070 m: 07970339546 e: paul@ifdnrg.com w: http://www.ifdnrg.com ------------------------- IFDNRG 40 Maritime Street Edinburgh EH6 6SA ---------------------------------------------------- Virtual Servers from £50.00pm High specification Dedicated Servers from £150.00pm ----------------------------------------------------