From owner-freebsd-questions@freebsd.org Fri Jul 26 19:34:49 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1EDCFA7DBE for ; Fri, 26 Jul 2019 19:34:49 +0000 (UTC) (envelope-from paul@ifdnrg.com) Received: from outbound.ifdnrg.com (outbound.ifdnrg.com [193.200.98.22]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "outbound.ifdnrg.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DF53077A94 for ; Fri, 26 Jul 2019 19:34:48 +0000 (UTC) (envelope-from paul@ifdnrg.com) Received: from [192.168.0.210] (38.156-30-62.static.virginmediabusiness.co.uk [62.30.156.38]) (authenticated bits=0) by outbound.ifdnrg.com (8.15.2/8.15.2) with ESMTPSA id x6QJYfUx086969 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Fri, 26 Jul 2019 20:34:42 +0100 (BST) (envelope-from paul@ifdnrg.com) X-Authentication-Warning: outbound.ifdnrg.com: Host 38.156-30-62.static.virginmediabusiness.co.uk [62.30.156.38] claimed to be [192.168.0.210] Subject: Re: Help:: Listen queue overflow killing servers To: freebsd-questions@freebsd.org References: <3a62375a-432c-3533-a7bc-e5573c26fa9c@ifdnrg.com> <2b10f991-bc95-ae31-18e2-95ae943ac527@holgerdanske.com> <2798d3f3-9689-111c-e061-1f6f66d78e03@ifdnrg.com> <1d629866-09db-d892-4c55-717b3dfead7f@holgerdanske.com> From: Paul Macdonald Message-ID: Date: Fri, 26 Jul 2019 20:34:44 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <1d629866-09db-d892-4c55-717b3dfead7f@holgerdanske.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Rspamd-Queue-Id: DF53077A94 X-Spamd-Bar: ------ X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.99)[-0.987,0] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2019 19:34:49 -0000 On 26/07/2019 19:56, David Christensen wrote: > On 7/26/19 9:57 AM, Paul Macdonald via freebsd-questions wrote: >> >> On 26/07/2019 17:11, David Christensen wrote: >>> On 7/26/19 4:58 AM, Paul Macdonald via freebsd-questions wrote: >>>> Over the past few months i've seen several boxes (4 or 5) become >>>> unresponsive as a result of a Listen queue overflow state. >>> >>                  so doesn;t look like its load..... ( and that would >> have shown up in the logs anyway) > > > Is this server in production?  If so, it would be prudent to migrate > services and data to another computer while you troubleshoot. > > this has happened on 5 production boxes over the past few months, all with different hardware and load profiles. > I would turn on debugging and crank up logging everywhere -- kernel, > ZFS, Apache, MySQL, PHP, WP, app code, etc..  Make sure you have a big > and fast device/ virtual device for the logs and debug dumps. > > thats  a big job  we run 110+ servers, i'd like to find something more specific > Are the stress tests hitting the server with "good" traffic?  Can you > send "bad" traffic? > > no idea how to send bad traffic! > Do you have test suites for any of the components?  If so, run them. > As you troubleshoot, write new test scripts. > components are not comparable across boxes, and one box that went down has only our custom code ( which has worked for a decade) > > Can you capture real traffic and replay it -- preferably traffic that > elicits the bug(s)? > the issue doesn;t seem to be that reproducible, i'l check but i think only 1 of the boxes has gone down >1 times with same issue (i can't capture traffic on all boxes) I wish it was more reproducible, i'd downgrade that server down to 11.4 in a heart beat ( i'm suspecting its 12.0 related) ( have see historic report of similar issues on imap boxes, which do have large quues anyway obv) weirdly our imap boxes have been fine, and they have 10k connections all the time. I sieged tested the box that went down earlier today (16C/32T, 128GB RAM, 1Tb NVme) and it didn;t break sweat after 300,000 conections. am at a bit of a loss. > > David > _______________________________________________ > freebsd-questions@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org" > -- ------------------------- Paul Macdonald IFDNRG Ltd Web and video hosting ------------------------- t: 0131 5548070 m: 07970339546 e: paul@ifdnrg.com w: http://www.ifdnrg.com ------------------------- IFDNRG 40 Maritime Street Edinburgh EH6 6SA ---------------------------------------------------- Virtual Servers from £50.00pm High specification Dedicated Servers from £150.00pm ----------------------------------------------------