From owner-freebsd-stable@freebsd.org Mon Sep 19 20:43:37 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6F67BE1210 for ; Mon, 19 Sep 2016 20:43:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7959A350; Mon, 19 Sep 2016 20:43:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bm5PU-0001Ki-4G; Mon, 19 Sep 2016 23:43:28 +0300 Date: Mon, 19 Sep 2016 23:43:28 +0300 From: Slawa Olhovchenkov To: Julien Charbon Cc: hiren panchasara , Konstantin Belousov , freebsd-stable@FreeBSD.org Subject: Re: 11.0 stuck on high network load Message-ID: <20160919204328.GN2840@zxy.spb.ru> References: <20160904215739.GC22212@zxy.spb.ru> <20160905014612.GA42393@strugglingcoder.info> <20160914213503.GJ2840@zxy.spb.ru> <20160915085938.GN38409@kib.kiev.ua> <20160915090633.GS2840@zxy.spb.ru> <20160916181839.GC2960@zxy.spb.ru> <20160916183053.GL9397@strugglingcoder.info> <20160916190330.GG2840@zxy.spb.ru> <78cbcdc9-f565-1046-c157-2ddd8fcccc62@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <78cbcdc9-f565-1046-c157-2ddd8fcccc62@freebsd.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Sep 2016 20:43:37 -0000 On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: > > > @ CPU_CLK_UNHALTED_CORE [4653445 samples] > > > > 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel > > 100.0% [2413083] __rw_wlock_hard > > 100.0% [2413083] tcp_tw_2msl_scan > > 99.99% [2412958] pfslowtimo > > 100.0% [2412958] softclock_call_cc > > 100.0% [2412958] softclock > > 100.0% [2412958] intr_event_execute_handlers > > 100.0% [2412958] ithread_loop > > 100.0% [2412958] fork_exit > > 00.01% [125] tcp_twstart > > 100.0% [125] tcp_do_segment > > 100.0% [125] tcp_input > > 100.0% [125] ip_input > > 100.0% [125] swi_net > > 100.0% [125] intr_event_execute_handlers > > 100.0% [125] ithread_loop > > 100.0% [125] fork_exit > > The only write lock tcp_tw_2msl_scan() tries to get is a > INP_WLOCK(inp). Thus here, tcp_tw_2msl_scan() seems to be stuck > spinning on INP_WLOCK (or pfslowtimo() is going crazy and calls > tcp_tw_2msl_scan() at high rate but this will be quite unexpected). > > Thus my hypothesis is that something is holding the INP_WLOCK and not > releasing it, and tcp_tw_2msl_scan() is spinning on it. > > If you can, could you compile the kernel with below options: > > options DDB # Support DDB. > options DEADLKRES # Enable the deadlock resolver > options INVARIANTS # Enable calls of extra sanity > checking > options INVARIANT_SUPPORT # Extra sanity checks of internal > structures, required by INVARIANTS > options WITNESS # Enable checks to detect > deadlocks and cycles > options WITNESS_SKIPSPIN # Don't run witness on spinlocks > for speed Currently this host run with 100% CPU load (on all cores), i.e. enabling WITNESS will be significant drop performance. Can I use only some subset of options? Also, I can some troubles to DDB enter in this case. May be kgdb will be success (not tryed yet)? > And once the issue is reproduce, run in ddb run the below commands: > > show pcpu > show allpcpu > show locks > show alllocks > show lockchain > show allchains > show all trace > > This is to see if the contention is indeed on the tcp_tw_2msl_scan's > INP_WLOCK.