From owner-freebsd-stable@freebsd.org Fri Oct 14 09:35:51 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44E81C11E76 for ; Fri, 14 Oct 2016 09:35:51 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 048F4B93; Fri, 14 Oct 2016 09:35:51 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1buyu3-000MMy-1k; Fri, 14 Oct 2016 12:35:47 +0300 Date: Fri, 14 Oct 2016 12:35:47 +0300 From: Slawa Olhovchenkov To: Julien Charbon Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara Subject: Re: 11.0 stuck on high network load Message-ID: <20161014093546.GN57714@zxy.spb.ru> References: <20161012121322.GB57876@zxy.spb.ru> <62d8861c-673e-6d86-e96e-751399e505e5@freebsd.org> <20161012130103.GD57714@zxy.spb.ru> <20161012154229.GC57876@zxy.spb.ru> <20161013143825.GK57714@zxy.spb.ru> <33ab0bfc-7009-95a7-7752-c2c439092e85@freebsd.org> <20161013151715.GL57714@zxy.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Oct 2016 09:35:51 -0000 On Thu, Oct 13, 2016 at 06:14:29PM +0200, Julien Charbon wrote: > On 10/13/16 5:17 PM, Slawa Olhovchenkov wrote: > > On Thu, Oct 13, 2016 at 05:06:00PM +0200, Julien Charbon wrote: > > > >>>> will give you that trace in the core, and without INVARIANT then it is > >>>> better to use dtrace: > >>>> > >>>> $ cat tcp-twstart-dropped.d > >>>> fbt::tcp_twstart:entry > >>>> /args[0]->t_inpcb->inp_flags & 0x04000000/ > >>>> { > >>>> stack(); > >>>> printf("INP_DROPPED in tcp_twstart: %x", args[0]->t_inpcb->inp_flags); > >>>> } > >>> > >>> Same code may be insert there too, IMHO. > >> > >> Hmm, I don't think so: > >> > >> - If you have INVARIANT, the kernel will panic in tcp_twstart() or > >> tcp_detach() and you will have everything you need to debug. > >> - If you don't, dtrace is the right tool to use in all cases anyway. > > > > dtrace don't executed in may case w/ diagnostic "dtrace: processing > > aborted: Abort due to systemic unresponsiveness". This is for > > tcp_close. May be tcp_twstart will be more successuful, may be not. > > It does and will. > > > Also, using dtrace too complex in production (need complex startup > > under screen and capture output) and for many peoples. > > kdb_backtrace() have too less administrative overhead. > > I still think it is overkill. The main goal of this change is to fix a > quite tricky and old TCP stack locking issue. Let's try to do that > first, it is complex enough by itself. > > Once the fix is validated and pushed, feel free to propose your own > patch/review to add kdb_backtrace(), log(), etc.. to get other devs > point of view. > > I don't remember who said: "Never ever optimize error cases"... This is not optimeze error cases, this is error recovery and diagnostic of error cases in other subsystems. Currently FreeBSD internals too complex for just always trust on correct of other subsystem or do panic on any incosystency. INVARIANTS too expensive now (20Gbit drops to 8Gbits). PS: I am applay patch. Wait till monday. Thanks very match for this hard work!