From owner-freebsd-stable@freebsd.org Fri Oct 14 10:02:25 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C61EC0F9DD for ; Fri, 14 Oct 2016 10:02:25 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5B23ED33; Fri, 14 Oct 2016 10:02:25 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1buzJm-000N6x-Na; Fri, 14 Oct 2016 13:02:22 +0300 Date: Fri, 14 Oct 2016 13:02:22 +0300 From: Slawa Olhovchenkov To: Julien Charbon Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara Subject: Re: 11.0 stuck on high network load Message-ID: <20161014100222.GO57714@zxy.spb.ru> References: <20161012130103.GD57714@zxy.spb.ru> <20161012154229.GC57876@zxy.spb.ru> <20161013143825.GK57714@zxy.spb.ru> <33ab0bfc-7009-95a7-7752-c2c439092e85@freebsd.org> <20161013151715.GL57714@zxy.spb.ru> <20161014093546.GN57714@zxy.spb.ru> <9f2eb420-897f-e231-834e-c8b256c81130@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9f2eb420-897f-e231-834e-c8b256c81130@freebsd.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Oct 2016 10:02:25 -0000 On Fri, Oct 14, 2016 at 11:48:38AM +0200, Julien Charbon wrote: > >>> Also, using dtrace too complex in production (need complex startup > >>> under screen and capture output) and for many peoples. > >>> kdb_backtrace() have too less administrative overhead. > >> > >> I still think it is overkill. The main goal of this change is to fix a > >> quite tricky and old TCP stack locking issue. Let's try to do that > >> first, it is complex enough by itself. > >> > >> Once the fix is validated and pushed, feel free to propose your own > >> patch/review to add kdb_backtrace(), log(), etc.. to get other devs > >> point of view. > >> > >> I don't remember who said: "Never ever optimize error cases"... > > > > This is not optimeze error cases, this is error recovery and > > diagnostic of error cases in other subsystems. > > Sure, I guess this quote is more geared toward: "Always spend 50x more > time on improving the main path than the error path". > > > Currently FreeBSD internals too complex for just always trust on > > correct of other subsystem or do panic on any incosystency. > > > > INVARIANTS too expensive now (20Gbit drops to 8Gbits). > > I do agree. I am not expert enough to see all the side effects of > calling kdb_backtrace() from the TCP stack, might be way too slow, > tricky in interruption context, etc. You can see that kdb_backtrace() I think about this. This is example take from netgraph and this similar case (about interruption context and etc). Occurrence to rare (one per day, may be one per two hour) for any overhead. OK, I am see you point: you expirence don't allow to put this code and need separete review and commit. Right, np. > is rarely called in the kernel source. That's why it is better if you > propose a review on adding this line to get comments from other devs on > just this question. > > > PS: I am applay patch. Wait till monday. > > > > Thanks very match for this hard work! > > No problem, thanks for your time. But it is not over yet: We have to > wait for final test. Currently system don't use Chelsio TOE, after monday I am update system with Chelsio TOE. With chelsio I am see this occurrence very rare, one in few month.