From owner-freebsd-stable@freebsd.org Wed Oct 12 13:01:05 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DE006C0E158 for ; Wed, 12 Oct 2016 13:01:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9CFD075; Wed, 12 Oct 2016 13:01:05 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1buJ9b-000Nuh-A8; Wed, 12 Oct 2016 16:01:03 +0300 Date: Wed, 12 Oct 2016 16:01:03 +0300 From: Slawa Olhovchenkov To: Julien Charbon Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara Subject: Re: 11.0 stuck on high network load Message-ID: <20161012130103.GD57714@zxy.spb.ru> References: <20161011121145.GJ6177@zxy.spb.ru> <20161012084045.GA57714@zxy.spb.ru> <20161012092945.GB57714@zxy.spb.ru> <4b0d4b58-6d13-3cd5-6991-27163f27acca@freebsd.org> <20161012095233.GC57714@zxy.spb.ru> <20161012121322.GB57876@zxy.spb.ru> <62d8861c-673e-6d86-e96e-751399e505e5@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <62d8861c-673e-6d86-e96e-751399e505e5@freebsd.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2016 13:01:06 -0000 On Wed, Oct 12, 2016 at 02:35:11PM +0200, Julien Charbon wrote: > > Hi Slawa, > > On 10/12/16 2:13 PM, Slawa Olhovchenkov wrote: > > On Wed, Oct 12, 2016 at 02:06:59PM +0200, Julien Charbon wrote: > >>>>>>> sofree() call tcp_usr_detach() and in tcp_usr_detach() we have > >>>>>>> unexpected INP_TIMEWAIT. > >>>>>> > >>>>>> I see, thus just for the context: The TCP stack in sys/dev/cxgb* is a > >>>>>> TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a > >>>>>> separate/side TCP stack that is used only with TCP_OFFLOAD option. > >>>>>> > >>>>>> This TOE TCP stack actually has its own set of detach()/input() > >>>>>> functions and seems to check INP_DROPPED flag properly. I guess @np > >>>>>> check fixes in socket TCP stack and decides which one can also impact > >>>>>> the Chelsio TOE TCP stack. Some bugs are only in socket TCP stack, some > >>>>>> are only in TOE TCP stack. > >>>>> > >>>>> I am fear about other direction -- setting INP_TIMEWAIT in Chelsio TOE > >>>>> TCP stack and impact this to > >>>>> tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach() path. > >>>> > >>>> I see, I expect no problem on this side as tcp_timer_2msl() checks the > >>>> INP_TIMEWAIT flag and do not call tcp_close() if set. > >>> > >>> I am about case when at time of first INP_WUNLOCK() tcp_timer_2msl() > >>> don't see INP_TIMEWAIT, call tcp_close(), tcp_close() do INP_WUNLOCK() > >>> and now Chelsio TOE take INP_WLOCK, do tcp_twstart() and set > >>> INP_TIMEWAIT. After this tcp_timer_2msl resume and have unexpected > >>> INP_TIMEWAIT in tcp_usr_detach(). > >> > >> Sure, basically the same bug that in classic TCP stack. If you think > >> it can happen, send an email describing that to np@ and he will check > >> and fix that. He is a TOE TCP stack expert and I am not. In all cases, > >> if this issue is possible in TOE TCP stack context, the patch will be > >> straightforward: If the INP_DROPPED flag is set do not call tcp_twstart(). > >> > >> The current patch focuses only on the classic TCP stack. > > > > May be current workaround (with logging) in tcp_usr_detach() is good > > solutuion for preventing system lockout by similar bugs? > > Good question, the quick workaround in tcp_usr_detach() does not handle > all the cases. If it reduces the number of crashes you can still find > scenarios where it can have unexpected side effect. This is best then guaranted lockout. > Long term solution is to enforce: If the inp has the INP_DROPPED flag > just stop processing it and return. If you grep the INP_DROPPED flag in > kernel sources, you can see that this test is already done in almost all > tcp_*() processing functions but tcp_input(). > > I would say that even without this issue tcp_input() should check > INP_DROPPED flags after INP_WLOCK anyway. Same for the TOE TCP stack, > you are simply not supposed to process a inp with INP_DROPPED flag. Absolutly acceptant! May point is: more check and good handling of check result is best for stability. I.e. AND check INP_DROPPED in tcp_input AND workaroud INP_TIMEWAIT in tcp_usr_detach (with logging) and check of some posible cases in XXX TOE. Current TCP stack too complex and have many corner cases. This is need additional guards where posible (not caused kernel panic).