From owner-freebsd-stable@freebsd.org Thu Oct 13 12:01:59 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F28DC0FB8E for ; Thu, 13 Oct 2016 12:01:59 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-qk0-f177.google.com (mail-qk0-f177.google.com [209.85.220.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 072BC666 for ; Thu, 13 Oct 2016 12:01:58 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-qk0-f177.google.com with SMTP id n189so85029374qke.0 for ; Thu, 13 Oct 2016 05:01:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=mZ4RJDpkoQln62QShN+/m/XifmxBTrvMFO33zBvLKa8=; b=ddfgUGheDi45kpXLXTQwzoMxhmLOsIDXMsyMu1GHLyFlPLPYfztNa8Da+nceaTgGv1 IV4Oxyi3erXDDn3iALrXdlnbr53LYyABoMiwnkyaReSp3pclEzcGBENDM728D2QX84br G863qPTz/gnYiDnTYk0VOJhFPY/Nh0HMYk6gFZagDvGX9tIuxcZpeI6l+PmssXdO+2xf 46L7+ZrNWn49NYyzjln8PMcrzq9DWFYYqdmLzvO503eDTsYVSdMRIPoAXiSRdfFG+KfU ZKvxUHnJTnIauffjU/Ul/ip7y5UnJXo5tcotTpIm8OH2AXOy02xVuSE3ZUOWk23zmewb 2Ejg== X-Gm-Message-State: AA6/9RnoiKF2Q76TRVaDqgYm1iUfiEbneJhSOck8NRxTHs7tgkX8flfgx8OLPvPpEXjvlA== X-Received: by 10.55.140.131 with SMTP id o125mr5632449qkd.17.1476359785368; Thu, 13 Oct 2016 04:56:25 -0700 (PDT) Received: from [10.100.64.26] ([217.30.88.7]) by smtp.gmail.com with ESMTPSA id a94sm4574417qkh.11.2016.10.13.04.56.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 13 Oct 2016 04:56:24 -0700 (PDT) Subject: Re: 11.0 stuck on high network load To: Slawa Olhovchenkov References: <20161012084045.GA57714@zxy.spb.ru> <20161012092945.GB57714@zxy.spb.ru> <4b0d4b58-6d13-3cd5-6991-27163f27acca@freebsd.org> <20161012095233.GC57714@zxy.spb.ru> <20161012121322.GB57876@zxy.spb.ru> <62d8861c-673e-6d86-e96e-751399e505e5@freebsd.org> <20161012130103.GD57714@zxy.spb.ru> <20161012154229.GC57876@zxy.spb.ru> Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara From: Julien Charbon Message-ID: Date: Thu, 13 Oct 2016 13:56:21 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161012154229.GC57876@zxy.spb.ru> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="IaBdSCtk01RvXNmT49NC0DLBOsIESQDPV" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Oct 2016 12:01:59 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --IaBdSCtk01RvXNmT49NC0DLBOsIESQDPV Content-Type: multipart/mixed; boundary="3JqlvwLibke83UhlriI3GVwN2NOMKIGGQ"; protected-headers="v1" From: Julien Charbon To: Slawa Olhovchenkov Cc: Konstantin Belousov , freebsd-stable@FreeBSD.org, hiren panchasara Message-ID: Subject: Re: 11.0 stuck on high network load References: <20161012084045.GA57714@zxy.spb.ru> <20161012092945.GB57714@zxy.spb.ru> <4b0d4b58-6d13-3cd5-6991-27163f27acca@freebsd.org> <20161012095233.GC57714@zxy.spb.ru> <20161012121322.GB57876@zxy.spb.ru> <62d8861c-673e-6d86-e96e-751399e505e5@freebsd.org> <20161012130103.GD57714@zxy.spb.ru> <20161012154229.GC57876@zxy.spb.ru> In-Reply-To: <20161012154229.GC57876@zxy.spb.ru> --3JqlvwLibke83UhlriI3GVwN2NOMKIGGQ Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Slawa, On 10/12/16 5:42 PM, Slawa Olhovchenkov wrote: > On Wed, Oct 12, 2016 at 05:17:35PM +0200, Julien Charbon wrote: >=20 >>>>>>>>>> I see, thus just for the context: The TCP stack in sys/dev/c= xgb* is a >>>>>>>>>> TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a >>>>>>>>>> separate/side TCP stack that is used only with TCP_OFFLOAD opt= ion. >>>>>>>>>> >>>>>>>>>> This TOE TCP stack actually has its own set of detach()/input= () >>>>>>>>>> functions and seems to check INP_DROPPED flag properly. I gue= ss @np >>>>>>>>>> check fixes in socket TCP stack and decides which one can also= impact >>>>>>>>>> the Chelsio TOE TCP stack. Some bugs are only in socket TCP s= tack, some >>>>>>>>>> are only in TOE TCP stack. >>>>>>>>> >>>>>>>>> I am fear about other direction -- setting INP_TIMEWAIT in Chel= sio TOE >>>>>>>>> TCP stack and impact this to >>>>>>>>> tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach() path. >>>>>>>> >>>>>>>> I see, I expect no problem on this side as tcp_timer_2msl() che= cks the >>>>>>>> INP_TIMEWAIT flag and do not call tcp_close() if set. >>>>>>> >>>>>>> I am about case when at time of first INP_WUNLOCK() tcp_timer_2ms= l() >>>>>>> don't see INP_TIMEWAIT, call tcp_close(), tcp_close() do INP_WUNL= OCK() >>>>>>> and now Chelsio TOE take INP_WLOCK, do tcp_twstart() and set >>>>>>> INP_TIMEWAIT. After this tcp_timer_2msl resume and have unexpecte= d >>>>>>> INP_TIMEWAIT in tcp_usr_detach(). >>>>>> >>>>>> Sure, basically the same bug that in classic TCP stack. If you t= hink >>>>>> it can happen, send an email describing that to np@ and he will ch= eck >>>>>> and fix that. He is a TOE TCP stack expert and I am not. In all = cases, >>>>>> if this issue is possible in TOE TCP stack context, the patch will= be >>>>>> straightforward: If the INP_DROPPED flag is set do not call tcp_t= wstart(). >=20 > I am email to np@ >=20 >>>>>> The current patch focuses only on the classic TCP stack. >>>>> >>>>> May be current workaround (with logging) in tcp_usr_detach() is goo= d >>>>> solutuion for preventing system lockout by similar bugs? >>>> >>>> Good question, the quick workaround in tcp_usr_detach() does not ha= ndle >>>> all the cases. If it reduces the number of crashes you can still fi= nd >>>> scenarios where it can have unexpected side effect. >>> >>> This is best then guaranted lockout. >>> >>>> Long term solution is to enforce: If the inp has the INP_DROPPED f= lag >>>> just stop processing it and return. If you grep the INP_DROPPED fla= g in >>>> kernel sources, you can see that this test is already done in almost= all >>>> tcp_*() processing functions but tcp_input(). >>>> >>>> I would say that even without this issue tcp_input() should check >>>> INP_DROPPED flags after INP_WLOCK anyway. Same for the TOE TCP stac= k, >>>> you are simply not supposed to process a inp with INP_DROPPED flag. >>> >>> Absolutly acceptant! >>> May point is: more check and good handling of check result is best fo= r >>> stability. >>> >>> I.e. AND check INP_DROPPED in tcp_input AND workaroud INP_TIMEWAIT in= >>> tcp_usr_detach (with logging) and check of some posible cases in XXX = TOE. >>> >>> Current TCP stack too complex and have many corner cases. This is nee= d >>> additional guards where posible (not caused kernel panic). >> >> I see your point: Even if this issue is caught by this assert: >> >> KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && " >> "INP_DROPPED && tp !=3D NULL")); >> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp= _usrreq.c#L213 >> >> you might not have INVARIANT option, then you will get a lockout quit= e >> difficult to debug. Thus what we can do is: >> >> - If INVARIANT is set: kernel panic to get all the details in the co= re. >> - If INVARIANT is not set: Log this error with an explicit kernel >> log(LOG_ERR) describing the issue, and then use the workaround to avoi= d >> the double-free to let the system to good enough state. >> >> Something like: >=20 > Yes, thanks! Proposed changes added in the review: https://reviews.freebsd.org/D8211 tell me when you have three days without issue with this change. >> tcp_detach() { >> >> ... >> if (inp->inp_flags & INP_TIMEWAIT) { >> >> ... >> if (inp->inp_flags & INP_DROPPED) { >> >> in_pcbdetach(inp); >> if (__predict_true(tp =3D=3D NULL)) { >> in_pcbfree(inp); >> } else { >> #ifdef INVARIANTS >> panic("tcp_detach: tp !=3D NULL, That's not good because 'blah= '\n"); >> #else >> log(LOG_ERR, "tcp_detach: tp !=3D NULL, That's not good becaus= e >> 'blah'\n"); >=20 > May be some more info in log can help to detect root cause of issuse? > I am don't know what info, may be flags or number of references? For this kind of issue, the useful part is the stacktrace. INVARIANT will give you that trace in the core, and without INVARIANT then it is better to use dtrace: $ cat tcp-twstart-dropped.d fbt::tcp_twstart:entry /args[0]->t_inpcb->inp_flags & 0x04000000/ { stack(); printf("INP_DROPPED in tcp_twstart: %x", args[0]->t_inpcb->inp_flags); } -- Julien --3JqlvwLibke83UhlriI3GVwN2NOMKIGGQ-- --IaBdSCtk01RvXNmT49NC0DLBOsIESQDPV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJX/3ZpAAoJEKVlQ5Je6dhxffgIAOs9QX6B0NiH5oaiTPhODLEX MRP9AcYRipoFEw4xIv07mI/Bqy8uAV062ZsYxjC3OppAH+bkNqIHX8exdpz5Hrct MvFzBtAWU5/t7zk6B4ZokLerUo5r9F6rkAjqgCFLK/WoU2EesH77GX1JFlo18qSn RqArn8F4AvvKFonKut21KChOoUB3VJ9fYKLOrcwheC8LRApTyPxQy6KyITxq8+p3 xxi1oc44TO0WK78OzWj1w9/tzw61oxaEmEmWDm2LNAkOEuF2fkq5aLUjPGrkaBVj xg1pXtbEipUK4gtHxqbTmz4dMsdEzZcZrH0w5tpVhodi988d7njzOpku/tJgsUE= =XhjT -----END PGP SIGNATURE----- --IaBdSCtk01RvXNmT49NC0DLBOsIESQDPV--