From owner-freebsd-net@FreeBSD.ORG Fri Mar 7 12:58:40 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 733DC637 for ; Fri, 7 Mar 2014 12:58:40 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3031AC14 for ; Fri, 7 Mar 2014 12:58:40 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WLu7e-0003aI-7s for freebsd-net@freebsd.org; Fri, 07 Mar 2014 13:43:30 +0100 Received: from 217.30.88.7 ([217.30.88.7]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Mar 2014 13:43:30 +0100 Received: from jcharbon by 217.30.88.7 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 07 Mar 2014 13:43:30 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Julien Charbon Subject: Re: TCP stack lock contention with short-lived connections Date: Fri, 07 Mar 2014 13:43:19 +0100 Lines: 85 Message-ID: <5319BEE7.607@verisign.com> References: <20140306215743.GB47921@funkthat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 217.30.88.7 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 In-Reply-To: <20140306215743.GB47921@funkthat.com> Cc: jmg@funkthat.com X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Mar 2014 12:58:40 -0000 Hi John, On 06/03/14 22:57, John-Mark Gurney wrote: > Julien Charbon wrote this message on Thu, Feb 27, 2014 at 11:32 +0100: >> [...] >> Any thoughts on this particular behavior? > > One thing that I noticed is that you now lock/unlock the tw and inp lock a > lot... Have you thought about grabing the TW lock once, grabbing some/all > of the ones necessary to process and then process them in a second step? > If the bulk processing is still an issue, then you could process them in > blocks of 50 or so, that way the number of lock/unlock cycles is reduced... First thanks, feedback are highly valuable to us. In first place, I indeed tried a kind of bulk processing enforcement with something like: tcp_tw_2msl_scan() { struct tcptw *tw; int i, end = 0, count = 100; for (;;) { INP_INFO_WLOCK(&V_tcbinfo); for (i = 0; i < count; ++i) { tw = TAILQ_FIRST(&V_twq_2msl); if (tw == NULL || (tw->tw_time - ticks) > 0)) { end = 1; break; } INP_WLOCK(tw->tw_inpcb); tcp_twclose(tw, 0); } if (end) break; INP_INFO_WUNLOCK(&V_tcbinfo); } return (NULL); } And I got best result with 'count' set to 1, this led us to current proposed patch. Thus main goal here is somehow to prioritize the NIC interruption handler calls against tcp_tw_2msl_scan() call in INP_INFO battle. As you proposed, we can add a sysctl(or a #define) to configure the maximum of tw objects to be cleaned up under a same INP_INFO_WLOCK() call, to have a finer control on how the tw objects are enforced. That said, our high level solution is to consider the NIC interruption code path (i.e. tcp_input()) as critical and remove almost most contention points on it, which is our long term goal. This change is just a step on this (long and not straightforward) path. >> +/* >> + * Drop a refcount on an tw elevated using tw_pcbref(). If it is >> + * valid, we return with the tw lock held. >> + */ > > I assume you mean that you return with the tw lock unlocked? at least > that's what the code reads to me... > >> +static int >> +tw_pcbrele(struct tcptw *tw) >> +{ >> + TW_WLOCK_ASSERT(V_tw_lock); >> + KASSERT(tw->tw_refcount > 0, ("%s: refcount 0", __func__)); >> + >> + if (!refcount_release(&tw->tw_refcount)) { >> + TW_WUNLOCK(V_tw_lock); >> + return (0); >> + } >> + >> + uma_zfree(V_tcptw_zone, tw); >> + TW_WUNLOCK(V_tw_lock); >> + return (1); >> +} You are completely right, my mistake. I will update the comment. > Otherwise looks like a good patch... Thanks again for your time. -- Julien