From owner-freebsd-net@FreeBSD.ORG Fri May 23 21:37:42 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1B66CEF for ; Fri, 23 May 2014 21:37:42 +0000 (UTC) Received: from mail-pb0-x22e.google.com (mail-pb0-x22e.google.com [IPv6:2607:f8b0:400e:c01::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9448D290E for ; Fri, 23 May 2014 21:37:42 +0000 (UTC) Received: by mail-pb0-f46.google.com with SMTP id rq2so4663977pbb.5 for ; Fri, 23 May 2014 14:37:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=KgztaJO+UltdvUKl/DKEkoHnYJOHkZZL9Nt3cKJx8TI=; b=UoQznAG5cysKuJ2fCzVGfQ9AullOUjPcAnTL2w6oaU6v/asGX4FD5OKEjHmFDRI5/3 SodAJHM3aAoOKanr2jyBkSQ+wW3RklF6eG2R5/52n8vRYTwvW/np1CZ0G4ZaVjiGHiB+ IejdOYs1r9MDedVNfI+KORwEbeY1Q/s4xSaF9+492D3EzD52NgePZpnsJoN8z8YTH2QF xQOQHqqnrsi7m2TL5PQPVWzMfFLcS4ZN/3ZOUszthtLlzxAMJTGhXj7Q319EIytHrdQq 0miO4TOi1/QXZHIUpySvc8k7zl24mz2fOVR6JMihb55f7fIVod+qKVxIvaUarjW/UAIU NUVg== X-Received: by 10.67.23.135 with SMTP id ia7mr9194757pad.5.1400881062173; Fri, 23 May 2014 14:37:42 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id de5sm6148239pbc.66.2014.05.23.14.37.41 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 23 May 2014 14:37:41 -0700 (PDT) Sender: Navdeep Parhar Message-ID: <537FBFA4.1010902@FreeBSD.org> Date: Fri, 23 May 2014 14:37:40 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Julien Charbon , freebsd-net@freebsd.org Subject: Re: TCP stack lock contention with short-lived connections References: <537F39DF.1090900@verisign.com> <537FB51D.2060401@verisign.com> In-Reply-To: <537FB51D.2060401@verisign.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 May 2014 21:37:42 -0000 On 05/23/14 13:52, Julien Charbon wrote: > > Hi, > > On 23/05/14 14:06, Julien Charbon wrote: >> On 27/02/14 11:32, Julien Charbon wrote: >>> On 07/11/13 14:55, Julien Charbon wrote: >>>> On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon >>>> wrote: >>>>> I have put technical and how-to-repeat details in below PR: >>>>> >>>>> kern/183659: TCP stack lock contention with short-lived connections >>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=183659 >>>>> >>>>> We are currently working on this performance improvement effort; it >>>>> will impact only the TCP locking strategy not the TCP stack logic >>>>> itself. We will share on freebsd-net the patches we made for >>>>> reviewing and improvement propositions; anyway this change might also >>>>> require enough eyeballs to avoid tricky race conditions introduction >>>>> in TCP stack. > > Joined the two cumulative patches (tcp-scale-inp-list-v1.patch and > tcp-scale-pcbinfo-rlock-v1.patch) we discussed the most at BSDCan 2014. > > First one is (tcp-scale-inp-list-v1.patch): > > [tcp-scaling] Introduce the INP_LIST global mutex for protecting pcbinfo > global structures > https://github.com/verisign/freebsd/commit/12c62273f052911aabe6ed283cea76cdd72c9493 > > > This change improves nothing in performance (neither degrades), it > simply implements what we are trying to achieve: Decompose further > pcbinfo lock (aka ipi_lock or INP_INFO). > > Ideally, pcbinfo globally shared structures are protected by leaf > mutexes (mutexes that are taken last), not by a root mutex (mutex taken > first). The current lock ordering is: > > ipi_lock > inpcb lock > ipi_hash_lock, pcbgroup locks > > ipi_lock being a root mutex is explained by its original task: Protect > the pcbinfo as a whole. > > Then, with this change, we added a new ipi_list_lock leaf mutex > dedicated to protect structures previously under ipi_lock umbrella, i.e.: > > - inpcb global list: ipi_listhead > - inpcb global list counter: ipi_count > - inpcb global list generated index: ipi_gencnt > > and it permits to implement the second (meatier) change > (tcp-scale-pcbinfo-rlock-v1.patch): > > [alpha][tcp-scaling] Use INP_INFO_RLOCK in critical path, and use > INP_INFO_WLOCK in full INP loops. > https://github.com/verisign/freebsd/commit/4633ac8c0b8d379fbda5fb9ffc921c2e4786db46 > > > Now that ipi_lost has lost is duty to protect pcbinfo globally shared > structures, its last (clear) duty is to hold inp creation/destruction > when a full traversal of global inp list is performed, as this > traversals expect inp list to be stable, e.g.: > > tcp_ccalgounload() > https://github.com/verisign/freebsd/blob/388f0a87958fde5e644e01798f44b58588eb1dc2/sys/netinet/tcp_subr.c#L848 > > > Thus (performance-wise) critical paths can now take ipi_lock _read_ > lock, e.g.: > > tcp_input() > tcp_usr_shutdown() > tcp_usr_close() > tcp_twstart() > > and, on the other side, functions performing full inp list traversal > will take the INP_INFO _write_ lock: > > tcp_ccalgounload() > tcp_pcblist() > in_pcbpurgeif0() > etc... > > This patch doubles the performance improvement with our short-live TCP > workload. > > _However_ it would be a miracle that this change does not introduce new > race condition(s) (hence the 'alpha' tag in commit message). Most of > TCP stack locking strategy being now on inpcb lock shoulders. That > said, from our tests point of view, this change is completely stable: No > kernel/lock assertion, no unexpected TCP behavior, stable performance > results. Moreover, before tagging this change as 'beta' we need to test > more thoroughly these features: > > - VNET, > - PCBGROUP/RSS/TCP timer per cpu, > - TCP Offloading (we need a NIC with a good TCP offloading support) I can assess the impact (and fix any fallout) on the parts of the kernel that deal with TCP_OFFLOAD, and the TOE driver in dev/cxgbe/tom. But I was hoping to do that only after there was general agreement on net@ that these locking changes are sound and should be taken into HEAD. Lack of reviews seems to be holding this back, correct? Regards, Navdeep > > Early testers, test ideas, reviewers and memories about previous (and > not documented or unclear) ipi_lock duties are more than welcome. > > Thanks. > > -- > Julien > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >