From owner-freebsd-net@FreeBSD.ORG Thu Nov 7 14:10:51 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 87974E5A for ; Thu, 7 Nov 2013 14:10:51 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 47DAC22BB for ; Thu, 7 Nov 2013 14:10:51 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1VeQ3f-0008T1-Ct for freebsd-net@freebsd.org; Thu, 07 Nov 2013 14:55:39 +0100 Received: from h87.s239.verisign.com ([216.168.239.87]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 07 Nov 2013 14:55:39 +0100 Received: from jcharbon by h87.s239.verisign.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 07 Nov 2013 14:55:39 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: "Julien Charbon" Subject: Re: TCP stack lock contention with short-lived connections Date: Thu, 07 Nov 2013 14:55:22 +0100 Lines: 119 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: h87.s239.verisign.com User-Agent: Opera Mail/1.0 (MacIntel) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 14:10:51 -0000 Hi list, On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon wrote: > just a follow-up of vBSDCon discussions about FreeBSD TCP performances > with short-lived connections. In summary: > > I have put technical and how-to-repeat details in below PR: > > kern/183659: TCP stack lock contention with short-lived connections > http://www.freebsd.org/cgi/query-pr.cgi?pr=183659 > > We are currently working on this performance improvement effort; it > will impact only the TCP locking strategy not the TCP stack logic > itself. We will share on freebsd-net the patches we made for reviewing > and improvement propositions; anyway this change might also require > enough eyeballs to avoid tricky race conditions introduction in TCP > stack. Just a follow-up: We are currently removing TCP INP_INFO lock from places it is actually not required in order to mitigate the lock contention. It seems to be a good first step in this effort: Small changes, easy to review, low risk (and small gain... right). Below a first patch that removes INP_INFO lock from tcp_usr_accept(): This changes simply follows the advice made in corresponding code comment: "A better fix would prevent the socket from being placed in the listen queue until all fields are fully initialized." For more technical details, check the comment in related change below: http://svnweb.freebsd.org/base?view=revision&revision=175612 With this patch applied we see no regressions and a performance improvement of ~5% i.e with 9.2 vanilla kernel: 52k TCP Queries Per Second, with 9.2 + joined patch: 55k TCP QPS. Not huge indeed but still an improvement. P.S.: Funny enough it seems that the same change has already been proposed in the past: http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034261.html -- Julien From: Julien Charbon Subject: [PATCH] Add new socket in listen queue only when fully initialized --- sys/netinet/tcp_syncache.c | 4 +++- sys/netinet/tcp_usrreq.c | 9 --------- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c index af1651a..eb73356 100644 --- a/sys/netinet/tcp_syncache.c +++ b/sys/netinet/tcp_syncache.c @@ -660,7 +660,7 @@ syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m) * connection when the SYN arrived. If we can't create * the connection, abort it. */ - so = sonewconn(lso, SS_ISCONNECTED); + so = sonewconn(lso, 0); if (so == NULL) { /* * Drop the connection; we will either send a RST or @@ -890,6 +890,8 @@ syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m) INP_WUNLOCK(inp); + soisconnected(so); + TCPSTAT_INC(tcps_accepts); return (so); diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c index b83f34a..566cc34 100644 --- a/sys/netinet/tcp_usrreq.c +++ b/sys/netinet/tcp_usrreq.c @@ -609,13 +609,6 @@ out: /* * Accept a connection. Essentially all the work is done at higher levels; * just return the address of the peer, storing through addr. - * - * The rationale for acquiring the tcbinfo lock here is somewhat complicated, - * and is described in detail in the commit log entry for r175612. Acquiring - * it delays an accept(2) racing with sonewconn(), which inserts the socket - * before the inpcb address/port fields are initialized. A better fix would - * prevent the socket from being placed in the listen queue until all fields - * are fully initialized. */ static int tcp_usr_accept(struct socket *so, struct sockaddr **nam) @@ -632,7 +625,6 @@ tcp_usr_accept(struct socket *so, struct sockaddr **nam) inp = sotoinpcb(so); KASSERT(inp != NULL, ("tcp_usr_accept: inp == NULL")); - INP_INFO_RLOCK(&V_tcbinfo); INP_WLOCK(inp); if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { error = ECONNABORTED; @@ -652,7 +644,6 @@ tcp_usr_accept(struct socket *so, struct sockaddr **nam) out: TCPDEBUG2(PRU_ACCEPT); INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); if (error == 0) *nam = in_sockaddr(port, &addr); return error;