From owner-freebsd-net@FreeBSD.ORG  Wed Mar 14 12:50:46 2007
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: net@freebsd.org
Delivered-To: freebsd-net@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B1B0116A400
	for <net@freebsd.org>; Wed, 14 Mar 2007 12:50:46 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 8373813C448
	for <net@freebsd.org>; Wed, 14 Mar 2007 12:50:46 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id EADB746F32;
	Wed, 14 Mar 2007 07:50:45 -0500 (EST)
Date: Wed, 14 Mar 2007 13:50:45 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Keith Arner <vornum@gmail.com>
In-Reply-To: <8e552a500703140531l39f6fae5o518ee271b879eb1a@mail.gmail.com>
Message-ID: <20070314134013.P60010@fledge.watson.org>
References: <45C0CA5D.5090903@incunabulum.net> <45E6BEE0.2050307@FreeBSD.org>
	<45E6C22D.7060200@freebsd.org> <45E6D70C.10104@FreeBSD.org> 
	<45EEB086.3050409@FreeBSD.org> <45F03269.7050705@FreeBSD.org> 
	<45F08F1D.5080708@us.fujitsu.com>
	<20070310035135.B30274@fledge.watson.org>
	<8e552a500703102157p1845926au65bb3adaf81c01c0@mail.gmail.com> 
	<20070311073249.R20646@fledge.watson.org>
	<8e552a500703140531l39f6fae5o518ee271b879eb1a@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@freebsd.org
Subject: Re: netisr_direct
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Mar 2007 12:50:46 -0000


On Wed, 14 Mar 2007, Keith Arner wrote:

> On 3/11/07, Robert Watson <rwatson@freebsd.org> wrote:
>> 
>> Yes -- right now the in-bound TCP path is essentially serialized because of 
>> the tcbinfo lock.  The reason for this is that the tcbinfo lock doesn't 
>> just protect the inpcb chains during lookup, but also effectively acts as a 
>> reference to prevent the inpcb from being freed during input processing. 
>> There are several ways we could start to reduce contention on that lock:
>
> So, why is the tcbinfo lock being used to protect the pcb from deletion? Why 
> isn't the INP_LOCK on the pcb used, instead?

The reasoning here is a little complex, and has to do with combining two uses 
of the tcbinfo lock.  The tcbinfo lock is before the inpcb lock in the lock 
order, as you need to access the tcbinfo lists in order to acquire a reference 
to the inpcb.  tcp_input() will always acquire a tcbinfo lock (whether one as 
today, or one of several in the future) in order to look up the inpcb. 
tcp_input() will then also acquire an inpcb lock to protect individual 
connection state.

There are then two cases: simple cases, where we know we don't need to access 
the lists again, and then complex cases where we may need to access the list. 
A typical example of the former is a straight ACK in the fast path, which will 
modify per-connection state only, and a typical example of the latter is a RST 
where we will tear down connection, which may remove the inpcb from the global 
lists.  In the former case, we do release the tcbinfo lock (in most cases) 
once we have decided that we won't need it; in the latter case we hold it 
because re-acquiring the lock would require dropping the inpcb lock for lock 
order reasons should the connection close.  This is where moving to a 
reference count would help us: it would allow releasing both locks while 
maintaining a valid pointer to the inpcb, in turn letting us drop the tcbinfo 
lock and then re-acquire it later if we do hit a connection close case.  This 
could use some refinement, and there are probably more cases we could be 
dropping the tcbinfo lock.

BTW, in 7.x there is significantly less contention on the pcbinfo lock because 
it's no longer acquired in any of the common send and receive paths in TCP, 
whereas previously it was.  This significantly lowers contention between the 
upper/lower halves of the kernel: that is, between a user thread performing 
send or receive on a TCP socket and netisr processing.  In 6.x, the pcbinfo 
lock is used more extensively in order to prevent the inpcb from being freed. 
The change I've made in 7.x is to guarantee that so_pcb will always be valid 
for a properly referenced socket, keeping the inpcb around until the socket is 
freed in the case of a reset, rather than leaving the socket without the inpcb 
(and hence requiring a lock to keep so_pcb valid).

Robert N M Watson
Computer Laboratory
University of Cambridge