From owner-freebsd-net@FreeBSD.ORG Wed Mar 14 12:50:46 2007 Return-Path: X-Original-To: net@freebsd.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B1B0116A400 for ; Wed, 14 Mar 2007 12:50:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8373813C448 for ; Wed, 14 Mar 2007 12:50:46 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id EADB746F32; Wed, 14 Mar 2007 07:50:45 -0500 (EST) Date: Wed, 14 Mar 2007 13:50:45 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Keith Arner In-Reply-To: <8e552a500703140531l39f6fae5o518ee271b879eb1a@mail.gmail.com> Message-ID: <20070314134013.P60010@fledge.watson.org> References: <45C0CA5D.5090903@incunabulum.net> <45E6BEE0.2050307@FreeBSD.org> <45E6C22D.7060200@freebsd.org> <45E6D70C.10104@FreeBSD.org> <45EEB086.3050409@FreeBSD.org> <45F03269.7050705@FreeBSD.org> <45F08F1D.5080708@us.fujitsu.com> <20070310035135.B30274@fledge.watson.org> <8e552a500703102157p1845926au65bb3adaf81c01c0@mail.gmail.com> <20070311073249.R20646@fledge.watson.org> <8e552a500703140531l39f6fae5o518ee271b879eb1a@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@freebsd.org Subject: Re: netisr_direct X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Mar 2007 12:50:46 -0000 On Wed, 14 Mar 2007, Keith Arner wrote: > On 3/11/07, Robert Watson wrote: >> >> Yes -- right now the in-bound TCP path is essentially serialized because of >> the tcbinfo lock. The reason for this is that the tcbinfo lock doesn't >> just protect the inpcb chains during lookup, but also effectively acts as a >> reference to prevent the inpcb from being freed during input processing. >> There are several ways we could start to reduce contention on that lock: > > So, why is the tcbinfo lock being used to protect the pcb from deletion? Why > isn't the INP_LOCK on the pcb used, instead? The reasoning here is a little complex, and has to do with combining two uses of the tcbinfo lock. The tcbinfo lock is before the inpcb lock in the lock order, as you need to access the tcbinfo lists in order to acquire a reference to the inpcb. tcp_input() will always acquire a tcbinfo lock (whether one as today, or one of several in the future) in order to look up the inpcb. tcp_input() will then also acquire an inpcb lock to protect individual connection state. There are then two cases: simple cases, where we know we don't need to access the lists again, and then complex cases where we may need to access the list. A typical example of the former is a straight ACK in the fast path, which will modify per-connection state only, and a typical example of the latter is a RST where we will tear down connection, which may remove the inpcb from the global lists. In the former case, we do release the tcbinfo lock (in most cases) once we have decided that we won't need it; in the latter case we hold it because re-acquiring the lock would require dropping the inpcb lock for lock order reasons should the connection close. This is where moving to a reference count would help us: it would allow releasing both locks while maintaining a valid pointer to the inpcb, in turn letting us drop the tcbinfo lock and then re-acquire it later if we do hit a connection close case. This could use some refinement, and there are probably more cases we could be dropping the tcbinfo lock. BTW, in 7.x there is significantly less contention on the pcbinfo lock because it's no longer acquired in any of the common send and receive paths in TCP, whereas previously it was. This significantly lowers contention between the upper/lower halves of the kernel: that is, between a user thread performing send or receive on a TCP socket and netisr processing. In 6.x, the pcbinfo lock is used more extensively in order to prevent the inpcb from being freed. The change I've made in 7.x is to guarantee that so_pcb will always be valid for a properly referenced socket, keeping the inpcb around until the socket is freed in the case of a reset, rather than leaving the socket without the inpcb (and hence requiring a lock to keep so_pcb valid). Robert N M Watson Computer Laboratory University of Cambridge