From owner-freebsd-stable@FreeBSD.ORG Sun Aug 3 10:54:43 2008 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 950301065683 for ; Sun, 3 Aug 2008 10:54:43 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5D6198FC1B for ; Sun, 3 Aug 2008 10:54:43 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 9CA7446B62 for ; Sun, 3 Aug 2008 06:54:42 -0400 (EDT) Date: Sun, 3 Aug 2008 11:54:42 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: stable@FreeBSD.org Message-ID: User-Agent: Alpine 1.10 (BSF 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: Subject: HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch near you X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Aug 2008 10:54:43 -0000 Dear all: This is an advance warning that, late next week, I will be merging a fairly large set of changes to the IPv4 and IPv6 protocols layered over the inpcb/inpcbinfo kernel infrastructure. To be specific, this affects TCP, UDP, and raw sockets on both IPv4 and IPv6. I will post a further e-mail announcement along with patch set and schedule in a day or two once it's prepared. The thrust of this change is to replace the mutexes protecting the inpcb and inpcbinfo data structures with read-write locks (rwlocks). These structures represent, respectively, particular sockets and the global socket lists for all socket types in IPv4 and IPv6 except for SCTP. When you run netstat, inpcbinfo is the data structure referencing all connections, and each line in the nestat output reflects the contents of a specific inpcb. In the current stage of this work, the intent is to improve performance for datagram-related protocols on SMP systems by allowing concurrent acquisition of both global and connection locks during receive and transmit. This is possible because, in the common case, no connection or global state is modified during UDP/raw receive and transmit at the IP layer, so a read lock is sufficient to prevent data in those structures from unexpectedly changing. For receive, socket layer state is modified, but this is separately protected by socket layer locks. On transmit, no state is modified at any layer, so in principle we will allow fully parallel transmit from multiple threads down to about the routing and network interface layers, whereas previously they would bottleneck in UDP. The applications targeted by this change are threaded UDP server applications, such as BIND9, nsd, and UDP-based memcached. Kris Kennaway and Paul Saab have done fairly extensive testing with the changes and demonstrated significant performance improvements due to reduced contention and overhead. Perhaps they can mention some of those numbers in a follow-up to this post. The reason for the heads up is that, while carefully-tested, changes of this sort do come with risks. We've carefully structured them so as to avoid breaking the ABIs for netstat, etc, but it's not impossible that some problems will arise as the changes settle. The goal, however, is to see these performance improvements in 7.1, and since they've had a bit to shake out in 8.x and seen some heavy use, I think now is the right time to merge them. In any case, I will send out e-mail in a couple of days with a proposed merge patch and schedule for merging, and perhaps if you are in a positition where you might benefit from these improvements, or have interesting UDP or raw-socket based applications running on 7.x, you could test the candidate patch before it's merged, reporting any problems. Unless I receive negative feedback, I will plan on merging the changes late in the week, and keep a close eye on stable@ for any reports of problems. Thanks, Robert N M Watson Computer Laboratory University of Cambridge