Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Aug 2008 22:21:12 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        stable@FreeBSD.org
Subject:   Re: HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch near you
Message-ID:  <alpine.BSF.1.10.0808082219360.16028@fledge.watson.org>
In-Reply-To: <alpine.BSF.1.10.0808031142550.65130@fledge.watson.org>
References:  <alpine.BSF.1.10.0808031142550.65130@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sun, 3 Aug 2008, Robert Watson wrote:

> This is an advance warning that, late next week, I will be merging a fairly 
> large set of changes to the IPv4 and IPv6 protocols layered over the 
> inpcb/inpcbinfo kernel infrastructure.  To be specific, this affects TCP, 
> UDP, and raw sockets on both IPv4 and IPv6.  I will post a further e-mail 
> announcement along with patch set and schedule in a day or two once it's 
> prepared.

Patches, which require the MFC of rwlock try-locking, which I did earlier 
today:

   http://www.watson.org/~robert/freebsd/netperf/20080808-7stable-rwlock-inpcb.diff

These incude the inpcb/inpcbinfo read/write locking changes (although not yet 
for raw/divert sockets).  Any testing, especially with heavy UDP loads, would 
be much appreciated -- this are fairly complex changes, and also quite a 
complex MFC.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> The thrust of this change is to replace the mutexes protecting the inpcb and 
> inpcbinfo data structures with read-write locks (rwlocks).  These structures 
> represent, respectively, particular sockets and the global socket lists for 
> all socket types in IPv4 and IPv6 except for SCTP.  When you run netstat, 
> inpcbinfo is the data structure referencing all connections, and each line in 
> the nestat output reflects the contents of a specific inpcb.
>
> In the current stage of this work, the intent is to improve performance for 
> datagram-related protocols on SMP systems by allowing concurrent acquisition 
> of both global and connection locks during receive and transmit.  This is 
> possible because, in the common case, no connection or global state is 
> modified during UDP/raw receive and transmit at the IP layer, so a read lock 
> is sufficient to prevent data in those structures from unexpectedly changing. 
> For receive, socket layer state is modified, but this is separately protected 
> by socket layer locks.  On transmit, no state is modified at any layer, so in 
> principle we will allow fully parallel transmit from multiple threads down to 
> about the routing and network interface layers, whereas previously they would 
> bottleneck in UDP.
>
> The applications targeted by this change are threaded UDP server 
> applications, such as BIND9, nsd, and UDP-based memcached.  Kris Kennaway and 
> Paul Saab have done fairly extensive testing with the changes and 
> demonstrated significant performance improvements due to reduced contention 
> and overhead.  Perhaps they can mention some of those numbers in a follow-up 
> to this post.
>
> The reason for the heads up is that, while carefully-tested, changes of this 
> sort do come with risks.  We've carefully structured them so as to avoid 
> breaking the ABIs for netstat, etc, but it's not impossible that some 
> problems will arise as the changes settle.  The goal, however, is to see 
> these performance improvements in 7.1, and since they've had a bit to shake 
> out in 8.x and seen some heavy use, I think now is the right time to merge 
> them.
>
> In any case, I will send out e-mail in a couple of days with a proposed merge 
> patch and schedule for merging, and perhaps if you are in a positition where 
> you might benefit from these improvements, or have interesting UDP or 
> raw-socket based applications running on 7.x, you could test the candidate 
> patch before it's merged, reporting any problems.  Unless I receive negative 
> feedback, I will plan on merging the changes late in the week, and keep a 
> close eye on stable@ for any reports of problems.
>
> Thanks,
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.1.10.0808082219360.16028>