Date: Fri, 8 Aug 2008 22:21:12 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: stable@FreeBSD.org Subject: Re: HEADS UP: inpcb/inpcbinfo rwlocking: coming to a 7-STABLE branch near you Message-ID: <alpine.BSF.1.10.0808082219360.16028@fledge.watson.org> In-Reply-To: <alpine.BSF.1.10.0808031142550.65130@fledge.watson.org> References: <alpine.BSF.1.10.0808031142550.65130@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 3 Aug 2008, Robert Watson wrote: > This is an advance warning that, late next week, I will be merging a fairly > large set of changes to the IPv4 and IPv6 protocols layered over the > inpcb/inpcbinfo kernel infrastructure. To be specific, this affects TCP, > UDP, and raw sockets on both IPv4 and IPv6. I will post a further e-mail > announcement along with patch set and schedule in a day or two once it's > prepared. Patches, which require the MFC of rwlock try-locking, which I did earlier today: http://www.watson.org/~robert/freebsd/netperf/20080808-7stable-rwlock-inpcb.diff These incude the inpcb/inpcbinfo read/write locking changes (although not yet for raw/divert sockets). Any testing, especially with heavy UDP loads, would be much appreciated -- this are fairly complex changes, and also quite a complex MFC. Robert N M Watson Computer Laboratory University of Cambridge > > The thrust of this change is to replace the mutexes protecting the inpcb and > inpcbinfo data structures with read-write locks (rwlocks). These structures > represent, respectively, particular sockets and the global socket lists for > all socket types in IPv4 and IPv6 except for SCTP. When you run netstat, > inpcbinfo is the data structure referencing all connections, and each line in > the nestat output reflects the contents of a specific inpcb. > > In the current stage of this work, the intent is to improve performance for > datagram-related protocols on SMP systems by allowing concurrent acquisition > of both global and connection locks during receive and transmit. This is > possible because, in the common case, no connection or global state is > modified during UDP/raw receive and transmit at the IP layer, so a read lock > is sufficient to prevent data in those structures from unexpectedly changing. > For receive, socket layer state is modified, but this is separately protected > by socket layer locks. On transmit, no state is modified at any layer, so in > principle we will allow fully parallel transmit from multiple threads down to > about the routing and network interface layers, whereas previously they would > bottleneck in UDP. > > The applications targeted by this change are threaded UDP server > applications, such as BIND9, nsd, and UDP-based memcached. Kris Kennaway and > Paul Saab have done fairly extensive testing with the changes and > demonstrated significant performance improvements due to reduced contention > and overhead. Perhaps they can mention some of those numbers in a follow-up > to this post. > > The reason for the heads up is that, while carefully-tested, changes of this > sort do come with risks. We've carefully structured them so as to avoid > breaking the ABIs for netstat, etc, but it's not impossible that some > problems will arise as the changes settle. The goal, however, is to see > these performance improvements in 7.1, and since they've had a bit to shake > out in 8.x and seen some heavy use, I think now is the right time to merge > them. > > In any case, I will send out e-mail in a couple of days with a proposed merge > patch and schedule for merging, and perhaps if you are in a positition where > you might benefit from these improvements, or have interesting UDP or > raw-socket based applications running on 7.x, you could test the candidate > patch before it's merged, reporting any problems. Unless I receive negative > feedback, I will plan on merging the changes late in the week, and keep a > close eye on stable@ for any reports of problems. > > Thanks, > > Robert N M Watson > Computer Laboratory > University of Cambridge > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.1.10.0808082219360.16028>