Date: Thu, 20 Jun 2002 01:36:53 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Aram Compeau <aram@tibco.com> Cc: "David E. Cross" <crossd@cs.rpi.edu>, hackers@FreeBSD.ORG Subject: Re: projects? Message-ID: <3D119425.7EF9BE3E@mindspring.com> References: <200206200209.g5K297R14456@monica.cs.rpi.edu> <3D113D16.6D1A0238@mindspring.com> <3D114FE7.3020304@tibco.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Aram Compeau wrote: > >Too bad he's sick of networking. There a lot of intersting code > >that could be implemented in the main line FreeBSD that would be > >really beneficial, overall. > > Could you elaborate briefly on what you'd like to see worked on with > respect to this? I don't want you to spend a lot of time describing > anything, but I am curious. I don't generally have large blocks of spare > time, but could work on something steadily with a low flame. --- LRP --- I would like FreeBSD to support LRP (Lazy Receiver Processing), an idea which came from the Scala Server Project at Rice University. LRP gets rid of the need to run network processing in kernel threads, in order to get parallel operation on SMP systems; so long as the interrupt processing load is balanced, it's possible to handle interrupts in an overlapped fashion. Right now, there are four sets of source code: SunOS 4.1.3_U1, FreeBSD 2.2-BETA, FreeBSD 4.0, FreeBSD 4.3. The first three are from Rice University. The fourth is from Duke University, and is a port forward of the 4.0 Rice code. The Rice code, other than the FreeBSD 2.2-BETA, is unusable. It mixes in an idea called "Resource Containers" (RESCON), that is really not very useful (I can go into great detail on this, if necessary). It also has a restrictive license. The FreeBSD 2.2-BETA implementation has a CMU MACH style license (same as some FreeBSD code already has). The LRP implementation in all these cases is flawed, in that it assumes that the LRP processing will be universal across an entire address family, and the experimental implementation loads a full copy of the AF_INET stack under another family name. A real integration is tricky, including credentials on accept calls, an attribute on the family struct, to indicate that it's LRP'ed, so that common subsystems can behave very differently, support for Accept filters and othe Kevents, etc.). LRP gives a minimum of a factor of 3 improvement in connections per second, without the SYN cache code involved at all, through an overall reduction in processing latency. It also has the effect of preventing "receiver livelock". http://www.cs.rice.edu/CS/Systems/LRP/ http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt ---------------- TCP Rate Halving ---------------- I would like to see FreeBSD support TCP Rate Halving, and idea from the Pittsburgh Cupercomputing Center (PSC) at Carengie Mellon University (CMU). These are the people who invented "traceroute". TCP Rate halving is an alternative to the RFC-2581 Fast Recovery algorithm for congestion control. It effectively causes the congestion recovery to be self-clocked by ACKs, which has the overall effect of avoiding the normal burstiness of TCP recovery following congestion. This builds on work by Van Jacobsen, J. Hoe, and Sally Floyd. Their current implementation is for NetBSD 1.3.2. http://www.psc.edu/networking/rate_halving.html --------------- SACK, FACK, ECN --------------- Also from PSC at CMU. SACK and FACK are well known. It's annnoying that Luigi Rizzo's code from 1997 or so was never integrated into FreeBSD. ECN is an implementation of Early Congestion Notification. http://www.psc.edu/networking/tcp.html ---- VRRP ---- There is an implementation of a real VRRP for FreeBSD available; it is in ports. This is a real VRRP (Virtual Router Redundancy Protocol), not like the Linux version which uses the multicast mask and thus loses multicast capability. There are intersting issues in actual deployment of this code; specifically, the VMAC that needs to be used in order to logically seperate virtual routers is not really implemented well, so there are common ARP issues. There are a couple of projects that one could take on here; by far, the most interesting (IMO) would be to support multiple virtual network cards on a single physical network card. Most of the Gigabit Ethernet cards, and some of the 10/100Mbit cards, can support multiple MAC addresses (the Intel Gigabit card can support 16, the Tigon III supports 4, and the Tigone II supports 2). The work required would be to support the ability to have a single driver, single NIC, multiple virtual NICs. There are also interesting issues, like being able to selectively control ARP response from a VRRP interface which is not the master interface. This has intersting implications for the routing code, and for the initialization code, which normally handles the gratuitous ARP. More information can be found in the VRRP RFC, RFC-2338. ---------- TCP Timers ---------- I've discussed this before in depth. Basically, the timer code is very poor for a large number of connections, and increasing the size of the callout wheel is not a real/reasonable answer. I would like to see the code go back to the BSD 4.2 model, which is a well known model. There is plenty of prior art in this area, but the main thing that needs to be taken from the BSD 4.2 is per interval timer lists, so that the list scanning, for the most part, scans only those timers that have expired (+ 1). Basically, a TAILQ per interval for ficed interval timers. A very obvious way to measure the performance improvement here is to establish a very large number of connections. If you have 4G of memory in an IA32 machine, you should have no problem getting to 300,000 connections. If you really work at it, I have been able to push this number to 1.6 Million simultaneous connections. --------------- SMP Safe Queues --------------- For simple queue types, it should be possible to make queueing and dequing an intrinsically atomic operation. This basically means that the queue locking that is being added to make the networking code SMP safe, is largely unnecessary, and is caused solely by the fact that the queue macros themselves are not being properly handled through ordering of operations, rather than being locked around. In theory, this is also possible for a "counted queue". A "counted queue" is a necessary construct for RED queueing, which needs to maintain a moving average for comparison to the actual queue depth, so that it can do RED (Random Early Drop) of packets. --- WFS --- Weighted fair share queueing is a method of handling scheduling of processes, such that the kernel processing. This isn't technically a networking issue. However, if the programs in user space which are intended to operate (or, if you are a 5.x purist, the kernel threads in kernel space, if you pull an Ingo Mollnar and cram everything that shouldn't be in the kernel, into the kernel) do not remove data from the input processing queue fast enough, you will still suffer from receiver livelock. Basically, you need to be able to run the programs with a priority, relative to interrupt processing. Some of the work that Jon Lemon and Luigi Rizzo have done in this area is interesting, but it's not sufficient to resolve the problem (sorry guys). Unfortunately, they don't tend to run their systems under breaking point stress, so they don't see the drop-off in performance that happens at high enough load.To be able to test this, you would have to have a lab with the ability to throw a large number of clients against a large number of servers, with the packets transiting an applicaiton in a FreeBSD box, at close to wire speeds. We are talking at least 32 clients and servers, unless you have access to purpose built code (it's easier to just throw the machines at it, and be done with it). --- --- --- Basically, that's my short list. There are actually a lot more things that could be done in the networking area; there are things to do in the routing area, and things to do with RED queueing, and things to do with resource tuning, etc., and, of course, there's the bugs that you normally see in the BSD stack only when you try to dothings like open more than 65535 outbound connections from a single box, etc.. Personally, I'm tired of solving the same problems over and over again, so I'd like to see the code in FreeBSD proper, so that it becomes part of the intellectual commons. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D119425.7EF9BE3E>