Date: Thu, 20 Jun 2002 14:11:13 -0700 From: "Aram Compeau" <aram@tibco.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: hackers@FreeBSD.ORG Subject: Re: projects? Message-ID: <3D1244F1.8020900@tibco.com> References: <200206200209.g5K297R14456@monica.cs.rpi.edu> <3D113D16.6D1A0238@mindspring.com> <3D114FE7.3020304@tibco.com> <3D119425.7EF9BE3E@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--------------000306030500070200080206 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Great list, thanks for that. While I think LRP and TCP Rate Halving are quite interesting, I think tackling the SMP Safe Queues makes the best use my resources. I fear that testing some of the other items requires setups that are not feasible for me. Cheers, Aram Terry Lambert wrote: >Aram Compeau wrote: > >>>Too bad he's sick of networking. There a lot of intersting code >>>that could be implemented in the main line FreeBSD that would be >>>really beneficial, overall. >>> >>Could you elaborate briefly on what you'd like to see worked on with >>respect to this? I don't want you to spend a lot of time describing >>anything, but I am curious. I don't generally have large blocks of spare >>time, but could work on something steadily with a low flame. >> > >--- >LRP >--- > >I would like FreeBSD to support LRP (Lazy Receiver Processing), >an idea which came from the Scala Server Project at Rice >University. > >LRP gets rid of the need to run network processing in kernel >threads, in order to get parallel operation on SMP systems; >so long as the interrupt processing load is balanced, it's >possible to handle interrupts in an overlapped fashion. > >Right now, there are four sets of source code: SunOS 4.1.3_U1, >FreeBSD 2.2-BETA, FreeBSD 4.0, FreeBSD 4.3. The first three >are from Rice University. The fourth is from Duke University, >and is a port forward of the 4.0 Rice code. > >The Rice code, other than the FreeBSD 2.2-BETA, is unusable. >It mixes in an idea called "Resource Containers" (RESCON), >that is really not very useful (I can go into great detail on >this, if necessary). It also has a restrictive license. The >FreeBSD 2.2-BETA implementation has a CMU MACH style license >(same as some FreeBSD code already has). > >The LRP implementation in all these cases is flawed, in that >it assumes that the LRP processing will be universal across >an entire address family, and the experimental implementation >loads a full copy of the AF_INET stack under another family >name. A real integration is tricky, including credentials on >accept calls, an attribute on the family struct, to indicate >that it's LRP'ed, so that common subsystems can behave very >differently, support for Accept filters and othe Kevents, etc.). > >LRP gives a minimum of a factor of 3 improvement in connections >per second, without the SYN cache code involved at all, through >an overall reduction in processing latency. It also has the >effect of preventing "receiver livelock". > > http://www.cs.rice.edu/CS/Systems/LRP/ > http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt > >---------------- >TCP Rate Halving >---------------- > >I would like to see FreeBSD support TCP Rate Halving, and idea >from the Pittsburgh Cupercomputing Center (PSC) at Carengie >Mellon University (CMU). > >These are the people who invented "traceroute". > >TCP Rate halving is an alternative to the RFC-2581 Fast Recovery >algorithm for congestion control. It effectively causes the >congestion recovery to be self-clocked by ACKs, which has the >overall effect of avoiding the normal burstiness of TCP recovery >following congestion. > >This builds on work by Van Jacobsen, J. Hoe, and Sally Floyd. > >Their current implementation is for NetBSD 1.3.2. > > http://www.psc.edu/networking/rate_halving.html > >--------------- >SACK, FACK, ECN >--------------- > >Also from PSC at CMU. > >SACK and FACK are well known. It's annnoying that Luigi Rizzo's >code from 1997 or so was never integrated into FreeBSD. > >ECN is an implementation of Early Congestion Notification. > > http://www.psc.edu/networking/tcp.html > > >---- >VRRP >---- > >There is an implementation of a real VRRP for FreeBSD available; >it is in ports. > >This is a real VRRP (Virtual Router Redundancy Protocol), not >like the Linux version which uses the multicast mask and thus >loses multicast capability. > >There are intersting issues in actual deployment of this code; >specifically, the VMAC that needs to be used in order to >logically seperate virtual routers is not really implemented >well, so there are common ARP issues. > >There are a couple of projects that one could take on here; by >far, the most interesting (IMO) would be to support multiple >virtual network cards on a single physical network card. Most >of the Gigabit Ethernet cards, and some of the 10/100Mbit cards, >can support multiple MAC addresses (the Intel Gigabit card can >support 16, the Tigon III supports 4, and the Tigone II supports >2). > >The work required would be to support the ability to have a >single driver, single NIC, multiple virtual NICs. > >There are also interesting issues, like being able to selectively >control ARP response from a VRRP interface which is not the >master interface. This has intersting implications for the >routing code, and for the initialization code, which normally >handles the gratuitous ARP. More information can be found in >the VRRP RFC, RFC-2338. > >---------- >TCP Timers >---------- > >I've discussed this before in depth. Basically, the timer code >is very poor for a large number of connections, and increasing >the size of the callout wheel is not a real/reasonable answer. > >I would like to see the code go back to the BSD 4.2 model, which >is a well known model. There is plenty of prior art in this area, >but the main thing that needs to be taken from the BSD 4.2 is per >interval timer lists, so that the list scanning, for the most part, >scans only those timers that have expired (+ 1). Basically, a >TAILQ per interval for ficed interval timers. > >A very obvious way to measure the performance improvement here is >to establish a very large number of connections. If you have 4G >of memory in an IA32 machine, you should have no problem getting >to 300,000 connections. If you really work at it, I have been >able to push this number to 1.6 Million simultaneous connections. > > >--------------- >SMP Safe Queues >--------------- > >For simple queue types, it should be possible to make queueing >and dequing an intrinsically atomic operation. > >This basically means that the queue locking that is being added >to make the networking code SMP safe, is largely unnecessary, >and is caused solely by the fact that the queue macros themselves >are not being properly handled through ordering of operations, >rather than being locked around. > >In theory, this is also possible for a "counted queue". A >"counted queue" is a necessary construct for RED queueing, >which needs to maintain a moving average for comparison to >the actual queue depth, so that it can do RED (Random Early >Drop) of packets. > >--- >WFS >--- > >Weighted fair share queueing is a method of handling scheduling >of processes, such that the kernel processing. > >This isn't technically a networking issue. However, if the >programs in user space which are intended to operate (or, if >you are a 5.x purist, the kernel threads in kernel space, if >you pull an Ingo Mollnar and cram everything that shouldn't >be in the kernel, into the kernel) do not remove data from >the input processing queue fast enough, you will still suffer >from receiver livelock. Basically, you need to be able to >run the programs with a priority, relative to interrupt processing. > >Some of the work that Jon Lemon and Luigi Rizzo have done in >this area is interesting, but it's not sufficient to resolve >the problem (sorry guys). Unfortunately, they don't tend to >run their systems under breaking point stress, so they don't >see the drop-off in performance that happens at high enough >load.To be able to test this, you would have to have a lab >with the ability to throw a large number of clients against >a large number of servers, with the packets transiting an >applicaiton in a FreeBSD box, at close to wire speeds. We >are talking at least 32 clients and servers, unless you have >access to purpose built code (it's easier to just throw the >machines at it, and be done with it). > >--- >--- >--- > >Basically, that's my short list. There are actually a lot more >things that could be done in the networking area; there are things >to do in the routing area, and things to do with RED queueing, and >things to do with resource tuning, etc., and, of course, there's >the bugs that you normally see in the BSD stack only when you try >to dothings like open more than 65535 outbound connections from a >single box, etc.. > >Personally, I'm tired of solving the same problems over and over >again, so I'd like to see the code in FreeBSD proper, so that it >becomes part of the intellectual commons. > >-- Terry > >To Unsubscribe: send mail to majordomo@FreeBSD.org >with "unsubscribe freebsd-hackers" in the body of the message > > --------------000306030500070200080206 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <html> <head> </head> <body> Great list, thanks for that. While I think LRP and TCP Rate Halving are quite interesting, I think tackling the SMP Safe Queues makes the best use my resources. I fear that testing some of the other items requires setups that are not feasible for me.<br> <br> Cheers, <br> <br> Aram<br> <br> <br> Terry Lambert wrote:<br> <blockquote type="cite" cite="mid:3D119425.7EF9BE3E@mindspring.com"> <pre wrap="">Aram Compeau wrote:<br></pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">Too bad he's sick of networking. There a lot of intersting code<br>that could be implemented in the main line FreeBSD that would be<br>really beneficial, overall.<br></pre> </blockquote> <pre wrap="">Could you elaborate briefly on what you'd like to see worked on with<br>respect to this? I don't want you to spend a lot of time describing<br>anything, but I am curious. I don't generally have large blocks of spare<br>time, but could work on something steadily with a low flame.<br></pre> </blockquote> <pre wrap=""><!----><br>---<br>LRP<br>---<br><br>I would like FreeBSD to support LRP (Lazy Receiver Processing),<br>an idea which came from the Scala Server Project at Rice<br>University.<br><br>LRP gets rid of the need to run network processing in kernel<br>threads, in order to get parallel operation on SMP systems;<br>so long as the interrupt processing load is balanced, it's<br>possible to handle interrupts in an overlapped fashion.<br><br>Right now, there are four sets of source code: SunOS 4.1.3_U1,<br>FreeBSD 2.2-BETA, FreeBSD 4.0, FreeBSD 4.3. The first three<br>are from Rice University. The fourth is from Duke University,<br>and is a port forward of the 4.0 Rice code.<br><br>The Rice code, other than the FreeBSD 2.2-BETA, is unusable.<br>It mixes in an idea called "Resource Containers" (RESCON),<br>that is really not very useful (I can go into great detail on<br>this, if necessary). It also has a restrictive license. The<br>FreeBSD 2.2-BETA implementation h as a CMU MACH style license<br>(same as some FreeBSD code already has).<br><br>The LRP implementation in all these cases is flawed, in that<br>it assumes that the LRP processing will be universal across<br>an entire address family, and the experimental implementation<br>loads a full copy of the AF_INET stack under another family<br>name. A real integration is tricky, including credentials on<br>accept calls, an attribute on the family struct, to indicate<br>that it's LRP'ed, so that common subsystems can behave very<br>differently, support for Accept filters and othe Kevents, etc.).<br><br>LRP gives a minimum of a factor of 3 improvement in connections<br>per second, without the SYN cache code involved at all, through<br>an overall reduction in processing latency. It also has the<br>effect of preventing "receiver livelock".<br><br> <a class="moz-txt-link-freetext" href="http://www.cs.rice.edu/CS/Systems/LRP/">http://www.cs.rice.edu/CS/Systems/LRP/</a><br> <a class="moz-txt- link-freetext" href="http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt">http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt</a><br><br>----------------<br>TCP Rate Halving<br>----------------<br><br>I would like to see FreeBSD support TCP Rate Halving, and idea<br>from the Pittsburgh Cupercomputing Center (PSC) at Carengie<br>Mellon University (CMU).<br><br>These are the people who invented "traceroute".<br><br>TCP Rate halving is an alternative to the RFC-2581 Fast Recovery<br>algorithm for congestion control. It effectively causes the<br>congestion recovery to be self-clocked by ACKs, which has the<br>overall effect of avoiding the normal burstiness of TCP recovery<br>following congestion.<br><br>This builds on work by Van Jacobsen, J. Hoe, and Sally Floyd.<br><br>Their current implementation is for NetBSD 1.3.2.<br><br> <a class="moz-txt-link-freetext" href="http://www.psc.edu/networking/rate_halving.html">http://www.psc.edu/networking/rate_halving.h tml</a><br><br>---------------<br>SACK, FACK, ECN<br>---------------<br><br>Also from PSC at CMU.<br><br>SACK and FACK are well known. It's annnoying that Luigi Rizzo's<br>code from 1997 or so was never integrated into FreeBSD.<br><br>ECN is an implementation of Early Congestion Notification.<br><br> <a class="moz-txt-link-freetext" href="http://www.psc.edu/networking/tcp.html">http://www.psc.edu/networking/tcp.html</a><br><br><br>----<br>VRRP<br>----<br><br>There is an implementation of a real VRRP for FreeBSD available;<br>it is in ports.<br><br>This is a real VRRP (Virtual Router Redundancy Protocol), not<br>like the Linux version which uses the multicast mask and thus<br>loses multicast capability.<br><br>There are intersting issues in actual deployment of this code;<br>specifically, the VMAC that needs to be used in order to<br>logically seperate virtual routers is not really implemented<br>well, so there are common ARP issues.<br><br>There are a couple of projects that one could take on here; by<br>far, the most interesting (IMO) would be to support multiple<br>virtual network cards on a single physical network card. Most<br>of the Gigabit Ethernet cards, and some of the 10/100Mbit cards,<br>can support multiple MAC addresses (the Intel Gigabit card can<br>support 16, the Tigon III supports 4, and the Tigone II supports<br>2).<br><br>The work required would be to support the ability to have a<br>single driver, single NIC, multiple virtual NICs.<br><br>There are also interesting issues, like being able to selectively<br>control ARP response from a VRRP interface which is not the<br>master interface. This has intersting implications for the<br>routing code, and for the initialization code, which normally<br>handles the gratuitous ARP. More information can be found in<br>the VRRP RFC, RFC-2338.<br><br>----------<br>TCP Timers<br>----------<br><br>I've discussed this before in depth. Basically, the timer code<br>is very poor for a large nu mber of connections, and increasing<br>the size of the callout wheel is not a real/reasonable answer.<br><br>I would like to see the code go back to the BSD 4.2 model, which<br>is a well known model. There is plenty of prior art in this area,<br>but the main thing that needs to be taken from the BSD 4.2 is per<br>interval timer lists, so that the list scanning, for the most part,<br>scans only those timers that have expired (+ 1). Basically, a<br>TAILQ per interval for ficed interval timers.<br><br>A very obvious way to measure the performance improvement here is<br>to establish a very large number of connections. If you have 4G<br>of memory in an IA32 machine, you should have no problem getting<br>to 300,000 connections. If you really work at it, I have been<br>able to push this number to 1.6 Million simultaneous connections.<br><br><br>---------------<br>SMP Safe Queues<br>---------------<br><br>For simple queue types, it should be possible to make queueing<br>and dequi ng an intrinsically atomic operation.<br><br>This basically means that the queue locking that is being added<br>to make the networking code SMP safe, is largely unnecessary,<br>and is caused solely by the fact that the queue macros themselves<br>are not being properly handled through ordering of operations,<br>rather than being locked around.<br><br>In theory, this is also possible for a "counted queue". A<br>"counted queue" is a necessary construct for RED queueing,<br>which needs to maintain a moving average for comparison to<br>the actual queue depth, so that it can do RED (Random Early<br>Drop) of packets.<br><br>---<br>WFS<br>---<br><br>Weighted fair share queueing is a method of handling scheduling<br>of processes, such that the kernel processing.<br><br>This isn't technically a networking issue. However, if the<br>programs in user space which are intended to operate (or, if<br>you are a 5.x purist, the kernel threads in kernel space, if<br>you pull an Ingo Mollnar an d cram everything that shouldn't<br>be in the kernel, into the kernel) do not remove data from<br>the input processing queue fast enough, you will still suffer<br>from receiver livelock. Basically, you need to be able to<br>run the programs with a priority, relative to interrupt processing.<br><br>Some of the work that Jon Lemon and Luigi Rizzo have done in<br>this area is interesting, but it's not sufficient to resolve<br>the problem (sorry guys). Unfortunately, they don't tend to<br>run their systems under breaking point stress, so they don't<br>see the drop-off in performance that happens at high enough<br>load.To be able to test this, you would have to have a lab<br>with the ability to throw a large number of clients against<br>a large number of servers, with the packets transiting an<br>applicaiton in a FreeBSD box, at close to wire speeds. We<br>are talking at least 32 clients and servers, unless you have<br>access to purpose built code (it's easier to just throw the <br>machines at it, and be done with it).<br><br>---<br>---<br>---<br><br>Basically, that's my short list. There are actually a lot more<br>things that could be done in the networking area; there are things<br>to do in the routing area, and things to do with RED queueing, and<br>things to do with resource tuning, etc., and, of course, there's<br>the bugs that you normally see in the BSD stack only when you try<br>to dothings like open more than 65535 outbound connections from a<br>single box, etc..<br><br>Personally, I'm tired of solving the same problems over and over<br>again, so I'd like to see the code in FreeBSD proper, so that it<br>becomes part of the intellectual commons.<br><br>-- Terry<br><br>To Unsubscribe: send mail to <a class="moz-txt-link-abbreviated" href="mailto:majordomo@FreeBSD.org">majordomo@FreeBSD.org</a><br>with "unsubscribe freebsd-hackers" in the body of the message<br><br><br></pre> </blockquote> <br> </body> </html> --------------000306030500070200080206-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D1244F1.8020900>