From owner-freebsd-hackers  Thu Jun 20 14:12:47 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from outhub2.tibco.com (outhub2.tibco.com [63.100.100.156])
	by hub.freebsd.org (Postfix) with ESMTP id D8FE537B40C
	for <hackers@FreeBSD.ORG>; Thu, 20 Jun 2002 14:12:22 -0700 (PDT)
Received: by outhub2.tibco.com; id OAA06453; Thu, 20 Jun 2002 14:12:24 -0700 (PDT)
Received: from na-h-inhub2.tibco.com(10.106.128.34) by outhub2.tibco.com via smap (V4.1)
	id xmaa06376; Thu, 20 Jun 02 14:11:55 -0700
Received: from mail1.tibco.com (nsmail2.tibco.com [10.106.128.42])
	by na-h-inhub2.tibco.com (8.12.2/8.12.2) with ESMTP id g5KL1eFl020538
	for <hackers@FreeBSD.ORG>; Thu, 20 Jun 2002 14:01:40 -0700 (PDT)
Received: from tibco.com ([10.105.146.230]) by mail1.tibco.com
          (Netscape Messaging Server 4.15) with ESMTP id GY0W7T00.947;
          Thu, 20 Jun 2002 14:11:53 -0700 
Message-ID: <3D1244F1.8020900@tibco.com>
Date: Thu, 20 Jun 2002 14:11:13 -0700
From: "Aram Compeau" <aram@tibco.com>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2
X-Accept-Language: en,zh,fr,ja,ko
MIME-Version: 1.0
To: Terry Lambert <tlambert2@mindspring.com>
Cc: hackers@FreeBSD.ORG
Subject: Re: projects?
References: <200206200209.g5K297R14456@monica.cs.rpi.edu> <3D113D16.6D1A0238@mindspring.com> <3D114FE7.3020304@tibco.com> <3D119425.7EF9BE3E@mindspring.com>
Content-Type: multipart/alternative;
 boundary="------------000306030500070200080206"
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG


--------------000306030500070200080206
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Great list, thanks for that. While I think LRP and TCP Rate Halving are 
quite interesting, I think tackling the SMP Safe Queues makes the best 
use my resources. I fear that testing some of the other items requires 
setups that are not feasible for me.

Cheers,

Aram


Terry Lambert wrote:

>Aram Compeau wrote:
>
>>>Too bad he's sick of networking.  There a lot of intersting code
>>>that could be implemented in the main line FreeBSD that would be
>>>really beneficial, overall.
>>>
>>Could you elaborate briefly on what you'd like to see worked on with
>>respect to this? I don't want you to spend a lot of time describing
>>anything, but I am curious. I don't generally have large blocks of spare
>>time, but could work on something steadily with a low flame.
>>
>
>---
>LRP
>---
>
>I would like FreeBSD to support LRP (Lazy Receiver Processing),
>an idea which came from the Scala Server Project at Rice
>University.
>
>LRP gets rid of the need to run network processing in kernel
>threads, in order to get parallel operation on SMP systems;
>so long as the interrupt processing load is balanced, it's
>possible to handle interrupts in an overlapped fashion.
>
>Right now, there are four sets of source code: SunOS 4.1.3_U1,
>FreeBSD 2.2-BETA, FreeBSD 4.0, FreeBSD 4.3.  The first three
>are from Rice University.  The fourth is from Duke University,
>and is a port forward of the 4.0 Rice code.
>
>The Rice code, other than the FreeBSD 2.2-BETA, is unusable.
>It mixes in an idea called "Resource Containers" (RESCON),
>that is really not very useful (I can go into great detail on
>this, if necessary).  It also has a restrictive license.  The
>FreeBSD 2.2-BETA implementation has a CMU MACH style license
>(same as some FreeBSD code already has).
>
>The LRP implementation in all these cases is flawed, in that
>it assumes that the LRP processing will be universal across
>an entire address family, and the experimental implementation
>loads a full copy of the AF_INET stack under another family
>name.  A real integration is tricky, including credentials on
>accept calls, an attribute on the family struct, to indicate
>that it's LRP'ed, so that common subsystems can behave very
>differently, support for Accept filters and othe Kevents, etc.).
>
>LRP gives a minimum of a factor of 3 improvement in connections
>per second, without the SYN cache code involved at all, through
>an overall reduction in processing latency.  It also has the
>effect of preventing "receiver livelock".
>
>	http://www.cs.rice.edu/CS/Systems/LRP/
>	http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt
>
>----------------
>TCP Rate Halving
>----------------
>
>I would like to see FreeBSD support TCP Rate Halving, and idea
>from the Pittsburgh Cupercomputing Center (PSC) at Carengie
>Mellon University (CMU).
>
>These are the people who invented "traceroute".
>
>TCP Rate halving is an alternative to the RFC-2581 Fast Recovery
>algorithm for congestion control.  It effectively causes the
>congestion recovery to be self-clocked by ACKs, which has the
>overall effect of avoiding the normal burstiness of TCP recovery
>following congestion.
>
>This builds on work by Van Jacobsen, J. Hoe, and Sally Floyd.
>
>Their current implementation is for NetBSD 1.3.2.
>
>	http://www.psc.edu/networking/rate_halving.html
>
>---------------
>SACK, FACK, ECN
>---------------
>
>Also from PSC at CMU.
>
>SACK and FACK are well known.  It's annnoying that Luigi Rizzo's
>code from 1997 or so was never integrated into FreeBSD.
>
>ECN is an implementation of Early Congestion Notification.
>
>	http://www.psc.edu/networking/tcp.html
>
>
>----
>VRRP
>----
>
>There is an implementation of a real VRRP for FreeBSD available;
>it is in ports.
>
>This is a real VRRP (Virtual Router Redundancy Protocol), not
>like the Linux version which uses the multicast mask and thus
>loses multicast capability.
>
>There are intersting issues in actual deployment of this code;
>specifically, the VMAC that needs to be used in order to
>logically seperate virtual routers is not really implemented
>well, so there are common ARP issues.
>
>There are a couple of projects that one could take on here; by
>far, the most interesting (IMO) would be to support multiple
>virtual network cards on a single physical network card.  Most
>of the Gigabit Ethernet cards, and some of the 10/100Mbit cards,
>can support multiple MAC addresses (the Intel Gigabit card can
>support 16, the Tigon III supports 4, and the Tigone II supports
>2).
>
>The work required would be to support the ability to have a
>single driver, single NIC, multiple virtual NICs.
>
>There are also interesting issues, like being able to selectively
>control ARP response from a VRRP interface which is not the
>master interface.  This has intersting implications for the
>routing code, and for the initialization code, which normally
>handles the gratuitous ARP.  More information can be found in
>the VRRP RFC, RFC-2338.
>
>----------
>TCP Timers
>----------
>
>I've discussed this before in depth.  Basically, the timer code
>is very poor for a large number of connections, and increasing
>the size of the callout wheel is not a real/reasonable answer.
>
>I would like to see the code go back to the BSD 4.2 model, which
>is a well known model.  There is plenty of prior art in this area,
>but the main thing that needs to be taken from the BSD 4.2 is per
>interval timer lists, so that the list scanning, for the most part,
>scans only those timers that have expired (+ 1).  Basically, a
>TAILQ per interval for ficed interval timers.
>
>A very obvious way to measure the performance improvement here is
>to establish a very large number of connections.  If you have 4G
>of memory in an IA32 machine, you should have no problem getting
>to 300,000 connections.  If you really work at it, I have been
>able to push this number to 1.6 Million simultaneous connections.
>
>
>---------------
>SMP Safe Queues
>---------------
>
>For simple queue types, it should be possible to make queueing
>and dequing an intrinsically atomic operation.
>
>This basically means that the queue locking that is being added
>to make the networking code SMP safe, is largely unnecessary,
>and is caused solely by the fact that the queue macros themselves
>are not being properly handled through ordering of operations,
>rather than being locked around.
>
>In theory, this is also possible for a "counted queue".  A
>"counted queue" is a necessary construct for RED queueing,
>which needs to maintain a moving average for comparison to
>the actual queue depth, so that it can do RED (Random Early
>Drop) of packets.
>
>---
>WFS
>---
>
>Weighted fair share queueing is a method of handling scheduling
>of processes, such that the kernel processing.
>
>This isn't technically a networking issue.  However, if the
>programs in user space which are intended to operate (or, if
>you are a 5.x purist, the kernel threads in kernel space, if
>you pull an Ingo Mollnar and cram everything that shouldn't
>be in the kernel, into the kernel) do not remove data from
>the input processing queue fast enough, you will still suffer
>from receiver livelock.  Basically, you need to be able to
>run the programs with a priority, relative to interrupt processing.
>
>Some of the work that Jon Lemon and Luigi Rizzo have done in
>this area is interesting, but it's not sufficient to resolve
>the problem (sorry guys).  Unfortunately, they don't tend to
>run their systems under breaking point stress, so they don't
>see the drop-off in performance that happens at high enough
>load.To be able to test this, you would have to have a lab
>with the ability to throw a large number of clients against
>a large number of servers, with the packets transiting an
>applicaiton in a FreeBSD box, at close to wire speeds.  We
>are talking at least 32 clients and servers, unless you have
>access to purpose built code (it's easier to just throw the
>machines at it, and be done with it).
>
>---
>---
>---
>
>Basically, that's my short list.  There are actually a lot more
>things that could be done in the networking area; there are things
>to do in the routing area, and things to do with RED queueing, and
>things to do with resource tuning, etc., and, of course, there's
>the bugs that you normally see in the BSD stack only when you try
>to dothings like open more than 65535 outbound connections from a
>single box, etc..
>
>Personally, I'm tired of solving the same problems over and over
>again, so I'd like to see the code in FreeBSD proper, so that it
>becomes part of the intellectual commons.
>
>-- Terry
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-hackers" in the body of the message
>
>


--------------000306030500070200080206
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<html>
<head>
</head>
<body>
Great list, thanks for that. While I think LRP and TCP Rate Halving are quite
interesting, I think tackling the SMP Safe Queues makes the best use my resources.
I fear that testing some of the other items requires setups that are not
feasible for me.<br>
<br>
Cheers, <br>
<br>
Aram<br>
<br>
<br>
Terry Lambert wrote:<br>
<blockquote type="cite" cite="mid:3D119425.7EF9BE3E@mindspring.com">
  <pre wrap="">Aram Compeau wrote:<br></pre>
  <blockquote type="cite">
    <blockquote type="cite">
      <pre wrap="">Too bad he's sick of networking.  There a lot of intersting code<br>that could be implemented in the main line FreeBSD that would be<br>really beneficial, overall.<br></pre>
      </blockquote>
      <pre wrap="">Could you elaborate briefly on what you'd like to see worked on with<br>respect to this? I don't want you to spend a lot of time describing<br>anything, but I am curious. I don't generally have large blocks of spare<br>time, but could work on something steadily with a low flame.<br></pre>
      </blockquote>
      <pre wrap=""><!----><br>---<br>LRP<br>---<br><br>I would like FreeBSD to support LRP (Lazy Receiver Processing),<br>an idea which came from the Scala Server Project at Rice<br>University.<br><br>LRP gets rid of the need to run network processing in kernel<br>threads, in order to get parallel operation on SMP systems;<br>so long as the interrupt processing load is balanced, it's<br>possible to handle interrupts in an overlapped fashion.<br><br>Right now, there are four sets of source code: SunOS 4.1.3_U1,<br>FreeBSD 2.2-BETA, FreeBSD 4.0, FreeBSD 4.3.  The first three<br>are from Rice University.  The fourth is from Duke University,<br>and is a port forward of the 4.0 Rice code.<br><br>The Rice code, other than the FreeBSD 2.2-BETA, is unusable.<br>It mixes in an idea called "Resource Containers" (RESCON),<br>that is really not very useful (I can go into great detail on<br>this, if necessary).  It also has a restrictive license.  The<br>FreeBSD 2.2-BETA implementation h
as a CMU MACH style license<br>(same as some FreeBSD code already has).<br><br>The LRP implementation in all these cases is flawed, in that<br>it assumes that the LRP processing will be universal across<br>an entire address family, and the experimental implementation<br>loads a full copy of the AF_INET stack under another family<br>name.  A real integration is tricky, including credentials on<br>accept calls, an attribute on the family struct, to indicate<br>that it's LRP'ed, so that common subsystems can behave very<br>differently, support for Accept filters and othe Kevents, etc.).<br><br>LRP gives a minimum of a factor of 3 improvement in connections<br>per second, without the SYN cache code involved at all, through<br>an overall reduction in processing latency.  It also has the<br>effect of preventing "receiver livelock".<br><br>	<a class="moz-txt-link-freetext" href="http://www.cs.rice.edu/CS/Systems/LRP/">http://www.cs.rice.edu/CS/Systems/LRP/</a><br>	<a class="moz-txt-
link-freetext" href="http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt">http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt</a><br><br>----------------<br>TCP Rate Halving<br>----------------<br><br>I would like to see FreeBSD support TCP Rate Halving, and idea<br>from the Pittsburgh Cupercomputing Center (PSC) at Carengie<br>Mellon University (CMU).<br><br>These are the people who invented "traceroute".<br><br>TCP Rate halving is an alternative to the RFC-2581 Fast Recovery<br>algorithm for congestion control.  It effectively causes the<br>congestion recovery to be self-clocked by ACKs, which has the<br>overall effect of avoiding the normal burstiness of TCP recovery<br>following congestion.<br><br>This builds on work by Van Jacobsen, J. Hoe, and Sally Floyd.<br><br>Their current implementation is for NetBSD 1.3.2.<br><br>	<a class="moz-txt-link-freetext" href="http://www.psc.edu/networking/rate_halving.html">http://www.psc.edu/networking/rate_halving.h
tml</a><br><br>---------------<br>SACK, FACK, ECN<br>---------------<br><br>Also from PSC at CMU.<br><br>SACK and FACK are well known.  It's annnoying that Luigi Rizzo's<br>code from 1997 or so was never integrated into FreeBSD.<br><br>ECN is an implementation of Early Congestion Notification.<br><br>	<a class="moz-txt-link-freetext" href="http://www.psc.edu/networking/tcp.html">http://www.psc.edu/networking/tcp.html</a><br><br><br>----<br>VRRP<br>----<br><br>There is an implementation of a real VRRP for FreeBSD available;<br>it is in ports.<br><br>This is a real VRRP (Virtual Router Redundancy Protocol), not<br>like the Linux version which uses the multicast mask and thus<br>loses multicast capability.<br><br>There are intersting issues in actual deployment of this code;<br>specifically, the VMAC that needs to be used in order to<br>logically seperate virtual routers is not really implemented<br>well, so there are common ARP issues.<br><br>There are a couple of projects that
 one could take on here; by<br>far, the most interesting (IMO) would be to support multiple<br>virtual network cards on a single physical network card.  Most<br>of the Gigabit Ethernet cards, and some of the 10/100Mbit cards,<br>can support multiple MAC addresses (the Intel Gigabit card can<br>support 16, the Tigon III supports 4, and the Tigone II supports<br>2).<br><br>The work required would be to support the ability to have a<br>single driver, single NIC, multiple virtual NICs.<br><br>There are also interesting issues, like being able to selectively<br>control ARP response from a VRRP interface which is not the<br>master interface.  This has intersting implications for the<br>routing code, and for the initialization code, which normally<br>handles the gratuitous ARP.  More information can be found in<br>the VRRP RFC, RFC-2338.<br><br>----------<br>TCP Timers<br>----------<br><br>I've discussed this before in depth.  Basically, the timer code<br>is very poor for a large nu
mber of connections, and increasing<br>the size of the callout wheel is not a real/reasonable answer.<br><br>I would like to see the code go back to the BSD 4.2 model, which<br>is a well known model.  There is plenty of prior art in this area,<br>but the main thing that needs to be taken from the BSD 4.2 is per<br>interval timer lists, so that the list scanning, for the most part,<br>scans only those timers that have expired (+ 1).  Basically, a<br>TAILQ per interval for ficed interval timers.<br><br>A very obvious way to measure the performance improvement here is<br>to establish a very large number of connections.  If you have 4G<br>of memory in an IA32 machine, you should have no problem getting<br>to 300,000 connections.  If you really work at it, I have been<br>able to push this number to 1.6 Million simultaneous connections.<br><br><br>---------------<br>SMP Safe Queues<br>---------------<br><br>For simple queue types, it should be possible to make queueing<br>and dequi
ng an intrinsically atomic operation.<br><br>This basically means that the queue locking that is being added<br>to make the networking code SMP safe, is largely unnecessary,<br>and is caused solely by the fact that the queue macros themselves<br>are not being properly handled through ordering of operations,<br>rather than being locked around.<br><br>In theory, this is also possible for a "counted queue".  A<br>"counted queue" is a necessary construct for RED queueing,<br>which needs to maintain a moving average for comparison to<br>the actual queue depth, so that it can do RED (Random Early<br>Drop) of packets.<br><br>---<br>WFS<br>---<br><br>Weighted fair share queueing is a method of handling scheduling<br>of processes, such that the kernel processing.<br><br>This isn't technically a networking issue.  However, if the<br>programs in user space which are intended to operate (or, if<br>you are a 5.x purist, the kernel threads in kernel space, if<br>you pull an Ingo Mollnar an
d cram everything that shouldn't<br>be in the kernel, into the kernel) do not remove data from<br>the input processing queue fast enough, you will still suffer<br>from receiver livelock.  Basically, you need to be able to<br>run the programs with a priority, relative to interrupt processing.<br><br>Some of the work that Jon Lemon and Luigi Rizzo have done in<br>this area is interesting, but it's not sufficient to resolve<br>the problem (sorry guys).  Unfortunately, they don't tend to<br>run their systems under breaking point stress, so they don't<br>see the drop-off in performance that happens at high enough<br>load.To be able to test this, you would have to have a lab<br>with the ability to throw a large number of clients against<br>a large number of servers, with the packets transiting an<br>applicaiton in a FreeBSD box, at close to wire speeds.  We<br>are talking at least 32 clients and servers, unless you have<br>access to purpose built code (it's easier to just throw the
<br>machines at it, and be done with it).<br><br>---<br>---<br>---<br><br>Basically, that's my short list.  There are actually a lot more<br>things that could be done in the networking area; there are things<br>to do in the routing area, and things to do with RED queueing, and<br>things to do with resource tuning, etc., and, of course, there's<br>the bugs that you normally see in the BSD stack only when you try<br>to dothings like open more than 65535 outbound connections from a<br>single box, etc..<br><br>Personally, I'm tired of solving the same problems over and over<br>again, so I'd like to see the code in FreeBSD proper, so that it<br>becomes part of the intellectual commons.<br><br>-- Terry<br><br>To Unsubscribe: send mail to <a class="moz-txt-link-abbreviated" href="mailto:majordomo@FreeBSD.org">majordomo@FreeBSD.org</a><br>with "unsubscribe freebsd-hackers" in the body of the message<br><br><br></pre>
      </blockquote>
      <br>
      </body>
      </html>

--------------000306030500070200080206--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message