Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Nov 1999 16:11:49 -0600 (CST)
From:      Mohit Aron <aron@cs.rice.edu>
To:        freebsd-net@freebsd.org, wollman@freebsd.org, jlemon@freebsd.org, julian@freebsd.org, ee@freebsd.org, bright@wintelcom.net
Subject:   FreeBSD networking problems
Message-ID:  <199911102211.QAA12891@cs.rice.edu>

next in thread | raw e-mail | index | archive | help
Hi,
	I've noticed several problems in networking performance in FreeBSD wrt
WAN conditions in the course of my experiments. I mailed them to Alfred
Perlstein who suggested that I post them to this list. I'm listing them below.

Problems with WAN emulation in lab environments:

1) FreeBSD tries to determine the max size of socket buffers from cached
  routing information. This is done even after an application wants to set the
  application buffer to a large value. The result is that you usually end up
  having a socket buffer size that you got from an earlier TCP connection (say
  telnet) which is usually very small.  The code is related to the 'ifdef
  RTV_SPIPE' and 'ifdef RTV_RPIPE' in sys/netinet/tcp_input.c. For my
  experiments, I usually undefine RTV_SPIPE and RTV_RPIPE in tcp_input.c.
  A more complete discussion is given in a PR that I filed a while back and
  can be viewed from:
      http://www.freebsd.org/cgi/query-pr.cgi?pr=11966

2) TCP Bug - the FreeBSD implementation does not scale the advertised window
   immediately when it discovers that window scaling is being used. The result
   is that irrespective of advertised window, in the first round-trip after
   connection establishment, a FreeBSD TCP sender cannot send more data than
   the unscaled value of advertised window. The fix is the following patch to
   tcp_input.c (taken from FreeBSD-3.3-RELEASE):

--- /sys/netinet/tcp_input.c    Sun Aug 29 11:29:54 1999
+++ tcp_input.c Wed Nov 10 15:39:49 1999
@@ -857,6 +857,9 @@
                                (TF_RCVD_SCALE|TF_REQ_SCALE)) {
                                tp->snd_scale = tp->requested_s_scale;
                                tp->rcv_scale = tp->request_r_scale;
+
+                               tp->snd_wnd <<= tp->snd_scale;
+                               tiwin = tp->snd_wnd;
                        }
                        /* Segment is acceptable, update cache if undefined. */
                        if (taop->tao_ccsent == 0)


   One can argue that this is not important given that TCP does slow-start in
   the first round-trip. Well, people are looking at rate-based pacing where
   you don't have to do slow-start. Also the above is important in LANs where
   FreeBSD doesn't use slow-start. In my case, I'm emulating a WAN to see
   the benefits of rate-based pacing and so is extremely important.


3) sbappend is unscalable. I've earlier posted this on freebsd-net and can
   be obtained from the archive from:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=58270+0+archive/1999/freebsd-net/19991010.freebsd-net
   Here's a suggested fix. Maintain one additional pointer to the last pkt
   in the send socket buffer (Alfred alreay has a patch for this available 
   from http://www.freebsd.org/~alfred/sockbuf-3.3-release.diff). However,
   this is not sufficient because for TCP, the data is maintained in a single
   chain of mbufs headed by a single packet header mbuf. Thus an additional
   pointer in each packet header mbuf is needed. Perhaps the m_pkthdr.rcvif
   field can be used for this purpose - this field is not used for outbound
   packets. One can modify the mbuf data structure to replace this field
   with a union whose other element has a name like m_pkttail. Otherwise if 
   increasing the length of the data structure is not a concern, then perhaps
   a completely new field can be added that can perhaps allow mbufs to be
   mainted in a Tailq.

4) FreeBSD-3.x onwards introduced a limit on the maximum number of sockets (can
   be viewed with 'sysctl kern.ipc.maxsockets' - its typically less than 5000
   and depends upon MAXUSERS). The reason for this limit was the new zone
   allocator scheme introduced in FreeBSD-3.x. I've shown in a prior paper 
   (http://www.cs.rice.edu/~aron/papers/rice-TR99-335.ps.gz) that
   a busy webserver can have upto 50000 open connections and so having just
   5000 sockets is going to have dismal performance with servers. The big
   number is due to connections in TCP TIME_WAIT state. The paper above also
   proposes an alternate fix where the TIME_WAIT state operates with minimal
   amount of state.

5) The interface queues need to be increased from the default of 50 packets
   (defined as IFQ_MAXLEN in sys/net/if.h). I normally increase this value to
   1000. A busy webserver can easily overflow the default of 50. 

   It is also important for my lab tests with WAN conditions (although this is
   not a case for increasing it in the general FreeBSD distribution). Consider
   a 100Mbps link with a round-trip delay of 100ms. It can hold upto 833
   packets. In a lab environment, these can be queued up in the driver and thus
   the need for higher interface queue. Additionally FreeBSD-3.x introduced a
   change to the fxp driver (in sys/pci/if_fxp.c) where it ignores the
   IFQ_MAXLEN setting for the output driver queue and instead sets it to the
   number of its own transmit buffers (127 by default).  I think this feature
   should be removed - the older FreeBSD-2.2.x used to only put more pkts (>
   127) in the driver once there was room - all others were queued up in the
   interface queue whose length was determined by IFQ_MAXLEN.

6) The value of SB_MAX (defined in sys/sys/socketvar.h) needs to be
   increased from the default of 256K. In my WAN experiments, the 
   bandwidth-delay product was 1250K - I think SB_MAX should be increased to 
   at least this value because high bandwidths in WANs are just around the 
   corner. Moreover, having this value of SB_MAX doesn't mean that this 
   memory is going to be reserved for each socket - only that applications 
   that need such memory can use it.



I earlier posted some additional tuning parameters wrt running webservers on
FreeBSD. These are available from:

http://docs.freebsd.org/cgi/getmsg.cgi?fetch=131178+0+archive/1999/freebsd-net/19990725.freebsd-net





- Mohit


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911102211.QAA12891>