Date: Wed, 10 Nov 1999 16:11:49 -0600 (CST) From: Mohit Aron <aron@cs.rice.edu> To: freebsd-net@freebsd.org, wollman@freebsd.org, jlemon@freebsd.org, julian@freebsd.org, ee@freebsd.org, bright@wintelcom.net Subject: FreeBSD networking problems Message-ID: <199911102211.QAA12891@cs.rice.edu>
next in thread | raw e-mail | index | archive | help
Hi,
I've noticed several problems in networking performance in FreeBSD wrt
WAN conditions in the course of my experiments. I mailed them to Alfred
Perlstein who suggested that I post them to this list. I'm listing them below.
Problems with WAN emulation in lab environments:
1) FreeBSD tries to determine the max size of socket buffers from cached
routing information. This is done even after an application wants to set the
application buffer to a large value. The result is that you usually end up
having a socket buffer size that you got from an earlier TCP connection (say
telnet) which is usually very small. The code is related to the 'ifdef
RTV_SPIPE' and 'ifdef RTV_RPIPE' in sys/netinet/tcp_input.c. For my
experiments, I usually undefine RTV_SPIPE and RTV_RPIPE in tcp_input.c.
A more complete discussion is given in a PR that I filed a while back and
can be viewed from:
http://www.freebsd.org/cgi/query-pr.cgi?pr=11966
2) TCP Bug - the FreeBSD implementation does not scale the advertised window
immediately when it discovers that window scaling is being used. The result
is that irrespective of advertised window, in the first round-trip after
connection establishment, a FreeBSD TCP sender cannot send more data than
the unscaled value of advertised window. The fix is the following patch to
tcp_input.c (taken from FreeBSD-3.3-RELEASE):
--- /sys/netinet/tcp_input.c Sun Aug 29 11:29:54 1999
+++ tcp_input.c Wed Nov 10 15:39:49 1999
@@ -857,6 +857,9 @@
(TF_RCVD_SCALE|TF_REQ_SCALE)) {
tp->snd_scale = tp->requested_s_scale;
tp->rcv_scale = tp->request_r_scale;
+
+ tp->snd_wnd <<= tp->snd_scale;
+ tiwin = tp->snd_wnd;
}
/* Segment is acceptable, update cache if undefined. */
if (taop->tao_ccsent == 0)
One can argue that this is not important given that TCP does slow-start in
the first round-trip. Well, people are looking at rate-based pacing where
you don't have to do slow-start. Also the above is important in LANs where
FreeBSD doesn't use slow-start. In my case, I'm emulating a WAN to see
the benefits of rate-based pacing and so is extremely important.
3) sbappend is unscalable. I've earlier posted this on freebsd-net and can
be obtained from the archive from:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=58270+0+archive/1999/freebsd-net/19991010.freebsd-net
Here's a suggested fix. Maintain one additional pointer to the last pkt
in the send socket buffer (Alfred alreay has a patch for this available
from http://www.freebsd.org/~alfred/sockbuf-3.3-release.diff). However,
this is not sufficient because for TCP, the data is maintained in a single
chain of mbufs headed by a single packet header mbuf. Thus an additional
pointer in each packet header mbuf is needed. Perhaps the m_pkthdr.rcvif
field can be used for this purpose - this field is not used for outbound
packets. One can modify the mbuf data structure to replace this field
with a union whose other element has a name like m_pkttail. Otherwise if
increasing the length of the data structure is not a concern, then perhaps
a completely new field can be added that can perhaps allow mbufs to be
mainted in a Tailq.
4) FreeBSD-3.x onwards introduced a limit on the maximum number of sockets (can
be viewed with 'sysctl kern.ipc.maxsockets' - its typically less than 5000
and depends upon MAXUSERS). The reason for this limit was the new zone
allocator scheme introduced in FreeBSD-3.x. I've shown in a prior paper
(http://www.cs.rice.edu/~aron/papers/rice-TR99-335.ps.gz) that
a busy webserver can have upto 50000 open connections and so having just
5000 sockets is going to have dismal performance with servers. The big
number is due to connections in TCP TIME_WAIT state. The paper above also
proposes an alternate fix where the TIME_WAIT state operates with minimal
amount of state.
5) The interface queues need to be increased from the default of 50 packets
(defined as IFQ_MAXLEN in sys/net/if.h). I normally increase this value to
1000. A busy webserver can easily overflow the default of 50.
It is also important for my lab tests with WAN conditions (although this is
not a case for increasing it in the general FreeBSD distribution). Consider
a 100Mbps link with a round-trip delay of 100ms. It can hold upto 833
packets. In a lab environment, these can be queued up in the driver and thus
the need for higher interface queue. Additionally FreeBSD-3.x introduced a
change to the fxp driver (in sys/pci/if_fxp.c) where it ignores the
IFQ_MAXLEN setting for the output driver queue and instead sets it to the
number of its own transmit buffers (127 by default). I think this feature
should be removed - the older FreeBSD-2.2.x used to only put more pkts (>
127) in the driver once there was room - all others were queued up in the
interface queue whose length was determined by IFQ_MAXLEN.
6) The value of SB_MAX (defined in sys/sys/socketvar.h) needs to be
increased from the default of 256K. In my WAN experiments, the
bandwidth-delay product was 1250K - I think SB_MAX should be increased to
at least this value because high bandwidths in WANs are just around the
corner. Moreover, having this value of SB_MAX doesn't mean that this
memory is going to be reserved for each socket - only that applications
that need such memory can use it.
I earlier posted some additional tuning parameters wrt running webservers on
FreeBSD. These are available from:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=131178+0+archive/1999/freebsd-net/19990725.freebsd-net
- Mohit
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911102211.QAA12891>
