From owner-freebsd-hackers Sun Jul 15 2:44:11 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 1610F37B403 for ; Sun, 15 Jul 2001 02:43:45 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.4/8.11.2) id f6F9hhx06763; Sun, 15 Jul 2001 02:43:43 -0700 (PDT) (envelope-from dillon) Date: Sun, 15 Jul 2001 02:43:43 -0700 (PDT) From: Matt Dillon Message-Id: <200107150943.f6F9hhx06763@earth.backplane.com> To: Matt Dillon Cc: Leo Bicknell , Drew Eckhardt , hackers@FreeBSD.ORG Subject: eXperimental bandwidth delay product code (was Re: Network performance tuning.) References: <200107130128.f6D1SFE59148@earth.backplane.com> <200107130217.f6D2HET67695@revolt.poohsticks.org> <20010712223042.A77503@ussenterprise.ufp.org> <200107131708.f6DH8ve65071@earth.backplane.com> <20010713132903.A21847@ussenterprise.ufp.org> <200107131847.f6DIlJv67457@earth.backplane.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, here is a patch set that tries to adjust the transmit congestion window and socket buffer space according to the bandwidth product of the link. THIS PATCH IS AGAINST STABLE! I make calculations based on bandwidth and round-trip-time. I spent a lot of time trying to write an algorithm that just used one or the other, but it turns out that bandwidth is only a stable metric when you are reducing the window, and rtt is only a stable metric when you are increasing the window. The algorithm is basically: decrease the window until we notice that the throughput is going down, then increase the window until we notice the RTT is going up (indicating buffering in the network). However, it took quite a few hours for me to find something that worked across a wide range of bandwidths and pipe delays. I had to deal with oscillations at high bandwidths, instability with the metrics being used in certain situations, and calculation overshoot and undershoot due to averaging. The biggest breakthrough occured when I stopped trying to time the code based on each ack coming back but instead timed it based on the round-trip-time interval (using the rtt calculation to trigger the windowing code). I used dummynet (aka 'ipfw pipe') as well as my LAN and two T1's two test it. sysctl's: net.inet.tcp.tcp_send_dynamic_enable 0 - disabled (old behavior) (default) 1 - enabled, no debugging output 2 - enabled, debug output to console (only really useful when testing one or two connections). net.inet.tcp.tcp_send_dynamic_min min buffering (4096 default) This parameter specifies the absolute smallest buffer size the dynamic windowing code will go down to. The default is 4096 bytes. You may want to set this to 4096 or 8192 to avoid degenerate conditions on very high speed networks, or if you want to enforce a minimum amount of socket buffering. I got some pretty awesome results when I tested it... I was able to create a really slow, low bandwidth dummynet link, start a transfer that utilizes 100% of the bandwidth, and I could still type in another xterm window that went through the same dummynet. There are immediate uses for something like this for people who have modem links, not to mention many other reasons. -Matt Index: kern/uipc_socket.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.68.2.16 diff -u -r1.68.2.16 uipc_socket.c --- kern/uipc_socket.c 2001/06/14 20:46:06 1.68.2.16 +++ kern/uipc_socket.c 2001/07/13 04:05:38 @@ -519,12 +519,44 @@ snderr(so->so_proto->pr_flags & PR_CONNREQUIRED ? ENOTCONN : EDESTADDRREQ); } - space = sbspace(&so->so_snd); + + /* + * Calculate the optimal write-buffer size and then reduce + * by the amount already in use. Special handling is required + * to ensure that atomic writes still work as expected. + * + * Note: pru_sendpipe() only returns the optimal transmission + * pipe size, which is roughly equivalent to what can be + * transmitted and unacked. To avoid excessive process + * wakeups we double the returned value for our recommended + * buffer size. + */ + if (so->so_proto->pr_usrreqs->pru_sendpipe == NULL) { + space = sbspace(&so->so_snd); + } else { + space = (*so->so_proto->pr_usrreqs->pru_sendpipe)(so) * 2; + if (atomic && space < resid + clen) + space = resid + clen; + if (space < so->so_snd.sb_lowat) + space = so->so_snd.sb_lowat; + if (space > so->so_snd.sb_hiwat) + space = so->so_snd.sb_hiwat; + space = sbspace_using(&so->so_snd, space); + } + if (flags & MSG_OOB) space += 1024; + + /* + * Error out if the request is impossible to satisfy. + */ if ((atomic && resid > so->so_snd.sb_hiwat) || clen > so->so_snd.sb_hiwat) snderr(EMSGSIZE); + + /* + * Block if necessary. + */ if (space < resid + clen && uio && (atomic || space < so->so_snd.sb_lowat || space < clen)) { if (so->so_state & SS_NBIO) @@ -537,6 +569,7 @@ goto restart; } splx(s); + mp = ⊤ space -= clen; do { Index: kern/uipc_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.54.2.5 diff -u -r1.54.2.5 uipc_usrreq.c --- kern/uipc_usrreq.c 2001/03/05 13:09:01 1.54.2.5 +++ kern/uipc_usrreq.c 2001/07/13 03:56:02 @@ -427,7 +427,7 @@ uipc_connect2, pru_control_notsupp, uipc_detach, uipc_disconnect, uipc_listen, uipc_peeraddr, uipc_rcvd, pru_rcvoob_notsupp, uipc_send, uipc_sense, uipc_shutdown, uipc_sockaddr, - sosend, soreceive, sopoll + sosend, soreceive, sopoll, pru_sendpipe_notsupp }; /* Index: net/raw_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/net/raw_usrreq.c,v retrieving revision 1.18 diff -u -r1.18 raw_usrreq.c --- net/raw_usrreq.c 1999/08/28 00:48:28 1.18 +++ net/raw_usrreq.c 2001/07/13 03:56:12 @@ -296,5 +296,5 @@ pru_connect2_notsupp, pru_control_notsupp, raw_udetach, raw_udisconnect, pru_listen_notsupp, raw_upeeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, raw_usend, pru_sense_null, raw_ushutdown, - raw_usockaddr, sosend, soreceive, sopoll + raw_usockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; Index: net/rtsock.c =================================================================== RCS file: /home/ncvs/src/sys/net/rtsock.c,v retrieving revision 1.44.2.4 diff -u -r1.44.2.4 rtsock.c --- net/rtsock.c 2001/07/11 09:37:37 1.44.2.4 +++ net/rtsock.c 2001/07/13 03:56:16 @@ -266,7 +266,7 @@ pru_connect2_notsupp, pru_control_notsupp, rts_detach, rts_disconnect, pru_listen_notsupp, rts_peeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, rts_send, pru_sense_null, rts_shutdown, rts_sockaddr, - sosend, soreceive, sopoll + sosend, soreceive, sopoll, pru_sendpipe_notsupp }; /*ARGSUSED*/ Index: netatalk/ddp_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netatalk/ddp_usrreq.c,v retrieving revision 1.17 diff -u -r1.17 ddp_usrreq.c --- netatalk/ddp_usrreq.c 1999/04/27 12:21:14 1.17 +++ netatalk/ddp_usrreq.c 2001/07/13 03:56:25 @@ -581,5 +581,6 @@ at_setsockaddr, sosend, soreceive, - sopoll + sopoll, + pru_sendpipe_notsupp }; Index: netatm/atm_aal5.c =================================================================== RCS file: /home/ncvs/src/sys/netatm/atm_aal5.c,v retrieving revision 1.6 diff -u -r1.6 atm_aal5.c --- netatm/atm_aal5.c 1999/10/09 23:24:59 1.6 +++ netatm/atm_aal5.c 2001/07/13 03:56:40 @@ -101,7 +101,8 @@ atm_aal5_sockaddr, /* pru_sockaddr */ sosend, /* pru_sosend */ soreceive, /* pru_soreceive */ - sopoll /* pru_sopoll */ + sopoll, /* pru_sopoll */ + pru_sendpipe_notsupp /* pru_sendpipe */ }; #endif Index: netatm/atm_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netatm/atm_usrreq.c,v retrieving revision 1.6 diff -u -r1.6 atm_usrreq.c --- netatm/atm_usrreq.c 1999/08/28 00:48:39 1.6 +++ netatm/atm_usrreq.c 2001/07/13 03:58:57 @@ -73,6 +73,10 @@ pru_sense_null, /* pru_sense */ atm_proto_notsupp1, /* pru_shutdown */ atm_proto_notsupp3, /* pru_sockaddr */ + NULL, /* pru_sosend */ + NULL, /* pru_soreceive */ + NULL, /* pru_sopoll */ + pru_sendpipe_notsupp /* pru_sendpipe */ }; #endif Index: netgraph/ng_socket.c =================================================================== RCS file: /home/ncvs/src/sys/netgraph/ng_socket.c,v retrieving revision 1.11.2.3 diff -u -r1.11.2.3 ng_socket.c --- netgraph/ng_socket.c 2001/02/02 11:59:27 1.11.2.3 +++ netgraph/ng_socket.c 2001/07/13 03:59:30 @@ -907,7 +907,8 @@ ng_setsockaddr, sosend, soreceive, - sopoll + sopoll, + pru_sendpipe_notsupp }; static struct pr_usrreqs ngd_usrreqs = { @@ -930,7 +931,8 @@ ng_setsockaddr, sosend, soreceive, - sopoll + sopoll, + pru_sendpipe_notsupp }; /* Index: netinet/ip_divert.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/ip_divert.c,v retrieving revision 1.42.2.3 diff -u -r1.42.2.3 ip_divert.c --- netinet/ip_divert.c 2001/02/27 09:41:15 1.42.2.3 +++ netinet/ip_divert.c 2001/07/13 03:59:47 @@ -540,5 +540,5 @@ pru_connect_notsupp, pru_connect2_notsupp, in_control, div_detach, div_disconnect, pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, div_send, pru_sense_null, div_shutdown, - in_setsockaddr, sosend, soreceive, sopoll + in_setsockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; Index: netinet/raw_ip.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/raw_ip.c,v retrieving revision 1.64.2.6 diff -u -r1.64.2.6 raw_ip.c --- netinet/raw_ip.c 2001/07/03 11:01:46 1.64.2.6 +++ netinet/raw_ip.c 2001/07/13 03:59:56 @@ -680,5 +680,5 @@ pru_connect2_notsupp, in_control, rip_detach, rip_disconnect, pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, rip_send, pru_sense_null, rip_shutdown, - in_setsockaddr, sosend, soreceive, sopoll + in_setsockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; Index: netinet/tcp_input.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_input.c,v retrieving revision 1.107.2.15 diff -u -r1.107.2.15 tcp_input.c --- netinet/tcp_input.c 2001/07/08 02:21:43 1.107.2.15 +++ netinet/tcp_input.c 2001/07/15 09:23:07 @@ -132,6 +132,14 @@ &drop_synfin, 0, "Drop TCP packets with SYN+FIN set"); #endif +int tcp_send_dynamic_enable = 0; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, tcp_send_dynamic_enable, CTLFLAG_RW, + &tcp_send_dynamic_enable, 0, "enable dynamic control of sendspace"); +int tcp_send_dynamic_min = 4096; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, tcp_send_dynamic_min, CTLFLAG_RW, + &tcp_send_dynamic_min, 0, "set minimum dynamic buffer space"); + + struct inpcbhead tcb; #define tcb6 tcb /* for KAME src sync over BSD*'s */ struct inpcbinfo tcbinfo; @@ -142,8 +150,9 @@ struct tcphdr *, struct mbuf *, int)); static int tcp_reass __P((struct tcpcb *, struct tcphdr *, int *, struct mbuf *)); -static void tcp_xmit_timer __P((struct tcpcb *, int)); +static void tcp_xmit_timer __P((struct tcpcb *, int, tcp_seq)); static int tcp_newreno __P((struct tcpcb *, struct tcphdr *)); +static void tcp_ack_dynamic_cwnd(struct tcpcb *tp, struct socket *so); /* Neighbor Discovery, Neighbor Unreachability Detection Upper layer hint. */ #ifdef INET6 @@ -931,12 +940,16 @@ tp->snd_nxt = tp->snd_max; tp->t_badrxtwin = 0; } - if ((to.to_flag & TOF_TS) != 0) - tcp_xmit_timer(tp, - ticks - to.to_tsecr + 1); - else if (tp->t_rtttime && - SEQ_GT(th->th_ack, tp->t_rtseq)) - tcp_xmit_timer(tp, ticks - tp->t_rtttime); + /* + * note: do not include a sequence number + * for anything but t_rtttime timings, see + * tcp_xmit_timer(). + */ + if (tp->t_rtttime && + SEQ_GT(th->th_ack, tp->t_rtseq)) + tcp_xmit_timer(tp, tp->t_rtttime, tp->t_rtseq); + else if ((to.to_flag & TOF_TS) != 0) + tcp_xmit_timer(tp, to.to_tsecr - 1, 0); acked = th->th_ack - tp->snd_una; tcpstat.tcps_rcvackpack++; tcpstat.tcps_rcvackbyte += acked; @@ -1927,11 +1940,14 @@ * Since we now have an rtt measurement, cancel the * timer backoff (cf., Phil Karn's retransmit alg.). * Recompute the initial retransmit timer. + * + * note: do not include a sequence number for anything + * but t_rtttime timings, see tcp_xmit_timer(). */ - if (to.to_flag & TOF_TS) - tcp_xmit_timer(tp, ticks - to.to_tsecr + 1); - else if (tp->t_rtttime && SEQ_GT(th->th_ack, tp->t_rtseq)) - tcp_xmit_timer(tp, ticks - tp->t_rtttime); + if (tp->t_rtttime && SEQ_GT(th->th_ack, tp->t_rtseq)) + tcp_xmit_timer(tp, tp->t_rtttime, tp->t_rtseq); + else if (to.to_flag & TOF_TS) + tcp_xmit_timer(tp, to.to_tsecr - 1, 0); /* * If all outstanding data is acked, stop retransmit @@ -1955,25 +1971,40 @@ /* * When new data is acked, open the congestion window. - * If the window gives us less than ssthresh packets - * in flight, open exponentially (maxseg per packet). - * Otherwise open linearly: maxseg per window - * (maxseg^2 / cwnd per packet). - */ - { - register u_int cw = tp->snd_cwnd; - register u_int incr = tp->t_maxseg; - - if (cw > tp->snd_ssthresh) - incr = incr * incr / cw; - /* + * We no longer use ssthresh because it just does not work + * right. Instead we try to avoid packet loss alltogether + * by avoiding excessive buffering of packet data in the + * network. + * * If t_dupacks != 0 here, it indicates that we are still * in NewReno fast recovery mode, so we leave the congestion * window alone. */ - if (tcp_do_newreno == 0 || tp->t_dupacks == 0) - tp->snd_cwnd = min(cw + incr,TCP_MAXWIN<snd_scale); + + if (tcp_do_newreno == 0 || tp->t_dupacks == 0) { + if (tp->t_txbandwidth && tcp_send_dynamic_enable) { + tcp_ack_dynamic_cwnd(tp, so); + } else { + int incr = tp->t_maxseg; + if (tp->snd_cwnd > tp->snd_ssthresh) + incr = incr * incr / tp->snd_cwnd; + tp->snd_cwnd += incr; + } + /* + * Enforce the minimum and maximum congestion window. + * Remember, this whole section is hit when we get a + * good ack so our window is at least 2 packets. + */ + if (tp->snd_cwnd > (TCP_MAXWIN << tp->snd_scale)) + tp->snd_cwnd = TCP_MAXWIN << tp->snd_scale; + if (tp->snd_cwnd < tp->t_maxseg * 2) + tp->snd_cwnd = tp->t_maxseg * 2; } + + /* + * Clean out buffered transmit data that we no longer need + * to keep around. + */ if (acked > so->so_snd.sb_cc) { tp->snd_wnd -= so->so_snd.sb_cc; sbdrop(&so->so_snd, (int)so->so_snd.sb_cc); @@ -2531,19 +2562,135 @@ panic("tcp_pulloutofband"); } +/* + * Dynamically adjust the congestion window. The sweet spot is slightly + * higher then the point where the bandwidth begins to degrade. Beyond + * that and the extra packets wind up being buffered in the network. + * + * We use an assymetric algorithm. We increase the window until we see + * a 5% increase the round-trip-time (SRTT). We then assume that this is + * the saturation point and decrease the window until we see a loss in + * bandwidth. + * + * This routine is master-timed off the round-trip time of the packet, + * allowing us to count round trips. Since bandwidth changes need at + * least an rtt cycle to occur, this is much better then counting packets + * and should be independant of bandwidth, pipe size, etc... + */ + +#define CWND_COUNT_START 2*1 +#define CWND_COUNT_DECR 2*3 +#define CWND_COUNT_INCR (CWND_COUNT_DECR + 2*8) +#define CWND_COUNT_STABILIZED (CWND_COUNT_INCR + 2*4) +#define CWND_COUNT_IMPROVING (CWND_COUNT_STABILIZED + 2*2) +#define CWND_COUNT_NOT_IMPROVING (CWND_COUNT_IMPROVING + 2*8) + +static void +tcp_ack_dynamic_cwnd(struct tcpcb *tp, struct socket *so) +{ + /* + * Make adjustments only at every complete round trip. + */ + if ((tp->t_txbwcount & 1) == 0) + return; + ++tp->t_txbwcount; + if (tp->t_txbwcount == CWND_COUNT_START) { + /* + * Set a rtt performance loss target of 20% + */ + tp->t_last_txbandwidth = tp->t_srtt + tp->t_srtt / 5; + } else if (tp->t_txbwcount >= CWND_COUNT_DECR && + tp->t_txbwcount < CWND_COUNT_INCR && + tp->t_srtt < tp->t_last_txbandwidth) { + /* + * Increase cwnd in maxseg chunks until we hit our target. + * The target represents the point where packets are starting + * to be buffered significantly in the network. + */ + tp->snd_cwnd += tp->t_maxseg; + tp->t_txbwcount = CWND_COUNT_START; + + /* + * snap target, required to avoid oscillation at high + * bandwidths + */ + if (tp->t_last_txbandwidth > tp->t_srtt + tp->t_srtt / 5) + tp->t_last_txbandwidth = tp->t_srtt + tp->t_srtt / 5; + /* + * Switch directions if we hit the top. + */ + if (tp->snd_cwnd >= so->so_snd.sb_hiwat || + tp->snd_cwnd >= (TCP_MAXWIN << tp->snd_scale)) { + tp->snd_cwnd = min(so->so_snd.sb_hiwat, (TCP_MAXWIN << tp->snd_scale)); + tp->t_txbwcount = CWND_COUNT_INCR - 2; + } + } else if (tp->t_txbwcount == CWND_COUNT_INCR) { + /* + * We hit 5% performance loss. Do nothing (wait until + * we stabilize). + */ + } else if (tp->t_txbwcount == CWND_COUNT_STABILIZED) { + /* + * srtt started to go up, we are at the pipe limit and + * must be at the maximum bandwidth. Reduce the window + * size until we loose 5% of our bandwidth. Use smaller + * chunks to avoid overshooting. + */ + tp->t_last_txbandwidth = tp->t_txbandwidth - tp->t_txbandwidth / 20; + tp->snd_cwnd -= tp->t_maxseg / 3; + } else if (tp->t_txbwcount >= CWND_COUNT_IMPROVING && + tp->t_txbandwidth > tp->t_last_txbandwidth) { + /* + * We saw an improvement, bump the window again, loop this + * state. If the pipeline isn't full then adding another + * packet should improve bandwidth by t_maxseg. Use seg / 4 + * to deal with any noise. + */ + tp->snd_cwnd -= tp->t_maxseg / 3; + + /* + * snap target, required to avoid oscillation at high + * bandwidths + */ + tp->t_txbwcount = CWND_COUNT_STABILIZED; + if (tp->t_last_txbandwidth < tp->t_txbandwidth - tp->t_txbandwidth / 20) + tp->t_last_txbandwidth = tp->t_txbandwidth - tp->t_txbandwidth / 20; + /* + * Switch directions if we hit bottom. + */ + if (tp->snd_cwnd < tcp_send_dynamic_min || + tp->snd_cwnd <= tp->t_maxseg * 2) { + tp->snd_cwnd = max(tcp_send_dynamic_min, tp->t_maxseg); + tp->t_txbwcount = 0; + } + } else if (tp->t_txbwcount >= CWND_COUNT_NOT_IMPROVING) { + /* + * No improvement, start upward again. loop to recalculate + * the -5%. We can recalculate immediately and do not require + * additional stabilization time. + */ + tp->snd_cwnd += tp->t_maxseg / 2; + tp->t_txbwcount = 0; + } +} + /* - * Collect new round-trip time estimate - * and update averages and current timeout. + * Collect new round-trip time estimate and update averages, current timeout, + * and transmit bandwidth. */ static void -tcp_xmit_timer(tp, rtt) +tcp_xmit_timer(tp, rtttime, rtseq) register struct tcpcb *tp; - int rtt; + int rtttime; + tcp_seq rtseq; { - register int delta; + int delta; + int rtt; tcpstat.tcps_rttupdated++; tp->t_rttupdated++; + + rtt = ticks - rtttime; if (tp->t_srtt != 0) { /* * srtt is stored as fixed point with 5 bits after the @@ -2582,8 +2729,30 @@ tp->t_srtt = rtt << TCP_RTT_SHIFT; tp->t_rttvar = rtt << (TCP_RTTVAR_SHIFT - 1); } - tp->t_rtttime = 0; tp->t_rxtshift = 0; + + /* + * Calculate the transmit-side throughput, in bytes/sec. This is + * used to dynamically size the congestion window to the pipe. We + * average over 2 packets only. rtseq is only passed for t_rtttime + * based timings, which in turn only occur on an interval close to + * the round trip time of the packet. We have to do this in order + * to get accurate bandwidths without having to take a long term + * average, which blows up the dynamic windowing algorithm. + */ + if (rtseq && rtt) { + tp->t_rtttime = 0; + if (tp->t_last_rtseq) { + int bw; + + bw = (rtseq - tp->t_last_rtseq) * hz / rtt; + bw = (tp->t_txbandwidth + bw) / 2; + tp->t_txbandwidth = bw; + tp->t_txbwcount |= 1; + } + tp->t_last_rtseq = rtseq; + tp->t_last_rtttime = rtttime; + } /* * the retransmit should happen at rtt + 4 * rttvar. Index: netinet/tcp_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.51.2.7 diff -u -r1.51.2.7 tcp_usrreq.c --- netinet/tcp_usrreq.c 2001/07/08 02:21:44 1.51.2.7 +++ netinet/tcp_usrreq.c 2001/07/15 05:31:52 @@ -494,6 +494,47 @@ } /* + * Calculate the optimal transmission pipe size. This is used to limit the + * amount of data we allow to be buffered in order to reduce memory use, + * allowing connections to dynamically adjust to the bandwidth product of + * their links. + * + * For tcp we return approximately the congestion window size, which + * winds up being the bandwidth delay product in a lossless environment. + */ +static int +tcp_usr_sendpipe(struct socket *so) +{ + struct inpcb *inp; + int size = so->so_snd.sb_hiwat; + + if (tcp_send_dynamic_enable && (inp = sotoinpcb(so)) != NULL) { + struct tcpcb *tp; + + if ((tp = intotcpcb(inp)) != NULL) { + size = tp->snd_cwnd; + if (size > tp->snd_wnd) + size = tp->snd_wnd; + + /* + * debugging & minimum transmit buffer availability + */ + if (tcp_send_dynamic_enable > 1) { + static int last_hz; + + if (last_hz != ticks / hz) { + last_hz = ticks / hz; + printf("tcp_user_sendpipe: size=%d bw=%d lbw=%d count=%d srtt=%d\n", size, tp->t_txbandwidth, tp->t_last_txbandwidth, tp->t_txbwcount, tp->t_srtt); + } + } + if (size < tcp_send_dynamic_min) + size = tcp_send_dynamic_min; + } + } + return(size); +} + +/* * Do a send by putting data in output queue and updating urgent * marker if URG set. Possibly send more data. Unlike the other * pru_*() routines, the mbuf chains are our responsibility. We @@ -674,7 +715,7 @@ tcp_usr_connect, pru_connect2_notsupp, in_control, tcp_usr_detach, tcp_usr_disconnect, tcp_usr_listen, in_setpeeraddr, tcp_usr_rcvd, tcp_usr_rcvoob, tcp_usr_send, pru_sense_null, tcp_usr_shutdown, - in_setsockaddr, sosend, soreceive, sopoll + in_setsockaddr, sosend, soreceive, sopoll, tcp_usr_sendpipe }; #ifdef INET6 @@ -683,7 +724,7 @@ tcp6_usr_connect, pru_connect2_notsupp, in6_control, tcp_usr_detach, tcp_usr_disconnect, tcp6_usr_listen, in6_mapped_peeraddr, tcp_usr_rcvd, tcp_usr_rcvoob, tcp_usr_send, pru_sense_null, tcp_usr_shutdown, - in6_mapped_sockaddr, sosend, soreceive, sopoll + in6_mapped_sockaddr, sosend, soreceive, sopoll, tcp_usr_sendpipe }; #endif /* INET6 */ Index: netinet/tcp_var.h =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_var.h,v retrieving revision 1.56.2.7 diff -u -r1.56.2.7 tcp_var.h --- netinet/tcp_var.h 2001/07/08 02:21:44 1.56.2.7 +++ netinet/tcp_var.h 2001/07/15 07:25:48 @@ -95,6 +95,7 @@ #define TF_SENDCCNEW 0x08000 /* send CCnew instead of CC in SYN */ #define TF_MORETOCOME 0x10000 /* More data to be appended to sock */ #define TF_LQ_OVERFLOW 0x20000 /* listen queue overflow */ +#define TF_BWSCANUP 0x40000 int t_force; /* 1 if forcing out a byte */ tcp_seq snd_una; /* send unacknowledged */ @@ -128,6 +129,11 @@ u_long t_starttime; /* time connection was established */ int t_rtttime; /* round trip time */ tcp_seq t_rtseq; /* sequence number being timed */ + int t_last_rtttime; + tcp_seq t_last_rtseq; /* sequence number being timed */ + int t_txbandwidth; /* transmit bandwidth/delay */ + int t_last_txbandwidth; + int t_txbwcount; int t_rxtcur; /* current retransmit value (ticks) */ u_int t_maxseg; /* maximum segment size */ @@ -371,6 +377,8 @@ extern int tcp_do_newreno; extern int ss_fltsz; extern int ss_fltsz_local; +extern int tcp_send_dynamic_enable; +extern int tcp_send_dynamic_min; void tcp_canceltimers __P((struct tcpcb *)); struct tcpcb * Index: netinet/udp_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/udp_usrreq.c,v retrieving revision 1.64.2.11 diff -u -r1.64.2.11 udp_usrreq.c --- netinet/udp_usrreq.c 2001/07/03 11:01:47 1.64.2.11 +++ netinet/udp_usrreq.c 2001/07/13 04:00:17 @@ -923,6 +923,6 @@ pru_connect2_notsupp, in_control, udp_detach, udp_disconnect, pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, udp_send, pru_sense_null, udp_shutdown, - in_setsockaddr, sosend, soreceive, sopoll + in_setsockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; Index: netinet6/raw_ip6.c =================================================================== RCS file: /home/ncvs/src/sys/netinet6/raw_ip6.c,v retrieving revision 1.7.2.3 diff -u -r1.7.2.3 raw_ip6.c --- netinet6/raw_ip6.c 2001/07/03 11:01:55 1.7.2.3 +++ netinet6/raw_ip6.c 2001/07/13 04:00:25 @@ -733,5 +733,5 @@ pru_connect2_notsupp, in6_control, rip6_detach, rip6_disconnect, pru_listen_notsupp, in6_setpeeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, rip6_send, pru_sense_null, rip6_shutdown, - in6_setsockaddr, sosend, soreceive, sopoll + in6_setsockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; Index: netipx/ipx_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netipx/ipx_usrreq.c,v retrieving revision 1.26.2.1 diff -u -r1.26.2.1 ipx_usrreq.c --- netipx/ipx_usrreq.c 2001/02/22 09:44:18 1.26.2.1 +++ netipx/ipx_usrreq.c 2001/07/13 04:00:38 @@ -89,7 +89,7 @@ ipx_connect, pru_connect2_notsupp, ipx_control, ipx_detach, ipx_disconnect, pru_listen_notsupp, ipx_peeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, ipx_send, pru_sense_null, ipx_shutdown, - ipx_sockaddr, sosend, soreceive, sopoll + ipx_sockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; struct pr_usrreqs ripx_usrreqs = { @@ -97,7 +97,7 @@ ipx_connect, pru_connect2_notsupp, ipx_control, ipx_detach, ipx_disconnect, pru_listen_notsupp, ipx_peeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, ipx_send, pru_sense_null, ipx_shutdown, - ipx_sockaddr, sosend, soreceive, sopoll + ipx_sockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; /* Index: netipx/spx_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netipx/spx_usrreq.c,v retrieving revision 1.27.2.1 diff -u -r1.27.2.1 spx_usrreq.c --- netipx/spx_usrreq.c 2001/02/22 09:44:18 1.27.2.1 +++ netipx/spx_usrreq.c 2001/07/13 04:00:46 @@ -107,7 +107,7 @@ spx_connect, pru_connect2_notsupp, ipx_control, spx_detach, spx_usr_disconnect, spx_listen, ipx_peeraddr, spx_rcvd, spx_rcvoob, spx_send, pru_sense_null, spx_shutdown, - ipx_sockaddr, sosend, soreceive, sopoll + ipx_sockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; struct pr_usrreqs spx_usrreq_sps = { @@ -115,7 +115,7 @@ spx_connect, pru_connect2_notsupp, ipx_control, spx_detach, spx_usr_disconnect, spx_listen, ipx_peeraddr, spx_rcvd, spx_rcvoob, spx_send, pru_sense_null, spx_shutdown, - ipx_sockaddr, sosend, soreceive, sopoll + ipx_sockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; void Index: netkey/keysock.c =================================================================== RCS file: /home/ncvs/src/sys/netkey/keysock.c,v retrieving revision 1.1.2.2 diff -u -r1.1.2.2 keysock.c --- netkey/keysock.c 2001/07/03 11:02:00 1.1.2.2 +++ netkey/keysock.c 2001/07/13 04:00:51 @@ -586,7 +586,7 @@ key_disconnect, pru_listen_notsupp, key_peeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, key_send, pru_sense_null, key_shutdown, - key_sockaddr, sosend, soreceive, sopoll + key_sockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; /* sysctl */ Index: netnatm/natm.c =================================================================== RCS file: /home/ncvs/src/sys/netnatm/natm.c,v retrieving revision 1.12 diff -u -r1.12 natm.c --- netnatm/natm.c 2000/02/13 03:32:03 1.12 +++ netnatm/natm.c 2001/07/13 04:01:15 @@ -413,7 +413,7 @@ natm_usr_detach, natm_usr_disconnect, pru_listen_notsupp, natm_usr_peeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, natm_usr_send, pru_sense_null, natm_usr_shutdown, - natm_usr_sockaddr, sosend, soreceive, sopoll + natm_usr_sockaddr, sosend, soreceive, sopoll, pru_sendpipe_notsupp }; #else /* !FREEBSD_USRREQS */ Index: sys/protosw.h =================================================================== RCS file: /home/ncvs/src/sys/sys/protosw.h,v retrieving revision 1.28.2.2 diff -u -r1.28.2.2 protosw.h --- sys/protosw.h 2001/07/03 11:02:01 1.28.2.2 +++ sys/protosw.h 2001/07/13 04:02:15 @@ -228,6 +228,7 @@ struct mbuf **controlp, int *flagsp)); int (*pru_sopoll) __P((struct socket *so, int events, struct ucred *cred, struct proc *p)); + int (*pru_sendpipe) __P((struct socket *so)); }; int pru_accept_notsupp __P((struct socket *so, struct sockaddr **nam)); @@ -240,6 +241,7 @@ int pru_rcvd_notsupp __P((struct socket *so, int flags)); int pru_rcvoob_notsupp __P((struct socket *so, struct mbuf *m, int flags)); int pru_sense_null __P((struct socket *so, struct stat *sb)); +#define pru_sendpipe_notsupp NULL #endif /* _KERNEL */ Index: sys/socketvar.h =================================================================== RCS file: /home/ncvs/src/sys/sys/socketvar.h,v retrieving revision 1.46.2.5 diff -u -r1.46.2.5 socketvar.h --- sys/socketvar.h 2001/02/26 04:23:21 1.46.2.5 +++ sys/socketvar.h 2001/07/13 03:47:25 @@ -188,9 +188,11 @@ * still be negative (cc > hiwat or mbcnt > mbmax). Should detect * overflow and return 0. Should use "lmin" but it doesn't exist now. */ -#define sbspace(sb) \ - ((long) imin((int)((sb)->sb_hiwat - (sb)->sb_cc), \ +#define sbspace_using(sb, hiwat) \ + ((long) imin((int)((hiwat) - (sb)->sb_cc), \ (int)((sb)->sb_mbmax - (sb)->sb_mbcnt))) + +#define sbspace(sb) sbspace_using(sb, (sb)->sb_hiwat) /* do we have to send all at once on a socket? */ #define sosendallatonce(so) \ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message