Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Jul 2000 14:38:18 -0700 (PDT)
From:      Archie Cobbs <archie@whistle.com>
To:        freebsd-net@freebsd.org
Subject:   sub-optimal tcp_ouput() performance in the face of ENOBUFS
Message-ID:  <200007272138.OAA11016@bubba.whistle.com>

next in thread | raw e-mail | index | archive | help
Dear TCP gurus,

It seems like there is sub-optimal behaviour in tcp_output(),
and I'm wondering what other people think.

Consider an output interface whose transmit queue is full.
tcp_output() calls ip_output(), and it will return ENOBUFS.
Here's where this is handled (tcp_output, line 863):

	error = ip_output(m, tp->t_inpcb->inp_options, &tp->t_inpcb->inp_route,
	    (so->so_options & SO_DONTROUTE), 0);
    }
	if (error) {
out:
		if (error == ENOBUFS) {
	                if (!callout_active(tp->tt_rexmt) &&
                            !callout_active(tp->tt_persist))
	                        callout_reset(tp->tt_rexmt, tp->t_rxtcur,
                                      tcp_timer_rexmt, tp);
			tcp_quench(tp->t_inpcb, 0);
			return (0);
		}

The problem is that the 'next sequence number to transmit' variable
(tp->snd_nxt) has already been advanced for the data in the dropped
packet, back on line 758:

		tp->snd_nxt += len;

The result is that even though TCP "knows" that the packet was
lost, it fails to retransmit the data in the lost packet with the
next packet output, instead skipping to the next chunk of data when
it sends the next packet.

Instead, it seems like tp->snd_nxt should be decremented back len
bytes when ENOBUFS -- or any other error for that matter, if it
represents a dropped packet -- is returned.

I've witnessed this causing netperf to show a 40% performance slowdown.

This is a problem because some links are very slow, and having a
queue length large enough so that TCP never gets ENOBUFS means having
very high potential latency, e.g., on the order of 10 seconds with
a full output queue. So the choice becomes an unfortunate one between
high bandwidth and low latency -- but not both at the same time.

I haven't stared at the code long enough to come up with a proposed
patch, because there is some interaction with timers, etc.  I'd be
happy to test one out if anyone else comes up with one in the meantime.

Thanks,
-Archie

___________________________________________________________________________
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200007272138.OAA11016>