From owner-freebsd-net Thu Jul 27 14:38:24 2000 Delivered-To: freebsd-net@freebsd.org Received: from bubba.whistle.com (bubba.whistle.com [207.76.205.7]) by hub.freebsd.org (Postfix) with ESMTP id 2DAF537B523 for ; Thu, 27 Jul 2000 14:38:20 -0700 (PDT) (envelope-from archie@whistle.com) Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id OAA11016 for freebsd-net@freebsd.org; Thu, 27 Jul 2000 14:38:18 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200007272138.OAA11016@bubba.whistle.com> Subject: sub-optimal tcp_ouput() performance in the face of ENOBUFS To: freebsd-net@freebsd.org Date: Thu, 27 Jul 2000 14:38:18 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL68 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Dear TCP gurus, It seems like there is sub-optimal behaviour in tcp_output(), and I'm wondering what other people think. Consider an output interface whose transmit queue is full. tcp_output() calls ip_output(), and it will return ENOBUFS. Here's where this is handled (tcp_output, line 863): error = ip_output(m, tp->t_inpcb->inp_options, &tp->t_inpcb->inp_route, (so->so_options & SO_DONTROUTE), 0); } if (error) { out: if (error == ENOBUFS) { if (!callout_active(tp->tt_rexmt) && !callout_active(tp->tt_persist)) callout_reset(tp->tt_rexmt, tp->t_rxtcur, tcp_timer_rexmt, tp); tcp_quench(tp->t_inpcb, 0); return (0); } The problem is that the 'next sequence number to transmit' variable (tp->snd_nxt) has already been advanced for the data in the dropped packet, back on line 758: tp->snd_nxt += len; The result is that even though TCP "knows" that the packet was lost, it fails to retransmit the data in the lost packet with the next packet output, instead skipping to the next chunk of data when it sends the next packet. Instead, it seems like tp->snd_nxt should be decremented back len bytes when ENOBUFS -- or any other error for that matter, if it represents a dropped packet -- is returned. I've witnessed this causing netperf to show a 40% performance slowdown. This is a problem because some links are very slow, and having a queue length large enough so that TCP never gets ENOBUFS means having very high potential latency, e.g., on the order of 10 seconds with a full output queue. So the choice becomes an unfortunate one between high bandwidth and low latency -- but not both at the same time. I haven't stared at the code long enough to come up with a proposed patch, because there is some interaction with timers, etc. I'd be happy to test one out if anyone else comes up with one in the meantime. Thanks, -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message