Date: Fri, 02 Feb 2007 08:12:39 -0600 From: Dave Baukus <david.baukus@us.fujitsu.com> To: freebsd-net@freebsd.org Cc: Dave Baukus <david.baukus@us.fujitsu.com> Subject: Re: ETIMEDOUT bug Message-ID: <45C346D7.4090305@us.fujitsu.com> In-Reply-To: <45C2765C.7010708@us.fujitsu.com> References: <45C2765C.7010708@us.fujitsu.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I realized, late last night, that I was wrong on a few details concerning this bug: 1.) The retransmit timer does not keep popping on without being restarted. 2.) ip_output() must return ENOBUFS (TCP_MAXRXTSHIFT + 1) times to the same, non-transmitting TCP. 3.) Given a TCP as described below, when tcp_output() uses ENOBUFS to blindly start the retransmit timer then tp->t_rxtshift will be falsely incremented and never cleared. Thus the bug manifests itself because it appears for a TCP that never transmits nobody ever clears clears tp->t_rxtshift; this allows tp->t_rxtshift to slowly count up to TCP_MAXRXTSHIFT; once TCP_MAXRXTSHIFT is exceeded tcp_timer_rexmt() will kill the poor innocent TCP. On 02/01/07 17:23, Dave Baukus wrote: > There is a bug tcp_output() for at least freeBSD6.1 > that causes a perfectly good TCP to be dropped by its > retransmit timer; the application receives ETIMEDOUT. > > Consider a TCP that never transmits (the receive end of the ttcp > utility is an example), while the TCP is established > snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit > timer should never be started. If the retransmit timer is started > then it is never stopped by tcp_input/tcp_out because > snd_max == snd_una == snd_nxt (always). Once started the > timer continues its count up till tp->t_rxtshift == 12 and > the connection that never transmitted gets falsely killed. > > The bug is to blindly rely on the return value of ip_output(). > If ip_output() returns ENOBUFS then the retransmit timer is > activated: > > From the end of tcp_output(): > out: > SOCKBUF_UNLOCK_ASSERT(&so->so_snd); /* Check gotos. */ > if (error == ENOBUFS) { > if (!callout_active(tp->tt_rexmt) && > !callout_active(tp->tt_persist)) > callout_reset(tp->tt_rexmt, tp->t_rxtcur, > tcp_timer_rexmt, tp); > tp->snd_cwnd = tp->t_maxseg; > return (0); > } > > My simple minded fix would be not to start the retransmit timer; > if tcp_output() wanted to time this transmit it would have started > the timer up above. > > This ETIMEDOUT problem is easily recreated on any old machine > using a single slow ethernet device and the ttcp test utility. > First, fire up a couple ttcp receivers. Second, flood the same > interface with enough ttcp transmitters to cause the driver's transmit > ring and interface queue to back up. Eventually, one of the ttcp > receives will get ENOBUFS from ip_output() and the retransmit > timer will be wrongly activated for a pure ACK segment. > > I was able to do it w/ the following on freeBSD6.1: > > box1: > ttcp -s -l 16384 -p 9444 -v -b 128000 -r > ttcp -s -l 16384 -p 9445 -v -b 128000 -r > ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13 > ttcp -s -n 9999999 -l 333 -p 9447 -v -b 128000 -t 192.168.222.13 > ttcp -s -n 9999999 -l 8192 -p 9448 -v -b 128000 -t 192.168.222.13 > ttcp -s -n 9999999 -l 333 -p 9449 -v -b 128000 -t 192.168.222.13 > ttcp -s -n 9999999 -l 8192 -p 9450 -v -b 128000 -t 192.168.222.13 > > box2: > ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t 192.168.222.222 > ttcp -s -n 9999999 -l 128 -p 9445 -v -b 128000 -t 192.168.222.222 > ttcp -s -l 16384 -p 9446 -v -b 128000 -r > ttcp -s -l 16384 -p 9447 -v -b 128000 -r > ttcp -s -l 16384 -p 9448 -v -b 128000 -r > ttcp -s -l 16384 -p 9449 -v -b 128000 -r > ttcp -s -l 16384 -p 9450 -v -b 128000 -r > -- Dave Baukus david.baukus@us.fujitsu.com 972-479-2491 Fujitsu Network Communications Richardson, Texas USA
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45C346D7.4090305>