Date: Wed, 23 Apr 2008 01:47:13 +0200 From: Andre Oppermann <andre@freebsd.org> To: Mark Hills <mark@pogo.org.uk> Cc: Peter Jeremy <peterjeremy@optushome.com.au>, freebsd-net@freebsd.org Subject: Re: read() returns ETIMEDOUT on steady TCP connection Message-ID: <480E7901.5000804@freebsd.org> In-Reply-To: <480C9AC6.8090802@freebsd.org> References: <alpine.BSO.1.10.0804191437400.21362@zrgural.vwaro.pbz> <20080420025010.GJ73016@server.vk2pj.dyndns.org> <alpine.BSO.1.10.0804201238480.31900@zrgural.vwaro.pbz> <480BBD7E.8010700@freebsd.org> <alpine.BSO.1.10.0804210740100.1745@zrgural.vwaro.pbz> <480C9AC6.8090802@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Andre Oppermann wrote: > Mark Hills wrote: >> On Mon, 21 Apr 2008, Andre Oppermann wrote: >> >>> Mark Hills wrote: >>>> On Sun, 20 Apr 2008, Peter Jeremy wrote: >> >>>>> I can't explain the problem but it definitely looks like a resource >>>>> starvation issue within the kernel. >>>> >>>> I've traced the source of the ETIMEDOUT within the kernel to >>>> tcp_timer_rexmt() in tcp_timer.c: >>>> >>>> if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) { >>>> tp->t_rxtshift = TCP_MAXRXTSHIFT; >>>> tcpstat.tcps_timeoutdrop++; >>>> tp = tcp_drop(tp, tp->t_softerror ? >>>> tp->t_softerror : ETIMEDOUT); >>>> goto out; >>>> } >>> >>> Yes, this is related to either lack of mbufs to create a segment >>> or a problem in sending it. That may be full interface queue, a >>> bandwidth manager (dummynet) or some firewall internally rejecting >>> the segment (ipfw, pf). Do you run any firewall in stateful mode? >> >> There's no firewall running. >> >>>> I'm new to FreeBSD, but it seems to implies that it's reaching a >>>> limit of a number of retransmits of sending ACKs on the TCP >>>> connection receiving the inbound data? But I checked this using >>>> tcpdump on the server and could see no retransmissions. >>> >>> When you have internal problems the segment never makes it to the >>> wire and thus you wont see it in tcpdump. >>> >>> Please report the output of 'netstat -s -p tcp' and 'netstat -m'. >> >> Posted below. You can see it it in there: "131 connections dropped by >> rexmit timeout" >> >>>> As a test, I ran a simulation with the necessary changes to increase >>>> TCP_MAXRXTSHIFT (including adding appropriate entries to >>>> tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to >>>> reduce the frequency of the problem occurring, but not to a usable >>>> level. >>> >>> Possible causes are timers that fire too early. Resource starvation >>> (you are doing a lot of traffic). Or of course some bug in the code. >> >> As I said in my original email, the data transfer doesn't stop or >> splutter, it's simply cut mid-flow. Sounds like something happening >> prematurely. >> >> Thanks for the help, > > The output doesn't show any obvious problems. I have to write some > debug code to run on your system. I'll do that later today if time > permits. Otherwise tomorrow. http://people.freebsd.org/~andre/tcp_output-error-log.diff Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1 and report any output. You likely get some (normal) noise from syncache. What we are looking for is reports from tcp_output. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?480E7901.5000804>