Date: Wed, 23 Apr 2008 01:47:13 +0200 From: Andre Oppermann <andre@freebsd.org> To: Mark Hills <mark@pogo.org.uk> Cc: Peter Jeremy <peterjeremy@optushome.com.au>, freebsd-net@freebsd.org Subject: Re: read() returns ETIMEDOUT on steady TCP connection Message-ID: <480E7901.5000804@freebsd.org> In-Reply-To: <480C9AC6.8090802@freebsd.org> References: <alpine.BSO.1.10.0804191437400.21362@zrgural.vwaro.pbz> <20080420025010.GJ73016@server.vk2pj.dyndns.org> <alpine.BSO.1.10.0804201238480.31900@zrgural.vwaro.pbz> <480BBD7E.8010700@freebsd.org> <alpine.BSO.1.10.0804210740100.1745@zrgural.vwaro.pbz> <480C9AC6.8090802@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Andre Oppermann wrote:
> Mark Hills wrote:
>> On Mon, 21 Apr 2008, Andre Oppermann wrote:
>>
>>> Mark Hills wrote:
>>>> On Sun, 20 Apr 2008, Peter Jeremy wrote:
>>
>>>>> I can't explain the problem but it definitely looks like a resource
>>>>> starvation issue within the kernel.
>>>>
>>>> I've traced the source of the ETIMEDOUT within the kernel to
>>>> tcp_timer_rexmt() in tcp_timer.c:
>>>>
>>>> if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
>>>> tp->t_rxtshift = TCP_MAXRXTSHIFT;
>>>> tcpstat.tcps_timeoutdrop++;
>>>> tp = tcp_drop(tp, tp->t_softerror ?
>>>> tp->t_softerror : ETIMEDOUT);
>>>> goto out;
>>>> }
>>>
>>> Yes, this is related to either lack of mbufs to create a segment
>>> or a problem in sending it. That may be full interface queue, a
>>> bandwidth manager (dummynet) or some firewall internally rejecting
>>> the segment (ipfw, pf). Do you run any firewall in stateful mode?
>>
>> There's no firewall running.
>>
>>>> I'm new to FreeBSD, but it seems to implies that it's reaching a
>>>> limit of a number of retransmits of sending ACKs on the TCP
>>>> connection receiving the inbound data? But I checked this using
>>>> tcpdump on the server and could see no retransmissions.
>>>
>>> When you have internal problems the segment never makes it to the
>>> wire and thus you wont see it in tcpdump.
>>>
>>> Please report the output of 'netstat -s -p tcp' and 'netstat -m'.
>>
>> Posted below. You can see it it in there: "131 connections dropped by
>> rexmit timeout"
>>
>>>> As a test, I ran a simulation with the necessary changes to increase
>>>> TCP_MAXRXTSHIFT (including adding appropriate entries to
>>>> tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to
>>>> reduce the frequency of the problem occurring, but not to a usable
>>>> level.
>>>
>>> Possible causes are timers that fire too early. Resource starvation
>>> (you are doing a lot of traffic). Or of course some bug in the code.
>>
>> As I said in my original email, the data transfer doesn't stop or
>> splutter, it's simply cut mid-flow. Sounds like something happening
>> prematurely.
>>
>> Thanks for the help,
>
> The output doesn't show any obvious problems. I have to write some
> debug code to run on your system. I'll do that later today if time
> permits. Otherwise tomorrow.
http://people.freebsd.org/~andre/tcp_output-error-log.diff
Please apply this patch and enable the sysctl net.inet.tcp.log_debug=1
and report any output. You likely get some (normal) noise from syncache.
What we are looking for is reports from tcp_output.
--
Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?480E7901.5000804>
