From owner-freebsd-net@FreeBSD.ORG  Mon Apr 21 13:46:47 2008
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BCF1F106564A
	for <freebsd-net@freebsd.org>; Mon, 21 Apr 2008 13:46:47 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1D4B78FC27
	for <freebsd-net@freebsd.org>; Mon, 21 Apr 2008 13:46:46 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: (qmail 15796 invoked from network); 21 Apr 2008 12:51:35 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <mark@pogo.org.uk>; 21 Apr 2008 12:51:35 -0000
Message-ID: <480C9AC6.8090802@freebsd.org>
Date: Mon, 21 Apr 2008 15:46:46 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: Mark Hills <mark@pogo.org.uk>
References: <alpine.BSO.1.10.0804191437400.21362@zrgural.vwaro.pbz>
	<20080420025010.GJ73016@server.vk2pj.dyndns.org>
	<alpine.BSO.1.10.0804201238480.31900@zrgural.vwaro.pbz>
	<480BBD7E.8010700@freebsd.org>
	<alpine.BSO.1.10.0804210740100.1745@zrgural.vwaro.pbz>
In-Reply-To: <alpine.BSO.1.10.0804210740100.1745@zrgural.vwaro.pbz>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Peter Jeremy <peterjeremy@optushome.com.au>, freebsd-net@freebsd.org
Subject: Re: read() returns ETIMEDOUT on steady TCP connection
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Apr 2008 13:46:47 -0000

Mark Hills wrote:
> On Mon, 21 Apr 2008, Andre Oppermann wrote:
> 
>> Mark Hills wrote:
>>> On Sun, 20 Apr 2008, Peter Jeremy wrote:
> 
>>>> I can't explain the problem but it definitely looks like a resource
>>>> starvation issue within the kernel.
>>>
>>> I've traced the source of the ETIMEDOUT within the kernel to 
>>> tcp_timer_rexmt() in tcp_timer.c:
>>>
>>>   if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
>>>           tp->t_rxtshift = TCP_MAXRXTSHIFT;
>>>           tcpstat.tcps_timeoutdrop++;
>>>           tp = tcp_drop(tp, tp->t_softerror ?
>>>                         tp->t_softerror : ETIMEDOUT);
>>>           goto out;
>>>   }
>>
>> Yes, this is related to either lack of mbufs to create a segment
>> or a problem in sending it.  That may be full interface queue, a
>> bandwidth manager (dummynet) or some firewall internally rejecting
>> the segment (ipfw, pf).  Do you run any firewall in stateful mode?
> 
> There's no firewall running.
> 
>>> I'm new to FreeBSD, but it seems to implies that it's reaching a 
>>> limit of a number of retransmits of sending ACKs on the TCP 
>>> connection receiving the inbound data? But I checked this using 
>>> tcpdump on the server and could see no retransmissions.
>>
>> When you have internal problems the segment never makes it to the
>> wire and thus you wont see it in tcpdump.
>>
>> Please report the output of 'netstat -s -p tcp' and 'netstat -m'.
> 
> Posted below. You can see it it in there: "131 connections dropped by 
> rexmit timeout"
> 
>>> As a test, I ran a simulation with the necessary changes to increase 
>>> TCP_MAXRXTSHIFT (including adding appropriate entries to 
>>> tcp_sync_backoff[] and tcp_backoff[]) and it appeared I was able to 
>>> reduce the frequency of the problem occurring, but not to a usable 
>>> level.
>>
>> Possible causes are timers that fire too early.  Resource starvation
>> (you are doing a lot of traffic).  Or of course some bug in the code.
> 
> As I said in my original email, the data transfer doesn't stop or 
> splutter, it's simply cut mid-flow. Sounds like something happening 
> prematurely.
> 
> Thanks for the help,

The output doesn't show any obvious problems.  I have to write some
debug code to run on your system.  I'll do that later today if time
permits.  Otherwise tomorrow.

-- 
Andre

> Mark
> 
> 
> 
> $ netstat -m
> 14632/8543/23175 mbufs in use (current/cache/total)
> 504/4036/4540/25600 mbuf clusters in use (current/cache/total/max)
> 504/3976 mbuf+clusters out of packet secondary zone in use (current/cache)
> 12550/250/12800/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 54866K/11207K/66073K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/6/6656 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> $ netstat -s -p tcp
> tcp:
>        3408601864 packets sent
>                3382078274 data packets (1431587515 bytes)
>                454189057 data packets (1209708476 bytes) retransmitted
>                14969051 data packets unnecessarily retransmitted
>                0 resends initiated by MTU discovery
>                2216740 ack-only packets (9863 delayed)
>                0 URG only packets
>                0 window probe packets
>                273815 window update packets
>                35946 control packets
>        2372591976 packets received
>                1991632669 acks (for 2122913190 bytes)
>                16032443 duplicate acks
>                0 acks for unsent data
>                1719033 packets (1781984933 bytes) received in-sequence
>                1404 completely duplicate packets (197042 bytes)
>                1 old duplicate packet
>                54 packets with some dup. data (6403 bytes duped)
>                9858 out-of-order packets (9314285 bytes)
>                0 packets (0 bytes) of data after window
>                0 window probes
>                363132176 window update packets
>                3 packets received after close
>                0 discarded for bad checksums
>                0 discarded for bad header offset fields
>                0 discarded because packet too short
>                635 discarded due to memory problems
>        39 connection requests
>        86333 connection accepts
>        0 bad connection attempts
>        2256 listen queue overflows
>        8557 ignored RSTs in the windows
>        86369 connections established (including accepts)
>        83380 connections closed (including 31174 drops)
>                74004 connections updated cached RTT on close
>                74612 connections updated cached RTT variance on close
>                74591 connections updated cached ssthresh on close
>        3 embryonic connections dropped
>        1979184038 segments updated rtt (of 1729113221 attempts)
>        110108313 retransmit timeouts
>                131 connections dropped by rexmit timeout
>        1 persist timeout
>                0 connections dropped by persist timeout
>        0 Connections (fin_wait_2) dropped because of timeout
>        23 keepalive timeouts
>                22 keepalive probes sent
>                1 connection dropped by keepalive
>        320 correct ACK header predictions
>        1638923 correct data packet header predictions
>        87976 syncache entries added
>                182 retransmitted
>                111 dupsyn
>                13061 dropped
>                86333 completed
>                8907 bucket overflow
>                0 cache overflow
>                1 reset
>                0 stale
>                2256 aborted
>                0 badack
>                0 unreach
>                0 zone failures
>        101037 cookies sent
>        9521 cookies received
>        36888 SACK recovery episodes
>        68139 segment rexmits in SACK recovery episodes
>        98390670 byte rexmits in SACK recovery episodes
>        243196 SACK options (SACK blocks) received
>        2853 SACK options (SACK blocks) sent
>        0 SACK scoreboard overflow
> 
>