From owner-freebsd-net@FreeBSD.ORG Mon Feb 2 18:44:20 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6A381065672 for ; Mon, 2 Feb 2009 18:44:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A22198FC14 for ; Mon, 2 Feb 2009 18:44:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 5778146B06; Mon, 2 Feb 2009 13:44:20 -0500 (EST) Date: Mon, 2 Feb 2009 18:44:20 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Mitar In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: read() returns ETIMEDOUT X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Feb 2009 18:44:21 -0000 On Sun, 1 Feb 2009, Mitar wrote: > Is there any progress on this error reported: > > http://freebsd.monkey.org/freebsd-net/200805/msg00026.html > > I have the same or very similar issue. On my server large HTTP uploads are > failing because there are only one direction data transmissions (when > reading/receiving a request) and kernel drops connections after some time > with ETIMEDOUT returning from read() even if transmissions are doing just > fine with steady speed, tested at different speeds. > > Is there any workaround currently known? Given that some time has passed since the previous reports, it's probably best to do a diagnosis from scratch rather than assume it's necessarily the same. Could you send us the output of the following commands: sysctl net.inet.tcp | grep keep There are a number of situations in which ETIMEDOUT may be set when a connection is dropped, so we should figure out which one(s) it may be: (1) TCP keepalive timer fires and finds one of the following cases: the connection isn't yet established or the keepalive timer has expired. (tcp_timer_keep) (2) TCP persist timer fires because the window is closed and the full exponential backoff has occurred. (tcp_timer_persist) (3) TCP retransmit timer reaches its full exponntial backoff without being ACK'd. (tcp_timer_rexmt) There are a few ways to go about this -- probably the easiest is to drop a kernel printf just before each call to tcp_drop(tp, ETIMEDOUT) in tcp_timer.c. It would also be useful, if possible, to look at the tcpdump for the last portion of the connection, perhaps ideally from the second-to-last ACK from the remote host to the connection reset from the local end. It might be worth running tcpdump on both sides to see if they see the same thing -- for example, does one side think it's sending ACKs and the other not receive it? In the previous thread, it looked a bit like the outcome was that there was a memory exhaustion issue under load, and that bumping nmbclusters helped at least defer that problem. So it would be useful to see the output of netstat -m before and after (for as small an epsilon as you can make it) the connection is timed out. I realize capturing the above sorts of data can be an issue on high-load boxes but if we can, it would be quite helpful. Regardless of that, knowing if you're seeing allocation errors in the netstat -m output would be helpful. Robert N M Watson Computer Laboratory University of Cambridge