From owner-freebsd-hackers  Thu Jun  7 13:24:24 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from monty.mutatus.co.uk (monty.mutatus.co.uk [195.184.238.178])
	by hub.freebsd.org (Postfix) with ESMTP id 2F89E37B401
	for <freebsd-hackers@FreeBSD.ORG>; Thu,  7 Jun 2001 13:24:17 -0700 (PDT)
	(envelope-from gbarr@mutatus.co.uk)
Received: (from gbarr@localhost)
	by monty.mutatus.co.uk (8.11.3/8.11.3) id f57KMP905622;
	Thu, 7 Jun 2001 21:22:25 +0100 (BST)
	(envelope-from gbarr)
Date: Thu, 7 Jun 2001 21:22:24 +0100
From: Graham Barr <gbarr@pobox.com>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: Alfred Perlstein <bright@rush.net>, freebsd-hackers@FreeBSD.ORG
Subject: Re: read(2) and ETIMEDOUT
Message-ID: <20010607212224.M50444@pobox.com>
References: <20010607201846.E50444@pobox.com> <200106072116.aa63698@salmon.maths.tcd.ie>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200106072116.aa63698@salmon.maths.tcd.ie>; from iedowse@maths.tcd.ie on Thu, Jun 07, 2001 at 09:16:19PM +0100
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

While this does sound very plausable,...

The server does not do any writes, data only travels from the clients
to the server.

The clients and the server are connected to the same switch.

The other server which is similar is on the same network and
is connected to by the same machines as clients, yet it
does not see any problems.

But thanks for the insight. I will place a sniffer of the port
and see if there are excessive retransmits

Graham.

On Thu, Jun 07, 2001 at 09:16:19PM +0100, Ian Dowse wrote:
> In message <20010607201846.E50444@pobox.com>, Graham Barr writes:
> 
> >Also why does this happen only every few hours ? There is a lot of
> >data going through these connections maybe the timer for SO_RCVTIMEO
> >is not being reset.
> >
> >But then we have another server, with a similar number of clients and
> >data through put, but it does not suffer from this problem.
> 
> I suspect that the server seeing this problem has a client that
> occasionally disappears from the network, or for whatever reason
> fails to respond to any packets for a long time (something like 5
> or 10 minutes). I've seen blocking TCP writes return ETIMEDOUT when
> the network between the client and the server goes down. In the
> non-blocking case I think the following can happen:
> 
> 	1) Client is connected to server.
> 	2) Network goes down, or client is turned off
> 	3) Server performs non-blocking write() on socket
> 	4) Server uses poll/select/kevent waiting for data from socket
> 	5) The write operation times out because no acknowledgements
> 	   have been received. This occurs after TCP_MAXRXTSHIFT
> 	   retransmits, so->so_error is set to ETIMEDOUT and the
> 	   connection is shut down (I haven't read the code very
> 	   carefully, so the details could be wrong.
> 	6) select/poll/kevent notes the EOF condition, and says that
> 	   the descriptor is ready to read.
> 	7) read() returns the real error, which is ETIMEDOUT.
> 
> I guess this should possibly be documented in read(2), but in
> practice there are numerous network errors that can be returned
> from read(). Normal practice in single-process servers is to
> consider any unknown errors from read(),write() etc as only
> fatal to that client rather than the whole server.
> 
> Ian

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message