From owner-freebsd-hackers Thu Jun 7 13:24:24 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from monty.mutatus.co.uk (monty.mutatus.co.uk [195.184.238.178]) by hub.freebsd.org (Postfix) with ESMTP id 2F89E37B401 for ; Thu, 7 Jun 2001 13:24:17 -0700 (PDT) (envelope-from gbarr@mutatus.co.uk) Received: (from gbarr@localhost) by monty.mutatus.co.uk (8.11.3/8.11.3) id f57KMP905622; Thu, 7 Jun 2001 21:22:25 +0100 (BST) (envelope-from gbarr) Date: Thu, 7 Jun 2001 21:22:24 +0100 From: Graham Barr To: Ian Dowse Cc: Alfred Perlstein , freebsd-hackers@FreeBSD.ORG Subject: Re: read(2) and ETIMEDOUT Message-ID: <20010607212224.M50444@pobox.com> References: <20010607201846.E50444@pobox.com> <200106072116.aa63698@salmon.maths.tcd.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200106072116.aa63698@salmon.maths.tcd.ie>; from iedowse@maths.tcd.ie on Thu, Jun 07, 2001 at 09:16:19PM +0100 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG While this does sound very plausable,... The server does not do any writes, data only travels from the clients to the server. The clients and the server are connected to the same switch. The other server which is similar is on the same network and is connected to by the same machines as clients, yet it does not see any problems. But thanks for the insight. I will place a sniffer of the port and see if there are excessive retransmits Graham. On Thu, Jun 07, 2001 at 09:16:19PM +0100, Ian Dowse wrote: > In message <20010607201846.E50444@pobox.com>, Graham Barr writes: > > >Also why does this happen only every few hours ? There is a lot of > >data going through these connections maybe the timer for SO_RCVTIMEO > >is not being reset. > > > >But then we have another server, with a similar number of clients and > >data through put, but it does not suffer from this problem. > > I suspect that the server seeing this problem has a client that > occasionally disappears from the network, or for whatever reason > fails to respond to any packets for a long time (something like 5 > or 10 minutes). I've seen blocking TCP writes return ETIMEDOUT when > the network between the client and the server goes down. In the > non-blocking case I think the following can happen: > > 1) Client is connected to server. > 2) Network goes down, or client is turned off > 3) Server performs non-blocking write() on socket > 4) Server uses poll/select/kevent waiting for data from socket > 5) The write operation times out because no acknowledgements > have been received. This occurs after TCP_MAXRXTSHIFT > retransmits, so->so_error is set to ETIMEDOUT and the > connection is shut down (I haven't read the code very > carefully, so the details could be wrong. > 6) select/poll/kevent notes the EOF condition, and says that > the descriptor is ready to read. > 7) read() returns the real error, which is ETIMEDOUT. > > I guess this should possibly be documented in read(2), but in > practice there are numerous network errors that can be returned > from read(). Normal practice in single-process servers is to > consider any unknown errors from read(),write() etc as only > fatal to that client rather than the whole server. > > Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message