From owner-freebsd-hackers Thu Jun 7 13:16:30 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by hub.freebsd.org (Postfix) with SMTP id 8713037B41A for ; Thu, 7 Jun 2001 13:16:21 -0700 (PDT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 7 Jun 2001 21:16:20 +0100 (BST) To: Graham Barr Cc: Alfred Perlstein , freebsd-hackers@FreeBSD.ORG Subject: Re: read(2) and ETIMEDOUT In-Reply-To: Your message of "Thu, 07 Jun 2001 20:18:46 BST." <20010607201846.E50444@pobox.com> Date: Thu, 07 Jun 2001 21:16:19 +0100 From: Ian Dowse Message-ID: <200106072116.aa63698@salmon.maths.tcd.ie> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <20010607201846.E50444@pobox.com>, Graham Barr writes: >Also why does this happen only every few hours ? There is a lot of >data going through these connections maybe the timer for SO_RCVTIMEO >is not being reset. > >But then we have another server, with a similar number of clients and >data through put, but it does not suffer from this problem. I suspect that the server seeing this problem has a client that occasionally disappears from the network, or for whatever reason fails to respond to any packets for a long time (something like 5 or 10 minutes). I've seen blocking TCP writes return ETIMEDOUT when the network between the client and the server goes down. In the non-blocking case I think the following can happen: 1) Client is connected to server. 2) Network goes down, or client is turned off 3) Server performs non-blocking write() on socket 4) Server uses poll/select/kevent waiting for data from socket 5) The write operation times out because no acknowledgements have been received. This occurs after TCP_MAXRXTSHIFT retransmits, so->so_error is set to ETIMEDOUT and the connection is shut down (I haven't read the code very carefully, so the details could be wrong. 6) select/poll/kevent notes the EOF condition, and says that the descriptor is ready to read. 7) read() returns the real error, which is ETIMEDOUT. I guess this should possibly be documented in read(2), but in practice there are numerous network errors that can be returned from read(). Normal practice in single-process servers is to consider any unknown errors from read(),write() etc as only fatal to that client rather than the whole server. Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message