From owner-freebsd-hackers  Fri Jun  8  2: 5:26 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from snipe.mail.pas.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62])
	by hub.freebsd.org (Postfix) with ESMTP id B30AC37B401
	for <freebsd-hackers@FreeBSD.ORG>; Fri,  8 Jun 2001 02:05:23 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from mindspring.com (dialup-209.245.138.245.Dial1.SanJose1.Level3.net [209.245.138.245])
	by snipe.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id CAA01540;
	Fri, 8 Jun 2001 02:05:14 -0700 (PDT)
Message-ID: <3B209567.1AE09631@mindspring.com>
Date: Fri, 08 Jun 2001 02:05:43 -0700
From: Terry Lambert <tlambert2@mindspring.com>
Reply-To: tlambert2@mindspring.com
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: Graham Barr <gbarr@pobox.com>,
	Alfred Perlstein <bright@rush.net>, freebsd-hackers@FreeBSD.ORG
Subject: Re: read(2) and ETIMEDOUT
References: <200106072116.aa63698@salmon.maths.tcd.ie>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Ian Dowse wrote:
> 
> In message <20010607201846.E50444@pobox.com>, Graham Barr writes:
> 
> >Also why does this happen only every few hours ? There is a lot of
> >data going through these connections maybe the timer for SO_RCVTIMEO
> >is not being reset.
> >
> >But then we have another server, with a similar number of clients and
> >data through put, but it does not suffer from this problem.
> 
> I suspect that the server seeing this problem has a client that
> occasionally disappears from the network, or for whatever reason
> fails to respond to any packets for a long time (something like 5
> or 10 minutes). I've seen blocking TCP writes return ETIMEDOUT when
> the network between the client and the server goes down. In the
> non-blocking case I think the following can happen:

I believe the proxy ARP normally sent on an interface
coming up can have this effect in the case a client goes
down, and someone else gets their DHCP lease.

You don't often see this on FreeBSD clients after 4.1,
since the gratuitous proxy ARP became broken around then
(if you change your IP address, it won't send the ARP
unless you down the interface first and bring it back up,
and it caches bad clone routes, too, just to make your
life miserable).

Probably your lease expiration times are set too low.  This
is usually the case in networks where people have transient
connections for things like mobile users, and have exhaused
their IP address space, and are trying to conserve it by
using much shorter leases.

A good, real fix for this is to have incredibly long lease
lifetimes (basically, the DHCP server hands out the lease,
and if the computer comes back days later, it gets the same
lease).  For this to work, you are probably going to have
to make the local DHCP server give out 10.x addresses, and
then NAT the 10.x net for real Internet connectivity.

Alternately, it could be something completely different.  8-).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message