Date: Sat, 25 May 1996 15:37:47 -0700 From: David Greenman <davidg@Root.COM> To: "Karl Denninger, MCSNet" <karl@mcs.com> Cc: hackers@FreeBSD.ORG Subject: Re: Grrr.. is this is a FreeBSD problem (TIME_WAIT again) Message-ID: <199605252237.PAA23150@Root.COM> In-Reply-To: Your message of "Sat, 25 May 1996 16:20:41 CDT." <m0uNQlN-000IDOC@venus.mcs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>If the caller and callee are on DIFFERENT machines, I get no stale sockets. >This is reliable even if there are tens of new connections per minute. > >If the caller and callee are on the SAME machine, I get sockets in TIME_WAIT >for 2 minutes each (grrrr) which, if the traffic is heavy enough, eventually >blocks new connections for a few minutes until they clear up. None of the >sockets in TIME_WAIT has output or input pending; both counts show zero. > >This is a serious problem! > >Interestingly enough, I can switch the end of the link which "netstat" thinks >is the "local" end by changing who calls shutdown() first! This is also >unexpected; I would have thought that the caller ALWAYS would be the "local" >side of the connection. > >I've checked and rechecked -- the same code, running across two machines, >does not do this. But when the calling and called code are on the same >system (2.1-STABLE) it does -- repeatedly and reliably. > >Any ideas? While one solution would be to get the code off the same >(common) machine, there are reasons that I don't want to do this in normal >production. But, I need to use TCP (rather than local Unix domain sockets) >because the BACKUP server is on a different system (in the event the first >one crashes). > >Why would this happen when the caller and callee are on the same box, but >not when the traffic actually goes across the network? Has anyone else seen >anything like this in their experience? Due to the structure of this module >(its a drop-in into a stock daemon from another source) I cannot leave the >socket open across requests, and I'd like to understand the reason for >this behavior anyway. Based on what you've said thus far, it's working as it is supposed to. There is a good discussion of the 2MSL wait ("TIME_WAIT") in "TCP/IP Illustrated Volume 1", page 242, by W. Richard Stevens. Depending on how your program handles it's ports/connections, you might be able to use the SO_REUSEADDR socket option to avoid the problem. See page 244. -DG David Greenman Core-team/Principal Architect, The FreeBSD Project
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605252237.PAA23150>