From owner-freebsd-hackers Sun Sep 16 14:58:47 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from robin.mail.pas.earthlink.net (robin.mail.pas.earthlink.net [207.217.120.65]) by hub.freebsd.org (Postfix) with ESMTP id D65D837B408 for ; Sun, 16 Sep 2001 14:58:43 -0700 (PDT) Received: from mindspring.com (dialup-209.245.132.139.Dial1.SanJose1.Level3.net [209.245.132.139]) by robin.mail.pas.earthlink.net (8.11.5/8.9.3) with ESMTP id f8GLwfD09022; Sun, 16 Sep 2001 14:58:41 -0700 (PDT) Message-ID: <3BA520BC.E26A64F0@mindspring.com> Date: Sun, 16 Sep 2001 14:59:24 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Stephen Montgomery-Smith Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Could not bind References: <3BA3F70D.27C2136@math.missouri.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Stephen Montgomery-Smith wrote: > > I have written a server program that listens on port 3000. The program > works very well except for one feature. I am asking if that is normal, > or whether I forgot something. > > If I run the program it does fine. If I then kill the program (after it > has accepted connections), and then run the program again, the bind > function fails to work, and I get a message like "Could not bind" (see > program below). If I wait a while, like a minute or two, then the > program will work again. Is this normal behavior, or did I miss > something? This is normal. When a server closes the connection, which will occur in the resource-track-cleanup case of you killing the server, the connections effectively undergo a "host close". If the clients are still around and responsive, these conections will go away quickly. If not, then these connections will hand around a long time. In addition, in the case of client initiated closes prior to your temination of the program, the sockets will be in TIME_WAIT state for 2MSL -- 60 seconds, by default. So in normal operation, you should expect that you will not be able to restart the server for at least 60 seconds, and perhaps more, unless you have unset the "keepalive" socket option on the sockets to prevent the FIN_WAIT_2 state. A common overreaction to improrper state tracking by the programmer, or improper "clean shutdown" of a server is to set SO_REUSEADDR on the listen socket of the server. THis lets you restart the server. But it also lets you start multiple instances of the server, so if you are doing things like cookie state tracking which are server instance specific (e.g. for an HTTP server), then you have shot yourself in the foot, unless this state is shared between all server instances, and your servers are anonymous work-to-do engines, rather than being specific-purpose (this is because you can not control the connections to make them go to one server vs. another, if both are listening on the same port). Ideally, you would correct the shutdown so that it was clean, and correct the socket options, if what you are intending is to abort the server without sending complete data to the client (e.g. unsetting SO_LINGER will cause the sending of an RST on close, avoiding the TIME_WAIT, but potentially leaving the client hanging until the longer -- 2 hour, by default -- clock on the client sends a keepalive, and the RST is resent; this is because RST's are not resent, as they do not get acknowledgement). As a workaround, you can set SO_REUSEADDR on the socket. Above, I labelled this as an overreaction... it is. For this to work, you will probably need to make sure your server creates a pid file in /var/run/.pid, and then, before you reopen the socket, verify via kill(2), using a argument of 0, that the process is in fact dead, before grabbing its port out from under it for half the inbound connections (see the "2 kill" man page for details on the 0 signal; a 0 return or a -1 return with errno == EPERM mean the process your are trying to replace is already running). > I got the programming style from Richard Steven's book on network > programming. The structure of the program is something like this: [ ... example elided ... ] That's all fine; the problem is just an incomplete understanding of the TCP protocol; hopefully the above will fill you in; in the man time, you may want to get the internals volumes from the Steven's books series, and read them, as well, since it's often useful to understand why you are seeing what you are seeing; the user space volumes are only half the story. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message