Date: Sun, 13 Jul 1997 13:39:45 -0700 (PDT) From: Julian Elischer <julian@whistle.com> To: Terry Lambert <terry@lambert.org> Cc: hackers@FreeBSD.ORG Subject: Re: TCP bug in 2.2 Message-ID: <Pine.BSF.3.95.970713133335.1271F-100000@current1.whistle.com> In-Reply-To: <199707122153.OAA28921@phaeton.artisoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 12 Jul 1997, Terry Lambert wrote: > > If I could borrow the ear of someone with more knowledge of TCP > > states than me.. > > > > We see the following in a kernel dated from around March 4 > > and from the logs it looks as if it's present in 2.2.2+ > > > > finger, (after a lot of iterations of the test) > > goes into a permanent wait reading from a socket. > > > > the socket is seen to be in FIN_WAIT_2 state > > after the finger proces is killed the socket STAYS in FIN_WAIT_2 > > state forever. > > > > from what I've read in tcp_input.c etc. This shouldn't happen. > > > > 2 problems: > > 1/ why doesn't finger wake up and return EOF? > > Probably you have an Annex or similar terminal server which has a > buggy TCP/IP implementation which does not correctly do option > negotiation. > > tcp_extensions=NO Terry, I am aware of that ok? And yes TCP exensions WAS (to our surprise) enabled on that machine. We have since turned it off, however despite this it should be IMPOSSIBLE to hang a socket in that way. > > > Also, get an updated stack forwhatever hardware you have which is > failing to implement TCP/IP according to the RFC's. There could be > less obvious problems with the stack as well, so it's a good idea > to not trust it until it's updated. it was SOLARIS 2.5.1 and it didn't happen consitently so it's a race condition of some sort. > > > > > 2/ why doesn't the close() ofthe socket start > > the 2MSL timer? > > This is generally the case with Winsock implementations in general > and Microsoft's in particular. Microsoft OS's don't do resource > tracking correctly, and so even though you can now tell that a > program has exited in Windows95, the Windows 3.1 Winsock code still > requires that the client application call "shutdown()" on the socket > prior to closing it. Basically, Microsoft's TCP/IP stack is too > stupid to send the FIN like it's supposed to on the close. As I said.. SOLARIS 2.5.1 but no matter WHAT OS, we should not have a code path that can get to the state that a tcp session is totally hung, without a timer running for it. > > 1) You reboot a machine without shuttding down all Winsock > clients. The solaris machine was not rebooted. > > 2) Your client program crashes and expect the OS to be able > to back out state on its behalf. > > 3) The client software was ported from a sane TCP/IP environment, > like UNIX, and the programmers have no idea that "shutdown()" > is supposed to be called (amazingly enough, on UNIX systems, > calling "shutdown()" shuts the machine down... who would have > ever thought of naming a function for what the function does? > Apparently not the originators of Winsock.). > > > Try correcting your client software. Also try running client machines > with OS's that can't be crashed by client programs (ie: real protected > mode operating systems). Finally, try running an OS that knows how to > recover resources that a program was using in the event of a program > crash which does not crash the OS (ie: real protected mode operating > systems). > > > and either tcp_usrclosed() is not being called > > during the socket closure for some reason, > > or the timer is being continually reset by something else. > > The timer is not started until the FIN is sent if SO_KEEPALIVE was > specified by the client. It usually is. I'll check this but it still seems to be a bug to me because the socket is in FIN_WAIT2 state and HAS BEEN CLOSED.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.970713133335.1271F-100000>