Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 May 1998 19:43:11 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        jas@flyingfox.com (Jim Shankland)
Cc:        mark@vmunix.com, tlambert@primenet.com, hackers@FreeBSD.ORG, isp@FreeBSD.ORG
Subject:   Re: TIME_WAIT/FIN_WAIT_2...
Message-ID:  <199805221943.MAA00619@usr02.primenet.com>
In-Reply-To: <199805221643.JAA01914@biggusdiskus.flyingfox.com> from "Jim Shankland" at May 22, 98 09:43:04 am

next in thread | previous in thread | raw e-mail | index | archive | help
> [mark@vmunix.com (Mark Mayo) asks about large numbers of TCP connections
> in TIME_WAIT and FIN_WAIT_2 states on a busy Web server.
> Terry Lambert <tlambert@primenet.com> responds:
> 
> > This is a client bug, specifically with Windows WinSock clients, which
> > do not call "shutdown(2)" in the following way:
> > 
> > 	shutdown( s, 1);
> > 
> > The '1' should be a '2', but many WinSock implementations fail to work
> > correctly if it isn't a '1'.
> > 
> > [...]
> > 
> > The fix is to be bug-compatible with Windows NT as a server, and to,
> > when you are in FIN_WAIT_2 state, back up to resend the FIN.
> > 
> > The problem is the lack of an ACK needed for a state transition in
> > the Windows TCP/IP implemenetation.
> 
> Terry, I think you have your TCP states mixed up.  First, none of this
> applies to the TIME_WAIT connections.


I think it's you that are mixed up... see the Apache FAQ on FIN_WAIT_2,
and contact Paul Vixie, as I've suggested.

> As for the FIN_WAIT_2 connections: FIN_WAIT_2 means, "I'm done sending,
> I've sent a FIN, the other side has ACKed my FIN, but the other side
> has not sent *me* a FIN to indicate that it is done sending."  One
> would then expect the peer to be in CLOSE_WAIT state.  This is a
> perfectly legal state that can persist indefinitely, with the
> CLOSE_WAIT end sending unbounded amounts of data to the FIN_WAIT_2
> end for consumption.
> 
> Why on earth would you "back up to resend [a] FIN" that has already
> been ACKed?

Because the peer is *NOT* in CLOSE_WAIT state, because the peer is
a Windows box, and the socket descriptor was abandoned with the
peer in CLOSE_WAIT, without a resource-track cleanup of the connection.

Here is the relevent state diagram from RFC793:

      TCP A                                                TCP B

  1.  ESTABLISHED                                          ESTABLISHED

  2.  (Close)
      FIN-WAIT-1  --> <SEQ=100><ACK=300><CTL=FIN,ACK>  --> CLOSE-WAIT

  3.  FIN-WAIT-2  <-- <SEQ=300><ACK=101><CTL=ACK>      <-- CLOSE-WAIT

  4.                                                       (Close)
      TIME-WAIT   <-- <SEQ=300><ACK=101><CTL=FIN,ACK>  <-- LAST-ACK

  5.  TIME-WAIT   --> <SEQ=101><ACK=301><CTL=ACK>      --> CLOSED

  6.  (2 MSL)
      CLOSED

Here is what the Windows machine does, when you are not running
WinSock 2.0, and you have not called "shutdown()" on the socket,
as documented in:

	Windows Sockets Network Programming
	Bob Quinn, Dave Shute
	Addison-Wesley _Advanced Windows Series_
	ISBN: 0-201-63372-8

      TCP A                                                TCP B

  1.  ESTABLISHED                                          ESTABLISHED

  2.  (Close)
      FIN-WAIT-1  --> <SEQ=100><ACK=300><CTL=FIN,ACK>  --> CLOSE-WAIT

  3.  FIN-WAIT-2  <-- <SEQ=300><ACK=101><CTL=ACK>      <-- CLOSE-WAIT

  4.                                                       (Exit)
      FIN-WAIT-2
  5.
      FIN-WAIT-2
  6.
      FIN-WAIT-2

      ...

In other words, the socket is torn down without a LAST-ACK.

The "workaround" that you should use to deal with this is to:

A)	wait 2 MSL for the LAST-ACK.

B)	reset to FIN-WAIT-1; send the packet.

C)	if you get CLOSE-WAIT, then goto (A).

D)	if you get no response in 2 MSL, or RST, then act as if
	you had recieved the CLOSE-WAIT, transitioned to FIN-WAIT-2,
	and subsequently recieved the LAST-ACK.

E)	(potential "enhancement")  If you get no response, rather
	than treating it as an RST, goto (A), but maintain the
	FIN_WAIT_2_TIMEOUT kludge currently in place.


This keeps the buggered connections (which are buggered by the
inability to restart a damaged client conversation at state 2)
around for a max of 6 MSL, and unbuggered connections act
normally (due to duplicate packet reception processing).

The need for a LAST-ACK without a TCP A packet solcitation of some
kind is arguably a bug in the design of TCP (otherwise people would
not be bitching about the problem).

Of of the main pains-in-the-ass in not calling "shutdown()" is
Netscape, BTW.


> I suspect what you really mean is that the client side never sends its
> FIN, even though it's really done sending, leaving the server-side
> hanging in FIN_WAIT_2 state, waiting for either more data or a close
> from the client, while the client wll never bother to send either..
> This can happen due to application-layer or TCP-layer bugs on the
> client side, or because the client crashed or was powered off at an
> inopportune time.
> 
> What BSD-based TCP stacks do is this: if the FIN_WAIT_2 socket has
> been closed by the application (i.e., there's no-one there to receive
> any data that might arrive), and no data arrives for about 11 minutes,
> then the connection is silently dropped.  (Remember, if the FIN_WAIT_2
> socket has not been closed, then data could arrive after an arbitrarily
> long silence, and there's a process there to read that data; so the
> connection *must not* be dropped.)

11 mintues is too long a time compared to 6 MSL.  By sending a
"duplicate packet" to test the lividity of the client, you can
solicit a "keepalive" (or an RST).  If you get the RST, then you
recover from the client error.

Paul Vixie had to do this modification to NetBSD Alpha in order to
handle a very high hit rate of a WWW server by technically broken
Windows clients.


> So what you're really seeing, Mark, is connections in FIN_WAIT_2 state,
> where the web server has closed the socket, but the client has never
> indicated that it is done sending.  These hang around for about 11
> minutes, then disappear.  (Of course, more take their place.)

And according to RFC793, they are not supposed to be timed out at
all, ever, and FreeBSD's implementation is non-conforming for doing
the 11 minute drop.


Being non-conformant is being non-conformant.  It doesn't matter
how you implement the non-conformance.

With Vixie's implementation, at least the non-conformance does not
result in incorrect behaviour for slow clients which are *truly*
intendinding to actually send the LAST-ACK, but have not closed,
over a period of more than 11 minutes (FIN_WAIT_2_TIMEOUT).


> The reason for the 11 minute wait is so that if the client is just
> slow going through its shutdown stuff, the server can still walk the
> client through an orderly close.  On the other hand, maybe 11 minutes
> is too long?  I'll bet nothing terrible happens if that timeout drops
> to 1 minute ... or 30 seconds.  Even 0 would at worst lead to some
> unnecessary RST's on closing connections.  Anyone have any thoughts on
> this?

See the algorithm above.  It is the Algorithm used by NT Server.


Much as I hate the idea of being bug-compatible with NT, at the very
least there should be a sysctl that acknowledges the fact that most
client machines are Windows boxes with broken TCP implementations.

This is roughly equivalent to the fact that FreeBSD allows RFC1323 &
RFC1544 to be turned off to support non-conforming TCP implemetnations.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199805221943.MAA00619>