From owner-freebsd-net  Tue Apr  6 19:17: 6 1999
Delivered-To: freebsd-net@freebsd.org
Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38])
	by hub.freebsd.org (Postfix) with ESMTP id A587B15252
	for <net@freebsd.org>; Tue,  6 Apr 1999 19:17:02 -0700 (PDT)
	(envelope-from julian@whistle.com)
Received: from current1.whistle.com (current1.whistle.com [207.76.205.22])
	by alpo.whistle.com (8.9.1a/8.9.1) with SMTP id TAA23719
	for <net@freebsd.org>; Tue, 6 Apr 1999 19:12:26 -0700 (PDT)
Date: Tue, 6 Apr 1999 19:12:24 -0700 (PDT)
From: Julian Elischer <julian@whistle.com>
To: net@freebsd.org
Subject: Coping with 1000s of W95 clients.
Message-ID: <Pine.BSF.3.95.990406183258.1119A-100000@current1.whistle.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


The windows world has given us teh wonderful legacy of coping with clients
that have their TCP stack as a program library, so that
if the program crashes or doesn't make the right calls,
the TCP session can be left in a bad state.

The 'canonical example' is Win95 machines that 
don't "shutdown()" the TCP session before exiting.
In the following situation you are left wit an entry on your server
sitting in FIN_WAIT_2 state.

client: Some action that causes the server do a 'shutdown()' on its socket.
		 (e.g. bad input, or "Quit")
server: shutdown(). (results in FIN sent to client)
client: Receive and ACK the FIN (the stack does this, and returns EOF to
                 Bthe app.
client: Exit() (after all its got EOF) (or worse, user powers it down).
server: recieves the ACK and waits forever in FIN_WAIT_2 hoping to
           get a FIN from the now defunct process.


The BSD4.3 hack is to have a (11 minute, 15 second) timeout on FIN_WAIT_2
state **IF THE LOCAL END HAS DONE A FULL CLOSE**. A notable example of 
this sees to be the APACHE server that actually does a 'shutdown()'
first, rather than a close(), thus making it's session immune to 
the timeout. Even if it does a close() the FIN_WAIT_2 state is 
held for 11.25 minutes. It is possible for a badly behaved set of clients
to bring a server to its knees in that time by creating tons
of such sessions.

The actual test for deciding whether to time-out FIN_WAIT_2 is:
if (so->so_state & SS_CANTRCVMORE) {}
which is NOT set by a shutdown() (half-close), only a close().

Anyway the point is that eventually you end up with a LOT of 
sessions stuck in FIN_WAIT_2 state.

The only way to stop this is to break the standard, as this would be
correct behaviour by the server. After a discussion with Paul Vixie
a long time ago, We have written some code to try and reduce this problem.

Machines that truely 'die' still wait around for a while, but 
badly behaved apps will eact (in general) to this by sending a RST
(or rather their OS will), giving us permission to purge the session.

Basically, any session that is still in FIN_WAIT_2 after 30 seconds
reverts to FIN_WAIT_1, and resends the FIN. I believe that
this is similar to a fix Paul Vixie mentionned implementing in NetBSD
once.

If the machine has gone south, FIN_WAIT_1 will time out.
If it's still there, but the App is gone, we'll get a RST.
If the App is still alive, we'll get an ACK again.

Here's the patch for discussion:
Ignore the 2 cosmetic changes :-)

I wouldn't mind including this as an option, to allow FreeBSD to better
handle 1000s of Usoft clients and stupid users.

julian

Index: tcp_fsm.h
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_fsm.h,v
retrieving revision 1.10
diff -c -r1.10 tcp_fsm.h
*** tcp_fsm.h	1997/08/16 19:15:38	1.10
--- tcp_fsm.h	1999/04/07 01:29:22
***************
*** 71,80 ****
   * if all data queued for output is included in the segment.
   */
  static u_char	tcp_outflags[TCP_NSTATES] = {
!     TH_RST|TH_ACK, 0, TH_SYN, TH_SYN|TH_ACK,
!     TH_ACK, TH_ACK,
!     TH_FIN|TH_ACK, TH_ACK, TH_FIN|TH_ACK, TH_ACK, TH_ACK,
! };
  #endif
  
  #ifdef KPROF
--- 71,88 ----
   * if all data queued for output is included in the segment.
   */
  static u_char	tcp_outflags[TCP_NSTATES] = {
! 	TH_RST|TH_ACK,		/* 0, CLOSED */
! 	0,			/* 1, LISTEN */
! 	TH_SYN,			/* 2, SYN_SENT */
! 	TH_SYN|TH_ACK,		/* 3, SYN_RECEIVED */
! 	TH_ACK,			/* 4, ESTABLISHED */
! 	TH_ACK,			/* 5, CLOSE_WAIT */
! 	TH_FIN|TH_ACK,		/* 6, FIN_WAIT_1 */
! 	TH_ACK,			/* 7, CLOSING */
! 	TH_FIN|TH_ACK,		/* 8, LAST_ACK */
! 	TH_ACK,			/* 9, FIN_WAIT_2 */
! 	TH_ACK,			/* 10, TIME_WAIT */
! };	
  #endif
  
  #ifdef KPROF
Index: tcp_input.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_input.c,v
retrieving revision 1.84
diff -c -r1.84 tcp_input.c
*** tcp_input.c	1999/02/06 00:47:45	1.84
--- tcp_input.c	1999/04/07 01:29:22
***************
*** 1495,1504 ****
--- 1495,1526 ----
  				 * specification, but if we don't get a FIN
  				 * we'll hang forever.
  				 */
+ #ifdef TCP_USOFT_BUG
+ 				/*
+ 				 * Wait longer and longer for the other
+ 				 * end to respond with something.
+ 				 * Eventually they should either
+ 				 * RST or FIN. If they are still alive
+ 				 * and actually want us to remain in this
+ 				 * state, they will keep ACKing and
+ 				 * we'll stay here indefinitly.
+ 				 * If they don't respond at all, we will
+ 				 * revert to FIN_WAIT_1 and eventually
+ 				 * time out as it would. How to cope with
+ 				 * the case of broken clients who are still
+ 				 * alive but never FIN is arguable. Certainly
+ 				 * if we've closed our end entirely, we
+ 				 * might as well just close the connection.
+ 				 */
+ 				tp->t_timer[TCPT_2MSL] = 
+ 					((tp->t_idle > TCPTV_MSL) ?
+ 					tp->t_idle : TCPTV_MSL);
+ #else	/* TCP_USOFT_BUG */
  				if (so->so_state & SS_CANTRCVMORE) {
  					soisdisconnected(so);
  					tp->t_timer[TCPT_2MSL] = tcp_maxidle;
  				}
+ #endif	/* TCP_USOFT_BUG */
  				tp->t_state = TCPS_FIN_WAIT_2;
  			}
  			break;
Index: tcp_output.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.32
diff -c -r1.32 tcp_output.c
*** tcp_output.c	1999/01/20 17:31:59	1.32
--- tcp_output.c	1999/04/07 01:29:23
***************
*** 128,134 ****
  			 * clear the FIN bit.  Usually this would
  			 * happen below when it realizes that we
  			 * aren't sending all the data.  However,
! 			 * if we have exactly 1 byte of unset data,
  			 * then it won't clear the FIN bit below,
  			 * and if we are in persist state, we wind
  			 * up sending the packet without recording
--- 128,134 ----
  			 * clear the FIN bit.  Usually this would
  			 * happen below when it realizes that we
  			 * aren't sending all the data.  However,
! 			 * if we have exactly 1 byte of unsent data,
  			 * then it won't clear the FIN bit below,
  			 * and if we are in persist state, we wind
  			 * up sending the packet without recording
Index: tcp_timer.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_timer.c,v
retrieving revision 1.28
diff -c -r1.28 tcp_timer.c
*** tcp_timer.c	1998/04/24 09:25:35	1.28
--- tcp_timer.c	1999/04/07 01:29:23
***************
*** 213,222 ****
  	 * control block.  Otherwise, check again in a bit.
  	 */
  	case TCPT_2MSL:
! 		if (tp->t_state != TCPS_TIME_WAIT &&
! 		    tp->t_idle <= tcp_maxidle)
! 			tp->t_timer[TCPT_2MSL] = tcp_keepintvl;
! 		else
  			tp = tcp_close(tp);
  		break;
  
--- 213,245 ----
  	 * control block.  Otherwise, check again in a bit.
  	 */
  	case TCPT_2MSL:
! 		if (tp->t_state != TCPS_TIME_WAIT
! 		&& tp->t_idle <= tcp_maxidle) {
! #ifdef TCP_USOFT_BUG
! 			if (tp->t_state == TCPS_FIN_WAIT_2) {
! 				/*
! 				 * We've timed out waiting for the other end 
! 				 * to finish up. Quite possibly it's a Win9x
! 				 * machine.
! 				 * If so we could be waiting here forever.
! 				 * Pretend we were never ack'd and reset
! 				 * ourselves to a retry of FIN_WAIT_1. If
! 				 * it's still alive, this should at least
! 				 * elicit a RST from it which
! 				 * will let us know we can shut down.
! 				 * If it has only done a half close,
! 				 * it'll ACK our retries so we'll
! 				 * keep waiting in FIN_WAIT_2.
! 				 * If it's dead, we'll time out.
! 				 */
! 				tp->t_state = TCPS_FIN_WAIT_1;
! 				tp->t_flags &= ~TF_SENTFIN;
! 				tp->snd_una = (tp->snd_nxt -= 1);
! 				tcp_output(tp);
! 			} else
! #endif
! 				tp->t_timer[TCPT_2MSL] = tcp_keepintvl;
! 		} else
  			tp = tcp_close(tp);
  		break;

xxxxxxxxxxend of patchxxxxxx

  
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message