From owner-freebsd-net Tue Apr 6 19:17: 6 1999 Delivered-To: freebsd-net@freebsd.org Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (Postfix) with ESMTP id A587B15252 for ; Tue, 6 Apr 1999 19:17:02 -0700 (PDT) (envelope-from julian@whistle.com) Received: from current1.whistle.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.9.1a/8.9.1) with SMTP id TAA23719 for ; Tue, 6 Apr 1999 19:12:26 -0700 (PDT) Date: Tue, 6 Apr 1999 19:12:24 -0700 (PDT) From: Julian Elischer To: net@freebsd.org Subject: Coping with 1000s of W95 clients. Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The windows world has given us teh wonderful legacy of coping with clients that have their TCP stack as a program library, so that if the program crashes or doesn't make the right calls, the TCP session can be left in a bad state. The 'canonical example' is Win95 machines that don't "shutdown()" the TCP session before exiting. In the following situation you are left wit an entry on your server sitting in FIN_WAIT_2 state. client: Some action that causes the server do a 'shutdown()' on its socket. (e.g. bad input, or "Quit") server: shutdown(). (results in FIN sent to client) client: Receive and ACK the FIN (the stack does this, and returns EOF to Bthe app. client: Exit() (after all its got EOF) (or worse, user powers it down). server: recieves the ACK and waits forever in FIN_WAIT_2 hoping to get a FIN from the now defunct process. The BSD4.3 hack is to have a (11 minute, 15 second) timeout on FIN_WAIT_2 state **IF THE LOCAL END HAS DONE A FULL CLOSE**. A notable example of this sees to be the APACHE server that actually does a 'shutdown()' first, rather than a close(), thus making it's session immune to the timeout. Even if it does a close() the FIN_WAIT_2 state is held for 11.25 minutes. It is possible for a badly behaved set of clients to bring a server to its knees in that time by creating tons of such sessions. The actual test for deciding whether to time-out FIN_WAIT_2 is: if (so->so_state & SS_CANTRCVMORE) {} which is NOT set by a shutdown() (half-close), only a close(). Anyway the point is that eventually you end up with a LOT of sessions stuck in FIN_WAIT_2 state. The only way to stop this is to break the standard, as this would be correct behaviour by the server. After a discussion with Paul Vixie a long time ago, We have written some code to try and reduce this problem. Machines that truely 'die' still wait around for a while, but badly behaved apps will eact (in general) to this by sending a RST (or rather their OS will), giving us permission to purge the session. Basically, any session that is still in FIN_WAIT_2 after 30 seconds reverts to FIN_WAIT_1, and resends the FIN. I believe that this is similar to a fix Paul Vixie mentionned implementing in NetBSD once. If the machine has gone south, FIN_WAIT_1 will time out. If it's still there, but the App is gone, we'll get a RST. If the App is still alive, we'll get an ACK again. Here's the patch for discussion: Ignore the 2 cosmetic changes :-) I wouldn't mind including this as an option, to allow FreeBSD to better handle 1000s of Usoft clients and stupid users. julian Index: tcp_fsm.h =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_fsm.h,v retrieving revision 1.10 diff -c -r1.10 tcp_fsm.h *** tcp_fsm.h 1997/08/16 19:15:38 1.10 --- tcp_fsm.h 1999/04/07 01:29:22 *************** *** 71,80 **** * if all data queued for output is included in the segment. */ static u_char tcp_outflags[TCP_NSTATES] = { ! TH_RST|TH_ACK, 0, TH_SYN, TH_SYN|TH_ACK, ! TH_ACK, TH_ACK, ! TH_FIN|TH_ACK, TH_ACK, TH_FIN|TH_ACK, TH_ACK, TH_ACK, ! }; #endif #ifdef KPROF --- 71,88 ---- * if all data queued for output is included in the segment. */ static u_char tcp_outflags[TCP_NSTATES] = { ! TH_RST|TH_ACK, /* 0, CLOSED */ ! 0, /* 1, LISTEN */ ! TH_SYN, /* 2, SYN_SENT */ ! TH_SYN|TH_ACK, /* 3, SYN_RECEIVED */ ! TH_ACK, /* 4, ESTABLISHED */ ! TH_ACK, /* 5, CLOSE_WAIT */ ! TH_FIN|TH_ACK, /* 6, FIN_WAIT_1 */ ! TH_ACK, /* 7, CLOSING */ ! TH_FIN|TH_ACK, /* 8, LAST_ACK */ ! TH_ACK, /* 9, FIN_WAIT_2 */ ! TH_ACK, /* 10, TIME_WAIT */ ! }; #endif #ifdef KPROF Index: tcp_input.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_input.c,v retrieving revision 1.84 diff -c -r1.84 tcp_input.c *** tcp_input.c 1999/02/06 00:47:45 1.84 --- tcp_input.c 1999/04/07 01:29:22 *************** *** 1495,1504 **** --- 1495,1526 ---- * specification, but if we don't get a FIN * we'll hang forever. */ + #ifdef TCP_USOFT_BUG + /* + * Wait longer and longer for the other + * end to respond with something. + * Eventually they should either + * RST or FIN. If they are still alive + * and actually want us to remain in this + * state, they will keep ACKing and + * we'll stay here indefinitly. + * If they don't respond at all, we will + * revert to FIN_WAIT_1 and eventually + * time out as it would. How to cope with + * the case of broken clients who are still + * alive but never FIN is arguable. Certainly + * if we've closed our end entirely, we + * might as well just close the connection. + */ + tp->t_timer[TCPT_2MSL] = + ((tp->t_idle > TCPTV_MSL) ? + tp->t_idle : TCPTV_MSL); + #else /* TCP_USOFT_BUG */ if (so->so_state & SS_CANTRCVMORE) { soisdisconnected(so); tp->t_timer[TCPT_2MSL] = tcp_maxidle; } + #endif /* TCP_USOFT_BUG */ tp->t_state = TCPS_FIN_WAIT_2; } break; Index: tcp_output.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v retrieving revision 1.32 diff -c -r1.32 tcp_output.c *** tcp_output.c 1999/01/20 17:31:59 1.32 --- tcp_output.c 1999/04/07 01:29:23 *************** *** 128,134 **** * clear the FIN bit. Usually this would * happen below when it realizes that we * aren't sending all the data. However, ! * if we have exactly 1 byte of unset data, * then it won't clear the FIN bit below, * and if we are in persist state, we wind * up sending the packet without recording --- 128,134 ---- * clear the FIN bit. Usually this would * happen below when it realizes that we * aren't sending all the data. However, ! * if we have exactly 1 byte of unsent data, * then it won't clear the FIN bit below, * and if we are in persist state, we wind * up sending the packet without recording Index: tcp_timer.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_timer.c,v retrieving revision 1.28 diff -c -r1.28 tcp_timer.c *** tcp_timer.c 1998/04/24 09:25:35 1.28 --- tcp_timer.c 1999/04/07 01:29:23 *************** *** 213,222 **** * control block. Otherwise, check again in a bit. */ case TCPT_2MSL: ! if (tp->t_state != TCPS_TIME_WAIT && ! tp->t_idle <= tcp_maxidle) ! tp->t_timer[TCPT_2MSL] = tcp_keepintvl; ! else tp = tcp_close(tp); break; --- 213,245 ---- * control block. Otherwise, check again in a bit. */ case TCPT_2MSL: ! if (tp->t_state != TCPS_TIME_WAIT ! && tp->t_idle <= tcp_maxidle) { ! #ifdef TCP_USOFT_BUG ! if (tp->t_state == TCPS_FIN_WAIT_2) { ! /* ! * We've timed out waiting for the other end ! * to finish up. Quite possibly it's a Win9x ! * machine. ! * If so we could be waiting here forever. ! * Pretend we were never ack'd and reset ! * ourselves to a retry of FIN_WAIT_1. If ! * it's still alive, this should at least ! * elicit a RST from it which ! * will let us know we can shut down. ! * If it has only done a half close, ! * it'll ACK our retries so we'll ! * keep waiting in FIN_WAIT_2. ! * If it's dead, we'll time out. ! */ ! tp->t_state = TCPS_FIN_WAIT_1; ! tp->t_flags &= ~TF_SENTFIN; ! tp->snd_una = (tp->snd_nxt -= 1); ! tcp_output(tp); ! } else ! #endif ! tp->t_timer[TCPT_2MSL] = tcp_keepintvl; ! } else tp = tcp_close(tp); break; xxxxxxxxxxend of patchxxxxxx To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message