From owner-freebsd-net Thu Apr 8 18:28:14 1999 Delivered-To: freebsd-net@freebsd.org Received: from kaa.kfunigraz.ac.at (KAA-ATM.kfunigraz.ac.at [143.50.202.22]) by hub.freebsd.org (Postfix) with ESMTP id C802915A2E for ; Thu, 8 Apr 1999 18:25:17 -0700 (PDT) (envelope-from dada@balu.kfunigraz.ac.at) Received: from balu.kfunigraz.ac.at (balu [143.50.16.16]) by kaa.kfunigraz.ac.at (8.9.2/8.9.2) with ESMTP id DAA28327; Fri, 9 Apr 1999 03:23:04 +0200 (MDT) Received: from localhost.kfunigraz.ac.at (IDENT:K6yQVU3oi7a80kZKgw8vcB5njAxIJYDv@BONLINEA21.kfunigraz.ac.at [143.50.36.21]) by balu.kfunigraz.ac.at (8.9.2/8.9.2) with ESMTP id DAA29029; Fri, 9 Apr 1999 03:23:00 +0200 (MDT) Received: from localhost (PYU6vulXczw+uIp/3lVytxrEL7TTHQYV@localhost.kfunigraz.ac.at [127.0.0.1]) by localhost.kfunigraz.ac.at (8.8.8/8.8.8) with SMTP id DAA00758; Fri, 9 Apr 1999 03:18:25 +0200 (CEST) (envelope-from dada@localhost.kfunigraz.ac.at) Date: Fri, 9 Apr 1999 03:18:23 +0200 (CEST) From: Martin Kammerhofer Reply-To: Martin Kammerhofer To: Julian Elischer Cc: freebsd-net@FreeBSD.ORG Subject: Re: Coping with 1000s of W95 clients. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 6 Apr 1999, Julian Elischer wrote: > > The 'canonical example' is Win95 machines that > don't "shutdown()" the TCP session before exiting. > In the following situation you are left wit an entry on your server > sitting in FIN_WAIT_2 state. > Even FreeBSD boxes are causing this :(. I see this every day on my home box where a local apache and a netscape browser are running. HTTP 1.1 introduced keepalive connections where the server keeps the connection open for some 15 sec after servicing the request(s). The idea was to cut down on connection setup overhead. If a client has to open for each of dozens of inlined GIFs contained in a page a new TCP connection (as it was the case with HTTP 1.0) performance will suffer. Now when the server closes the connection after no further request came in for 15sec a FIN will be sent and acknowledged from the browser's OS. After that the server's TCP is in FIN_WAIT_2 state and the browser's in CLOSE_WAIT. If the browser would periodically check his sockets, read zero length from them, notice that the server closed and close the socket too, all would be fine. Unfortunately browsers like Netscape (at least up to 4.08) just sit idle until the user accesses her next webpage - maybe idling for days! The problem is well known, those having installed apache from the FreeBSD ports collection can read about it in file:/usr/local/share/doc/apache/manual/misc/fin_wait_2.html . > > The BSD4.3 hack is to have a (11 minute, 15 second) timeout on FIN_WAIT_2 > state **IF THE LOCAL END HAS DONE A FULL CLOSE**. A notable example of If an application really does shut down _only_ the socket's output side with shutdown(socket,how=1) then it wants to keep the socket open for further reads! Timing out a half duplex connection is plain wrong. The timeout is actually tcp_maxidle = tcp_keepcnt * tcp_keepintvl; Keepcnt is 8 (hard coded) and keepintvl is settable by sysctl (net.inet.tcp.keepintvl). Eight times the default keepintvl of 150 is 1200 or 10 minutes. (Those TCP timers run with 2 Hz.) Because - the idle time counter is incremented _after_ the timer is run, - and the condition for waiting another 75sec (=keepintvl) is ``tp->t_idle<= tcp_maxidle'' instead of ``tp->t_idle < tcp_maxidle'' another keepintvl is added so it's 9 * 75 = 675sec total. > > The only way to stop this is to break the standard, as this would be FreeBSD's 675sec timeout on FIN_WAIT_2 is already breaking RFC 793. > Basically, any session that is still in FIN_WAIT_2 after 30 seconds > reverts to FIN_WAIT_1, and resends the FIN. I believe that > this is similar to a fix Paul Vixie mentionned implementing in NetBSD > once. > I don't think is a good solution. Retransmitting the FIN certainly doesn't break the spec, but it won't help much. If there were dead browsers ``on the other side'' of all those annoying FIN_WAIT_2 sockets, then retransmitting and getting ACKs or RSTs would certainly help. But in most cases there is a browser just waiting for user actions! Resending the FIN would accomplish nothing in this case. The browsers TCP stack would reacknowledge the FIN and continue hanging around in CLOSE_WAIT. I guess the percentage of cases where you get a RST or ICMP is quite low and not worth the increase in net traffic. The easy solution to shorten FIN_WAIT_2 is simple: sysctl -w net.inet.tcp.keepintvl=27 This would give you a FIN_WAIT_2 timeout of 9 * 27 / 2 = 121.5 sec This should cut down the number of FW2 sockets by a factor of 675/121 = 5.6 . The only drawback to this solution is, that keepalives won't work reliably any more. (I'm referring to the transport layer keepalives here, _not_ the HTTP 1.1 application layer keepalives). After 2 hours idle time there would be only a 2 min time window to respond before a keepalive connection is dropped. This leads me two solution 2: Just introduce a configurable timeout for idle finwait2 sockets. This is a quite small change and less intrusive than your suggestion. Those who have so many hits that finwait2 sockets pile up could just lower the finwait2 timeout. Martin Index: netinet/tcp_input.c =================================================================== RCS file: /home/dada/cvsroot/src/netinet/tcp_input.c,v retrieving revision 1.3 diff -u -u -r1.3 tcp_input.c --- tcp_input.c 1999/04/06 19:28:25 1.3 +++ tcp_input.c 1999/04/08 22:17:58 @@ -1496,7 +1496,7 @@ */ if (so->so_state & SS_CANTRCVMORE) { soisdisconnected(so); - tp->t_timer[TCPT_2MSL] = tcp_maxidle; + tp->t_timer[TCPT_2MSL] = tcp_finwait2idle; } tp->t_state = TCPS_FIN_WAIT_2; } Index: netinet/tcp_timer.c =================================================================== RCS file: /home/dada/cvsroot/src/netinet/tcp_timer.c,v retrieving revision 1.4 diff -u -u -r1.4 tcp_timer.c --- tcp_timer.c 1999/04/08 12:15:01 1.4 +++ tcp_timer.c 1999/04/08 23:34:49 @@ -85,6 +85,10 @@ SYSCTL_INT(_net_inet_tcp, TCPCTL_KEEPINTVL, keepintvl, CTLFLAG_RW, &tcp_keepintvl , 0, ""); +int tcp_finwait2idle = TCPTV_FINWAIT2IDLE; +SYSCTL_INT(_net_inet_tcp, TCPCTL_FINWAIT2IDLE, finwait2idle, + CTLFLAG_RW, &tcp_finwait2idle , 0, ""); + static int always_keepalive = 0; SYSCTL_INT(_net_inet_tcp, OID_AUTO, always_keepalive, CTLFLAG_RW, &always_keepalive , 0, ""); @@ -162,6 +166,10 @@ tp = intotcpcb(ip); if (tp == 0 || tp->t_state == TCPS_LISTEN) continue; + tp->t_idle++; + tp->t_duration++; + if (tp->t_rtt) + tp->t_rtt++; for (i = 0; i < TCPT_NTIMERS; i++) { if (tp->t_timer[i] && --tp->t_timer[i] == 0) { #ifdef TCPDEBUG @@ -180,10 +188,6 @@ #endif } } - tp->t_idle++; - tp->t_duration++; - if (tp->t_rtt) - tp->t_rtt++; tpgone: ; } @@ -235,10 +239,13 @@ */ case TCPT_2MSL: if (tp->t_state != TCPS_TIME_WAIT && - tp->t_idle <= tcp_maxidle) + tp->t_idle < tcp_finwait2idle) tp->t_timer[TCPT_2MSL] = tcp_keepintvl; - else + else { + if (tp->t_state == TCPS_FIN_WAIT_2) + tcpstat.tcps_finwait2drops++; tp = tcp_close(tp); + } break; /* Index: netinet/tcp_timer.h =================================================================== RCS file: /home/dada/cvsroot/src/netinet/tcp_timer.h,v retrieving revision 1.1 diff -u -u -r1.1 tcp_timer.h --- tcp_timer.h 1999/04/02 01:15:25 1.1 +++ tcp_timer.h 1999/04/08 22:13:15 @@ -101,6 +101,8 @@ #define TCPTV_KEEPINTVL ( 75*PR_SLOWHZ) /* default probe interval */ #define TCPTV_KEEPCNT 8 /* max probes before drop */ +#define TCPTV_FINWAIT2IDLE ( 120*PR_SLOWHZ) /* max idle time in FINWAIT2 */ + #define TCPTV_MIN ( 1*PR_SLOWHZ) /* minimum allowable value */ #define TCPTV_REXMTMAX ( 64*PR_SLOWHZ) /* max allowable REXMT value */ @@ -129,6 +131,8 @@ #ifdef KERNEL extern int tcp_keepinit; /* time to establish connection */ extern int tcp_keepidle; /* time before keepalive probes begin */ +extern int tcp_finwait2idle; /* idle time until drop in FIN_WAIT_2 */ + extern int tcp_keepintvl; /* time between keepalive probes */ extern int tcp_maxidle; /* time to drop after starting probes */ extern int tcp_ttl; /* time to live for TCP segs */ Index: netinet/tcp_usrreq.c =================================================================== RCS file: /home/dada/cvsroot/src/netinet/tcp_usrreq.c,v retrieving revision 1.2 diff -u -u -r1.2 tcp_usrreq.c --- tcp_usrreq.c 1999/04/04 22:17:54 1.2 +++ tcp_usrreq.c 1999/04/08 22:16:52 @@ -833,7 +833,7 @@ soisdisconnected(tp->t_inpcb->inp_socket); /* To prevent the connection hanging in FIN_WAIT_2 forever. */ if (tp->t_state == TCPS_FIN_WAIT_2) - tp->t_timer[TCPT_2MSL] = tcp_maxidle; + tp->t_timer[TCPT_2MSL] = tcp_finwait2idle; } return (tp); } Index: netinet/tcp_var.h =================================================================== RCS file: /home/dada/cvsroot/src/netinet/tcp_var.h,v retrieving revision 1.2 diff -u -u -r1.2 tcp_var.h --- tcp_var.h 1999/04/04 22:17:54 1.2 +++ tcp_var.h 1999/04/08 23:32:18 @@ -248,6 +248,7 @@ u_long tcps_keeptimeo; /* keepalive timeouts */ u_long tcps_keepprobe; /* keepalive probes sent */ u_long tcps_keepdrops; /* connections dropped in keepalive */ + u_long tcps_finwait2drops; /* connections dropped in finwait2 */ u_long tcps_sndtotal; /* total packets sent */ u_long tcps_sndpack; /* data packets sent */ @@ -310,7 +311,8 @@ #define TCPCTL_SENDSPACE 8 /* send buffer space */ #define TCPCTL_RECVSPACE 9 /* receive buffer space */ #define TCPCTL_KEEPINIT 10 /* receive buffer space */ -#define TCPCTL_MAXID 11 +#define TCPCTL_FINWAIT2IDLE 11 /* max idle time in FIN_WAIT_2 state */ +#define TCPCTL_MAXID 12 #define TCPCTL_NAMES { \ { 0, 0 }, \ @@ -324,6 +326,7 @@ { "sendspace", CTLTYPE_INT }, \ { "recvspace", CTLTYPE_INT }, \ { "keepinit", CTLTYPE_INT }, \ + { "finwait2idle", CTLTYPE_INT }, \ } #ifdef KERNEL Index: netstat/inet.c =================================================================== RCS file: /home/dada/cvsroot/src/netstat/inet.c,v retrieving revision 1.1 diff -u -u -r1.1 inet.c --- inet.c 1999/04/08 23:38:57 1.1 +++ inet.c 1999/04/08 23:42:27 @@ -253,6 +253,7 @@ p(tcps_keeptimeo, "\t%lu keepalive timeout%s\n"); p(tcps_keepprobe, "\t\t%lu keepalive probe%s sent\n"); p(tcps_keepdrops, "\t\t%lu connection%s dropped by keepalive\n"); + p(tcps_finwait2drops, "\t%lu connection%s dropped in finwait2\n"); p(tcps_predack, "\t%lu correct ACK header prediction%s\n"); p(tcps_preddat, "\t%lu correct data packet header prediction%s\n"); #undef p To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message