Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Apr 1999 19:34:59 -0700 (PDT)
From:      Julian Elischer <julian@whistle.com>
To:        Martin Kammerhofer <dada@sbox.tu-graz.ac.at>
Cc:        freebsd-net@FreeBSD.ORG
Subject:   Re: Coping with 1000s of W95 clients.
Message-ID:  <Pine.BSF.3.95.990408185635.4355E-100000@current1.whistle.com>
In-Reply-To: <Pine.BSF.3.96.990409015507.558A-100000@localhost.kfunigraz.ac.at>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for taking the time to respond.

On Fri, 9 Apr 1999, Martin Kammerhofer wrote:

> On Tue, 6 Apr 1999, Julian Elischer wrote:
> 
> > 
> > The 'canonical example' is Win95 machines that 
> > don't "shutdown()" the TCP session before exiting.
> > In the following situation you are left wit an entry on your server
> > sitting in FIN_WAIT_2 state.
> > 
> Even FreeBSD boxes are causing this :(. I see this every day on my home
> box where a local apache and a netscape browser are running. HTTP 1.1
> introduced keepalive connections where the server keeps the connection
> open for some 15 sec after servicing the request(s). The idea was to cut
> down on connection setup overhead. If a client has to open for each of
> dozens of inlined GIFs contained in a page a new TCP connection (as it was
> the case with HTTP 1.0) performance will suffer.

Yes I understand this, but the server will not initiate a shutdown
so the socket is not in FIN_WAIT_2 at tehat time.


> Now when the server closes the
> connection after no further request came in for 15sec a FIN will be sent
> and acknowledged from the browser's OS. After that the server's TCP is in
> FIN_WAIT_2 state and the browser's in CLOSE_WAIT.

This is unfortunatly ok, and indistinguishable from the case where
rsh (for example) has closed the input to a remote 'sort' (to allow it to
start sorting) and teh remode 'sort' is busy doing stuff, but will
(eventually) respond with all the sorted data. (assuming for some reason
you wanted to sort on another machine with 300 times as much ram or
something). 

The fact that the browser is misbehaving is not a problem for the 
server OS in this case. Apache actually waits a few seconds and then does
a full close() on the socket, so that in that case a timeout is valid,
but in the case where the client may want to send more data..




> If the browser would
> periodically check his sockets, read zero length from them, notice that
> the server closed and close the socket too, all would be fine.
> Unfortunately browsers like Netscape (at least up to 4.08) just sit idle
> until the user accesses her next webpage - maybe idling for days!


I agree that this is a bad client, but it's indistinguishable from 
a good client. The server however knows that it can do a close() and 
force a timeout. So if it doesn't, it's the server's fault as well :-)


> The problem is well known, those having installed apache from the FreeBSD
> ports collection can read about it in
> file:/usr/local/share/doc/apache/manual/misc/fin_wait_2.html .
> 

I don't have this unfortunatly..

> > 
> > The BSD4.3 hack is to have a (11 minute, 15 second) timeout on FIN_WAIT_2
> > state **IF THE LOCAL END HAS DONE A FULL CLOSE**. A notable example of 
> 
> If an application really does shut down _only_ the socket's output
> side with shutdown(socket,how=1) then it wants to keep the socket
> open for further reads! Timing out a half duplex connection is plain
> wrong.

unless the far end has actually crashed or closed quietly.
A keepalive that elicits a RST or an ACK would isolate those cases.
(i.e. resend the FIN and see what comes back)

> 
> The timeout is actually tcp_maxidle = tcp_keepcnt * tcp_keepintvl;
> Keepcnt is 8 (hard coded) and keepintvl is settable by sysctl
> (net.inet.tcp.keepintvl). Eight times the default keepintvl of 150
> is 1200 or 10 minutes. (Those TCP timers run with 2 Hz.)


yes I read the code too :-)


> Because
>  - the idle time counter is incremented _after_ the timer is run,
>  - and the condition for waiting another 75sec (=keepintvl) is
>    ``tp->t_idle<= tcp_maxidle'' instead of ``tp->t_idle < tcp_maxidle''
> another keepintvl is added so it's 9 * 75 = 675sec total.

(11 min 15 secs :-)

> 
> >
> > The only way to stop this is to break the standard, as this would be
> 
> FreeBSD's 675sec timeout on FIN_WAIT_2 is already breaking RFC 793.
> 
> > Basically, any session that is still in FIN_WAIT_2 after 30 seconds
> > reverts to FIN_WAIT_1, and resends the FIN. I believe that
> > this is similar to a fix Paul Vixie mentionned implementing in NetBSD
> > once.
> > 


> I don't think is a good solution. Retransmitting the FIN certainly doesn't
> break the spec, but it won't help much. If there were dead browsers ``on
> the other side'' of all those annoying FIN_WAIT_2 sockets, then 
> retransmitting and getting ACKs or RSTs would certainly help. But in most
> cases there is a browser just waiting for user actions! Resending the
> FIN would accomplish nothing in this case. The browsers TCP stack would
> reacknowledge the FIN and continue hanging around in CLOSE_WAIT.

At least you know NOT to time out.  On the other hand if you get no answer
at for 10 minutes straight (it's crashed)  or a RST (it's rebooted), you
know you can throw away the whole session. 


> I guess the percentage of cases where you get a RST or ICMP is quite low
> and not worth the increase in net traffic.

In some cases it's quite high.


> 
> The easy solution to shorten FIN_WAIT_2 is simple:
> 
>   sysctl -w net.inet.tcp.keepintvl=27

yes,
I've added though
sysctl -w net.inet.tcp.finretry
(so I can play with it :-)


> 
>   This would give you a FIN_WAIT_2 timeout of 9 * 27 / 2 = 121.5 sec
>   This should cut down the number of FW2 sockets by a factor of
>   675/121 = 5.6 . The only drawback to this solution is, that keepalives
>   won't work reliably any more. (I'm referring to the transport layer
>   keepalives here, _not_ the HTTP 1.1 application layer keepalives).
>   After 2 hours idle time there would be only a 2 min time window to
>   respond before a keepalive connection is dropped.

yes I thought of this.

> 
> This leads me two solution 2:
> 
>   Just introduce a configurable timeout for idle finwait2 sockets.
>   This is a quite small change and less intrusive than your suggestion.
>   Those who have so many hits that finwait2 sockets pile up could
>   just lower the finwait2 timeout.

and what about silly servers that do a shutdown() but never notice that
the other end has vanished.. If the kernel could 
ping the other end, it's KNOW about it.

I gotta run, to a FreeBSD meeting...

julian


> 
>        Martin
> 
> Index: netinet/tcp_input.c
> ===================================================================
> RCS file: /home/dada/cvsroot/src/netinet/tcp_input.c,v
> retrieving revision 1.3
> diff -u -u -r1.3 tcp_input.c
> --- tcp_input.c	1999/04/06 19:28:25	1.3
> +++ tcp_input.c	1999/04/08 22:17:58
> @@ -1496,7 +1496,7 @@
>  				 */
>  				if (so->so_state & SS_CANTRCVMORE) {
>  					soisdisconnected(so);
> -					tp->t_timer[TCPT_2MSL] = tcp_maxidle;
> +					tp->t_timer[TCPT_2MSL] = tcp_finwait2idle;
>  				}
>  				tp->t_state = TCPS_FIN_WAIT_2;
>  			}
> Index: netinet/tcp_timer.c
> ===================================================================
> RCS file: /home/dada/cvsroot/src/netinet/tcp_timer.c,v
> retrieving revision 1.4
> diff -u -u -r1.4 tcp_timer.c
> --- tcp_timer.c	1999/04/08 12:15:01	1.4
> +++ tcp_timer.c	1999/04/08 23:34:49
> @@ -85,6 +85,10 @@
>  SYSCTL_INT(_net_inet_tcp, TCPCTL_KEEPINTVL, keepintvl,
>  	CTLFLAG_RW, &tcp_keepintvl , 0, "");
>  
> +int	tcp_finwait2idle = TCPTV_FINWAIT2IDLE;
> +SYSCTL_INT(_net_inet_tcp, TCPCTL_FINWAIT2IDLE, finwait2idle,
> +	CTLFLAG_RW, &tcp_finwait2idle , 0, "");
> +
>  static int	always_keepalive = 0;
>  SYSCTL_INT(_net_inet_tcp, OID_AUTO, always_keepalive,
>  	CTLFLAG_RW, &always_keepalive , 0, "");
> @@ -162,6 +166,10 @@
>  		tp = intotcpcb(ip);
>  		if (tp == 0 || tp->t_state == TCPS_LISTEN)
>  			continue;
> +		tp->t_idle++;
> +		tp->t_duration++;
> +		if (tp->t_rtt)
> +			tp->t_rtt++;
>  		for (i = 0; i < TCPT_NTIMERS; i++) {
>  			if (tp->t_timer[i] && --tp->t_timer[i] == 0) {
>  #ifdef TCPDEBUG
> @@ -180,10 +188,6 @@
>  #endif
>  			}
>  		}
> -		tp->t_idle++;
> -		tp->t_duration++;
> -		if (tp->t_rtt)
> -			tp->t_rtt++;
>  tpgone:
>  		;
>  	}
> @@ -235,10 +239,13 @@
>  	 */
>  	case TCPT_2MSL:
>  		if (tp->t_state != TCPS_TIME_WAIT &&
> -		    tp->t_idle <= tcp_maxidle)
> +		    tp->t_idle < tcp_finwait2idle)
>  			tp->t_timer[TCPT_2MSL] = tcp_keepintvl;
> -		else
> +		else {
> +			if (tp->t_state == TCPS_FIN_WAIT_2)
> +			    tcpstat.tcps_finwait2drops++;
>  			tp = tcp_close(tp);
> +		}
>  		break;
>  
>  	/*
> Index: netinet/tcp_timer.h
> ===================================================================
> RCS file: /home/dada/cvsroot/src/netinet/tcp_timer.h,v
> retrieving revision 1.1
> diff -u -u -r1.1 tcp_timer.h
> --- tcp_timer.h	1999/04/02 01:15:25	1.1
> +++ tcp_timer.h	1999/04/08 22:13:15
> @@ -101,6 +101,8 @@
>  #define	TCPTV_KEEPINTVL	( 75*PR_SLOWHZ)		/* default probe interval */
>  #define	TCPTV_KEEPCNT	8			/* max probes before drop */
>  
> +#define	TCPTV_FINWAIT2IDLE ( 120*PR_SLOWHZ)	/* max idle time in FINWAIT2 */
> +
>  #define	TCPTV_MIN	(  1*PR_SLOWHZ)		/* minimum allowable value */
>  #define	TCPTV_REXMTMAX	( 64*PR_SLOWHZ)		/* max allowable REXMT value */
>  
> @@ -129,6 +131,8 @@
>  #ifdef KERNEL
>  extern int tcp_keepinit;		/* time to establish connection */
>  extern int tcp_keepidle;		/* time before keepalive probes begin */
> +extern int tcp_finwait2idle;		/* idle time until drop in FIN_WAIT_2 */
> +
>  extern int tcp_keepintvl;		/* time between keepalive probes */
>  extern int tcp_maxidle;			/* time to drop after starting probes */
>  extern int tcp_ttl;			/* time to live for TCP segs */
> Index: netinet/tcp_usrreq.c
> ===================================================================
> RCS file: /home/dada/cvsroot/src/netinet/tcp_usrreq.c,v
> retrieving revision 1.2
> diff -u -u -r1.2 tcp_usrreq.c
> --- tcp_usrreq.c	1999/04/04 22:17:54	1.2
> +++ tcp_usrreq.c	1999/04/08 22:16:52
> @@ -833,7 +833,7 @@
>  		soisdisconnected(tp->t_inpcb->inp_socket);
>  		/* To prevent the connection hanging in FIN_WAIT_2 forever. */
>  		if (tp->t_state == TCPS_FIN_WAIT_2)
> -			tp->t_timer[TCPT_2MSL] = tcp_maxidle;
> +			tp->t_timer[TCPT_2MSL] = tcp_finwait2idle;
>  	}
>  	return (tp);
>  }
> Index: netinet/tcp_var.h
> ===================================================================
> RCS file: /home/dada/cvsroot/src/netinet/tcp_var.h,v
> retrieving revision 1.2
> diff -u -u -r1.2 tcp_var.h
> --- tcp_var.h	1999/04/04 22:17:54	1.2
> +++ tcp_var.h	1999/04/08 23:32:18
> @@ -248,6 +248,7 @@
>  	u_long	tcps_keeptimeo;		/* keepalive timeouts */
>  	u_long	tcps_keepprobe;		/* keepalive probes sent */
>  	u_long	tcps_keepdrops;		/* connections dropped in keepalive */
> +	u_long	tcps_finwait2drops;	/* connections dropped in finwait2 */
>  
>  	u_long	tcps_sndtotal;		/* total packets sent */
>  	u_long	tcps_sndpack;		/* data packets sent */
> @@ -310,7 +311,8 @@
>  #define	TCPCTL_SENDSPACE	8	/* send buffer space */
>  #define	TCPCTL_RECVSPACE	9	/* receive buffer space */
>  #define	TCPCTL_KEEPINIT		10	/* receive buffer space */
> -#define TCPCTL_MAXID		11
> +#define	TCPCTL_FINWAIT2IDLE	11	/* max idle time in FIN_WAIT_2 state */
> +#define TCPCTL_MAXID		12
>  
>  #define TCPCTL_NAMES { \
>  	{ 0, 0 }, \
> @@ -324,6 +326,7 @@
>  	{ "sendspace", CTLTYPE_INT }, \
>  	{ "recvspace", CTLTYPE_INT }, \
>  	{ "keepinit", CTLTYPE_INT }, \
> +	{ "finwait2idle", CTLTYPE_INT }, \
>  }
>  
>  #ifdef KERNEL
> Index: netstat/inet.c
> ===================================================================
> RCS file: /home/dada/cvsroot/src/netstat/inet.c,v
> retrieving revision 1.1
> diff -u -u -r1.1 inet.c
> --- inet.c	1999/04/08 23:38:57	1.1
> +++ inet.c	1999/04/08 23:42:27
> @@ -253,6 +253,7 @@
>  	p(tcps_keeptimeo, "\t%lu keepalive timeout%s\n");
>  	p(tcps_keepprobe, "\t\t%lu keepalive probe%s sent\n");
>  	p(tcps_keepdrops, "\t\t%lu connection%s dropped by keepalive\n");
> +	p(tcps_finwait2drops, "\t%lu connection%s dropped in finwait2\n");
>  	p(tcps_predack, "\t%lu correct ACK header prediction%s\n");
>  	p(tcps_preddat, "\t%lu correct data packet header prediction%s\n");
>  #undef p
> 
> 
> 
> 
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.990408185635.4355E-100000>