From owner-freebsd-net@FreeBSD.ORG Mon Oct 15 15:17:27 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6564781B; Mon, 15 Oct 2012 15:17:27 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 364EE8FC0A; Mon, 15 Oct 2012 15:17:27 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 917D4B911; Mon, 15 Oct 2012 11:17:26 -0400 (EDT) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: Dropping TCP options from retransmitted SYNs considered harmful Date: Mon, 15 Oct 2012 09:08:36 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <201210121213.11152.jhb@freebsd.org> In-Reply-To: <201210121213.11152.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201210150908.36498.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 15 Oct 2012 11:17:26 -0400 (EDT) Cc: net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 15:17:27 -0000 On Friday, October 12, 2012 12:13:11 pm John Baldwin wrote: > Back in 2001 FreeBSD added a hack to strip TCP options from retransmitted SYNs > starting with the 3rd SYN in this block in tcp_timer.c: > > /* > * Disable rfc1323 if we haven't got any response to > * our third SYN to work-around some broken terminal servers > * (most of which have hopefully been retired) that have bad VJ > * header compression code which trashes TCP segments containing > * unknown-to-them TCP options. > */ > if ((tp->t_state == TCPS_SYN_SENT) && (tp->t_rxtshift == 3)) > tp->t_flags &= ~(TF_REQ_SCALE|TF_REQ_TSTMP); > > There is even a PR for the original bug report: kern/1689 > > However, there is an unintended consequence of this change that can be > disastrous. Specifically, suppose you have a FreeBSD client connecting to a > server, and that the SYNs are arriving at the server successfully, but the > first few return SYN/ACKs are dropped. Eventually a SYN/ACK makes it through > and the connection is established. > > The server (based on the first SYN it saw) believes it has negotiated window > scaling with the client. The client, however, has broken what it promised in > that first SYN and believes it is not using any window scaling at all. This > causes two forms of breakage: > > 1) When the server advertises a scaled window (e.g. '8' for a 64k window > scaled at 13), the client thinks it is an unscaled window ('8') and > sends data to the server very slowly. > > 2) When the client advertises an unscaled window (e.g. '65535' for a 64k > window), the server thinks it has a huge window (65535 << 13 == 511MB) > to send into. > > I'm not sure that 2) is a problem per se, but I have definitely seen instances > of 1) (and examined the 'struct tcpcb' in kgdb on both the server and client > end of the connections to verify they disagreed on the scaling). > > The original motivation of this change is to work around broken terminal > servers that were old when this change was added in 2001. Over 10 years later > I think we should at least have an option to turn this work-around off, and > possibly disable it by default. > > Thoughts? How about this: Index: tcp_timer.c =================================================================== --- tcp_timer.c (revision 241579) +++ tcp_timer.c (working copy) @@ -118,6 +118,11 @@ SYSCTL_INT(_net_inet_tcp, OID_AUTO, keepcnt, CTLFL /* max idle probes */ int tcp_maxpersistidle; +static int tcp_rexmit_drop_options = 0; +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rexmit_drop_options, CTLFLAG_RW, + &tcp_rexmit_drop_options, 0, + "Drop TCP options from 3rd and later retransmitted SYN"); + static int per_cpu_timers = 0; SYSCTL_INT(_net_inet_tcp, OID_AUTO, per_cpu_timers, CTLFLAG_RW, &per_cpu_timers , 0, "run tcp timers on all cpus"); @@ -578,7 +583,8 @@ tcp_timer_rexmt(void * xtp) * header compression code which trashes TCP segments containing * unknown-to-them TCP options. */ - if ((tp->t_state == TCPS_SYN_SENT) && (tp->t_rxtshift == 3)) + if (tcp_rexmit_drop_options && (tp->t_state == TCPS_SYN_SENT) && + (tp->t_rxtshift == 3)) tp->t_flags &= ~(TF_REQ_SCALE|TF_REQ_TSTMP); /* * If we backed off this far, our srtt estimate is probably bogus. Any other suggestions on the sysctl name? -- John Baldwin