From owner-freebsd-net@FreeBSD.ORG Tue Oct 16 02:11:38 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5AC6F212; Tue, 16 Oct 2012 02:11:38 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 152158FC0A; Tue, 16 Oct 2012 02:11:36 +0000 (UTC) Received: from pool-96-250-5-62.nycmny.fios.verizon.net ([96.250.5.62]:58711 helo=minion.home) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1TNwd6-0008Nm-Jt; Mon, 15 Oct 2012 22:11:36 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Dropping TCP options from retransmitted SYNs considered harmful From: George Neville-Neil In-Reply-To: <201210150908.36498.jhb@freebsd.org> Date: Mon, 15 Oct 2012 22:11:41 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <70397C6E-202A-4FAE-AF53-6A5A1D89FAAC@neville-neil.com> References: <201210121213.11152.jhb@freebsd.org> <201210150908.36498.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1499) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: freebsd-net@freebsd.org, net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 02:11:38 -0000 On Oct 15, 2012, at 09:08 , John Baldwin wrote: > On Friday, October 12, 2012 12:13:11 pm John Baldwin wrote: >> Back in 2001 FreeBSD added a hack to strip TCP options from = retransmitted SYNs=20 >> starting with the 3rd SYN in this block in tcp_timer.c: >>=20 >> /* >> * Disable rfc1323 if we haven't got any response to >> * our third SYN to work-around some broken terminal servers >> * (most of which have hopefully been retired) that have bad VJ >> * header compression code which trashes TCP segments containing >> * unknown-to-them TCP options. >> */ >> if ((tp->t_state =3D=3D TCPS_SYN_SENT) && (tp->t_rxtshift =3D=3D = 3)) >> tp->t_flags &=3D ~(TF_REQ_SCALE|TF_REQ_TSTMP); >>=20 >> There is even a PR for the original bug report: kern/1689 >>=20 >> However, there is an unintended consequence of this change that can = be=20 >> disastrous. Specifically, suppose you have a FreeBSD client = connecting to a=20 >> server, and that the SYNs are arriving at the server successfully, = but the=20 >> first few return SYN/ACKs are dropped. Eventually a SYN/ACK makes it = through=20 >> and the connection is established. >>=20 >> The server (based on the first SYN it saw) believes it has negotiated = window=20 >> scaling with the client. The client, however, has broken what it = promised in=20 >> that first SYN and believes it is not using any window scaling at = all. This=20 >> causes two forms of breakage: >>=20 >> 1) When the server advertises a scaled window (e.g. '8' for a 64k = window >> scaled at 13), the client thinks it is an unscaled window ('8') = and >> sends data to the server very slowly. >>=20 >> 2) When the client advertises an unscaled window (e.g. '65535' for a = 64k >> window), the server thinks it has a huge window (65535 << 13 =3D=3D = 511MB) >> to send into. >>=20 >> I'm not sure that 2) is a problem per se, but I have definitely seen = instances=20 >> of 1) (and examined the 'struct tcpcb' in kgdb on both the server and = client=20 >> end of the connections to verify they disagreed on the scaling). >>=20 >> The original motivation of this change is to work around broken = terminal=20 >> servers that were old when this change was added in 2001. Over 10 = years later=20 >> I think we should at least have an option to turn this work-around = off, and=20 >> possibly disable it by default. >>=20 >> Thoughts? >=20 > How about this: >=20 > Index: tcp_timer.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- tcp_timer.c (revision 241579) > +++ tcp_timer.c (working copy) > @@ -118,6 +118,11 @@ SYSCTL_INT(_net_inet_tcp, OID_AUTO, keepcnt, = CTLFL > /* max idle probes */ > int tcp_maxpersistidle; >=20 > +static int tcp_rexmit_drop_options =3D 0; > +SYSCTL_INT(_net_inet_tcp, OID_AUTO, rexmit_drop_options, CTLFLAG_RW, > + &tcp_rexmit_drop_options, 0, > + "Drop TCP options from 3rd and later retransmitted SYN"); > + > static int per_cpu_timers =3D 0; > SYSCTL_INT(_net_inet_tcp, OID_AUTO, per_cpu_timers, CTLFLAG_RW, > &per_cpu_timers , 0, "run tcp timers on all cpus"); > @@ -578,7 +583,8 @@ tcp_timer_rexmt(void * xtp) > * header compression code which trashes TCP segments containing > * unknown-to-them TCP options. > */ > - if ((tp->t_state =3D=3D TCPS_SYN_SENT) && (tp->t_rxtshift =3D=3D = 3)) > + if (tcp_rexmit_drop_options && (tp->t_state =3D=3D = TCPS_SYN_SENT) && > + (tp->t_rxtshift =3D=3D 3)) > tp->t_flags &=3D ~(TF_REQ_SCALE|TF_REQ_TSTMP); > /* > * If we backed off this far, our srtt estimate is probably = bogus. >=20 > Any other suggestions on the sysctl name? The name's fine. Commit that sucker and turn it off. Best, George