From owner-freebsd-net@FreeBSD.ORG Fri Feb 1 22:25:26 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CE771CE7; Fri, 1 Feb 2013 22:25:26 +0000 (UTC) (envelope-from kevin@your.org) Received: from mail.your.org (mail.your.org [IPv6:2001:4978:1:2::cc09:3717]) by mx1.freebsd.org (Postfix) with ESMTP id 887C1195; Fri, 1 Feb 2013 22:25:26 +0000 (UTC) Received: from mail.your.org (chi02.mail.your.org [204.9.55.23]) by mail.your.org (Postfix) with ESMTP id 0FCFAF06C33; Fri, 1 Feb 2013 22:25:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=your.org; h=content-type :mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s= selector1; bh=rypa939gCBSnFNIVq/LrrD4PiC4=; b=1hbHJ1ufuHEjZr81Sr bLzEmWYcQE1AKBO0DFLrWwmR1gs3PxHWlhiFPPJrLN2g5P93c3qVA7ee5u1fRowD PGMLHLcgFr8HnS5zIYdTqwhU+Ppapw848gQQyGbLrMm3b9D5mWbJcpjRT6fbI/Ym AVB7xOxUz8RK6SMWyMiqsMQcA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=your.org; h=content-type :mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; q=dns; s= selector1; b=0aiMRHe2TCkJ+AO4gg7/mczx9rJTvI+3re7UB0vEmJ0u3AuYLvi grTnxFUzSyuM1ed0AwNLxxfOcVjjnOGOUKQsLcpt/bKlB3CwUunTAIsMTr+iySo9 sVD/O08ic9fLHznhj5ghvJ7xhByv/KoUsJuB78b0nkghkym5UCT+7tZE= Received: from vpn132.rw1.your.org (vpn132.rw1.your.org [204.9.51.132]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.your.org (Postfix) with ESMTPSA id 7DA9DF06C32; Fri, 1 Feb 2013 22:25:25 +0000 (UTC) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Syncookies break with Windows 8 From: Kevin Day In-Reply-To: Date: Fri, 1 Feb 2013 16:25:23 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <87E3F8C6-1027-49E5-8ED3-C0A499D71864@your.org> References: To: Ed Maste X-Mailer: Apple Mail (2.1499) Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2013 22:25:26 -0000 On Feb 1, 2013, at 4:05 PM, Ed Maste wrote: > On 1 February 2013 16:21, Kevin Day wrote: >> We've got a large cluster of HTTP servers, each server handling = >10,000req/sec. Occasionally, and during periods of heavy load, we'd get = complaints from some users that downloads were working but going = EXTREMELY slowly. After a whole lot of debugging, we narrowed it down to = being only Windows 8 clients experiencing this problem. It turns out = that FreeBSD's implementation of syncookies is likely violating RFC1323. >=20 > Kevin, >=20 > Thanks for the thorough analysis and report, although I didn't see > mention of which FreeBSD version you're running. It looks like andre@ > added storage of the window scale option in the timestamp many years > ago in r162277[1], so I'm curious if you have an old version or > there's an issue with this implementation. This is in 9.1. I saw that change, but based it's not kicking in here (i = think) because Windows wasn't setting a TSopt option in the initial SYN. = RFC1323 says: A TCP may send the Timestamps option (TSopt) in an initial segment (i.e., segment containing a SYN bit and no ACK bit), and may send a TSopt in other segments only if it re- ceived a TSopt in the initial segment for the connection. The client is not setting TSopt on the SYN, so we can't set it on the = SYN/ACK.=20 I can't tell if RFC1323 is saying you MUST support timestamps if you = have window scaling or not: It is vitally important to use the RTTM mechanism with big windows; otherwise, the door is opened to some dangerous instabilities due to aliasing. Furthermore, the option is probably useful for all TCP's, since it simplifies the sender. It appears that the code here won't do window scaling stuffing into = timestamps if we didn't get a timestamp on the SYN though. On = connection: /* * A timestamp received in a SYN makes * it ok to send timestamp requests and replies. */ if (to->to_flags & TOF_TS) { sc->sc_tsreflect =3D to->to_tsval; sc->sc_ts =3D tcp_ts_getticks(); sc->sc_flags |=3D SCF_TIMESTAMP; } Then later on the syncookies ACK: /* Additional parameters are stored in the timestamp if present. = */ if (sc->sc_flags & SCF_TIMESTAMP) { data =3D ((sc->sc_flags & SCF_SIGNATURE) ? 1 : 0); /* = TCP-MD5, 1 bit */ data |=3D ((sc->sc_flags & SCF_SACK) ? 1 : 0) << 1; /* = SACK, 1 bit */ data |=3D sc->sc_requested_s_scale << 2; /* SWIN scale, = 4 bits */ data |=3D sc->sc_requested_r_scale << 6; /* RWIN scale, = 4 bits */ data |=3D md5_buffer[2] << 10; /* more digest = bits */ data ^=3D md5_buffer[3]; sc->sc_ts =3D data; sc->sc_tsoff =3D data - tcp_ts_getticks(); /* = after XOR */ } If SCF_TIMESTAMP doesn't get set, the server doesn't do this. Here's an example tcpdump showing what's happening, with = syncookies_only=3D1 set: 14:22:16.696366 IP (tos 0x0, ttl 118, id 18606, offset 0, flags [DF], = proto TCP (6), length 52) client.49637 > server.80: Flags [S], cksum 0xe8c0 (correct), seq = 4056314475, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], = length 0 14:22:16.696386 IP (tos 0x0, ttl 64, id 60489, offset 0, flags [DF], = proto TCP (6), length 52) server.80 > client.49637: Flags [S.], cksum 0x400e (incorrect -> = 0x5f10), seq 3099521508, ack 4056314476, win 65535, options [mss = 1460,nop,wscale 5,sackOK,eol], length 0 14:22:16.708523 IP (tos 0x0, ttl 118, id 18607, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x9ddf (correct), seq 1, = ack 1, win 256, length 0 14:22:16.708537 IP (tos 0x0, ttl 119, id 18608, offset 0, flags [DF], = proto TCP (6), length 452) client.49637 > server.80: Flags [P.], cksum 0x9aec (correct), seq = 1:413, ack 1, win 256, length 412 14:22:16.711885 IP (tos 0x0, ttl 64, id 60794, offset 0, flags [DF], = proto TCP (6), length 296) server.80 > client.49637: Flags [.], cksum 0x4102 (incorrect -> = 0xdd41), seq 1:257, ack 413, win 65535, length 256 14:22:16.782746 IP (tos 0x0, ttl 119, id 18609, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x9b44 (correct), seq = 413, ack 257, win 255, length 0 14:22:16.782758 IP (tos 0x0, ttl 64, id 62354, offset 0, flags [DF], = proto TCP (6), length 295) server.80 > client.49637: Flags [.], cksum 0x4101 (incorrect -> = 0x8d90), seq 257:512, ack 413, win 65535, length 255 14:22:16.844941 IP (tos 0x0, ttl 119, id 18610, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x9a46 (correct), seq = 413, ack 512, win 254, length 0 14:22:16.844953 IP (tos 0x0, ttl 64, id 63448, offset 0, flags [DF], = proto TCP (6), length 294) server.80 > client.49637: Flags [.], cksum 0x4100 (incorrect -> = 0x9a46), seq 512:766, ack 413, win 65535, length 254 14:22:16.906469 IP (tos 0x0, ttl 119, id 18611, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x9949 (correct), seq = 413, ack 766, win 253, length 0 14:22:16.906481 IP (tos 0x0, ttl 64, id 64738, offset 0, flags [DF], = proto TCP (6), length 293) server.80 > client.49637: Flags [.], cksum 0x40ff (incorrect -> = 0x9949), seq 766:1019, ack 413, win 65535, length 253 14:22:16.968393 IP (tos 0x0, ttl 119, id 18612, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x984d (correct), seq = 413, ack 1019, win 252, length 0 14:22:16.968414 IP (tos 0x0, ttl 64, id 788, offset 0, flags [DF], proto = TCP (6), length 292) server.80 > client.49637: Flags [.], cksum 0x40fe (incorrect -> = 0x984d), seq 1019:1271, ack 413, win 65535, length 252 14:22:17.036097 IP (tos 0x0, ttl 119, id 18613, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x9752 (correct), seq = 413, ack 1271, win 251, length 0 14:22:17.036114 IP (tos 0x0, ttl 64, id 1973, offset 0, flags [DF], = proto TCP (6), length 291) server.80 > client.49637: Flags [.], cksum 0x40fd (incorrect -> = 0x9752), seq 1271:1522, ack 413, win 65535, length 251 14:22:17.095046 IP (tos 0x0, ttl 119, id 18614, offset 0, flags [DF], = proto TCP (6), length 40) client.49637 > server.80: Flags [.], cksum 0x9652 (correct), seq = 413, ack 1522, win 256, length 0 14:22:17.095062 IP (tos 0x0, ttl 64, id 3218, offset 0, flags [DF], = proto TCP (6), length 296) server.80 > client.49637: Flags [.], cksum 0x4102 (incorrect -> = 0x9652), seq 1522:1778, ack 413, win 65535, length 256 You can see where it looks like(from an outside observer) scaling was = properly negotiated, but the server is treating the lengths as unshifted = amounts. -- Kevin