Date: Fri, 1 Feb 2013 15:21:15 -0600 From: Kevin Day <kevin@your.org> To: freebsd-net@freebsd.org Subject: Syncookies break with Windows 8 Message-ID: <CA61E725-8370-4ED2-BBA7-F6FAFF93A553@your.org>
next in thread | raw e-mail | index | archive | help
We've got a large cluster of HTTP servers, each server handling = >10,000req/sec. Occasionally, and during periods of heavy load, we'd get = complaints from some users that downloads were working but going = EXTREMELY slowly. After a whole lot of debugging, we narrowed it down to = being only Windows 8 clients experiencing this problem. It turns out = that FreeBSD's implementation of syncookies is likely violating RFC1323. When syncookies kicks in, either because the syncache limit is reached = or net.inet.tcp.syncookies_only is set, some shortcuts are taken with = regard to TCP connections. Unlike some other syncookies implementations = which (ab)use timestamps to store options, the FreeBSD implementation of = syncookies discards TCP options such as window scaling. In itself this = isn't a bad thing, but it becomes a bad thing because we then lie and = pretend that we are supporting window scaling. According to RFC1323, if you want to use TCP window scaling, the client = says so on the initial SYN. If the server is also willing to use = scaling, it says so on the SYN/ACK. If both parties included a scaling = option on their respective SYN, you assume window scaling is working and = proceed to use it. If one or both parties don't have a scaling option, = you don't scale at all. The problem here is that with syncookies, we = don't save the wscale parameter from the client's SYN, but offer to use = window scaling anyway on our SYN/ACK, so the client thinks we = successfully negotiated window scaling even though we haven't. This is how a normal window scaled connection happens: client > server: Flags [S], win 65535, options [mss 1460,nop,wscale = 4,nop,nop,sackOK], length 0 (client is connecting, offering a window of 64K, but if scaling is = negotiated wants to scale future window sizes by 4 bits) server > client: Flags [S.], win 65535, options [mss 1460,nop,wscale = 5,sackOK,eol], length 0 (server is ACKing the client's SYN, also offering an unscaled window of = 64K, but wanting to shift by 5 going forward) The server and client both offered window scaling, so they're now using = it from this point on. All window sizes sent/received are shifted by the = appropriate number of bits. When syncookies kicks in on the server, and the client is anything BUT = Windows 8, this happens: client > server: Flags [S], win 65535, options [mss 1460,nop,wscale = 4,nop,nop,sackOK], length 0 However, syncookies cause the options to get lost. The client sent the = "wscale 4" parameter, but we immediately forgot it. server > client: Flags [S.], win 65535, options [mss 1460,nop,wscale = 5,sackOK,eol], length 0 (server is ACKing the client's SYN, also offering an unscaled window of = 64K, but wanting to shift by 5 going forward) The server sent a wscale back on its SYN/ACK, so the client thinks = window scaling is now in effect. But it's not, the server didn't = remember the client's wscale option, so it's not scaling any of the = received window sizes that are coming in from the client. This doesn't = actually hurt much. The client thinks it's telling us it has a 1MB = window open, but we're only hearing that it's sent a 64K window, so = that's all we ever use. It's "failing safe" here, and nothing actually = breaks. Now throw Windows 8 into the mix. Windows 8's TCP auto tuning is much = more aggressive than previous versions of Windows. I honestly can't tell = if this is a bug or intentional design, but Windows will sometimes, = intermittently, advertise a much much larger wscale option than it = actually needs. This is a mild example of what happens: client > server: Flags [S], win 8192, options [mss 1460,nop,wscale = 8,nop,nop,sackOK], length 0 (client is connecting, offering an unscaled window of 8192 bytes, but = wants to negotiate window scaling of 8 bits if the server will accept = it) server > client: Flags [S.], win 65535, options [mss 1460,nop,wscale = 5,sackOK,eol], length 0 (server is ACKing the client's SYN, also offering an unscaled window of = 64K, but wanting to shift by 5 going forward) We're at the same point here as in the above example, the client now = believes we've successfully negotiated window scaling, but on the server = side we're treating all window sizes coming from the client as being = shifted by 0. So the client sends it's first ACK: client > server: Flags [.], seq 1, ack 1, win 256, length 0 The client believes we're still scaling everything it says by 8 bits, = but it only wants to give us a 64K window, so it's saying 256 here. = (256<<8 =3D 65536). We don't remember that we agreed to shift everything = by 8, so we treat that as just 256. The connection now proceeds, but we = think we can only send 256 bytes at a time. It is extremely slow. I have seen Windows 8 attempt to use wscale parameters of 8 all way up = to 10. While I've only caught a few cases of this happening in the wild, = when it's using 10 we end up thinking we only have a 64 byte window and = things get really silly really fast. I've been talking with someone on Microsoft's side of things about why = Windows is choosing to do this. But my own view of this is that if = syncookies are being used in their current state (we lose the client's = wscale option), we can't advertise wscale on the SYN/ACK. My reading of = RFC1323 says that if we put a wscale option in our SYN/ACK that means we = agreed to use the client's wscale in their SYN. I don't think that's = correct. If syncookies are being used, we should advertise MIN(sb_max, = TCP_MAXWIN) with no scaling and stay within the RFC. This doesn't affect Linux because it uses timestamp options to stuff the = client's wscale, so it gets re-learned on the ACK. OpenBSD and OS X = don't have syncookies. NetBSD seems to have the same problem if it's new = syncookie implementation gets turned on.=20 Any thoughts? Was there a reason why we're forcing the use of wscale on = syncookie connections? -- Kevin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA61E725-8370-4ED2-BBA7-F6FAFF93A553>