From owner-freebsd-net@FreeBSD.ORG Wed Mar 12 23:00:58 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE5021065671 for ; Wed, 12 Mar 2008 23:00:58 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 2F3758FC22 for ; Wed, 12 Mar 2008 23:00:57 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 26482 invoked from network); 12 Mar 2008 22:13:10 -0000 Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 12 Mar 2008 22:13:10 -0000 Message-ID: <47D860AC.6030707@freebsd.org> Date: Thu, 13 Mar 2008 00:01:00 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: "d.s. al coda" References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: TCP options order changed in FreeBSD 7, incompatible with some routers X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Mar 2008 23:00:58 -0000 d.s. al coda wrote: > Hi, > We recently upgraded one of our webservers to FreeBSD 7, and we started > receiving complaints from some users not able to connect to that server > anymore. On top of that, users were saying that the problem only occurred on > Windows (at least, the ones who had more than on OS to try it out). > > After managing to get a user who had the problem running windump, running > tcpdump on the new server, and comparing that to the windump & tcpdump > output for a "control" user (me) that could connect, we managed to figure > out the following: > - For the user with this problem, ping works fine, but all TCP connections > to the server fail. > - The user, trying to connect, sends out a SYN packet, receives no response, > and retries a few times until timing out. > - The server sees a bunch of SYN packets and responds with SYN-ACK each > time. > - The issue only seems to arise if the sender has RFC1323 disabled. > > So, the SYN-ACK is getting lost somewhere. > > - For the control user (who can connect via TCP just fine), we set the TCP > window size and RFC1323 options the same as the user with the problem. > - The control user sees the SYN-ACK packet. > - We send a connection attempt to one of our other servers, running FreeBSD > 5.5, and one to the server running FreeBSD 7. > - There is only one notable difference between the responses: the order of > the options. > - FreeBSD 5.5 has > - FreeBSD 7 has (there is of course an aligning nop > after the eol, which tcpdump skips) > - These options don't appear in this exact configuration when using RFC1323 > options. > > I get a hunch that the users with the problem have a router that erroneously > thinks that these options are invalid, or thinks that the some part of byte > sequence (e.g. 0204 05b4 0101 0402) is an attack. > > Just to try it out, I patched tcp_output.c so that the SACK permitted option > was aligned on a 4-byte boundary, preventing the "sackOK, eol" pattern from > ever occuring. Looking through previous versions, I found where the tcp > option code had changed, and there used to be a comment about putting SACK > permitted last, but I can't tell if it's relevant. > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.125;r2=1.126 > > The one-line patch to tcp_output.c is attached. > > Sure enough, it fixed the problem. Afterwards, we collected some information > about the routers the users who had the problem were using, and while they > didn't all have the same manufacturer, several mentioned that their router > had a built-in firewall, which, when they disabled it, allowed them to > access the server. I'd be very interesting to know the exactly models and their firmware version of the affected routers. If available locally I'd like to obtain a similar model myself for future regression tests. > Does all of this sound reasonable? And if so, would it be worth submitting > this patch? I don't know if this particular change in options order was > intentional, or just a side-effect of the new code, but it certainly works > around an extremely hard-to-diagnose problem. We've already fixed two issues. The first changes the order of the TCP options and is in this change: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_var.h.diff?r1=1.160;r2=1.161 It is to solve a problem observed by ISC that sounds very much like what you describe. This fixed the issue in this case. The second changes the alignment padding from NOP to 0x00. Whether this was a contributing factor to the reported problem is not clear. There hasn't (yet) been any specific test case for it. It was fixed because the RFC specifies 0x00 to be used for padding and nothing else. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.145;r2=1.146 It would be very helpful if you could apply these two patches after each other to your 7.0 test server and find out together with the affected user(s) which of these fixes the issue. If you can please try to test each one with and w/o the routers firewall enabled. It is interesting to know whether the NAT or firewalling part of the router chokes on it. Your help is very appreciated and I try to document all strange TCP occurrences so we can incorporate them into a regression test suite later on. The more information we have the better it'll become. -- Andre