From owner-freebsd-net@FreeBSD.ORG Tue Jun 12 15:39:42 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3E51216A46B for ; Tue, 12 Jun 2007 15:39:42 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9DBF913C4B7 for ; Tue, 12 Jun 2007 15:39:41 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 2663 invoked from network); 12 Jun 2007 14:53:14 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 12 Jun 2007 14:53:14 -0000 Message-ID: <466EBE3E.3050105@freebsd.org> Date: Tue, 12 Jun 2007 17:39:42 +0200 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: Bill Moran References: <20070612101949.646dcaa5.wmoran@collaborativefusion.com> In-Reply-To: <20070612101949.646dcaa5.wmoran@collaborativefusion.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Weird "ignoring syn" problem X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jun 2007 15:39:42 -0000 Bill Moran wrote: > This one has got me pretty befuddled. > > We're seeing some really odd behaviour with FreeBSD ignoring SYN packets. > I've been trying to diagnose this for a couple of weeks now, and my current > guess is that there's something wrong with the em driver. Here's a narrowed > down list of what I've ruled out: > *) I've done my best to eliminate other network components as the problem. > My theory at this point is that it can't possibly be any other network > hardware, based on the tcpdump show below. > *) The problem occurred on both FreeBSD 6.1 and FreeBSD 6.2-p3. > *) The problem does not appear to be tied to CPU usage -- the CPU is nearly > idle when the problem occurs. > *) I can now reproduce it pretty easily, so I'll know when it's fixed. > *) The system exhibiting the problem is running 15 jails, but they are > idle 95% of the time. The problem initially occurred inside one of > the jails, but I just recreated it outside the jail (on the host) and > it's _easier_ to reproduce outside the jail. > *) The problem occurred with both GENERIC, and the SMP kernel (this is a > dual-CPU, hyperthreaded system) > *) I've tested and the behavior occurs both with a dynamically generated > file (from PHP) or from a static file. > > The nature of the beast is that we've got a SOAP application running under > Apache and PHP. This application is subject to many requests in rapid > succession, such that load can be simulated by the following loop: > > while true; do fetch http://192.168.121.250/test.php; done > > The problem is that occasionally, the Apache server machine just ignores > SYN packets. Take the following tcpdump output for example: > > 13:34:17.312296 IP web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S 2645061726:2645061726(0) win 65535 > 13:34:20.312398 IP web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S 2645061726:2645061726(0) win 65535 > 13:34:23.512626 IP web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S 2645061726:2645061726(0) win 65535 > > This is the _only_ traffic on port 80 during the test. It looks like the > kernel has ignored the initial syn packet and two duplicates. I've seen it > take as long as 45 seconds to establish a connection, and this causes > ugly performance problems, as well as frequent timeouts on the client end. > The only clue I've found so far is this output from netstat -s. > > 153099 syncache entries added > 6184 retransmitted > 6491 dupsyn > 0 dropped > 150923 completed > 0 bucket overflow > 0 cache overflow > 235 reset > 1941 stale > 0 aborted > 0 badack > 0 unreach > 0 zone failures > > Unfortunately, I've been unable to determine how to fix the problem. Any > advice is welcome. Before we go into more detail: a) the em(4) driver is most likely totally unrelated to this b) you may run out of socket on the client side and reuse them too fast. Try to lower net.inet.ip.portrange.first to 30,000 or so. c) related to (b) you may have a lot of connections in TIME_WAIT on the server catching not really stray packets. Try it with net.inet.tcp.nolocaltimewait=1 d) if the above didn't help then it'd be very helpful to test against a server with FreeBSD-current (the future 7.0) on it. In -current we've got detailed logging of LISTEN socket failures that allow rapid analysis of the problem. -- Andre