From owner-freebsd-net@FreeBSD.ORG Thu Aug 21 00:47:25 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BCF151065680 for ; Thu, 21 Aug 2008 00:47:25 +0000 (UTC) (envelope-from freebsd@chrisbuechler.com) Received: from mail.livebsd.com (mail.livebsd.com [69.64.6.14]) by mx1.freebsd.org (Postfix) with SMTP id 7C3498FC25 for ; Thu, 21 Aug 2008 00:47:25 +0000 (UTC) (envelope-from freebsd@chrisbuechler.com) Received: (qmail 39563 invoked by uid 89); 21 Aug 2008 00:47:24 -0000 Received: from unknown (HELO ?10.0.64.15?) (74.130.92.110) by 172.29.29.14 with SMTP; 21 Aug 2008 00:47:24 -0000 Message-ID: <48ACBB1C.7080701@chrisbuechler.com> Date: Wed, 20 Aug 2008 20:47:24 -0400 From: Chris Buechler User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: freebsd-net@freebsd.org References: <48A878C6.9000001@chrisbuechler.com> In-Reply-To: <48A878C6.9000001@chrisbuechler.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: repeatable scp stalls from 7.0 to 7.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Aug 2008 00:47:25 -0000 Chris Buechler wrote: > I've been seeing pretty frequent and repeatable scp stalls between two > FreeBSD 7.0 servers (7.0-RELEASE-p2 to be exact) on a 100 Mb LAN. > They're two HP servers, an Opteron 275 and a dual Xeon 3.4 (don't > recall the models but I can get them if it's relevant) using the > onboard bge(4) cards. The client side (builder7) SCPs a file to the > server side (hosting7) about 20 times a day. The stall happens about > 2-4 times a week or so, and has happened ever since we put these two > boxes online in their current functions. Initially they were the > original 7.0 release, prior to the TCP fix in June. It's behaved the > same way both prior to and after that fix. There are no apparent > network issues aside from this with either of the boxes. > > Since we had nothing to go on other than scp sessions going to > "stalled" (no relevant logs), I setup a tcpdump on each end filtering > on the TCP 22 traffic between these hosts, grabbing 100 bytes of each > frame to avoid chewing up too much disk space. When it happened again > I split the end out into its own file with editcap, 4.2-4.3 MB each. > > http://chrisbuechler.com/temp/lastcut-hosting7.pcap - server end, > capture taken on host but destination IP is a jail > http://chrisbuechler.com/temp/lastcut-builder7.pcap - client end, > connection is initiated from the host, no jails involved. > > The TCP window on the ACKs from server to client start decrementing > [1], to the point where it's down to a window of 0. From that point, > everything the server (172.29.29.181) sends back to the client > (172.29.29.170) has a window of 0. Restarting the scp makes it work > again. It doesn't happen every time, somewhere around 2-3% of the time > it does. I don't see any cause for the decrementing window in those > captures but maybe I'm missing something. > > 1 - lastcut-hosting7.pcap frame #21298; lastcut-builder7.pcap #25088 > > These are both very stock boxes, GENERIC kernels, no significant > changes in sysctl or anything else. I'm not sure where to go from > here, any assistance in resolving this would be appreciated. Cut the nasty stuff above that Thunderbird threw in there on the copy/paste, sorry about that. I haven't gotten any replies, thought I would bump this. thanks, Chris