Date: Sun, 17 Aug 2008 15:15:18 -0400 From: Chris Buechler <freebsd@chrisbuechler.com> To: freebsd-net@freebsd.org Subject: repeatable scp stalls from 7.0 to 7.0 Message-ID: <48A878C6.9000001@chrisbuechler.com>
next in thread | raw e-mail | index | archive | help
I've been seeing pretty frequent and repeatable scp stalls between two FreeBSD 7.0 servers (7.0-RELEASE-p2 to be exact) on a 100 Mb LAN. They're two HP servers, an Opteron 275 and a dual Xeon 3.4 (don't recall the models but I can get them if it's relevant) using the onboard bge(4) cards. The client side (builder7) SCPs a file to the server side (hosting7) about 20 times a day. The stall happens about 2-4 times a week or so, and has happened ever since we put these two boxes online in their current functions. Initially they were the original 7.0 release, prior to the TCP fix in June. It's behaved the same way both prior to and after that fix. There are no apparent network issues aside from this with either of the boxes. Since we had nothing to go on other than scp sessions going to "stalled" (no relevant logs), I setup a tcpdump on each end filtering on the TCP 22 traffic between these hosts, grabbing 100 bytes of each frame to avoid chewing up too much disk space. When it happened again I split the end out into its own file with editcap, 4.2-4.3 MB each. http://chrisbuechler.com/temp/lastcut-hosting7.pcap <http://chrisbuechler.com/temp/lastcut-hosting7.pcap> - server end, capture taken on host but destination IP is a jail http://chrisbuechler.com/temp/lastcut-builder7.pcap <http://chrisbuechler.com/temp/lastcut-builder7.pcap> - client end, connection is initiated from the host, no jails involved. The TCP window on the ACKs from server to client start decrementing [1], to the point where it's down to a window of 0. From that point, everything the server (172.29.29.181 <http://172.29.29.181>) sends back to the client (172.29.29.170 <http://172.29.29.170>) has a window of 0. Restarting the scp makes it work again. It doesn't happen every time, somewhere around 2-3% of the time it does. I don't see any cause for the decrementing window in those captures but maybe I'm missing something. 1 - lastcut-hosting7.pcap frame #21298; lastcut-builder7.pcap #25088 These are both very stock boxes, GENERIC kernels, no significant changes in sysctl or anything else. I'm not sure where to go from here, any assistance in resolving this would be appreciated. cheers, Chris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48A878C6.9000001>