From owner-freebsd-net@FreeBSD.ORG Mon Jul 18 22:25:34 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8934E1065675 for ; Mon, 18 Jul 2011 22:25:34 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 141038FC0A for ; Mon, 18 Jul 2011 22:25:33 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 8034215346D; Tue, 19 Jul 2011 00:25:32 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sakZaQPfkuM1; Tue, 19 Jul 2011 00:25:30 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc] (unknown [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 1FCF115346C; Tue, 19 Jul 2011 00:25:30 +0200 (CEST) Message-ID: <4E24B2D9.3000706@digiware.nl> Date: Tue, 19 Jul 2011 00:25:29 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <379885BA631F4C7787C24E00A174B429@multiplay.co.uk> <4E2186A5.4040707@digiware.nl> In-Reply-To: <4E2186A5.4040707@digiware.nl> X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: igb enable_aim or flow_control causing tcp stalls? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jul 2011 22:25:34 -0000 On 16-7-2011 14:40, Willem Jan Withagen wrote: > On 15-7-2011 18:47, Steven Hartland wrote: >> Been trying to identify an strange network stalling issue while using >> scp or rsync between two machines, initially at remote locations. >> >> The behaviour has proved quite difficult to track as it seems to require a >> number or factors combined before the stalls occur. These seem to be: >> 1. This particular target machine >> 2. Some load, but not much on the machine, when idle we don't see stalls. >> 3. Remote 9ms+ latency or high through put 50MB/s transmission speeds >> >> My current test case is copying a freebsd iso from a local machine to >> the potentially problematic machine's /dev/null e.g. >> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso test1:/dev/null >> >> These machines are connected via a cisco 6509 -> supermicro blade >> chassis. >> >> When the failure happens we see the following:- >> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null >> FreeBSD-8.2-RELEASE-amd64-disc1.iso 21% 147MB 2.1MB/s - stalled - >> >> When all is well we see:- >> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null >> FreeBSD-8.2-RELEASE-amd64-disc1.iso 100% 691MB 53.1MB/s 00:13 >> >> This setup:- >> 1. Source machine 7.0-RELEASE-p2 using em0 >> em0@pci0:6:0:0: class=0x020000 card=0x109615d9 chip=0x10968086 rev=0x01 >> hdr=0x00 >> vendor = 'Intel Corporation' >> device = 'PRO/1000 EB Network Connection' >> class = network >> subclass = ethernet >> 2. Target (problem) machine 8.2-RELEASE using igb0 >> igb0@pci0:5:0:0: class=0x020000 card=0x10e715d9 chip=0x10e78086 >> rev=0x01 hdr=0x00 >> vendor = 'Intel Corporation' >> class = network >> subclass = ethernet >> >> I've tried switching to igb1 with no change, which also changes >> switches and hence ports on the Cisco, so I don't at this point >> believe there is an issue there. >> >> Now I've just noticed that igb has at least two sysctl's which >> seemed interesting, enable_aim & flow_control (which is missing >> from the man page btw). On disabling both, the stalls seem to go away. >> >> Unfortunately re-enabling them didn't re-introduce the stalls, but >> this could another quirk when they don't re-enable properly? >> >> So the questions are:- >> 1. Could either of these settings cause tcp stalls? >> 2. If the nic and switch differ in flow control, what is the likely >> effect? >> 3. Any other thoughts? > > I'm having more or less the same problems with a remote server with an > em0 device running 7.2 (just upgraded to 7.4) and a 8.2 system..... > Connection is limited on the way by 100Mbit-link, but other paths are > all 1Gbit. Traffic starts at 20Mbit/sec but like after a minute we are > down to 1Mbit/sec. Wait longer and it gets down to 250Kbit/s. And then > starts to stall. > > But I suspect that in my case it is due to a mismatch of 100baseTX on > the remote server, where the connection to the switch should be > 1000baseTX. Reboots and all don't cure this, so I'll probably with have > to go over and also kick the switch. > > Especially flowcontrol would be "upset" with such a mismatch. Just as a followup, it was indeed bad networking. One of the hubs building the 100Mb bit had thrown a fit, and decided to run things a mock. Taking it out, solved the matter. Now we're running again 100Mbit FTPs... --WjW