From owner-freebsd-net@FreeBSD.ORG  Sat Jul 16 12:40:03 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2B337106564A
	for <freebsd-net@freebsd.org>; Sat, 16 Jul 2011 12:40:03 +0000 (UTC)
	(envelope-from wjw@digiware.nl)
Received: from mail.digiware.nl (mail.ip6.digiware.nl
	[IPv6:2001:4cb8:1:106::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 617B98FC0A
	for <freebsd-net@freebsd.org>; Sat, 16 Jul 2011 12:40:02 +0000 (UTC)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
	by mail.digiware.nl (Postfix) with ESMTP id 72237153433;
	Sat, 16 Jul 2011 14:40:00 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from mail.digiware.nl ([127.0.0.1])
	by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id hcjIYoNBiMm1; Sat, 16 Jul 2011 14:39:58 +0200 (CEST)
Received: from [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc] (unknown
	[IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.digiware.nl (Postfix) with ESMTPSA id 73945153435;
	Sat, 16 Jul 2011 14:39:58 +0200 (CEST)
Message-ID: <4E2186A5.4040707@digiware.nl>
Date: Sat, 16 Jul 2011 14:40:05 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Steven Hartland <killing@multiplay.co.uk>
References: <379885BA631F4C7787C24E00A174B429@multiplay.co.uk>
In-Reply-To: <379885BA631F4C7787C24E00A174B429@multiplay.co.uk>
X-Enigmail-Version: 1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org
Subject: Re: igb enable_aim or flow_control causing tcp stalls?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 16 Jul 2011 12:40:03 -0000

On 15-7-2011 18:47, Steven Hartland wrote:
> Been trying to identify an strange network stalling issue while using
> scp or rsync between two machines, initially at remote locations.
> 
> The behaviour has proved quite difficult to track as it seems to require a
> number or factors combined before the stalls occur. These seem to be:
> 1. This particular target machine
> 2. Some load, but not much on the machine, when idle we don't see stalls.
> 3. Remote 9ms+ latency or high through put 50MB/s transmission speeds
> 
> My current test case is copying a freebsd iso from a local machine to
> the potentially problematic machine's /dev/null e.g.
> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso test1:/dev/null
> 
> These machines are connected via a cisco 6509 -> supermicro blade
> chassis.
> 
> When the failure happens we see the following:-
> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null
> FreeBSD-8.2-RELEASE-amd64-disc1.iso   21%  147MB   2.1MB/s - stalled -
> 
> When all is well we see:-
> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null
> FreeBSD-8.2-RELEASE-amd64-disc1.iso   100%  691MB  53.1MB/s   00:13
> 
> This setup:-
> 1. Source machine 7.0-RELEASE-p2 using em0
> em0@pci0:6:0:0: class=0x020000 card=0x109615d9 chip=0x10968086 rev=0x01
> hdr=0x00
>    vendor     = 'Intel Corporation'
>    device     = 'PRO/1000 EB Network Connection'
>    class      = network
>    subclass   = ethernet
> 2. Target (problem) machine 8.2-RELEASE using igb0
> igb0@pci0:5:0:0:        class=0x020000 card=0x10e715d9 chip=0x10e78086
> rev=0x01 hdr=0x00
>    vendor     = 'Intel Corporation'
>    class      = network
>    subclass   = ethernet
> 
> I've tried switching to igb1 with no change, which also changes
> switches and hence ports on the Cisco, so I don't at this point
> believe there is an issue there.
> 
> Now I've just noticed that igb has at least two sysctl's which
> seemed interesting, enable_aim & flow_control (which is missing
> from the man page btw). On disabling both, the stalls seem to go away.
> 
> Unfortunately re-enabling them didn't re-introduce the stalls, but
> this could another quirk when they don't re-enable properly?
> 
> So the questions are:-
> 1. Could either of these settings cause tcp stalls?
> 2. If the nic and switch differ in flow control, what is the likely
> effect?
> 3. Any other thoughts?

I'm having more or less the same problems with a remote server with an
em0 device running 7.2 (just upgraded to 7.4) and a 8.2 system.....
Connection is limited on the way by 100Mbit-link, but other paths are
all 1Gbit. Traffic starts at 20Mbit/sec but like after a minute we are
down to 1Mbit/sec. Wait longer and it gets down to 250Kbit/s. And then
starts to stall.

But I suspect that in my case it is due to a mismatch of 100baseTX on
the remote server, where the connection to the switch should be
1000baseTX. Reboots and all don't cure this, so I'll probably with have
to go over and also kick the switch.

Especially flowcontrol would be "upset" with such a mismatch.

--WjW