Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Jul 2011 00:25:29 +0200
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-net@freebsd.org
Subject:   Re: igb enable_aim or flow_control causing tcp stalls?
Message-ID:  <4E24B2D9.3000706@digiware.nl>
In-Reply-To: <4E2186A5.4040707@digiware.nl>
References:  <379885BA631F4C7787C24E00A174B429@multiplay.co.uk> <4E2186A5.4040707@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On 16-7-2011 14:40, Willem Jan Withagen wrote:
> On 15-7-2011 18:47, Steven Hartland wrote:
>> Been trying to identify an strange network stalling issue while using
>> scp or rsync between two machines, initially at remote locations.
>>
>> The behaviour has proved quite difficult to track as it seems to require a
>> number or factors combined before the stalls occur. These seem to be:
>> 1. This particular target machine
>> 2. Some load, but not much on the machine, when idle we don't see stalls.
>> 3. Remote 9ms+ latency or high through put 50MB/s transmission speeds
>>
>> My current test case is copying a freebsd iso from a local machine to
>> the potentially problematic machine's /dev/null e.g.
>> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso test1:/dev/null
>>
>> These machines are connected via a cisco 6509 -> supermicro blade
>> chassis.
>>
>> When the failure happens we see the following:-
>> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null
>> FreeBSD-8.2-RELEASE-amd64-disc1.iso   21%  147MB   2.1MB/s - stalled -
>>
>> When all is well we see:-
>> scp FreeBSD-8.2-RELEASE-amd64-disc1.iso amsbld16:/dev/null
>> FreeBSD-8.2-RELEASE-amd64-disc1.iso   100%  691MB  53.1MB/s   00:13
>>
>> This setup:-
>> 1. Source machine 7.0-RELEASE-p2 using em0
>> em0@pci0:6:0:0: class=0x020000 card=0x109615d9 chip=0x10968086 rev=0x01
>> hdr=0x00
>>    vendor     = 'Intel Corporation'
>>    device     = 'PRO/1000 EB Network Connection'
>>    class      = network
>>    subclass   = ethernet
>> 2. Target (problem) machine 8.2-RELEASE using igb0
>> igb0@pci0:5:0:0:        class=0x020000 card=0x10e715d9 chip=0x10e78086
>> rev=0x01 hdr=0x00
>>    vendor     = 'Intel Corporation'
>>    class      = network
>>    subclass   = ethernet
>>
>> I've tried switching to igb1 with no change, which also changes
>> switches and hence ports on the Cisco, so I don't at this point
>> believe there is an issue there.
>>
>> Now I've just noticed that igb has at least two sysctl's which
>> seemed interesting, enable_aim & flow_control (which is missing
>> from the man page btw). On disabling both, the stalls seem to go away.
>>
>> Unfortunately re-enabling them didn't re-introduce the stalls, but
>> this could another quirk when they don't re-enable properly?
>>
>> So the questions are:-
>> 1. Could either of these settings cause tcp stalls?
>> 2. If the nic and switch differ in flow control, what is the likely
>> effect?
>> 3. Any other thoughts?
> 
> I'm having more or less the same problems with a remote server with an
> em0 device running 7.2 (just upgraded to 7.4) and a 8.2 system.....
> Connection is limited on the way by 100Mbit-link, but other paths are
> all 1Gbit. Traffic starts at 20Mbit/sec but like after a minute we are
> down to 1Mbit/sec. Wait longer and it gets down to 250Kbit/s. And then
> starts to stall.
> 
> But I suspect that in my case it is due to a mismatch of 100baseTX on
> the remote server, where the connection to the switch should be
> 1000baseTX. Reboots and all don't cure this, so I'll probably with have
> to go over and also kick the switch.
> 
> Especially flowcontrol would be "upset" with such a mismatch.

Just as a followup, it was indeed bad networking.
One of the hubs building the 100Mb bit had thrown a fit, and decided to
run things a mock. Taking it out, solved the matter.

Now we're running again 100Mbit FTPs...

--WjW





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E24B2D9.3000706>