Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Jul 2011 00:09:45 -0400 (EDT)
From:      Charles Sprickman <spork@bway.net>
To:        David Christensen <davidch@broadcom.com>
Cc:        YongHyeon PYUN <pyunyh@gmail.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, David Christensen <davidch@freebsd.org>
Subject:   RE: bce packet loss
Message-ID:  <alpine.OSX.2.00.1107112234460.1070@freemac>
In-Reply-To: <5D267A3F22FD854F8F48B3D2B523819385F12FE86B@IRVEXCHCCR01.corp.ad.broadcom.com>
References:  <alpine.OSX.2.00.1107042113000.2407@freemac> <20110706201509.GA5559@michelle.cdnetworks.com> <alpine.OSX.2.00.1107070121060.2407@freemac> <20110707174233.GB8702@michelle.cdnetworks.com> <alpine.OSX.2.00.1107072129310.2407@freemac> <5D267A3F22FD854F8F48B3D2B523819385C32D96B7@IRVEXCHCCR01.corp.ad.broadcom.com> <alpine.OSX.2.00.1107082009350.1070@freemac> <5D267A3F22FD854F8F48B3D2B523819385F12FE86B@IRVEXCHCCR01.corp.ad.broadcom.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 11 Jul 2011, David Christensen wrote:

>> I'm running 8.1 and at least on the bce hosts, it looks like flow
>> control
>> isn't supported, it was added on 4/30/2010:
>>
>> http://svnweb.freebsd.org/base/head/sys/dev/bce/if_bce.c?r1=206268&r2=20
>> 7411
>>
>> In my 8.1 sources I still see this comment, which was removed in the
>> above
>> commit:
>>          /* ToDo: Enable flow control support in brgphy and bge. */
>
> This really applies to whether the user can set flow control
> manually.  By default the NIC should auto-negotiate link speed
> and flow-control which is the most common case.  For example,
> you can't set RX flow control and disable TX flow control with
> ifconfig using the current implementation, though it is possible
> in Linux with ethtool.

OK, well that explains alot.  I've had it hammered into my brain over the 
years that for servers it's always best to set link speed and duplex 
manually at both ends to remove any possible issues with link negotiation. 
This advice was from back when FE was still new, and I recall 
autonegotiation causing issues, I believe specifically with some vintage 
Cisco switches.

>> So at least on the bce hosts (and bge it seems), I do not have flow
>> control available on the NIC.
>
> Flow control will be set according to auto-negotiation results.
> For most cases that means flow control will be enabled since
> both sides normally support it.

It sounds like I'm causing myself trouble here by not letting everything
autonegotiate.  I'll move things to auto and see what happens.B

>> The sysctl stats do show that it's
>> received
>> "XON/XOFF" frames, which I assume are flow control messages, but there's
>> no indication that the NIC does anything with them.
>
> There won't be any indication in the driver since flow control
> is managed in hardware.  You'd need a wire capture to see that
> bce(4) has stopped sending frames in response to receiving an
> XOFF flow control frame or started sending frames in response
> to receiving an XON flow control frame.

Ah.  I was hoping for something in the ifconfig output.  I'll see if 
tcpdump and wireshark can tell me anything about this host.

One the one host (w/bce) I just set to full auto, the switch claims to 
have negotiated 1000FD w/flow control (this specifically shows as 
"auto+enabled" on the switch side).

I see that the "sysctl dev.bce.1" tree has some info, and I can see that 
the NIC is receiving flow control frames:

dev.bce.1.stat_XonPauseFramesReceived: 16638
dev.bce.1.stat_XoffPauseFramesReceived: 17239

These lines are a bit puzzling though:

dev.bce.1.stat_FlowControlDone: 0
dev.bce.1.stat_XoffStateEntered: 0

>>>> We are running 8.1, am I correct in that flow control is not
>> implemented
>>>> there?  We do have an 8.2-STABLE image from a month or so ago that we
>>>> are
>>>> testing with zfs v28, might that implement flow control?
>>>
>>> Flow control will depend on the NIC driver implementation.  Older
>>> versions of the bce(4) firmware will rarely generate pause frames
>>> (frames would be dropped by firmware but statistics should show
>>> the frame drop occurring) and should always honor pause frames
>>> from the link partner when flow control is enabled.
>>
>> I think my nics probably lack it.  I am also guessing that if any
>> high-traffic host ignores flow control frames, that's going to screw up
>> other hosts as well since the one causing the buffers to fill is not
>> going
>> to throttle and the overflow will continue, correct?
>
> Flow control is asymmetric and operates independently in both
> directions.  If the traffic source ignores flow control frames
> or did not auto-negotiate flow control then it can certainly
> overwhelm the switch or traffic sink's buffers, causing frame
> drop and retransmits.

I ran a quick scp of a large file to another host with 100Mb connectivity
and those xon/xoff counters incremented, but they were doing that
previously.  I assume that confirms the switch is at least asking for a
pause. I still saw about 5000 dropped ingress packets on the switch, but I 
assume that could be due to some other host filling the buffers.

>>
>>>>
>>>> Although reading this:
>>>>
>>>> http://en.wikipedia.org/wiki/Ethernet_flow_control
>>>>
>>>> It sounds like flow control is not terribly optimal since it forces
>> the
>>>> host to block all traffic.  Not sure if this means drops are
>> eliminated,
>>>> reduced or shuffled around.
>
> Frame drops should be eliminated, though congestion could
> spread upstream to other devices which don't have flow control
> and result in frame drops and retransmits there.
>
>>> When congestion is detected the switch should buffer up to a certain
>>> limit (say 80% of full) and then start sending pause frames to avoid
>>> dropping frames.  This will affect all hosts connecting through the
>>> switch so congestion at one host can spread to other hosts (see
>>>
>> http://www.ieee802.org/3/cm_study/public/september04/thaler_3_0904.pdf).
>>
>> Wow.  I did not catch that.  I do recall something about the flow
>> control
>> frames being multicast - so every host gets them and pauses.  That's...
>> interesting, isn't it?
>
> Pause frames are multicast frames but they are only transmitted
> between link partners (NIC to switch) and never sent further in
> the network.  Flow control is intended to be a local behavior but
> the link indicates it can have an unintended global effect.

Interesting.  Thanks very much for the information on auto-negotiation - 
it was totally unclear to me that I'd basically been disabling flow 
control by manuallying configuring the interface.

Thanks,

Charles

> Dave
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.1107112234460.1070>