From owner-freebsd-net@FreeBSD.ORG  Thu Jul  7 06:00:28 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4E323106564A
	for <freebsd-net@freebsd.org>; Thu,  7 Jul 2011 06:00:28 +0000 (UTC)
	(envelope-from spork@bway.net)
Received: from xena.bway.net (xena.bway.net [216.220.96.26])
	by mx1.freebsd.org (Postfix) with ESMTP id 16C2A8FC0A
	for <freebsd-net@freebsd.org>; Thu,  7 Jul 2011 06:00:27 +0000 (UTC)
Received: (qmail 35306 invoked by uid 0); 7 Jul 2011 06:00:27 -0000
Received: from smtp.bway.net (216.220.96.25)
	by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP;
	7 Jul 2011 06:00:27 -0000
Received: (qmail 35298 invoked by uid 90); 7 Jul 2011 06:00:27 -0000
Received: from unknown (HELO ?10.3.2.40?) (spork@bway.net@96.57.144.66)
	by smtp.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP;
	7 Jul 2011 06:00:27 -0000
Date: Thu, 7 Jul 2011 02:00:26 -0400 (EDT)
From: Charles Sprickman <spork@bway.net>
X-X-Sender: spork@freemac
To: YongHyeon PYUN <pyunyh@gmail.com>
In-Reply-To: <20110706201509.GA5559@michelle.cdnetworks.com>
Message-ID: <alpine.OSX.2.00.1107070121060.2407@freemac>
References: <alpine.OSX.2.00.1107042113000.2407@freemac>
	<20110706201509.GA5559@michelle.cdnetworks.com>
User-Agent: Alpine 2.00 (OSX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-net@freebsd.org, David Christensen <davidch@freebsd.org>
Subject: Re: bce packet loss
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2011 06:00:28 -0000

More inline, including a bigger picture of what I'm seeing on some other 
hosts, but I wanted to thank everyone for all the fascinating ethernet BER 
info and the final explanation of what the "IfHCInBadOctets" counter 
represents.  Interesting stuff.

On Wed, 6 Jul 2011, YongHyeon PYUN wrote:

> On Mon, Jul 04, 2011 at 09:32:11PM -0400, Charles Sprickman wrote:
>> Hello,
>>
>> We're running a few 8.1-R servers with Broadcom bce interfaces (Dell R510)
>> and I'm seeing occasional packet loss on them (enough that it trips nagios
>> now and then).  Cabling seems fine as neither the switch nor the sysctl
>> info for the device show any errors/collisions/etc, however there is one
>> odd one, which is "dev.bce.1.stat_IfHCInBadOctets: 539369".  See [1] below
>> for full sysctl output.  The switch shows no errors but for "Dropped
>> packets 683868".
>>
>> pciconf output is also below. [2]
>>
>> By default, the switch had flow control set to "on".  I also let it run
>> with "auto".  In both cases, the drops continued to increment.  I'm now
>> running with flow control off to see if that changes anything.
>>
>> I do see some correlation between cpu usage and drops - I have cpu usage
>> graphed in nagios and cacti is graphing the drops on the dell switch.
>> There's no signs of running out of mbufs or similar.
>>
>> So given that limited info, is there anything I can look at to track this
>> down?  Anything stand out in the stats sysctl exposes?  Two things are
>> standing out for me - the number of changes in bce regarding flow control
>> that are not in 8.1, and the correlation between cpu load and the drops.
>>
>> What other information can I provide?
>>
>
> You had 282 RX buffer shortages and these frames were dropped. This
> may explain why you see occasional packet loss. 'netstat -m' will
> show which size of cluster allocation were failed.

Nothing of note:

0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed

> However it seems you have 0 com_no_buffers which indicates
> controller was able to receive all packets destined for this host.
> You may host lost some packets(i.e. non-zero mbuf_alloc_failed_count)
> but your controller and system was still responsive to the network
> traffic.

OK.  I recall seeing a thread in the -net archives where some folks had 
the "com_no_buffers" incrementing, but I'm not seeing that at all.

> Data sheet says IfHCInBadOctets indicates number of octets received
> on the interface, including framing characters for packets that
> were dropped in the MAC for any reason. I'm not sure this counter
> includes packets IfInFramesL2FilterDiscards which indicates number
> of good frames that have been dropped due to the L2 perfect match,
> broadcast, multicast or MAC control frame filters. If your switch
> runs STP it would periodically sends BPDU packets to destination
> address of STP multicast address 01:80:C2:00:00:00. Not sure this
> is the reason though. Probably David can explain more details on
> IfHCInBadOctets counter(CCed).

Again, thanks for that.

If I could just ask for a bit more assistance, it would be greatly 
appreciated.  I collected a fair bit of data and it's done nothing but 
complicate the issue for me so far.

-If I'm reading the switch stats correctly, most of my drops are 
host->switch, although I'm not certain of that, these Dell 2848s have no 
real cli interface to speak of.

-I'm seeing similar drops, but not quite so bad, on other hosts.  They all 
use the em interface but for one other with bge.  This particular host 
(with the bce interface) just seems to get bad enough to trigger nagios 
alerts (simple ping check from a host on the same switch/subnet).  All 
these hosts are forced to 100/FD as is the switch.  The switch is our 
external (internet facing) switch with a 100Mb connection to our upstream. 
At *peak* our aggregate bandwidth on this switch is maybe 45Mb/s, most of 
it outbound.  We are nowhere near saturating the switching fabric (I 
hope).

-There are three reasons I set the ports at 100baseTX - the old Cisco that 
lost a few ports was a 10/100 switch and the hosts were already hard-coded 
for 100/FD, I figured if the Dell craps out I can toss the Cisco back 
without changing the speed/duplex on all the hosts, and lastly our uplink 
is only 100/FD so why bother.  Also maybe some vague notion that I'd not 
use up some kind of buffers in the switch by matching the speed on all 
ports...

-We have an identical switch (same model, same hardware rev, same 
firmware) for our internal network (lots of log analysis over nfs mounts, 
a ton of internal dns (upwards of 10K queries/sec at peak), and occasional 
large file transfers.  On this host and all others, the dropped packet 
count on the switch ports is at worst around 5000 packets.  The counters 
have not been reset on it and it's been up for 460 days.

-A bunch of legacy servers that have fxp interfaces on the external switch 
and em on the internal switch show *no* significant drops nor do 
the switch ports they are connected to.

-To see if forcing the ports to 100/FD was causing a problem, I set the 
host and switch to 1000/FD.  Over roughly 24 hours, the switch is 
reporting 197346 dropped packets of 52166986 packets received.

-Tonight's change was to turn off spanning tree.  This is a long shot 
based on some Dell bug I saw discussed on their forums.  Given our simple 
network layout, I don't really see spanning tree as being at all 
necessary.

One of the first replies I got to my original post was private and 
amounted to "Dell is garbage".  That may be true, but the excellent 
performance on the more heavily loaded internal network makes me doubt 
there's a fundamental shortcoming in the switch.  It would have to be real 
garbage to crap out with a combined load of 45Mb/s.  I am somewhat curious 
if some weird buffering issue is possible with a mix of 100/FD and 1000/FD 
ports.

Any thoughts on that?  It's the only thing that differs between the two 
switches.

Before replacing the switch I'm also going to cycle through turning off 
TSO, rxcsum, and txcsum since it seems that has been a fix for some people 
with otherwise unexplained network issues.  I assume those features all 
depend on the firmware of the NIC being bug-free, and I'm not quite ready 
to accept that.

Thanks,

Charles