From owner-freebsd-net@FreeBSD.ORG  Wed Feb  2 18:07:38 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1C872106564A
	for <freebsd-net@freebsd.org>; Wed,  2 Feb 2011 18:07:38 +0000 (UTC)
	(envelope-from seanbru@yahoo-inc.com)
Received: from mrout1-b.corp.re1.yahoo.com (mrout1-b.corp.re1.yahoo.com
	[69.147.107.20])
	by mx1.freebsd.org (Postfix) with ESMTP id E40868FC08
	for <freebsd-net@freebsd.org>; Wed,  2 Feb 2011 18:07:37 +0000 (UTC)
Received: from [127.0.0.1] (rideseveral.corp.yahoo.com [10.73.160.231])
	by mrout1-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id
	p12I7EYa011192; Wed, 2 Feb 2011 10:07:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com;
	s=cobra; t=1296670034;
	bh=YXPFPkjW4DN7z1MsRc7EIdrVRUYIO4zBhOHr7HRSHjQ=;
	h=Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date:
	Message-ID:Mime-Version:Content-Transfer-Encoding;
	b=vSJhyBjm8bDfA9tMU4lLIn76jm0LrCDXPcq5g7er9nR+AfdTf7EqWVxP1TNwSeVzz
	ixP4gX1MgJINIBS+qQrZW2eaux2aX386ALcPmiRwektDxPkNVE4dmnz1H7VZjXep8V
	Q6gBlA3wXHed3goIyA0p5LZmuMmu/Ij5Ggcne0no=
From: Sean Bruno <seanbru@yahoo-inc.com>
To: Mike Carlson <carlson39@llnl.gov>
In-Reply-To: <4D48721A.5040906@llnl.gov>
References: <4D48721A.5040906@llnl.gov>
Content-Type: text/plain; charset="UTF-8"
Date: Wed, 02 Feb 2011 10:07:13 -0800
Message-ID: <1296670033.2286.0.camel@hitfishpass-lx.corp.yahoo.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 (2.32.1-1.fc14) 
Content-Transfer-Encoding: 7bit
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject: Re: A flood of bacula traffic causes igb interface to go offline.
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Feb 2011 18:07:38 -0000

On Tue, 2011-02-01 at 12:50 -0800, Mike Carlson wrote:
> Hey net@,
> 
> I have a FreeBSD 8.2-RC2 system running on a HP DL180 G6, using the 
> onboard Intel controller, and it is our primary Bacula storage node and 
> director node.
> 
> We have 96 clients that are scheduled to run at 8:30pm. After about 9 - 
> 10 minutes of activity (mrtg graphs show about 50-60MB/sec incoming 
> traffic), the igb1 interface is no longer able to communicate with the 
> Cisco switch.
> 
> The interesting part is, the interface is still "up", there is nothing 
> in the kernel message buffer, and nothing relevant in the log file (just 
> syslogd and ldap errors because they cannot reach their respective 
> network servers). The system only responds to the network until I either 
> reboot, or run 'ifconfig igb1 down ;  ifconfig igb1 up'. There is no 
> firewall loaded/configured.
> 
> Thankfully, I have a KVM over IP, so when this happens I can at least 
> run script(1) and capture some useful information.
> ifconfig igb1
> igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>      
> options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
>      ether 1c:c1:de:e9:fb:af
>      inet 128.15.136.105 netmask 0xffffff00 broadcast 128.15.136.255
>      inet 128.15.136.108 netmask 0xffffff00 broadcast 128.15.136.255
>      inet 128.15.136.102 netmask 0xffffff00 broadcast 128.15.136.255
>      media: Ethernet autoselect (1000baseT <full-duplex>)
>      status: active
> 
> I can ping the internal IP (but I realize that is probably a useless
> test...)
> root@write /etc]> ping 128.15.136.105
> PING 128.15.136.105 (128.15.136.105): 56 data bytes
> 64 bytes from 128.15.136.105: icmp_seq=0 ttl=64 time=0.024 ms
> 64 bytes from 128.15.136.105: icmp_seq=1 ttl=64 time=0.015 ms
> ^C
> --- 128.15.136.105 ping statistics ---
> 2 packets transmitted, 2 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.015/0.019/0.024/0.005 ms
> 
> Attempting to ping the router:
> root@write /etc]> ping 128.15.136.254
> PING 128.15.136.254 (128.15.136.254): 56 data bytes
> ping: sendto: Host is down
> ping: sendto: Host is down
> ping: sendto: Host is down
> ping: sendto: Host is down
> ^C
> --- 128.15.136.254 ping statistics ---
> 9 packets transmitted, 0 packets received, 100.0% packet loss
> 
> 
> The only thing that seems to solve this problem is to either reboot, or
> do an "ifconfig down/up":
> 
> root@write /etc]> ifconfig igb1 down
> root@write /etc]> ifconfig igb1
> root@write /etc]> ping 128.15.136.254
> PING 128.15.136.254 (128.15.136.254): 56 data bytes
> 64 bytes from 128.15.136.254: icmp_seq=1 ttl=255 time=1.015 ms
> 64 bytes from 128.15.136.254: icmp_seq=2 ttl=255 time=0.217 ms
> 64 bytes from 128.15.136.254: icmp_seq=3 ttl=255 time=0.278 ms
> 64 bytes from 128.15.136.254: icmp_seq=4 ttl=255 time=0.238 ms
> ^C
> --- 128.15.136.254 ping statistics ---
> 5 packets transmitted, 4 packets received, 20.0% packet loss
> round-trip min/avg/max/stddev = 0.217/0.437/1.015/0.334 ms
> 
> I was able to run tcpdump during all of this, and it *nothing* between 
> the system and the switch until I run ifconfig igb1 down/up, and then 
> you see the CDP and Tree Spanning traffic.
> 
> The networking team here has told me there are no errors on the switch, 
> or the port I am on, and they even moved me from one port to another, 
> but this is still happening on a fairly regular basis now that I've 
> added more backup clients.
> 
> Is this a possible bug with my hardware and the intel driver? I have a 
> pcap file and more system information that might provide a lot more 
> information, but I don't want to send that out to a mailing list.
> _______________________________________________

You may want to pay attention to the current discussions regarding the
intel driver (em and igb).

Can you post the output of lspci -vvv ?

Sean