From owner-freebsd-net@FreeBSD.ORG Wed Feb 2 18:07:38 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C872106564A for ; Wed, 2 Feb 2011 18:07:38 +0000 (UTC) (envelope-from seanbru@yahoo-inc.com) Received: from mrout1-b.corp.re1.yahoo.com (mrout1-b.corp.re1.yahoo.com [69.147.107.20]) by mx1.freebsd.org (Postfix) with ESMTP id E40868FC08 for ; Wed, 2 Feb 2011 18:07:37 +0000 (UTC) Received: from [127.0.0.1] (rideseveral.corp.yahoo.com [10.73.160.231]) by mrout1-b.corp.re1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p12I7EYa011192; Wed, 2 Feb 2011 10:07:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1296670034; bh=YXPFPkjW4DN7z1MsRc7EIdrVRUYIO4zBhOHr7HRSHjQ=; h=Subject:From:To:Cc:In-Reply-To:References:Content-Type:Date: Message-ID:Mime-Version:Content-Transfer-Encoding; b=vSJhyBjm8bDfA9tMU4lLIn76jm0LrCDXPcq5g7er9nR+AfdTf7EqWVxP1TNwSeVzz ixP4gX1MgJINIBS+qQrZW2eaux2aX386ALcPmiRwektDxPkNVE4dmnz1H7VZjXep8V Q6gBlA3wXHed3goIyA0p5LZmuMmu/Ij5Ggcne0no= From: Sean Bruno To: Mike Carlson In-Reply-To: <4D48721A.5040906@llnl.gov> References: <4D48721A.5040906@llnl.gov> Content-Type: text/plain; charset="UTF-8" Date: Wed, 02 Feb 2011 10:07:13 -0800 Message-ID: <1296670033.2286.0.camel@hitfishpass-lx.corp.yahoo.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 (2.32.1-1.fc14) Content-Transfer-Encoding: 7bit Cc: "freebsd-net@freebsd.org" Subject: Re: A flood of bacula traffic causes igb interface to go offline. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Feb 2011 18:07:38 -0000 On Tue, 2011-02-01 at 12:50 -0800, Mike Carlson wrote: > Hey net@, > > I have a FreeBSD 8.2-RC2 system running on a HP DL180 G6, using the > onboard Intel controller, and it is our primary Bacula storage node and > director node. > > We have 96 clients that are scheduled to run at 8:30pm. After about 9 - > 10 minutes of activity (mrtg graphs show about 50-60MB/sec incoming > traffic), the igb1 interface is no longer able to communicate with the > Cisco switch. > > The interesting part is, the interface is still "up", there is nothing > in the kernel message buffer, and nothing relevant in the log file (just > syslogd and ldap errors because they cannot reach their respective > network servers). The system only responds to the network until I either > reboot, or run 'ifconfig igb1 down ; ifconfig igb1 up'. There is no > firewall loaded/configured. > > Thankfully, I have a KVM over IP, so when this happens I can at least > run script(1) and capture some useful information. > ifconfig igb1 > igb1: flags=8843 metric 0 mtu 1500 > > options=1bb > ether 1c:c1:de:e9:fb:af > inet 128.15.136.105 netmask 0xffffff00 broadcast 128.15.136.255 > inet 128.15.136.108 netmask 0xffffff00 broadcast 128.15.136.255 > inet 128.15.136.102 netmask 0xffffff00 broadcast 128.15.136.255 > media: Ethernet autoselect (1000baseT ) > status: active > > I can ping the internal IP (but I realize that is probably a useless > test...) > root@write /etc]> ping 128.15.136.105 > PING 128.15.136.105 (128.15.136.105): 56 data bytes > 64 bytes from 128.15.136.105: icmp_seq=0 ttl=64 time=0.024 ms > 64 bytes from 128.15.136.105: icmp_seq=1 ttl=64 time=0.015 ms > ^C > --- 128.15.136.105 ping statistics --- > 2 packets transmitted, 2 packets received, 0.0% packet loss > round-trip min/avg/max/stddev = 0.015/0.019/0.024/0.005 ms > > Attempting to ping the router: > root@write /etc]> ping 128.15.136.254 > PING 128.15.136.254 (128.15.136.254): 56 data bytes > ping: sendto: Host is down > ping: sendto: Host is down > ping: sendto: Host is down > ping: sendto: Host is down > ^C > --- 128.15.136.254 ping statistics --- > 9 packets transmitted, 0 packets received, 100.0% packet loss > > > The only thing that seems to solve this problem is to either reboot, or > do an "ifconfig down/up": > > root@write /etc]> ifconfig igb1 down > root@write /etc]> ifconfig igb1 > root@write /etc]> ping 128.15.136.254 > PING 128.15.136.254 (128.15.136.254): 56 data bytes > 64 bytes from 128.15.136.254: icmp_seq=1 ttl=255 time=1.015 ms > 64 bytes from 128.15.136.254: icmp_seq=2 ttl=255 time=0.217 ms > 64 bytes from 128.15.136.254: icmp_seq=3 ttl=255 time=0.278 ms > 64 bytes from 128.15.136.254: icmp_seq=4 ttl=255 time=0.238 ms > ^C > --- 128.15.136.254 ping statistics --- > 5 packets transmitted, 4 packets received, 20.0% packet loss > round-trip min/avg/max/stddev = 0.217/0.437/1.015/0.334 ms > > I was able to run tcpdump during all of this, and it *nothing* between > the system and the switch until I run ifconfig igb1 down/up, and then > you see the CDP and Tree Spanning traffic. > > The networking team here has told me there are no errors on the switch, > or the port I am on, and they even moved me from one port to another, > but this is still happening on a fairly regular basis now that I've > added more backup clients. > > Is this a possible bug with my hardware and the intel driver? I have a > pcap file and more system information that might provide a lot more > information, but I don't want to send that out to a mailing list. > _______________________________________________ You may want to pay attention to the current discussions regarding the intel driver (em and igb). Can you post the output of lspci -vvv ? Sean