Date: Wed, 02 Feb 2011 10:07:13 -0800 From: Sean Bruno <seanbru@yahoo-inc.com> To: Mike Carlson <carlson39@llnl.gov> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: A flood of bacula traffic causes igb interface to go offline. Message-ID: <1296670033.2286.0.camel@hitfishpass-lx.corp.yahoo.com> In-Reply-To: <4D48721A.5040906@llnl.gov> References: <4D48721A.5040906@llnl.gov>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2011-02-01 at 12:50 -0800, Mike Carlson wrote: > Hey net@, > > I have a FreeBSD 8.2-RC2 system running on a HP DL180 G6, using the > onboard Intel controller, and it is our primary Bacula storage node and > director node. > > We have 96 clients that are scheduled to run at 8:30pm. After about 9 - > 10 minutes of activity (mrtg graphs show about 50-60MB/sec incoming > traffic), the igb1 interface is no longer able to communicate with the > Cisco switch. > > The interesting part is, the interface is still "up", there is nothing > in the kernel message buffer, and nothing relevant in the log file (just > syslogd and ldap errors because they cannot reach their respective > network servers). The system only responds to the network until I either > reboot, or run 'ifconfig igb1 down ; ifconfig igb1 up'. There is no > firewall loaded/configured. > > Thankfully, I have a KVM over IP, so when this happens I can at least > run script(1) and capture some useful information. > ifconfig igb1 > igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > > options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4> > ether 1c:c1:de:e9:fb:af > inet 128.15.136.105 netmask 0xffffff00 broadcast 128.15.136.255 > inet 128.15.136.108 netmask 0xffffff00 broadcast 128.15.136.255 > inet 128.15.136.102 netmask 0xffffff00 broadcast 128.15.136.255 > media: Ethernet autoselect (1000baseT <full-duplex>) > status: active > > I can ping the internal IP (but I realize that is probably a useless > test...) > root@write /etc]> ping 128.15.136.105 > PING 128.15.136.105 (128.15.136.105): 56 data bytes > 64 bytes from 128.15.136.105: icmp_seq=0 ttl=64 time=0.024 ms > 64 bytes from 128.15.136.105: icmp_seq=1 ttl=64 time=0.015 ms > ^C > --- 128.15.136.105 ping statistics --- > 2 packets transmitted, 2 packets received, 0.0% packet loss > round-trip min/avg/max/stddev = 0.015/0.019/0.024/0.005 ms > > Attempting to ping the router: > root@write /etc]> ping 128.15.136.254 > PING 128.15.136.254 (128.15.136.254): 56 data bytes > ping: sendto: Host is down > ping: sendto: Host is down > ping: sendto: Host is down > ping: sendto: Host is down > ^C > --- 128.15.136.254 ping statistics --- > 9 packets transmitted, 0 packets received, 100.0% packet loss > > > The only thing that seems to solve this problem is to either reboot, or > do an "ifconfig down/up": > > root@write /etc]> ifconfig igb1 down > root@write /etc]> ifconfig igb1 > root@write /etc]> ping 128.15.136.254 > PING 128.15.136.254 (128.15.136.254): 56 data bytes > 64 bytes from 128.15.136.254: icmp_seq=1 ttl=255 time=1.015 ms > 64 bytes from 128.15.136.254: icmp_seq=2 ttl=255 time=0.217 ms > 64 bytes from 128.15.136.254: icmp_seq=3 ttl=255 time=0.278 ms > 64 bytes from 128.15.136.254: icmp_seq=4 ttl=255 time=0.238 ms > ^C > --- 128.15.136.254 ping statistics --- > 5 packets transmitted, 4 packets received, 20.0% packet loss > round-trip min/avg/max/stddev = 0.217/0.437/1.015/0.334 ms > > I was able to run tcpdump during all of this, and it *nothing* between > the system and the switch until I run ifconfig igb1 down/up, and then > you see the CDP and Tree Spanning traffic. > > The networking team here has told me there are no errors on the switch, > or the port I am on, and they even moved me from one port to another, > but this is still happening on a fairly regular basis now that I've > added more backup clients. > > Is this a possible bug with my hardware and the intel driver? I have a > pcap file and more system information that might provide a lot more > information, but I don't want to send that out to a mailing list. > _______________________________________________ You may want to pay attention to the current discussions regarding the intel driver (em and igb). Can you post the output of lspci -vvv ? Sean
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1296670033.2286.0.camel>