Date: Tue, 01 Feb 2011 12:50:34 -0800 From: Mike Carlson <carlson39@llnl.gov> To: freebsd-net@freebsd.org Subject: A flood of bacula traffic causes igb interface to go offline. Message-ID: <4D48721A.5040906@llnl.gov>
next in thread | raw e-mail | index | archive | help
Hey net@, I have a FreeBSD 8.2-RC2 system running on a HP DL180 G6, using the onboard Intel controller, and it is our primary Bacula storage node and director node. We have 96 clients that are scheduled to run at 8:30pm. After about 9 - 10 minutes of activity (mrtg graphs show about 50-60MB/sec incoming traffic), the igb1 interface is no longer able to communicate with the Cisco switch. The interesting part is, the interface is still "up", there is nothing in the kernel message buffer, and nothing relevant in the log file (just syslogd and ldap errors because they cannot reach their respective network servers). The system only responds to the network until I either reboot, or run 'ifconfig igb1 down ; ifconfig igb1 up'. There is no firewall loaded/configured. Thankfully, I have a KVM over IP, so when this happens I can at least run script(1) and capture some useful information. ifconfig igb1 igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4> ether 1c:c1:de:e9:fb:af inet 128.15.136.105 netmask 0xffffff00 broadcast 128.15.136.255 inet 128.15.136.108 netmask 0xffffff00 broadcast 128.15.136.255 inet 128.15.136.102 netmask 0xffffff00 broadcast 128.15.136.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active I can ping the internal IP (but I realize that is probably a useless test...) root@write /etc]> ping 128.15.136.105 PING 128.15.136.105 (128.15.136.105): 56 data bytes 64 bytes from 128.15.136.105: icmp_seq=0 ttl=64 time=0.024 ms 64 bytes from 128.15.136.105: icmp_seq=1 ttl=64 time=0.015 ms ^C --- 128.15.136.105 ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.015/0.019/0.024/0.005 ms Attempting to ping the router: root@write /etc]> ping 128.15.136.254 PING 128.15.136.254 (128.15.136.254): 56 data bytes ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ping: sendto: Host is down ^C --- 128.15.136.254 ping statistics --- 9 packets transmitted, 0 packets received, 100.0% packet loss The only thing that seems to solve this problem is to either reboot, or do an "ifconfig down/up": root@write /etc]> ifconfig igb1 down root@write /etc]> ifconfig igb1 root@write /etc]> ping 128.15.136.254 PING 128.15.136.254 (128.15.136.254): 56 data bytes 64 bytes from 128.15.136.254: icmp_seq=1 ttl=255 time=1.015 ms 64 bytes from 128.15.136.254: icmp_seq=2 ttl=255 time=0.217 ms 64 bytes from 128.15.136.254: icmp_seq=3 ttl=255 time=0.278 ms 64 bytes from 128.15.136.254: icmp_seq=4 ttl=255 time=0.238 ms ^C --- 128.15.136.254 ping statistics --- 5 packets transmitted, 4 packets received, 20.0% packet loss round-trip min/avg/max/stddev = 0.217/0.437/1.015/0.334 ms I was able to run tcpdump during all of this, and it *nothing* between the system and the switch until I run ifconfig igb1 down/up, and then you see the CDP and Tree Spanning traffic. The networking team here has told me there are no errors on the switch, or the port I am on, and they even moved me from one port to another, but this is still happening on a fairly regular basis now that I've added more backup clients. Is this a possible bug with my hardware and the intel driver? I have a pcap file and more system information that might provide a lot more information, but I don't want to send that out to a mailing list.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D48721A.5040906>