Date: Mon, 28 Apr 2008 14:13:21 GMT From: Josh <josh@endries.org> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/123172: Watchdog timeout problems with if_bce Message-ID: <200804281413.m3SEDLgT038973@www.freebsd.org> Resent-Message-ID: <200804281420.m3SEK19s076162@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 123172 >Category: kern >Synopsis: Watchdog timeout problems with if_bce >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Apr 28 14:20:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Josh >Release: 7.0-RELEASE >Organization: >Environment: FreeBSD 7.0-RELEASE/amd64, custom kernel (SMP with SCHED_ULE and MAC) >Description: The machine doesn't lock up, but becomes unusable. The network is completely unusable, and anything involving networking "parts" is also. E.g., if I run ifconfig it locks up my shell. When that happened I logged into another tty and ran "sysctl -a | grep watchdog" and it locked the whole machine up. I couldn't go back to the initial tty, ctrl-alt-del, or anything; had to hard reset. These are the messages in syslog: Apr 28 00:00:04 hathor kernel: bce0: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting! Apr 28 00:00:04 hathor kernel: bce0: link state changed to DOWN Apr 28 00:00:07 hathor kernel: bce0: link state changed to UP Apr 28 00:00:14 hathor kernel: bce1: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting! Apr 28 00:00:14 hathor kernel: bce1: link state changed to DOWN Apr 28 00:00:16 hathor kernel: bce1: link state changed to UP Apr 28 00:00:18 hathor kernel: bce0: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting! Apr 28 00:00:18 hathor kernel: bce0: link state changed to DOWN Apr 28 00:00:21 hathor kernel: bce0: link state changed to UP Apr 28 00:00:23 hathor kernel: bce1: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting! Apr 28 00:00:23 hathor kernel: bce1: link state changed to DOWN Apr 28 00:00:25 hathor kernel: bce1: link state changed to UP Apr 28 00:00:28 hathor kernel: bce0: /jails/src/usr/src/sys/dev/bce/if_bce.c(5244): Watchdog timeout occurred, resetting! .. This just repeats. It seems to happen when there is a significant amount of traffic, possibly based on or more affected by UDP traffic. That machine currently runs a MySQL slave jail and a BIND jail, and it worked fine until I started using BIND, but the slave isn't very bandwidth intensive. It was fine for a few days, then died, and now it seems to die much more often (possibly because BIND is being used). I can't get into it right now to get a uname (it's remote, and broke again a few minutes ago), but I did get a dmesg (below) before it broke. I currently have it set up to use LACP via lagg and vlan devices on top of that. I'm doing some funky things with pf (route-to/reply-to/NAT for jails). I'm going to change it to be more basic: one NIC external and one internal, real jail IPs, to see if that helps any. Unfortunately this is pretty much a showstopper. :( If there are any tests/info/shell/contact info that would help someone work on this please let me know. --- Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RELEASE #0: Tue Mar 24 13:36:33 EDT 2009 root@hathor.production.pyramid:/jails/src/usr/obj/jails/src/usr/src/sys/ULEM AC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2500.11-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xce3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,< b19>> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> Cores per package: 4 usable memory = 8575201280 (8177 MB) avail memory = 8287870976 (7903 MB) ACPI APIC Table: <HP ProLiant> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Mar 24 2009 13:36:25) acpi0: <HP ProLiant> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 cpu0: <ACPI CPU> on acpi0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est0 attach returned 6 p4tcc0: <CPU Frequency Thermal Control> on cpu0 cpu1: <ACPI CPU> on acpi0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est1 attach returned 6 p4tcc1: <CPU Frequency Thermal Control> on cpu1 cpu2: <ACPI CPU> on acpi0 est2: <Enhanced SpeedStep Frequency Control> on cpu2 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est2 attach returned 6 p4tcc2: <CPU Frequency Thermal Control> on cpu2 cpu3: <ACPI CPU> on acpi0 est3: <Enhanced SpeedStep Frequency Control> on cpu3 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est3 attach returned 6 p4tcc3: <CPU Frequency Thermal Control> on cpu3 cpu4: <ACPI CPU> on acpi0 est4: <Enhanced SpeedStep Frequency Control> on cpu4 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est4 attach returned 6 p4tcc4: <CPU Frequency Thermal Control> on cpu4 cpu5: <ACPI CPU> on acpi0 est5: <Enhanced SpeedStep Frequency Control> on cpu5 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est5 attach returned 6 p4tcc5: <CPU Frequency Thermal Control> on cpu5 cpu6: <ACPI CPU> on acpi0 est6: <Enhanced SpeedStep Frequency Control> on cpu6 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est6 attach returned 6 p4tcc6: <CPU Frequency Thermal Control> on cpu6 cpu7: <ACPI CPU> on acpi0 est7: <Enhanced SpeedStep Frequency Control> on cpu7 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 4720472006004720 device_attach: est7 attach returned 6 p4tcc7: <CPU Frequency Thermal Control> on cpu7 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci9: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci9 pci10: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 0.0 on pci10 pci11: <ACPI PCI bus> on pcib3 pcib4: <PCI-PCI bridge> at device 1.0 on pci10 pci14: <PCI bus> on pcib4 pcib5: <PCI-PCI bridge> at device 2.0 on pci10 pci15: <PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 0.3 on pci9 pci16: <ACPI PCI bus> on pcib6 pcib7: <ACPI PCI-PCI bridge> at device 3.0 on pci0 pci6: <ACPI PCI bus> on pcib7 ciss0: <HP Smart Array P400i> port 0x4000-0x40ff mem 0xfde00000-0xfdefffff,0xfdd f0000-0xfddf0fff irq 16 at device 0.0 on pci6 ciss0: [ITHREAD] pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci19: <ACPI PCI bus> on pcib8 pcib9: <PCI-PCI bridge> at device 5.0 on pci0 pci22: <PCI bus> on pcib9 pcib10: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci2: <ACPI PCI bus> on pcib10 pcib11: <ACPI PCI-PCI bridge> at device 0.0 on pci2 pci3: <ACPI PCI bus> on pcib11 bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem 0xf8000000-0xf9ffffff irq 18 at device 0.0 on pci3 miibus0: <MII bus> on bce0 brgphy0: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-F DX, auto bce0: Ethernet address: 00:1f:29:06:d9:e2 bce0: [ITHREAD] bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W (0x01090605) ; Flags( MFW MSI ) pcib12: <ACPI PCI-PCI bridge> at device 7.0 on pci0 pci4: <ACPI PCI bus> on pcib12 pcib13: <ACPI PCI-PCI bridge> at device 0.0 on pci4 pci5: <ACPI PCI bus> on pcib13 bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem 0xfa000000-0xfbffffff irq 19 at device 0.0 on pci5 miibus1: <MII bus> on bce1 brgphy1: <BCM5708C 10/100/1000baseTX PHY> PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-F DX, auto bce1: Ethernet address: 00:1f:29:06:d9:e0 bce1: [ITHREAD] bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W (0x01090605) ; Flags( MFW MSI ) uhci0: <Intel 631XESB/632XESB/3100 USB controller USB-1> port 0x1000-0x101f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: <Intel 631XESB/632XESB/3100 USB controller USB-1> on uhci0 usb0: USB revision 1.0 uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: <Intel 631XESB/632XESB/3100 USB controller USB-2> port 0x1020-0x103f irq 17 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: <Intel 631XESB/632XESB/3100 USB controller USB-2> on uhci1 usb1: USB revision 1.0 uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: <Intel 631XESB/632XESB/3100 USB controller USB-3> port 0x1040-0x105f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: <Intel 631XESB/632XESB/3100 USB controller USB-3> on uhci2 usb2: USB revision 1.0 uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2 uhub2: 2 ports with 2 removable, self powered uhci3: <Intel 631XESB/632XESB/3100 USB controller USB-4> port 0x1060-0x107f irq 19 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] uhci3: [ITHREAD] usb3: <Intel 631XESB/632XESB/3100 USB controller USB-4> on uhci3 usb3: USB revision 1.0 uhub3: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb3 uhub3: 2 ports with 2 removable, self powered ehci0: <Intel 63XXESB USB 2.0 controller> mem 0xf7df0000-0xf7df03ff irq 16 at de vice 29.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb4: waiting for BIOS to give up control usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: <Intel 63XXESB USB 2.0 controller> on ehci0 usb4: USB revision 2.0 uhub4: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb4 uhub4: 8 ports with 8 removable, self powered pcib14: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci1: <ACPI PCI bus> on pcib14 vgapci0: <VGA-compatible display> port 0x3000-0x30ff mem 0xd8000000-0xdfffffff,0 xf7ff0000-0xf7ffffff irq 23 at device 3.0 on pci1 pci1: <base peripheral> at device 4.0 (no driver attached) pci1: <base peripheral> at device 4.2 (no driver attached) uhci4: <UHCI (generic) USB controller> port 0x3800-0x381f irq 22 at device 4.4 o n pci1 uhci4: [GIANT-LOCKED] uhci4: [ITHREAD] usb5: <UHCI (generic) USB controller> on uhci4 usb5: USB revision 1.0 uhub5: <(0x103c) UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb5 uhub5: 2 ports with 2 removable, self powered pci1: <serial bus> at device 4.6 (no driver attached) isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177, 0x376,0x500-0x50f irq 17 at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] acpi_tz0: <Thermal Zone> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse, device ID 3 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xe6000-0xe7fff on isa0 ppc0: cannot reserve I/O port range sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A sio1: [FILTER] vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ukbd0: <HP Virtual Keyboard, class 0/0, rev 1.10/0.02, addr 2> on uhub5 kbd2 at ukbd0 ums0: <HP Virtual Keyboard, class 0/0, rev 1.10/0.02, addr 2> on uhub5 ums0: 3 buttons. uhub6: <HP Virtual Hub, class 9/0, rev 1.10/0.01, addr 3> on uhub5 uhub6: 7 ports with 7 removable, self powered NULL mp in getnewvnode() Timecounters tick every 1.000 msec hptrr: no controller detected. acd0: DVDROM <HL-DT-STDVD-ROM GDR-D10N/3.00> at ata0-master UDMA33 SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #7 Launched! da0 at ciss0 bus 0 target 0 lun 0 da0: <COMPAQ RAID 5 VOLUME OK> Fixed Direct Access SCSI-5 device da0: 135.168MB/s transfers da0: 419946MB (860051248 512 byte sectors: 255H 32S/T 65535C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted bce0: link state changed to UP lagg0: link state changed to UP vlan2: link state changed to UP vlan8: link state changed to UP vlan11: link state changed to UP vlan12: link state changed to UP bce1: link state changed to UP >How-To-Repeat: Not sure yet...generate UDP traffic, it seems. >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200804281413.m3SEDLgT038973>