Date: Sun, 23 Apr 2006 13:41:52 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: Lars Erik Gullerud <lerik@nolink.net> Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeouts and dead network on bge - 6.1-RC1 Message-ID: <20060423133913.T56433@fledge.watson.org> In-Reply-To: <20060423114810.P36951@electra.nolink.net> References: <20060423114810.P36951@electra.nolink.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 23 Apr 2006, Lars Erik Gullerud wrote: > We recently upgraded one of our 4.11 servers to 6.1-RC1. The server is a > Dell PE2650, dual Xeons, and has two onboard Broadcom BCM5701 cards, using > the bge driver. > > Some older threads on -net and -current led me to believe that most issues > with bge driver in FreeBSD >4 had been sorted. However, after our upgrade, > we are seing errors like this: There's a Dell 2650 in the FreeBSD netperf cluster. When working with 5.x on the box quite a long time ago, I saw similar problems, in which the network interface stalled and required kicking to reset. Unfortunately, this is not an issue I have time to work on currently, but if it would help a FreeBSD developer track down and debug this problem, I can provide remote access to a box that has had the problem in the past, along with serial console, remote power, and network booting. I'll run some tests on it today and see if that box still has the same problem or not. I've never been entirely convinced it was actually a bge problem as opposed to an interrupt delivery problem, however. Dmesg fragment below. Robert N M Watson Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-CURRENT #1: Sat Jan 29 21:32:42 EST 2005 rwatson@zoo.freebsd.org:/usr/obj/zoo/rwatson/netperf/src/sys/GENERIC WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) XEON(TM) CPU 2.20GHz (2192.90-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf24 Stepping = 4 Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM> real memory = 2147352576 (2047 MB) avail memory = 2096799744 (1999 MB) ACPI APIC Table: <DELL PE2650 > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 6 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic2: Changing APIC ID to 10 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-15 on motherboard ioapic1 <Version 1.1> irqs 16-31 on motherboard ioapic2 <Version 1.1> irqs 32-47 on motherboard ... ACPI APIC Table: <DELL PE2650 > acpi0: <DELL PE2650> on motherboard aac0: <Dell PERC 3/Di> mem 0xf0000000-0xf7ffffff irq 30 at device 8.1 on pci4 ... bge0: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 0xfcd10000-0xfcd1ffff irq 28 at device 6.0 on pci3 miibus0: <MII bus> on bge0 bge0: Ethernet address: 00:06:5b:8e:b9:8d bge1: <Broadcom BCM5701 Gigabit Ethernet, ASIC rev. 0x105> mem 0xfcd00000-0xfcd0ffff irq 29 at device 8.0 on pci3 miibus1: <MII bus> on bge1 bge1: Ethernet address: 00:06:5b:8e:b9:8e > > Apr 22 18:44:01 nebula kernel: bge0: watchdog timeout -- resetting > Apr 22 18:44:01 nebula kernel: bge0: link state changed to DOWN > Apr 22 18:44:03 nebula kernel: bge0: link state changed to UP > > ...and more importantly - when this happens, the network connection does NOT > in fact come back up. Logging into the box locally (or via a different > network interface) and manually issuing "ifconfig bge0 down ; ifconfig bge0 > up" DOES get the interface going again, however. > > We have only seen this on very high network loads - the particular message > included above occured while transferring some 120GB of data from a 4.11 > NFS-server to this 6.1-RC1 box. > > Is this a known issue in bge? If so, is anyone working on it? Can we provide > some useful information to whoever this might be? > > We have never had any issues with bge in 4.x, but we really need to get this > server up to 5.x/6.x at this point in time, any other suggestions on knobs or > workarounds that can give us bge stability? > > Thanks in advance, > > /leg > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060423133913.T56433>