Date: Wed, 04 Oct 2006 11:03:04 +0300 From: Stefan Lambrev <stefan.lambrev@sun-fish.com> To: freebsd-stable@freebsd.org Subject: Re: Watchdog Timeout - bge devices Message-ID: <45236AB8.3070102@sun-fish.com> In-Reply-To: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au> References: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi all, I have few servers that have Intel and Broadcom (em&bge) giga NICs running FreeBSD RELENG_6 (from 6.1-R to 6.2-PRERELEASE). And (luckily) there are no such problems like watchdog timeouts. So may be something is different in our configurations, do you want my kernel confs or something else ? I have usb enabled on 3 servers (Serial HUBs, and usb dvd-burners connected), but the load on the servers rarely goes more then 2.x Do you want me to check something else? :) John Marshall wrote: > $ dmesg | grep bge > bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem > 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:51:ba > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > > I initially pronounced the network cable dead and replaced it. Then I > suspected the FastEthernet switch port and relocated to a different > port. Watchdog timeouts persisted. I concluded that the bge hardware > must be flaky until I read a recent thread on em device watchdog > timeouts which led me to wonder about CPU scheduling. > > The server experiencing the bge timeouts was using SCHED_ULE. I built > 6.2-PRERELEASE on a spare disk and booted the problem server from that > disk - bge problem persisted. > > We have a second (identical) problem-free server configured with > SCHED_4BSD. I reconfigured both machines so that the first machine (now > 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE) > uses SCHED_ULE. Both machines are configured with PREEMPTION. > > +-----------------------------------------------+ > | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES | > +-----------------------------------------------+ > > The machines are hp ProLiant ML110 servers. > > There is nothing sharing the interrupt with the bge device. No USB > drivers are loaded. > > > $ vmstat -i > interrupt total rate > irq1: atkbd0 70 0 > irq6: fdc0 9 0 > irq14: ata0 1234430 6 > irq15: ata1 47 0 > irq17: bge0 17543591 93 > irq26: fxp0 70832 0 > cpu0: timer 376381765 1999 > Total 395230744 2099 > > > $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge > kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006 > > kern.sched.name: ule > kern.sched.slice_min: 10 > kern.sched.slice_max: 142 > kern.sched.preemption: 1 > kern.smp.maxcpus: 1 > kern.smp.active: 0 > kern.smp.disabled: 0 > kern.smp.cpus: 1 > hw.machine: i386 > hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz > dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=4 function=0 > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c > subdevice=0x1654 class=0x020000 > dev.bge.0.%parent: pci4 > > Is there any other information I ought to post to help with diagnosis - > or is this a known problem? (I've only subscribed recently) > > John Marshall. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- Best Wishes, Stefan Lambrev ICQ# 24134177
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45236AB8.3070102>