Date: Tue, 03 Oct 2006 23:18:16 -0600 From: Scott Long <scottl@samsco.org> To: John Marshall <John.Marshall@riverwillow.com.au> Cc: freebsd-stable@freebsd.org Subject: Re: Watchdog Timeout - bge devices Message-ID: <45234418.7000205@samsco.org> In-Reply-To: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au> References: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au>
next in thread | previous in thread | raw e-mail | index | archive | help
John Marshall wrote: > $ dmesg | grep bge > bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem > 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 > miibus1: <MII bus> on bge0 > bge0: Ethernet address: 00:0b:cd:e7:51:ba > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > > I initially pronounced the network cable dead and replaced it. Then I > suspected the FastEthernet switch port and relocated to a different > port. Watchdog timeouts persisted. I concluded that the bge hardware > must be flaky until I read a recent thread on em device watchdog > timeouts which led me to wonder about CPU scheduling. > > The server experiencing the bge timeouts was using SCHED_ULE. I built > 6.2-PRERELEASE on a spare disk and booted the problem server from that > disk - bge problem persisted. > > We have a second (identical) problem-free server configured with > SCHED_4BSD. I reconfigured both machines so that the first machine (now > 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE) > uses SCHED_ULE. Both machines are configured with PREEMPTION. > > +-----------------------------------------------+ > | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES | > +-----------------------------------------------+ > > The machines are hp ProLiant ML110 servers. > > There is nothing sharing the interrupt with the bge device. No USB > drivers are loaded. > > > $ vmstat -i > interrupt total rate > irq1: atkbd0 70 0 > irq6: fdc0 9 0 > irq14: ata0 1234430 6 > irq15: ata1 47 0 > irq17: bge0 17543591 93 > irq26: fxp0 70832 0 > cpu0: timer 376381765 1999 > Total 395230744 2099 > > > $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge > kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006 > > kern.sched.name: ule > kern.sched.slice_min: 10 > kern.sched.slice_max: 142 > kern.sched.preemption: 1 > kern.smp.maxcpus: 1 > kern.smp.active: 0 > kern.smp.disabled: 0 > kern.smp.cpus: 1 > hw.machine: i386 > hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz > dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=4 function=0 > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c > subdevice=0x1654 class=0x020000 > dev.bge.0.%parent: pci4 > > Is there any other information I ought to post to help with diagnosis - > or is this a known problem? (I've only subscribed recently) > > John Marshall. Very interesting data point. I wonder if this accounts for some of the inconsistency in the reporting from others. In any case, SCHED_ULE is still considered to be highly experimental. Hopefully it will get some more attention in the near future to bring it closer to production quality. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45234418.7000205>