Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Oct 2006 23:18:16 -0600
From:      Scott Long <scottl@samsco.org>
To:        John Marshall <John.Marshall@riverwillow.com.au>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Watchdog Timeout - bge devices
Message-ID:  <45234418.7000205@samsco.org>
In-Reply-To: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au>

index | next in thread | previous in thread | raw e-mail

John Marshall wrote:
> $ dmesg | grep bge
> bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem
> 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4
> miibus1: <MII bus> on bge0
> bge0: Ethernet address: 00:0b:cd:e7:51:ba
> bge0: watchdog timeout -- resetting
> bge0: link state changed to DOWN
> bge0: link state changed to UP
> 
> I initially pronounced the network cable dead and replaced it. Then I
> suspected the FastEthernet switch port and relocated to a different
> port. Watchdog timeouts persisted. I concluded that the bge hardware
> must be flaky until I read a recent thread on em device watchdog
> timeouts which led me to wonder about CPU scheduling.
> 
> The server experiencing the bge timeouts was using SCHED_ULE. I built
> 6.2-PRERELEASE on a spare disk and booted the problem server from that
> disk - bge problem persisted.
> 
> We have a second (identical) problem-free server configured with
> SCHED_4BSD. I reconfigured both machines so that the first machine (now
> 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE)
> uses SCHED_ULE. Both machines are configured with PREEMPTION.
> 
> +-----------------------------------------------+
> | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES |
> +-----------------------------------------------+
> 
> The machines are hp ProLiant ML110 servers.
> 
> There is nothing sharing the interrupt with the bge device. No USB
> drivers are loaded.
> 
> 
> $ vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                          70          0
> irq6: fdc0                             9          0
> irq14: ata0                      1234430          6
> irq15: ata1                           47          0
> irq17: bge0                     17543591         93
> irq26: fxp0                        70832          0
> cpu0: timer                    376381765       1999
> Total                          395230744       2099
> 
> 
> $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge
> kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct  2 08:36:56 AEST 2006
> 
> kern.sched.name: ule
> kern.sched.slice_min: 10
> kern.sched.slice_max: 142
> kern.sched.preemption: 1
> kern.smp.maxcpus: 1
> kern.smp.active: 0
> kern.smp.disabled: 0
> kern.smp.cpus: 1
> hw.machine: i386
> hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz
> dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003
> dev.bge.0.%driver: bge
> dev.bge.0.%location: slot=4 function=0
> dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c
> subdevice=0x1654 class=0x020000
> dev.bge.0.%parent: pci4
> 
> Is there any other information I ought to post to help with diagnosis -
> or is this a known problem? (I've only subscribed recently)
> 
> John Marshall.

Very interesting data point.  I wonder if this accounts for some of the
inconsistency in the reporting from others.  In any case, SCHED_ULE is
still considered to be highly experimental.  Hopefully it will get some
more attention in the near future to bring it closer to production
quality.

Scott



home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45234418.7000205>