From owner-freebsd-stable@FreeBSD.ORG Wed Oct 4 08:03:12 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 447AF16A416 for ; Wed, 4 Oct 2006 08:03:12 +0000 (UTC) (envelope-from stefan.lambrev@sun-fish.com) Received: from sun-fish.com (blah.sun-fish.com [217.18.249.150]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7425543D6D for ; Wed, 4 Oct 2006 08:03:06 +0000 (GMT) (envelope-from stefan.lambrev@sun-fish.com) Received: from sun-fish.com (localhost.cmotd.com [127.0.0.1]) by sun-fish.com (Postfix) with ESMTP id 5A8DD38431 for ; Wed, 4 Oct 2006 10:03:04 +0200 (CEST) Received: from [192.168.3.112] (boar.cmotd.com [192.168.3.112]) by sun-fish.com (Postfix) with ESMTP id ED8683842D for ; Wed, 4 Oct 2006 10:03:03 +0200 (CEST) Message-ID: <45236AB8.3070102@sun-fish.com> Date: Wed, 04 Oct 2006 11:03:04 +0300 From: Stefan Lambrev User-Agent: Thunderbird 1.5.0.7 (X11/20060918) MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au> In-Reply-To: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au> Content-Type: text/plain; charset=windows-1251; format=flowed Content-Transfer-Encoding: 7bit X-AV-Checked: ClamAV Subject: Re: Watchdog Timeout - bge devices X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 08:03:12 -0000 Hi all, I have few servers that have Intel and Broadcom (em&bge) giga NICs running FreeBSD RELENG_6 (from 6.1-R to 6.2-PRERELEASE). And (luckily) there are no such problems like watchdog timeouts. So may be something is different in our configurations, do you want my kernel confs or something else ? I have usb enabled on 3 servers (Serial HUBs, and usb dvd-burners connected), but the load on the servers rarely goes more then 2.x Do you want me to check something else? :) John Marshall wrote: > $ dmesg | grep bge > bge0: mem > 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 > miibus1: on bge0 > bge0: Ethernet address: 00:0b:cd:e7:51:ba > bge0: watchdog timeout -- resetting > bge0: link state changed to DOWN > bge0: link state changed to UP > > I initially pronounced the network cable dead and replaced it. Then I > suspected the FastEthernet switch port and relocated to a different > port. Watchdog timeouts persisted. I concluded that the bge hardware > must be flaky until I read a recent thread on em device watchdog > timeouts which led me to wonder about CPU scheduling. > > The server experiencing the bge timeouts was using SCHED_ULE. I built > 6.2-PRERELEASE on a spare disk and booted the problem server from that > disk - bge problem persisted. > > We have a second (identical) problem-free server configured with > SCHED_4BSD. I reconfigured both machines so that the first machine (now > 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE) > uses SCHED_ULE. Both machines are configured with PREEMPTION. > > +-----------------------------------------------+ > | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES | > +-----------------------------------------------+ > > The machines are hp ProLiant ML110 servers. > > There is nothing sharing the interrupt with the bge device. No USB > drivers are loaded. > > > $ vmstat -i > interrupt total rate > irq1: atkbd0 70 0 > irq6: fdc0 9 0 > irq14: ata0 1234430 6 > irq15: ata1 47 0 > irq17: bge0 17543591 93 > irq26: fxp0 70832 0 > cpu0: timer 376381765 1999 > Total 395230744 2099 > > > $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge > kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct 2 08:36:56 AEST 2006 > > kern.sched.name: ule > kern.sched.slice_min: 10 > kern.sched.slice_max: 142 > kern.sched.preemption: 1 > kern.smp.maxcpus: 1 > kern.smp.active: 0 > kern.smp.disabled: 0 > kern.smp.cpus: 1 > hw.machine: i386 > hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz > dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=4 function=0 > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c > subdevice=0x1654 class=0x020000 > dev.bge.0.%parent: pci4 > > Is there any other information I ought to post to help with diagnosis - > or is this a known problem? (I've only subscribed recently) > > John Marshall. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- Best Wishes, Stefan Lambrev ICQ# 24134177