From owner-freebsd-stable@FreeBSD.ORG Mon Apr 13 12:55:42 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FC6810656E2 for ; Mon, 13 Apr 2009 12:55:42 +0000 (UTC) (envelope-from jwyatt@rwsystems.net) Received: from extra.rwsystems.net (adsl-99-31-91-20.dsl.rcsntx.sbcglobal.net [99.31.91.20]) by mx1.freebsd.org (Postfix) with ESMTP id AD80F8FC1B for ; Mon, 13 Apr 2009 12:55:41 +0000 (UTC) (envelope-from jwyatt@rwsystems.net) Received: from extra.rwsystems.net (jwyatt@localhost.rwsystems.net [127.0.0.1]) by extra.rwsystems.net (8.13.8/8.13.8) with ESMTP id n3AE3LaX088872; Fri, 10 Apr 2009 09:03:27 -0500 (CDT) (envelope-from jwyatt@rwsystems.net) Received: from localhost (jwyatt@localhost) by extra.rwsystems.net (8.13.8/8.13.8/Submit) with ESMTP id n3AE3KBe088869; Fri, 10 Apr 2009 09:03:21 -0500 (CDT) (envelope-from jwyatt@rwsystems.net) X-Authentication-Warning: extra.rwsystems.net: jwyatt owned process doing -bs Date: Fri, 10 Apr 2009 09:03:20 -0500 (CDT) From: James Wyatt To: Pyun YongHyeon In-Reply-To: <20090410044340.GJ37714@michelle.cdnetworks.co.kr> Message-ID: <20090410083808.F88691@extra.rwsystems.net> References: <20090407120032.633E410656D5@hub.freebsd.org> <20090410044340.GJ37714@michelle.cdnetworks.co.kr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; format=flowed Content-ID: <20090410083808.M88691@extra.rwsystems.net> Content-Disposition: INLINE Cc: xer , freebsd-stable@freebsd.org Subject: Re: watchdog timeout X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Apr 2009 12:55:44 -0000 On Fri, 10 Apr 2009, Pyun YongHyeon wrote: > On Wed, Apr 08, 2009 at 10:41:44AM +0200, xer wrote: >> Hello >> I have some problems with 3Com nics, after a upgrade from 5.5-STABLE to >> 6.4-STABLE. >> >> This machine has two 3com nics (one is LAN other is WAN) and i see too much >> "watchdog timeout" on both cards. >> This on/off up/down on cards, affect the interrupt to clients that are >> downloading from apache web server, especially on large files. >> >> -------------------------------------------- >> xer:/root# dmesg >> xl1: watchdog timeout >> xl1: link state changed to DOWN >> xl1: link state changed to UP >> xl1: watchdog timeout >> xl1: link state changed to DOWN >> xl1: link state changed to UP >> xl1: watchdog timeout >> xl1: link state changed to DOWN >> xl1: link state changed to UP >> --------------------------------------------- [ . . . ] >> As you can see, the cards are 3c905C-TX model. >> Someone told me to change drivers, but i cannot understand this advice. >> I got same errors with same cards but with another mainboard, same problem, >> watchdog appears after an upgrade from 5.4-STABLE to 6.4-STABLE. >> >> I don't think that to change nic's pci slots, will solve the problem, i >> think that maybe change the nics would resolve the matter, but i cannot >> access to both FreeBSD phisically, cause the boxes are too far from me >> (about 3500 km). >> >> I'm asking you some advices, and i can i fix this problem. >> p.s. with both 5.4 or 5.5 old kernel, the nics was fine. > > I vaguely remember there were a couple of reports on xl(4) watchdog > timeouts. I'm not sure this came from missing Tx interrupts but > would you try attached patch? > Note, it was generated against HEAD and you should experiment the > attached patch on local box prior to applying to your production > server. Perhaps the case can convey the amount of hair I lost over this: HAVE YOU CHECKED YOUR BIOS AND ONBOARD IO SETTINGS? I have been swapping boards for days for another firewall project and finally figured it out last night. I would get messages like these, depending on the 3rd card used: xl0: Watchdog Timeout pcn0: Watchdog timeout Finally the Sierra Nevada Porter kicked-in and an old idea came back to me: I was running out of interrupts! 1/2 (^_^) The hint was from a combination of having the earlier advice of "set the 'PNP OS' to false" fail and a Tom's Hardware mobo review complaint about 5-slot PCI mobos having IRQ sharing issues. Thanks to you both wherever you are! Finally I went in and disabled all the onboard IO I wasn't using to free up IRQs. Disable the onboard serial if you aren't using them. If you are, then an 8-port board or USB serial, can save COM1 and COM2 IRQs. No video IRQ needed for simple console work. No parallel needed, so saved that one. No floppy nowadays, either. Lots of options for you! Thinking I hadn't had IRQ issues in 15 years or so, it sure reminded me that we still have the legacy x86 IRQ limitations. This has pushed me to thinking more about shifting to trunked VLANs to save ENet ports and make recovery easier. FWIW: I tend to use different cards so the network ports don't "float" to other numbers if one dies. This made the problem worse because the drivers assign IRQs when they load, so it looked like xl0 cards were the issue when 'de*' and 'dc*' worked. Changing slots changed things too! This explains some of what you're experience shows. There is no way I can give back to this list what it's given me in technical support, but if this makes things work for you, then I've given *something* back and it really is a Good Friday - Jy@