Date: Wed, 23 Apr 2008 11:11:27 +0200 From: Luigi Rizzo <rizzo@iet.unipi.it> To: Pyun YongHyeon <pyunyh@gmail.com> Cc: current@FreeBSD.org, bug-followup@FreeBSD.org, yongari@FreeBSD.org Subject: Re: amd64/115126: [nfe] nfe0: watchdog timeout (missed Tx interrupts) -- recovering (UP with SCHED_ULE) Message-ID: <20080423091127.GB36580@onelab2.iet.unipi.it> In-Reply-To: <20080423082240.GF54715@cdnetworks.co.kr> References: <20080422072839.GA85728@onelab2.iet.unipi.it> <20080423082240.GF54715@cdnetworks.co.kr>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 23, 2008 at 05:22:40PM +0900, Pyun YongHyeon wrote: > On Tue, Apr 22, 2008 at 09:28:39AM +0200, Luigi Rizzo wrote: > > related to this bug, i am seeing similar problems with RELENG_7 and amd64, > > with an ASUS M2N-VM DVI motherboard > > http://www.asus.com/products.aspx?modelmenu=1&model=1841&l1=3&l2=101&l3=567&l4=0 > > and an Athlon64-BE2400 dual core CPU . > > > > Under heavy load, e.g. scp-ing a large file over the local network, > > and at the same time doing a buildkernel or building a port, > > and with X11 active (using the 'vesa' xorg driver) > > the network card stalls and doesn't recover - i waited over 10 minutes > > hoping for the watchdog or some timeout to kick in, the only way > > to bring the link back up was > > > > ifconfig nfe0 down ; ifconfig nfe0 up > > dhclient nfe0 > > > > doing only ifconfig down/up or only dhclient did not help, i needed both. ... > Your BIOS may have an option for ASF related one for onboard NIC. > Try toggling that option and see how it goes. ... > Just vague guess, how about disabling MSI/MSI-X in loader.conf? > (hw.nfe.msi_disable = "1", hw.nfe.msix_disable = "1") > If you are using jumbo frame, try disabling it too. > > > Hope this helps... > > > > It would be even better if you can post verbosed boot messages > related wiht nfe(4) and PHY driver. will try to do all the above, but upon further investigation the problem appears even on i386 and really seems related to the receive queue filling up and the condition not being detected due to a race. Things like this used to happen in the past in several network drivers, and there is a comment suggesting the same thing in one of the commit logs for the openbsd nfe driver. So that's the part i am going to investigate (i have strong motivations with 5 such machines in my lab...) My preliminary question is the following: is the 'nfe' driver just an adaptation from some other driver (possibly trying to guess the way the NIC synchronizes with the CPU), or there is someone who carefully studied that specific issue ? cheers luigi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080423091127.GB36580>