Date: Wed, 23 Apr 2008 17:22:40 +0900 From: Pyun YongHyeon <pyunyh@gmail.com> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: current@FreeBSD.org, bug-followup@FreeBSD.org, yongari@FreeBSD.org Subject: Re: amd64/115126: [nfe] nfe0: watchdog timeout (missed Tx interrupts) -- recovering (UP with SCHED_ULE) Message-ID: <20080423082240.GF54715@cdnetworks.co.kr> In-Reply-To: <20080422072839.GA85728@onelab2.iet.unipi.it> References: <20080422072839.GA85728@onelab2.iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 22, 2008 at 09:28:39AM +0200, Luigi Rizzo wrote: > related to this bug, i am seeing similar problems with RELENG_7 and amd64, > with an ASUS M2N-VM DVI motherboard > http://www.asus.com/products.aspx?modelmenu=1&model=1841&l1=3&l2=101&l3=567&l4=0 > and an Athlon64-BE2400 dual core CPU . > > Under heavy load, e.g. scp-ing a large file over the local network, > and at the same time doing a buildkernel or building a port, > and with X11 active (using the 'vesa' xorg driver) > the network card stalls and doesn't recover - i waited over 10 minutes > hoping for the watchdog or some timeout to kick in, the only way > to bring the link back up was > > ifconfig nfe0 down ; ifconfig nfe0 up > dhclient nfe0 > > doing only ifconfig down/up or only dhclient did not help, i needed both. > > vmstat -i says the network card has irq256 (???) and it is not shared with > other devices. Ehci, sound, ohci, ata, and others have low irq numbers > (6, 14, 20, 21, 22), some shared, some not. > > Changing the bios setting for PnP OS from 'yes' to 'no' or viceversa > does not change the situation. > Your BIOS may have an option for ASF related one for onboard NIC. Try toggling that option and see how it goes. > The stall seems related to the presence of other activity - if i > let the bulk scp transfer alone, i get an happy 10-10.5Mbytes/s > (over a 100meg link). > > When the stall occurs, i see no interrupts (vmstat -i counts > for irq256 says the same), > Packets are still transmitted and received on the other side, it's > the rx side of the card that becomes deaf. I don't see any > watchdog timeout or other error messages in /var/log/messages. > > Also, enabling polling does not help getting traffic in > (with a kernel built with DEVICE_POLLING, > doing sysctl kern.polling.enable=1 and "ifconfig nfe0 polling"). > > So i suspect that for some reason the rx ring becomes confused > and does not recover. > Just vague guess, how about disabling MSI/MSI-X in loader.conf? (hw.nfe.msi_disable = "1", hw.nfe.msix_disable = "1") If you are using jumbo frame, try disabling it too. > Hope this helps... > It would be even better if you can post verbosed boot messages related wiht nfe(4) and PHY driver. > cheers > luigi -- Regards, Pyun YongHyeon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080423082240.GF54715>