Date: Fri, 25 Apr 2008 18:00:39 +0200 From: Luigi Rizzo <rizzo@iet.unipi.it> To: net@freebsd.org, current@freebsd.org Subject: 'nfe' stalls (analysis and partial solution) Message-ID: <20080425160039.GA65918@onelab2.iet.unipi.it>
next in thread | raw e-mail | index | archive | help
just for the record and the mail archives - i have been experiencing a lot of unrecovered stalls of the network card with the 'nfe' driver under heavy load (this was on 7.0-i386 and 7.0-amd64, but it is hardware related so it cross-platform). After 2-3 days of investigation, and with the help of Pyun YongHyeon (yongari) i finally managed to pin down the problem and start working on a solution. I would be grateful if others can report of similar problems with the 'nfe' driver so we can see if the patch we can come up with also fix their problem. THE PROBLEM: under heavy load (e.g. full speed ssh transfers, disk activity, Xwindows...) causing the receive ring to fill up, it seems that some nfe-supported cards (at least the MCP67) enter a state where they stop looking at the ring buffers and drop incoming packets. The driver does not recover from the error so you manually have to 'ifconfig down; ifconfig up' the interface to restart receiving. SOLUTION: I have not yet determined the exact conditions causing the error, so as a temporary workaround i am calling nfe_init_locked() every from the watchdog routine every time a receive error of some kind is experienced. I definitely need to apply stricter checks on the error condition, but some more extra card reset is certainly better than losing contact with the machine. Unfortunately there is no documentation on this behaviour of the card, and the linux driver (forcedeth) has no error checking/recovery at all so it is of no help. cheers luigi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080425160039.GA65918>