From owner-freebsd-net@FreeBSD.ORG Fri Apr 25 16:17:01 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 79CD4106564A for ; Fri, 25 Apr 2008 16:17:01 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.9.129]) by mx1.freebsd.org (Postfix) with ESMTP id 297BF8FC14 for ; Fri, 25 Apr 2008 16:17:00 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 9506F7318D; Fri, 25 Apr 2008 18:00:39 +0200 (CEST) Date: Fri, 25 Apr 2008 18:00:39 +0200 From: Luigi Rizzo To: net@freebsd.org, current@freebsd.org Message-ID: <20080425160039.GA65918@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: Subject: 'nfe' stalls (analysis and partial solution) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Apr 2008 16:17:01 -0000 just for the record and the mail archives - i have been experiencing a lot of unrecovered stalls of the network card with the 'nfe' driver under heavy load (this was on 7.0-i386 and 7.0-amd64, but it is hardware related so it cross-platform). After 2-3 days of investigation, and with the help of Pyun YongHyeon (yongari) i finally managed to pin down the problem and start working on a solution. I would be grateful if others can report of similar problems with the 'nfe' driver so we can see if the patch we can come up with also fix their problem. THE PROBLEM: under heavy load (e.g. full speed ssh transfers, disk activity, Xwindows...) causing the receive ring to fill up, it seems that some nfe-supported cards (at least the MCP67) enter a state where they stop looking at the ring buffers and drop incoming packets. The driver does not recover from the error so you manually have to 'ifconfig down; ifconfig up' the interface to restart receiving. SOLUTION: I have not yet determined the exact conditions causing the error, so as a temporary workaround i am calling nfe_init_locked() every from the watchdog routine every time a receive error of some kind is experienced. I definitely need to apply stricter checks on the error condition, but some more extra card reset is certainly better than losing contact with the machine. Unfortunately there is no documentation on this behaviour of the card, and the linux driver (forcedeth) has no error checking/recovery at all so it is of no help. cheers luigi