From owner-freebsd-net@FreeBSD.ORG Sat Apr 26 05:59:09 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 61A0B1065670 for ; Sat, 26 Apr 2008 05:59:09 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.233]) by mx1.freebsd.org (Postfix) with ESMTP id 31CBC8FC19 for ; Sat, 26 Apr 2008 05:59:08 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so2559364rvf.43 for ; Fri, 25 Apr 2008 22:59:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; bh=tukGEHgedY6AJn0UMDGx8RoGE2MIlDhTbrp1jv8qNnk=; b=AJ9PKah7vIo6/tgug2Abx6SAf0/fVQe6l403WL1NMZOT5fJNzsGwdQ3Ipgwl5RAMy+fYFlSkT2Zh1ADvJnsnLr6AeaYvAXhL2CODnd6jGxTT/5+VLlXdRxha74Qo/UHBwQQOJnoL7fjFjjWxFJM7n8cHR1w4avkedII+hKQM6VQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=eshLT7AOcAT8x6j3+j1Us3mMh3+PleIKBZMrd954NArD+RwcMBDMHUoab7P4xeSOYy57beAU/I4y0T4vpcdC/CM64dptgO+9tzTifhUHdNsPVt95bDd1FQMBjHfkDywGQot3c4XrJ+1xjsi1l9kJxzjJKwYMazYksYR2/Sy5TJU= Received: by 10.141.74.17 with SMTP id b17mr868166rvl.234.1209188025543; Fri, 25 Apr 2008 22:33:45 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.google.com with ESMTPS id k2sm3286231rvb.6.2008.04.25.22.33.42 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 25 Apr 2008 22:33:43 -0700 (PDT) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id m3Q5XcuV067902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 26 Apr 2008 14:33:38 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id m3Q5XbFo067901; Sat, 26 Apr 2008 14:33:37 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Sat, 26 Apr 2008 14:33:37 +0900 From: Pyun YongHyeon To: Luigi Rizzo Message-ID: <20080426053337.GE67361@cdnetworks.co.kr> References: <20080425160039.GA65918@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080425160039.GA65918@onelab2.iet.unipi.it> User-Agent: Mutt/1.4.2.1i Cc: current@freebsd.org, net@freebsd.org Subject: Re: 'nfe' stalls (analysis and partial solution) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Apr 2008 05:59:09 -0000 On Fri, Apr 25, 2008 at 06:00:39PM +0200, Luigi Rizzo wrote: > just for the record and the mail archives - i have been experiencing > a lot of unrecovered stalls of the network card with the 'nfe' > driver under heavy load (this was on 7.0-i386 and 7.0-amd64, but > it is hardware related so it cross-platform). > > After 2-3 days of investigation, and with the help of > Pyun YongHyeon (yongari) i finally managed to pin down the > problem and start working on a solution. > > I would be grateful if others can report of similar problems > with the 'nfe' driver so we can see if the patch we can come > up with also fix their problem. > > THE PROBLEM: > under heavy load (e.g. full speed ssh transfers, disk activity, > Xwindows...) causing the receive ring to fill up, it seems that > some nfe-supported cards (at least the MCP67) enter a state where > they stop looking at the ring buffers and drop incoming packets. > > The driver does not recover from the error so you manually have > to 'ifconfig down; ifconfig up' the interface to restart > receiving. > I tried to reprocude this on CK804 MCP9 hardware but nfe(4) recovered successfully from this Rx ring full condition. Of course, I still don't know how to reliably reproduce Rx stalls but just Rx ring full condition doesn't seem to trigger Rx stalls on CK804 MCP9. As Luigi said, it's also possible only some NVIDIA chips can have this issue. If you happen to see this issue please let us know what chip/model you have. The Rx ring full condition could be easily triggered by sending lots of UDP packets with network benchmark programs. In order to increase the possibility of the Rx ring full condition, running buildworld while benchmark test is in progress would certainly trigger the condition. > SOLUTION: > I have not yet determined the exact conditions causing the error, > so as a temporary workaround i am calling nfe_init_locked() every > from the watchdog routine every time a receive error of some kind > is experienced. > > I definitely need to apply stricter checks on the error condition, > but some more extra card reset is certainly better than losing contact > with the machine. Unfortunately there is no documentation on this > behaviour of the card, and the linux driver (forcedeth) has no > error checking/recovery at all so it is of no help. > > cheers > luigi -- Regards, Pyun YongHyeon