From owner-freebsd-current@FreeBSD.ORG Thu Apr 27 21:35:06 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DFFF416A43B; Thu, 27 Apr 2006 21:35:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8293143D45; Thu, 27 Apr 2006 21:35:01 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost (john@localhost [127.0.0.1]) by server.baldwin.cx (8.13.4/8.13.4) with ESMTP id k3RLYrNa085444; Thu, 27 Apr 2006 17:34:56 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-current@freebsd.org Date: Thu, 27 Apr 2006 17:30:46 -0400 User-Agent: KMail/1.9.1 References: <20060328044432.152CD45047@ptavv.es.net> <4428D0EE.6080603@FreeBSD.org> <20060328063930.GC12815@tnn.dglawrence.com> In-Reply-To: <20060328063930.GC12815@tnn.dglawrence.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200604271730.49268.jhb@freebsd.org> X-Virus-Scanned: ClamAV 0.87.1/1428/Thu Apr 27 14:39:31 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.1 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx Cc: Bachilo Dmitry , David Greenman-Lawrence , Sergey Matveychuk Subject: Re: nve0: device timeout (1) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Apr 2006 21:35:10 -0000 On Tuesday 28 March 2006 01:39, David Greenman-Lawrence wrote: > > Bachilo Dmitry wrote: > > > Patch, by the way, was rejected. I have edited if_nve.c by hands, just changed > > > > Yep. It looks as a workaround, not a fix. > > > Right. It's a reasonable work-around, however, so people shouldn't be > afraid of using it. Here is my original message on this subject: > > > In reply to... > > > It doesn't only run into timeouts, during some of these timeout the > > machine or at least the keyboard hangs for about a minute. > > > > Is there anything I can do to help debug this? > > I ran into this problem recently as well and spent some time diagnosing > it. It's not that the cable isn't plugged in - rather it happens whenever > the traffic levels are low. > The problem is that the nvidia-supplied portion of the driver is defering > the releasing of the completed transmit buffers and this occasionally > results in if_timer expiring, causing the driver watchdog routine to be > called ("device timeout"). The watchdog routine resets the card and the > nvidia-supplied code sits in a high-priority loop waiting for the card > to reset. This can take many seconds and your system will be hung until > it completes. > I have a work-around patch for the problem that I've attached to this > email. It simply disables the watchdog. A real fix would involve accounting > for the outstanding transmit buffers differently (or perhaps not at all - > e.g. always attempt to call the nvidia-supplied code and if a queue-full > error occurs, then wait for an interrupt before trying to queue more > transmit packets). What about the patch just posted to amd64@? It looks like a patch for this issue. It changes the watchdog() routine to detect this condition and if it happens exit the routine early without emitting a printf or resetting the chip. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org