Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Apr 2006 17:30:46 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Bachilo Dmitry <root@solink.ru>, David Greenman-Lawrence <dg@dglawrence.com>, Sergey Matveychuk <sem@freebsd.org>
Subject:   Re: nve0: device timeout (1)
Message-ID:  <200604271730.49268.jhb@freebsd.org>
In-Reply-To: <20060328063930.GC12815@tnn.dglawrence.com>
References:  <20060328044432.152CD45047@ptavv.es.net> <4428D0EE.6080603@FreeBSD.org> <20060328063930.GC12815@tnn.dglawrence.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 28 March 2006 01:39, David Greenman-Lawrence wrote:
> > Bachilo Dmitry wrote:
> > > Patch, by the way, was rejected. I have edited if_nve.c by hands, just changed 
> > 
> > Yep. It looks as a workaround, not a fix.
> 
> 
>    Right. It's a reasonable work-around, however, so people shouldn't be
> afraid of using it. Here is my original message on this subject:
> 
> 
>   In reply to...
> 
>   > It doesn't only run into timeouts, during some of these timeout the
>   > machine or at least the keyboard hangs for about a minute.
>   > 
>   > Is there anything I can do to help debug this?
> 
>      I ran into this problem recently as well and spent some time diagnosing
>   it. It's not that the cable isn't plugged in - rather it happens whenever
>   the traffic levels are low.
>      The problem is that the nvidia-supplied portion of the driver is defering
>   the releasing of the completed transmit buffers and this occasionally
>   results in if_timer expiring, causing the driver watchdog routine to be
>   called ("device timeout"). The watchdog routine resets the card and the
>   nvidia-supplied code sits in a high-priority loop waiting for the card
>   to reset. This can take many seconds and your system will be hung until
>   it completes.
>      I have a work-around patch for the problem that I've attached to this
>   email. It simply disables the watchdog. A real fix would involve accounting
>   for the outstanding transmit buffers differently (or perhaps not at all -
>   e.g. always attempt to call the nvidia-supplied code and if a queue-full
>   error occurs, then wait for an interrupt before trying to queue more
>   transmit packets).

What about the patch just posted to amd64@?  It looks like a patch for
this issue.  It changes the watchdog() routine to detect this condition
and if it happens exit the routine early without emitting a printf or
resetting the chip.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200604271730.49268.jhb>