From owner-freebsd-net  Fri Jun 28  0:33: 1 2002
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 904A637B401
	for <net@freebsd.org>; Fri, 28 Jun 2002 00:32:52 -0700 (PDT)
Received: from patrocles.silby.com (d116.as7.nwbl0.wi.voyager.net [169.207.128.244])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B337943E06
	for <net@freebsd.org>; Fri, 28 Jun 2002 00:32:50 -0700 (PDT)
	(envelope-from silby@silby.com)
Received: from patrocles.silby.com (localhost [127.0.0.1])
	by patrocles.silby.com (8.12.4/8.12.4) with ESMTP id g5S7ZNcv071336;
	Fri, 28 Jun 2002 02:35:23 -0500 (CDT)
	(envelope-from silby@silby.com)
Received: from localhost (silby@localhost)
	by patrocles.silby.com (8.12.4/8.12.4/Submit) with ESMTP id g5S7ZKYf071333;
	Fri, 28 Jun 2002 02:35:22 -0500 (CDT)
X-Authentication-Warning: patrocles.silby.com: silby owned process doing -bs
Date: Fri, 28 Jun 2002 02:35:20 -0500 (CDT)
From: Mike Silbersack <silby@silby.com>
To: Luigi Rizzo <rizzo@icir.org>
Cc: net@freebsd.org
Subject: Re: interface stalling on tx ?
In-Reply-To: <20020627230348.A54937@iguana.icir.org>
Message-ID: <20020628022611.K70821-100000@patrocles.silby.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org


On Thu, 27 Jun 2002, Luigi Rizzo wrote:

> I thought that upon transmission the driver somehow registered a
> timeout to take care of these events, but maybe I am wrong ?
>
> Have other people seen this problem too ?
>
>         cheers
>         luigi

The watchdog timer code in most of the drivers is rather conservative, and
may not detect mid-transfer stalls.  I'll use the dc driver as an example:

In dc_start, if_timer = 5 is set.  Then, in dc_txeof, if_timer = 0,
disabling the watchdog timer.  This means that after a _single_ frame is
sent, any subsequent stall will not be recovered from by the watchdog.

In the vr driver, we were having problems where such stalls could be
caused by high load, and the ifconfig up / down process was getting
annoying to users.  I worked around this by setting if_timer = 5 every
time vr_txeof was entered, only setting if_timer = 0 at the point when the
_entire_ transmit buffer list was emptied.

(See if_vr.c rev 1.49 to see how I did it in that driver.)

You should be able to do something similar in all of the drivers, and I
have indeed thought of doing so.  Could you code up and test such a patch
for whatever card you are using in your test environment to see if it
is a successful workaround?

Of course, in an ideal world all drivers would recover in a graceful
fashion.  However, taking advantage of the watchdog timer to reset stuck
cards seems like an adequate workaround.  So far, I can't see any downside
to this approach.  If the card never locks up, then the change is
superfluous.  When it does, the change is a lifesaver.

Apologies if parts of this message sound like babbling; I should be
sleeping at this moment in time. :)

Mike "Silby" Silbersack


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message