Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Apr 2004 00:17:06 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Palle Girgensohn <girgen@pingpong.net>
Cc:        net@FreeBSD.org
Subject:   Re: sk ethernet driver: watchdog timeout
Message-ID:  <20040407235838.K11719@gamplex.bde.org>
In-Reply-To: <3810000.1081299464@palle.girgensohn.se>
References:  <20240000.1079394807@palle.girgensohn.se> <wpy8q04buf.fsf@heho.snv.jussieu.fr> <3810000.1081299464@palle.girgensohn.se>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 7 Apr 2004, Palle Girgensohn wrote:

> --On onsdag, mars 17, 2004 00.21.44 +0100 "Arno J. Klaassen"
> <arno@heho.snv.jussieu.fr> wrote:
>
> > Hello,
> >
> >> I have an ASUS motherboard A7V8X-E Deluxe with onboard 10/100/1000
> >> Mbit/s NIC from Marvell Semiconductor.
> >>
> >> My problem is that it sometimes lock up with the error message
> >>
> >>  sk0: watchdog timeout
> >
> > I have a similar problem with 3Com cards on an ASUS A7N266;
> > I just post in case this might be related (and in hope for
> > a hint for a solution )
>
> Hi again,
>
> I've since this thread started tried this on more different systems, with
> exactly the same results. Anyone else experiencing this? Anything I can do
> to help fixing it?

The following patch reduces the problem on A7V8X-E a little.  It limits
the tx queue to 1 packet and fixes handling of the timeout on txeof.
The first part probably makes the second part a no-op.  Without this,
my A7V8X-E hangs on even light nfs activity (e.g., copying a 1MB file
to nfs).  With it, it takes heavier nfs activity to hang (makeworld
never completes, and a flood ping always hangs).

I first suspected an interrupt-related bug, but the bug seems to be
more hardware-specific.  Examination of the output queues shows that
the tx sometimes just stops before processing all packets.  Resetting
in sk_watchdog() doesn't always fix the problem, and the timeout usually
stops firing after a couple of unsuccessful resets, giving a completely
hung device.  But the problem may be related to interrupt timing, since
it is much smaller under RELENG_4.  RELENG_4 hangs about as often
without this hack as -current does with it.

nv0 hangs similarly.  fxp0 just works.

%%%
Index: if_sk.c
===================================================================
RCS file: /home/ncvs/src/sys/pci/if_sk.c,v
retrieving revision 1.78
diff -u -2 -r1.78 if_sk.c
--- if_sk.c	31 Mar 2004 12:35:51 -0000	1.78
+++ if_sk.c	1 Apr 2004 07:33:58 -0000
@@ -1830,4 +1830,9 @@
 	SK_IF_LOCK(sc_if);

+	if (sc_if->sk_cdata.sk_tx_cnt > 0) {
+		SK_IF_UNLOCK(sc_if);
+		return;
+	}
+
 	idx = sc_if->sk_cdata.sk_tx_prod;

@@ -1853,4 +1858,5 @@
 		 */
 		BPF_MTAP(ifp, m_head);
+		break;
 	}

@@ -2000,5 +2031,4 @@
 		sc_if->sk_cdata.sk_tx_cnt--;
 		SK_INC(idx, SK_TX_RING_CNT);
-		ifp->if_timer = 0;
 	}

@@ -2007,4 +2037,6 @@
 	if (cur_tx != NULL)
 		ifp->if_flags &= ~IFF_OACTIVE;
+
+	ifp->if_timer = (sc_if->sk_cdata.sk_tx_cnt == 0) ? 0 : 5;

 	return;
%%%

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040407235838.K11719>