From owner-freebsd-stable@FreeBSD.ORG  Tue Aug 29 16:52:38 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 62C8C16A4DD
	for <freebsd-stable@freebsd.org>; Tue, 29 Aug 2006 16:52:38 +0000 (UTC)
	(envelope-from sam@fqdn.net)
Received: from host.fqdn.net (host.fqdn.net [194.242.157.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 121C943D4C
	for <freebsd-stable@freebsd.org>; Tue, 29 Aug 2006 16:52:37 +0000 (GMT)
	(envelope-from sam@fqdn.net)
Received: by host.fqdn.net (Postfix, from userid 1003)
	id 5E2F520C; Tue, 29 Aug 2006 17:52:34 +0100 (BST)
Date: Tue, 29 Aug 2006 17:52:34 +0100
From: Sam Eaton <sam@fqdn.net>
To: freebsd-stable@freebsd.org
Message-ID: <20060829165234.GA15988@host.fqdn.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
Subject: bce0 watchdog timeout errors
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Aug 2006 16:52:38 -0000

I'm still seeing an ongoing problem with the bce device on my Dell 1950.  

I'm running AMD64 6-STABLE, with the stock SMP kernel, and I'm running
the most recent version of the bce driver, which did cure the other
errors we were seeing (the mbuf related ones).

The card is currently connected at an auto-negotiated 100BaseTX full
duplex (rather than gigabit) as we don't currently have a gigabit switch
to test on (the machine is under test rather than deployed).

I can consistently cause the system to go into a 'Watchdog timeout
occurred, resetting!' loop, by trying to do any reasonable amount of
work over an nfs mounted filesystem.  

An easy way to reproduce this for me is to try and build some reasonably
large port on our nfs mounted copy of the ports tree.  

I can also cause this by running bonnie++ against an nfs mounted
filesystem.  

I've so far failed to find some simpler network only test to trigger
the problem (I've tried sshing large amounts of data back and forth,
iperf, ping floods, etc).  NFS seems to do the trick every time though.

Once it's reported the watchdog timeout, the networking on the box never
recovers.

Is anyone else seeing anything similar?  And does anyone have any
suggestions as to what I can do to try and diagnose this further so we
can get to the bottom of it?

Thanks,

Sam.
-- 
"Fortified with Essential Bitterness and Sarcasm"
    Matt Groening, "Binky's Guide to Love".