From owner-freebsd-net@FreeBSD.ORG Wed Jun 16 13:36:27 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AFC7E16A4CE for ; Wed, 16 Jun 2004 13:36:27 +0000 (GMT) Received: from mail.borderware.com (mail.borderware.com [207.236.65.231]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2BBE043D41 for ; Wed, 16 Jun 2004 13:36:27 +0000 (GMT) (envelope-from fming@borderware.com) Message-ID: <40D04C08.2080703@borderware.com> Date: Wed, 16 Jun 2004 09:32:56 -0400 From: ming fu User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040413 Debian/1.6-5 X-Accept-Language: en MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: FreeBSD em ether driver lockup X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2004 13:36:27 -0000 Hi, I have experienced em driver lockup. One of a port on a multi-port Intel Gigbit card would lockup. It can be unlocked by #ifconfig em2 down #ifconfig em2 up I beleive I have been hit by the same bug reported as kern/66634 Looking through the em driver code, I noticed the watchdog function is somewhat strange: static void em_watchdog(struct ifnet *ifp) { struct adapter * adapter; adapter = ifp->if_softc; /* If we are in this routine because of pause frames, then * don't reset the hardware. */ if (E1000_READ_REG(&adapter->hw, STATUS) & E1000_STATUS_TXOFF) { ifp->if_timer = EM_TX_TIMEOUT; return; } if (em_check_for_link(&adapter->hw)) printf("em%d: watchdog timeout -- resetting\n", adapter->unit); ifp->if_flags &= ~IFF_RUNNING; em_stop(adapter); em_init(adapter); ifp->if_oerrors++; return; } Would the if (E1000_READ_REG(&adapter->hw, STATUS) & E1000_STATUS_TXOFF) ever be false on a configured device? I checked several other watchdog function of different ether device drivers (fxp, bge). All pretty much go straight to stop / init the device. I think the watchdog is the last attempt the kernel try to bring back an interface, why subject this desperate action to a bit on the device's hardware? The hardware could be insane at the moment. Is there a suggestion on how to trigger the watchdog to be called. It is really time consuming to diagnose this as it takes hours or dates for the em to lockup once. Regards, Ming