From owner-freebsd-bugs@FreeBSD.ORG Thu Apr 27 03:50:16 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 16E2716A40E for ; Thu, 27 Apr 2006 03:50:16 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9EC4443D45 for ; Thu, 27 Apr 2006 03:50:15 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k3R3oF5g005495 for ; Thu, 27 Apr 2006 03:50:15 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k3R3oFOb005492; Thu, 27 Apr 2006 03:50:15 GMT (envelope-from gnats) Resent-Date: Thu, 27 Apr 2006 03:50:15 GMT Resent-Message-Id: <200604270350.k3R3oFOb005492@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Nathan Whitehorn Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF87916A401 for ; Thu, 27 Apr 2006 03:40:07 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8CF6643D45 for ; Thu, 27 Apr 2006 03:40:07 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k3R3e7wa043238 for ; Thu, 27 Apr 2006 03:40:07 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id k3R3e7ac043237; Thu, 27 Apr 2006 03:40:07 GMT (envelope-from nobody) Message-Id: <200604270340.k3R3e7ac043237@www.freebsd.org> Date: Thu, 27 Apr 2006 03:40:07 GMT From: Nathan Whitehorn To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Cc: Subject: kern/96391: Device timeouts on nve(4) [PATCH] X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Apr 2006 03:50:16 -0000 >Number: 96391 >Category: kern >Synopsis: Device timeouts on nve(4) [PATCH] >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Apr 27 03:50:14 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Nathan Whitehorn >Release: 6.1-RC >Organization: University of Chicago >Environment: FreeBSD munuc.uchicago.edu 6.1-RC FreeBSD 6.1-RC #9: Wed Apr 26 22:02:06 CDT 2006 root@munuc.uchicago.edu:/usr/obj/usr/src/sys/MUNUC amd64 >Description: On some systems with nVidia NICs, especially nForce4, nve(4) reports frequent device timeouts (every 5-10 minutes) under low load. This seems to result, as per a note in the forcedeth source, from the nve MAC randomly failing to send tx acknowledgement interrupts. Under load, tx interrupts from other packets or rx interrupts will cause the interrupt routine to run and register the packet transmit notification. Under low load, the watchdog timer will expire before this happens, causing a device timeout and a MAC reset, which also briefly hangs the machine. >How-To-Repeat: Place an affected nve controller on a low-traffic network and watch the errors come rolling in. >Fix: We can fix the problem by calling the nVidia HAL's interrupt service routine from the nve_watchdog(), in effect causing an interrupt to occur if we're expecting one and it hasn't shown up yet. If the pending transmits counter is still non-zero, we conclude, as before, that the NIC has crashed and reset it, but we can just continue on our way if the problem is now resolved. --- if_nve_original.c Wed Apr 26 22:23:14 2006 +++ if_nve.c Wed Apr 26 21:52:34 2006 @@ -1270,6 +1270,18 @@ nve_watchdog(struct ifnet *ifp) { struct nve_softc *sc = ifp->if_softc; + + NVE_LOCK(sc); + /* Check for lost interrupts -- happens on nForce4 */ + sc->hwapi->pfnDisableInterrupts(sc->hwapi->pADCX); + sc->hwapi->pfnHandleInterrupt(sc->hwapi->pADCX); + sc->hwapi->pfnEnableInterrupts(sc->hwapi->pADCX); + + if (sc->pending_txs == 0) { + NVE_UNLOCK(sc); + return; /* Problem went away */ + } + NVE_UNLOCK(sc); device_printf(sc->dev, "device timeout (%d)\n", sc->pending_txs); >Release-Note: >Audit-Trail: >Unformatted: