From owner-freebsd-net@FreeBSD.ORG Tue Apr 22 14:47:44 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 287D037B401 for ; Tue, 22 Apr 2003 14:47:44 -0700 (PDT) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BE4E43FBF for ; Tue, 22 Apr 2003 14:47:43 -0700 (PDT) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id ; Tue, 22 Apr 2003 17:47:41 -0400 Message-ID: From: Don Bowman To: 'John Polstra' , net@freebsd.org Date: Tue, 22 Apr 2003 17:47:32 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Subject: RE: em net (optical GigE) driver hangs? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Apr 2003 21:47:44 -0000 From: John Polstra [mailto:jdp@polstra.com] > Sent: April 22, 2003 16:12 > To: net@freebsd.org > Subject: Re: em net (optical GigE) driver hangs? > > > In article > , > Dave Dolson wrote: > > > > Has anyone experienced em interface hangs after approx > several days of heavy > > operation? > > > > We are using a system which is mostly RELENG_4_7, using > multiple optical em > > GigE devices. > > > > The symptom is that the interface stops transmitting or > receiving, reporting > > drops on output (no tx descriptors) and input errors (MPC > stat-->no receive > > descriptors). > > > > It turns out that all but 64 transmit descriptors are in > use. The driver is > > waiting for the "done" flag to be set so it can clean the > descriptors. > > The device is also in the OACTIVE state at this time. > > > > After the interface is brought down (or unplugged), the em > watchdog timer > > goes off 5s later. > > > > We are trying to figure out two things: > > 1. why did the driver lock up? > > 2. why didn't the watchdog timer go off earlier? > > > > I think we would be happy to solve #2 given the rarity of the event. > > Is the RELENG_4 version likely to fix the problem? > > I think the RELENG_4 version is likely to eliminate the problem. See > the comment near the define of EM_RDTR in if_em.h (in the RELENG_4 > version of that file, of course). We saw that, but we are using DEVICE_POLLING, so assumed it was not the issue. We think instead its another problem, which is also solved in the RELENG_4 driver, in that em_poll() calls em_start() if device is running and there are pkts on the queue. em_start() re-arms the timer, holding off the wdog forever. --don