From owner-freebsd-stable@FreeBSD.ORG Wed May 13 16:44:41 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25BC71065675 for ; Wed, 13 May 2009 16:44:41 +0000 (UTC) (envelope-from byshenknet@byshenk.net) Received: from core.byshenk.net (core.byshenk.net [62.58.73.230]) by mx1.freebsd.org (Postfix) with ESMTP id 978FE8FC08 for ; Wed, 13 May 2009 16:44:40 +0000 (UTC) (envelope-from byshenknet@byshenk.net) Received: from core.byshenk.net (localhost.aoes.com [127.0.0.1]) by core.byshenk.net (8.14.3/8.14.3) with ESMTP id n4DGic48084558 for ; Wed, 13 May 2009 18:44:38 +0200 (CEST) (envelope-from byshenknet@core.byshenk.net) Received: (from byshenknet@localhost) by core.byshenk.net (8.14.3/8.14.3/Submit) id n4DGic4c084557 for freebsd-stable@freebsd.org; Wed, 13 May 2009 18:44:38 +0200 (CEST) (envelope-from byshenknet) Date: Wed, 13 May 2009 18:44:38 +0200 From: Greg Byshenk To: freebsd-stable@freebsd.org Message-ID: <20090513164438.GE67116@core.byshenk.net> References: <20090426125008.GK1550@core.byshenk.net> <20090513164207.GD67116@core.byshenk.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090513164207.GD67116@core.byshenk.net> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on core.byshenk.net Subject: Re: em0 watchdog timeout 7-stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2009 16:44:41 -0000 On Wed, May 13, 2009 at 06:42:07PM +0200, Greg Byshenk wrote: > As a followup to my own previous message, I continue to have annoying > problems with "em?: watchdog timeout" on one of my machines (now running > 7.2-STABLE as of 2009-05-08). > > I have discontinued using the on-board (em, copper) NICs, and replaced > the original fibre NIC with a newer model, but the problem persists. > I've also set > > hw.pci.enable_msix=0 > hw.pci.enable_msi=0 > hw.em.rxd=1024 > hw.em.txd=1024 > net.inet.tcp.tso=0 > > ...as suggested in some discussions of this problem, and set the em1 > interface to 'polling', all to no avail. Frequently, though irregularly > (once or twice a day), the console begins to display > > em1: watchdog timeout -- resetting > em1: watchdog timeout -- resetting > em1: watchdog timeout -- resetting > > the nework is down, and the machine locks up. > > [Note: I am getting 'em1' now instead of 'em0' as previously, but this > is due to changing all of the nics, which led to a different numbering; > the timeout is still occurring on the (main) interface, the fibre > gigabit connection.] > > What is particularly perverse (IMO) is that, since changing the NIC to > the newer model (and updating the kernel), I can no longer break to the > debugger when the lockup occurs (there is no response to the break) -- > bit I _can_ shut the machine down cleanly via hardware (a touch of the > power switch sends 'shutdown', and the machine shuts down cleanly -- > after killing off processes waiting on network i/o). > > The machine is running nfs and samba (3.2.10, from ports), and pretty > much nothing else. > > > Anyone have any ideas about this...? I'm going mad with this. Just as an FYI, the drive errors I described in my previous message appear to have been due to a bad BBU on the RAID controller, and to have been resolved. -- greg byshenk - gbyshenk@byshenk.net - Leiden, NL