From owner-freebsd-stable@FreeBSD.ORG Sat Sep 9 00:13:26 2006 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1076516A40F; Sat, 9 Sep 2006 00:13:26 +0000 (UTC) (envelope-from barney@databus.com) Received: from mail1.acecape.com (mail1.acecape.com [66.114.74.12]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9381343D46; Sat, 9 Sep 2006 00:13:25 +0000 (GMT) (envelope-from barney@databus.com) Received: from pit.databus.com (pool-72-89-128-62.nycmny.fios.verizon.net [72.89.128.62]) (authenticated bits=0) by mail1.acecape.com (8.13.7/8.13.7) with ESMTP id k890DOxR002333 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 8 Sep 2006 20:13:24 -0400 Received: from pit.databus.com (localhost [127.0.0.1]) by pit.databus.com (8.13.8/8.13.8) with ESMTP id k890DNe5006920; Fri, 8 Sep 2006 20:13:23 -0400 (EDT) (envelope-from barney@pit.databus.com) Received: (from barney@localhost) by pit.databus.com (8.13.8/8.13.8/Submit) id k890DNAn006919; Fri, 8 Sep 2006 20:13:23 -0400 (EDT) (envelope-from barney) Date: Fri, 8 Sep 2006 20:13:23 -0400 From: Barney Wolff To: Gleb Smirnoff Message-ID: <20060909001323.GA96663@pit.databus.com> References: <20060905183352.GA56243@pit.databus.com> <20060908172543.GL40020@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060908172543.GL40020@FreeBSD.org> User-Agent: Mutt/1.5.11 X-Scanned-By: MIMEDefang 2.56 on 72.89.128.62 Cc: stable@FreeBSD.org Subject: Re: em watchdog timeout on UP, 6-stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Sep 2006 00:13:26 -0000 On Fri, Sep 08, 2006 at 09:25:43PM +0400, Gleb Smirnoff wrote: > Barney, > > On Tue, Sep 05, 2006 at 02:33:52PM -0400, Barney Wolff wrote: > B> Updated my Athlon-xp 6-stable system last night, got an em watchdog > B> timeout for the first time a few hours later, during a fairly > B> high-traffic period. System is UP but does have device apic in > B> the config. Any chance this is the recent race condition? > B> Workaround? ifconfig em0 down, ifconfig em0 up seemed to cure it, > B> at least for the moment. > > Not clear from your mail whether interface was working after the > event occured. In the watchdog timer case it was not. Looking further, I had several cases where nfs-over-tcp failed under heavy load, but the interface did not report failure and continued to work. The system sending nfs writes logged "nfs send error 35" and gzip died with "resource temporarily unavailable". (I haven't looked at the code - EAGAIN?) In the watchdog timer case the cpu was very busy with portbuilding and the system was receiving nfs writes. But the nfs failures happened in both directions (I have two systems which back up each other, at different times). Before updating from a 6/14/06 6-stable to 9/04/06, such nfs failures were unknown unless I tried to run both backups simultaneously. Systems are on a cheap netgear gb switch, other system is current but a couple of months old. After the watchdog timer, the link was unidirectional - sending worked (packets were correctly received on the other system) but receiving did not work. Then, after another 9 minutes, it seemed to stop working in either direction, until manually down/up'd hours later. I can put logs on a webserver if that would be useful. -- Barney Wolff I never met a computer I didn't like.