From owner-freebsd-stable@FreeBSD.ORG Fri Mar 24 20:55:51 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B28A016A41F for ; Fri, 24 Mar 2006 20:55:51 +0000 (UTC) (envelope-from oberman@es.net) Received: from postal1.es.net (postal1.es.net [198.128.3.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 74CAB43D49 for ; Fri, 24 Mar 2006 20:55:46 +0000 (GMT) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal1.es.net (Postal Node 1) with ESMTP (SSL) id IBA74465; Fri, 24 Mar 2006 12:55:42 -0800 Received: from ptavv.es.net (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id BEB1745047; Fri, 24 Mar 2006 12:55:41 -0800 (PST) To: Ion-Mihai Tetcu In-reply-to: Your message of "Fri, 24 Mar 2006 22:33:17 +0200." <20060324223317.2069564f@it.buh.tecnik93.com> Date: Fri, 24 Mar 2006 12:55:41 -0800 From: "Kevin Oberman" Message-Id: <20060324205541.BEB1745047@ptavv.es.net> Cc: "Bjoern A. Zeeb" , freebsd-stable@freebsd.org, JoaoBR Subject: Re: nve timeout (and down) regression? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Mar 2006 20:55:51 -0000 > Date: Fri, 24 Mar 2006 22:33:17 +0200 > From: Ion-Mihai Tetcu > > On Thu, 23 Mar 2006 14:34:24 -0800 > "Kevin Oberman" wrote: > > > > Date: Thu, 23 Mar 2006 21:59:56 +0000 (UTC) > > > From: "Bjoern A. Zeeb" > > > > > > On Thu, 23 Mar 2006, JoaoBR wrote: > > > > > > > On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: > > > > > > > > nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for > > > > me) in end of dezember > > > > > > > > since a month or so with recent releng_6 the problem came back, timeouts and > > > > stopping rx/tx > > > > > > did you do more updates in the timeframe from december to about a > > > month ago? > > > > > > if the problem was gone and is back now any (exact) dates to narrow > > > down the timeframe where the problem came back would be very helpful. > > nve0: port 0xbc00-0xbc07 mem 0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0 > nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000 > nve0: Ethernet address 00:0a:48:1d:c6:97 > miibus1: on nve0 > nve0: bpf attached > nve0: Ethernet address: 00:0a:48:1d:c6:97 > nve0: [MPSAFE] > > This happens w/o any "real" activity on that interface (which goes into > an Allied Telesyn switch): > ....... > Mar 24 19:39:54 worf kernel: nve0: device timeout (1) > Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN > Mar 24 19:39:55 worf kernel: nve0: link state changed to UP > Mar 24 19:40:14 worf kernel: nve0: device timeout (1) > Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN > Mar 24 19:40:15 worf kernel: nve0: link state changed to UP > Mar 24 19:40:33 worf kernel: nve0: device timeout (2) > Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN > Mar 24 19:40:34 worf kernel: nve0: link state changed to UP > Mar 24 19:45:52 worf kernel: nve0: device timeout (1) > Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN > Mar 24 19:45:53 worf kernel: nve0: link state changed to UP > ......... > > > FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 01:39:15 EET 2006 itetcu@worf.tecnik93.com:/usr/obj/usr/src/sys/GENERIC amd64 Note that we are running on i386 running am an AMD64 platform. I updated my system (which was happy on Feb. 15 code) to March 13 code and I am still running fine. No errors at all. Also, another system was updated to RELENG_6 yesterday and it is also running clean. Again, all systems are identical dual core AMD64 systems running i386 code. (We would like to run amd64, but OpenOffice.org still does not run on it and we need that.) Only the system in Iowa with the AT switch is seeing problems. Even if there is no traffic, it is possible that something that is negotiated by the switch is triggering the problem. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634