From owner-freebsd-net@FreeBSD.ORG Sun Apr 23 12:41:53 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B56D116A401 for ; Sun, 23 Apr 2006 12:41:53 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5FC3243D49 for ; Sun, 23 Apr 2006 12:41:53 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id DC22346C16; Sun, 23 Apr 2006 08:41:52 -0400 (EDT) Date: Sun, 23 Apr 2006 13:41:52 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Lars Erik Gullerud In-Reply-To: <20060423114810.P36951@electra.nolink.net> Message-ID: <20060423133913.T56433@fledge.watson.org> References: <20060423114810.P36951@electra.nolink.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeouts and dead network on bge - 6.1-RC1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Apr 2006 12:41:53 -0000 On Sun, 23 Apr 2006, Lars Erik Gullerud wrote: > We recently upgraded one of our 4.11 servers to 6.1-RC1. The server is a > Dell PE2650, dual Xeons, and has two onboard Broadcom BCM5701 cards, using > the bge driver. > > Some older threads on -net and -current led me to believe that most issues > with bge driver in FreeBSD >4 had been sorted. However, after our upgrade, > we are seing errors like this: There's a Dell 2650 in the FreeBSD netperf cluster. When working with 5.x on the box quite a long time ago, I saw similar problems, in which the network interface stalled and required kicking to reset. Unfortunately, this is not an issue I have time to work on currently, but if it would help a FreeBSD developer track down and debug this problem, I can provide remote access to a box that has had the problem in the past, along with serial console, remote power, and network booting. I'll run some tests on it today and see if that box still has the same problem or not. I've never been entirely convinced it was actually a bge problem as opposed to an interrupt delivery problem, however. Dmesg fragment below. Robert N M Watson Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-CURRENT #1: Sat Jan 29 21:32:42 EST 2005 rwatson@zoo.freebsd.org:/usr/obj/zoo/rwatson/netperf/src/sys/GENERIC WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) XEON(TM) CPU 2.20GHz (2192.90-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf24 Stepping = 4 Features=0x3febfbff real memory = 2147352576 (2047 MB) avail memory = 2096799744 (1999 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 6 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic2: Changing APIC ID to 10 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-15 on motherboard ioapic1 irqs 16-31 on motherboard ioapic2 irqs 32-47 on motherboard ... ACPI APIC Table: acpi0: on motherboard aac0: mem 0xf0000000-0xf7ffffff irq 30 at device 8.1 on pci4 ... bge0: mem 0xfcd10000-0xfcd1ffff irq 28 at device 6.0 on pci3 miibus0: on bge0 bge0: Ethernet address: 00:06:5b:8e:b9:8d bge1: mem 0xfcd00000-0xfcd0ffff irq 29 at device 8.0 on pci3 miibus1: on bge1 bge1: Ethernet address: 00:06:5b:8e:b9:8e > > Apr 22 18:44:01 nebula kernel: bge0: watchdog timeout -- resetting > Apr 22 18:44:01 nebula kernel: bge0: link state changed to DOWN > Apr 22 18:44:03 nebula kernel: bge0: link state changed to UP > > ...and more importantly - when this happens, the network connection does NOT > in fact come back up. Logging into the box locally (or via a different > network interface) and manually issuing "ifconfig bge0 down ; ifconfig bge0 > up" DOES get the interface going again, however. > > We have only seen this on very high network loads - the particular message > included above occured while transferring some 120GB of data from a 4.11 > NFS-server to this 6.1-RC1 box. > > Is this a known issue in bge? If so, is anyone working on it? Can we provide > some useful information to whoever this might be? > > We have never had any issues with bge in 4.x, but we really need to get this > server up to 5.x/6.x at this point in time, any other suggestions on knobs or > workarounds that can give us bge stability? > > Thanks in advance, > > /leg > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >