From owner-freebsd-stable@FreeBSD.ORG Tue May 28 06:55:28 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DCC83BC9 for ; Tue, 28 May 2013 06:55:28 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id 66752DA7 for ; Tue, 28 May 2013 06:55:28 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1UhDoa-000ElU-2U; Tue, 28 May 2013 09:55:24 +0300 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3 To: pyunyh@gmail.com Subject: Re: SunFire X2200 ilo's bge1 DOWN/UP In-reply-to: <20130528064850.GB1457@michelle.cdnetworks.com> References: <20130528052953.GA1457@michelle.cdnetworks.com> <20130528064850.GB1457@michelle.cdnetworks.com> Comments: In-reply-to YongHyeon PYUN message dated "Tue, 28 May 2013 15:48:50 +0900." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 28 May 2013 09:55:24 +0300 From: Daniel Braniss Message-ID: Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 06:55:28 -0000 > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, > > > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > > > > bge0: mem > > > > 0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6 > > > > bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > miibus2: on bge0 > > > > brgphy0: PHY 1 on miibus2 > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > > bge1: mem > > > > 0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6 > > > > bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > miibus3: on bge1 > > > > brgphy1: PHY 1 on miibus3 > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > > > sf-10> ifconfig bge1 > > > > bge1: flags=8802 metric 0 mtu 1500 > > > > options=8009b > > > TE> > > > > ether 00:1b:24:5d:5b:be > > > > nd6 options=21 > > > > media: Ethernet autoselect (100baseTX ) > > > > status: active > > > > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > > Do you have some network script run by cron? > > > > no scripts. > > this port is shared with the ILO/IPMI, and back in March you fixed a problem > > that it was hanging soon after it was initialized by the driver, > > (r248226 - but I'm not sure if it was ever MFC'ed). > > It was MFCed. > > > Initialy I thought it could be caused by connections to it from other > > hosts (either via the web, or ssh) so I killed them, but it didn't help. > > without that patch the connection fails, and I don't see any DOWN/UP. > > Could you check how many number of interrupts you get from bge1? > Ideally you shouldn't get any interrupts for bge1. it's not even mentioned :-) sf-04> vmstat -i interrupt total rate irq3: uart1 964 0 irq4: uart0 6 0 irq14: ata0 227354 0 irq17: bge0 1021981 2 irq21: ohci0 28 0 irq22: ehci0 2 0 irq23: atapci1 293228 0 cpu0:timer 383244076 1124 cpu1:timer 2225144 6 cpu2:timer 2056087 6 cpu3:timer 2093943 6 Total 391162813 1147 > > > > > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. > > > > > > To check, I upgraded another identical host, and the same problem appears. > > > > > > > > > > What is the last known working revision? > > > > > > > > I have no idea, but I have older versions, and ill start from the oldets > > > > (9.1-prerelease), but > > > > it will take time, since it takes hours till it happens. > > > > > > > > > > ok. > > > >