FreeBSD Mail Archives

Date:      Mon, 27 May 2013 23:49:31 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        pyunyh@gmail.com, freebsd-stable@freebsd.org
Subject:   Re: SunFire X2200 ilo's bge1 DOWN/UP
Message-ID:  <20130528064931.GA61056@icarus.home.lan>
In-Reply-To: <E1UhDO4-000Dr7-PJ@kabab.cs.huji.ac.il>
References:  <E1UgsL2-000DBa-El@kabab.cs.huji.ac.il> <20130528052953.GA1457@michelle.cdnetworks.com> <E1UhDO4-000Dr7-PJ@kabab.cs.huji.ac.il>

On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
> > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
> > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
> > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200,
> > > > 
> > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
> > > > 
> > > 
> > > bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 
> > > 0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6
> > > bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > miibus2: <MII bus> on bge0
> > > brgphy0: <BCM5714 1000BASE-T media interface> PHY 1 on miibus2
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > bge0: Ethernet address: 00:1b:24:5d:5b:bd
> > > bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 
> > > 0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6
> > > bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > miibus3: <MII bus> on bge1
> > > brgphy1: <BCM5714 1000BASE-T media interface> PHY 1 on miibus3
> > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > bge1: Ethernet address: 00:1b:24:5d:5b:be
> > > 
> > > sf-10> ifconfig bge1
> > > bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > >         options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
> > > TE>
> > >         ether 00:1b:24:5d:5b:be
> > >         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> > >         media: Ethernet autoselect (100baseTX <full-duplex>)
> > >         status: active
> > > 
> > 
> > Because bge1 is not UP, I wonder how you get link UP/DOWN events.
> > Do you have some network script run by cron?
> 
> no scripts.
> this port is shared with the ILO/IPMI, and back in March you fixed a problem
> that it was hanging soon after it was initialized by the driver,
> (r248226 - but I'm not sure if it was ever MFC'ed).
> Initialy I thought it could be caused by connections to it from other
> hosts (either via the web, or ssh) so I killed them, but it didn't help.
> without that patch the connection fails, and I don't see any DOWN/UP.

Two things:

1. r248226 in head was MFC'd to stable/9 as r248858.  Validation:

http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log

So the answer: whether or not you have that MFC in stable/9 depends on
what SVN rev your kernel is.

2. Is there some way to verify that the ASF/iLO/IPMI bits (i.e. the IPMI
firmware itself) are not shutting down bge1's PHY intentionally?  Unless
the IPMI module chooses to log something useful (e.g. "I'm doing this"),
I'm not sure how you'd figure that out.

Other question: is there any correlation between the amount of time that
goes by between events with, say, ARP/MAC address expiry in "arp -a"?  I
mention this because I know some of the ASF methods have historically
shown two MAC addresses on the same physif, and I can see how this might
confuse some stacks.

<rant>
That "piggybacking" crap never should have been invented.  All it has
done is cause problems for every OS I know of (including Windows) since
its inception, and is also exactly why today almost all vendors I've
seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface.
It's admission the "piggybacking" method doesn't work.  And may it rot
in hell for all I care, while simultaneously feeling very sorry for
those who have to suffer/deal with it.

This is just another reason why I've always been very picky about what
hardware I'd buy for server deployments.  Vendors never actually
disclose this crap until you've shelled out money for the hardware, by
which point it's too late and you're suffering.  Really great model --
for the pocketbook.  :/
</rant>

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130528064931.GA61056>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation