Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Nov 2006 13:13:13 -0800
From:      "Jack Vogel" <jfvogel@gmail.com>
To:        "John Polstra" <jdp@polstra.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Serious em problems under -current on two different platforms
Message-ID:  <2a41acea0611171313k56d19031kca505b8b2117a7e3@mail.gmail.com>
In-Reply-To: <XFMail.20061117124158.jdp@polstra.com>
References:  <XFMail.20061117124158.jdp@polstra.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/17/06, John Polstra <jdp@polstra.com> wrote:
> Folks, I'm using -current from 2006-11-16 05:00 UTC and find that my
> em interfaces are unusable on two quite different platforms.  I've
> tried a lot of things to make sure it's not a local fubar here,
> including doing a "make release" using a virgin source tree and
> installing fresh from the resulting CD (with GENERIC kernel).  I also
> have a netbootable CD image that is part of the project I'm working
> on, and it admittedly has some minor mods to the kernel.  I booted
> that exact same image on two different platforms with em devices in
> them, and got the same results as when I used the virgin FreeBSD CD.
>
> I don't think this is caused by the recent MSI support.  I get the
> same results when I disable it by adding "hw.pci.enable_msi=0" and
> "hw.pci.enable_msix=0" to my /boot/loader.conf file.  (And I confirmed
> that MSI wasn't being used when I did that.)
>
> The symptoms are complicated, so let's focus on one of the machines.
> It's a Dell 1950 with two dual-core 3.0 GHz Xeons in it.  The em
> devices look like this (it's a dual-port card PCI-Express card):
>
> em0@pci11:0:0:  class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = 'PRO/1000 PT'
>     class    = network
>     subclass = ethernet
> em1@pci11:0:1:  class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = 'PRO/1000 PT'
>     class    = network
>     subclass = ethernet
>
> Starting with a freshly-booted system, we see this ifconfig output,
> as expected:
>
> em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
>         options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4>
>         ether 00:0e:0c:6f:0e:18
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
> em1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
>         options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4>
>         ether 00:0e:0c:6f:0e:19
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
>
> Now I do "ifconfig em0 10.5.1.1/24" and then ping that address from
> another machine on the LAN:
>
> thin# ping 10.5.1.1
> PING 10.5.1.1 (10.5.1.1): 56 data bytes
> 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.524 ms
>
> Then nothing after the first reply.  Leaving the ping running on the
> other machine, I configure the address a 2nd time on the Dell with
> "ifconfig em0 10.5.1.1/24".  Still no response.  Next, ifconfig em0
> down and then up again.  After a few seconds, the ping responses
> start coming in and continue to work.  Try a flood ping from the
> other machine: it works fine.
>
> I kill the flood ping and go have lunch for a half-hour, then start
> up a normal 1-per-second ping from the other machine:
>
> thin# ping 10.5.1.1
> PING 10.5.1.1 (10.5.1.1): 56 data bytes
> 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.612 ms
> [then nothing]
>
> This time, I check the vmstat -i output a few times, and see that
> em0 isn't generating any interrupts.  I ifconfig em0 down and then
> up, and the pings start working again.
>
> Now, leaving that 1-per-second ping running, I start messing with
> em1.  I do "ifconfig em1 10.6.1.1/24", and within a few seconds, the
> pings on em0 stop responding.  Again em0 isn't generating
> interrupts.  Pings to em1 aren't working, either.  I ifconfig em1
> down and then up.  The pings still aren't working.  I set em1's
> address again with "ifconfig em1 10.6.1.1/24", and the pings start
> working.  Now I ping em0 from the other machine and find that it
> works, too.  Hallelujah!  Now both interfaces are working at the
> same time.  But what's the key to getting to this point?
>
> I let the pings run for awhile.  Pretty soon, both of them stop
> working again.
>
> The other machine is a Tyan 2721 with dual Xeons in it.  Its
> dual-port NIC is on the motherboard, and it looks like this:
>
> em0@pci7:1:0:   class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
>     class    = network
>     subclass = ethernet
> em1@pci7:1:1:   class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
>     class    = network
>     subclass = ethernet
>
> I can't get either port to send any packets at all.  When I try, the
> driver reports transmit watchdog timeouts.
>
> Is this stuff working for anybody at all?

This sounds bizarrely broken, can you try and back off the deltas of
if_em.[ch] and find a point where it works? I have not been making
the changes into CURRENT, and I am busy with some important
Intel tasks that I must get done, so it would help knowing when it
broke.

Thanks,

Jack



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0611171313k56d19031kca505b8b2117a7e3>