Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Nov 2006 12:41:58 -0800 (PST)
From:      John Polstra <jdp@polstra.com>
To:        freebsd-net@freebsd.org
Cc:        Jack Vogel <jfvogel@gmail.com>
Subject:   Serious em problems under -current on two different platforms
Message-ID:  <XFMail.20061117124158.jdp@polstra.com>

next in thread | raw e-mail | index | archive | help
Folks, I'm using -current from 2006-11-16 05:00 UTC and find that my
em interfaces are unusable on two quite different platforms.  I've
tried a lot of things to make sure it's not a local fubar here,
including doing a "make release" using a virgin source tree and
installing fresh from the resulting CD (with GENERIC kernel).  I also
have a netbootable CD image that is part of the project I'm working
on, and it admittedly has some minor mods to the kernel.  I booted
that exact same image on two different platforms with em devices in
them, and got the same results as when I used the virgin FreeBSD CD.

I don't think this is caused by the recent MSI support.  I get the
same results when I disable it by adding "hw.pci.enable_msi=0" and
"hw.pci.enable_msix=0" to my /boot/loader.conf file.  (And I confirmed
that MSI wasn't being used when I did that.)

The symptoms are complicated, so let's focus on one of the machines.
It's a Dell 1950 with two dual-core 3.0 GHz Xeons in it.  The em
devices look like this (it's a dual-port card PCI-Express card):

em0@pci11:0:0:  class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = 'PRO/1000 PT'
    class    = network
    subclass = ethernet
em1@pci11:0:1:  class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = 'PRO/1000 PT'
    class    = network
    subclass = ethernet

Starting with a freshly-booted system, we see this ifconfig output,
as expected:

em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4>
        ether 00:0e:0c:6f:0e:18
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active
em1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
        options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4>
        ether 00:0e:0c:6f:0e:19
        media: Ethernet autoselect (1000baseTX <full-duplex>)
        status: active

Now I do "ifconfig em0 10.5.1.1/24" and then ping that address from
another machine on the LAN:

thin# ping 10.5.1.1
PING 10.5.1.1 (10.5.1.1): 56 data bytes
64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.524 ms

Then nothing after the first reply.  Leaving the ping running on the
other machine, I configure the address a 2nd time on the Dell with
"ifconfig em0 10.5.1.1/24".  Still no response.  Next, ifconfig em0
down and then up again.  After a few seconds, the ping responses
start coming in and continue to work.  Try a flood ping from the
other machine: it works fine.

I kill the flood ping and go have lunch for a half-hour, then start
up a normal 1-per-second ping from the other machine:

thin# ping 10.5.1.1
PING 10.5.1.1 (10.5.1.1): 56 data bytes
64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.612 ms
[then nothing]

This time, I check the vmstat -i output a few times, and see that
em0 isn't generating any interrupts.  I ifconfig em0 down and then
up, and the pings start working again.

Now, leaving that 1-per-second ping running, I start messing with
em1.  I do "ifconfig em1 10.6.1.1/24", and within a few seconds, the
pings on em0 stop responding.  Again em0 isn't generating
interrupts.  Pings to em1 aren't working, either.  I ifconfig em1
down and then up.  The pings still aren't working.  I set em1's
address again with "ifconfig em1 10.6.1.1/24", and the pings start
working.  Now I ping em0 from the other machine and find that it
works, too.  Hallelujah!  Now both interfaces are working at the
same time.  But what's the key to getting to this point?

I let the pings run for awhile.  Pretty soon, both of them stop
working again.

The other machine is a Tyan 2721 with dual Xeons in it.  Its
dual-port NIC is on the motherboard, and it looks like this:

em0@pci7:1:0:   class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
    class    = network
    subclass = ethernet
em1@pci7:1:1:   class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
    class    = network
    subclass = ethernet

I can't get either port to send any packets at all.  When I try, the
driver reports transmit watchdog timeouts.

Is this stuff working for anybody at all?

John



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20061117124158.jdp>