Date: Fri, 17 Nov 2006 13:13:13 -0800 From: "Jack Vogel" <jfvogel@gmail.com> To: "John Polstra" <jdp@polstra.com> Cc: freebsd-net@freebsd.org Subject: Re: Serious em problems under -current on two different platforms Message-ID: <2a41acea0611171313k56d19031kca505b8b2117a7e3@mail.gmail.com> In-Reply-To: <XFMail.20061117124158.jdp@polstra.com> References: <XFMail.20061117124158.jdp@polstra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/17/06, John Polstra <jdp@polstra.com> wrote: > Folks, I'm using -current from 2006-11-16 05:00 UTC and find that my > em interfaces are unusable on two quite different platforms. I've > tried a lot of things to make sure it's not a local fubar here, > including doing a "make release" using a virgin source tree and > installing fresh from the resulting CD (with GENERIC kernel). I also > have a netbootable CD image that is part of the project I'm working > on, and it admittedly has some minor mods to the kernel. I booted > that exact same image on two different platforms with em devices in > them, and got the same results as when I used the virgin FreeBSD CD. > > I don't think this is caused by the recent MSI support. I get the > same results when I disable it by adding "hw.pci.enable_msi=0" and > "hw.pci.enable_msix=0" to my /boot/loader.conf file. (And I confirmed > that MSI wasn't being used when I did that.) > > The symptoms are complicated, so let's focus on one of the machines. > It's a Dell 1950 with two dual-core 3.0 GHz Xeons in it. The em > devices look like this (it's a dual-port card PCI-Express card): > > em0@pci11:0:0: class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00 > vendor = 'Intel Corporation' > device = 'PRO/1000 PT' > class = network > subclass = ethernet > em1@pci11:0:1: class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00 > vendor = 'Intel Corporation' > device = 'PRO/1000 PT' > class = network > subclass = ethernet > > Starting with a freshly-booted system, we see this ifconfig output, > as expected: > > em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500 > options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4> > ether 00:0e:0c:6f:0e:18 > media: Ethernet autoselect (1000baseTX <full-duplex>) > status: active > em1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500 > options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4> > ether 00:0e:0c:6f:0e:19 > media: Ethernet autoselect (1000baseTX <full-duplex>) > status: active > > Now I do "ifconfig em0 10.5.1.1/24" and then ping that address from > another machine on the LAN: > > thin# ping 10.5.1.1 > PING 10.5.1.1 (10.5.1.1): 56 data bytes > 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.524 ms > > Then nothing after the first reply. Leaving the ping running on the > other machine, I configure the address a 2nd time on the Dell with > "ifconfig em0 10.5.1.1/24". Still no response. Next, ifconfig em0 > down and then up again. After a few seconds, the ping responses > start coming in and continue to work. Try a flood ping from the > other machine: it works fine. > > I kill the flood ping and go have lunch for a half-hour, then start > up a normal 1-per-second ping from the other machine: > > thin# ping 10.5.1.1 > PING 10.5.1.1 (10.5.1.1): 56 data bytes > 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.612 ms > [then nothing] > > This time, I check the vmstat -i output a few times, and see that > em0 isn't generating any interrupts. I ifconfig em0 down and then > up, and the pings start working again. > > Now, leaving that 1-per-second ping running, I start messing with > em1. I do "ifconfig em1 10.6.1.1/24", and within a few seconds, the > pings on em0 stop responding. Again em0 isn't generating > interrupts. Pings to em1 aren't working, either. I ifconfig em1 > down and then up. The pings still aren't working. I set em1's > address again with "ifconfig em1 10.6.1.1/24", and the pings start > working. Now I ping em0 from the other machine and find that it > works, too. Hallelujah! Now both interfaces are working at the > same time. But what's the key to getting to this point? > > I let the pings run for awhile. Pretty soon, both of them stop > working again. > > The other machine is a Tyan 2721 with dual Xeons in it. Its > dual-port NIC is on the motherboard, and it looks like this: > > em0@pci7:1:0: class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' > class = network > subclass = ethernet > em1@pci7:1:1: class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' > class = network > subclass = ethernet > > I can't get either port to send any packets at all. When I try, the > driver reports transmit watchdog timeouts. > > Is this stuff working for anybody at all? This sounds bizarrely broken, can you try and back off the deltas of if_em.[ch] and find a point where it works? I have not been making the changes into CURRENT, and I am busy with some important Intel tasks that I must get done, so it would help knowing when it broke. Thanks, Jack
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0611171313k56d19031kca505b8b2117a7e3>