From owner-freebsd-net@FreeBSD.ORG  Sat Nov 18 07:52:41 2006
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9527116B35C
	for <freebsd-net@freebsd.org>; Sat, 18 Nov 2006 07:52:41 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.226])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 79ED145425
	for <freebsd-net@freebsd.org>; Sat, 18 Nov 2006 04:51:27 +0000 (GMT)
	(envelope-from jfvogel@gmail.com)
Received: by wx-out-0506.google.com with SMTP id s18so1042787wxc
	for <freebsd-net@freebsd.org>; Fri, 17 Nov 2006 20:51:27 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=PhckOxnG1XsYiKnIo7/SJZDGdq54fOuhYhSfDP1c52bo+2UQDG5RmCO9GbdV4Z8n4Otf2f+OFbkhdOq1rI82CrSzaX4uvdvhDsUVBGW5SvO0NH9RBB9un5bIF+n7fWZSt7A5tD9F+rNp/0joab5DRfIjpG0SFEV7q86XNjUE3gs=
Received: by 10.90.115.9 with SMTP id n9mr2135843agc.1163797993352;
	Fri, 17 Nov 2006 13:13:13 -0800 (PST)
Received: by 10.35.118.6 with HTTP; Fri, 17 Nov 2006 13:13:13 -0800 (PST)
Message-ID: <2a41acea0611171313k56d19031kca505b8b2117a7e3@mail.gmail.com>
Date: Fri, 17 Nov 2006 13:13:13 -0800
From: "Jack Vogel" <jfvogel@gmail.com>
To: "John Polstra" <jdp@polstra.com>
In-Reply-To: <XFMail.20061117124158.jdp@polstra.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <XFMail.20061117124158.jdp@polstra.com>
Cc: freebsd-net@freebsd.org
Subject: Re: Serious em problems under -current on two different platforms
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Nov 2006 07:52:41 -0000

On 11/17/06, John Polstra <jdp@polstra.com> wrote:
> Folks, I'm using -current from 2006-11-16 05:00 UTC and find that my
> em interfaces are unusable on two quite different platforms.  I've
> tried a lot of things to make sure it's not a local fubar here,
> including doing a "make release" using a virgin source tree and
> installing fresh from the resulting CD (with GENERIC kernel).  I also
> have a netbootable CD image that is part of the project I'm working
> on, and it admittedly has some minor mods to the kernel.  I booted
> that exact same image on two different platforms with em devices in
> them, and got the same results as when I used the virgin FreeBSD CD.
>
> I don't think this is caused by the recent MSI support.  I get the
> same results when I disable it by adding "hw.pci.enable_msi=0" and
> "hw.pci.enable_msix=0" to my /boot/loader.conf file.  (And I confirmed
> that MSI wasn't being used when I did that.)
>
> The symptoms are complicated, so let's focus on one of the machines.
> It's a Dell 1950 with two dual-core 3.0 GHz Xeons in it.  The em
> devices look like this (it's a dual-port card PCI-Express card):
>
> em0@pci11:0:0:  class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = 'PRO/1000 PT'
>     class    = network
>     subclass = ethernet
> em1@pci11:0:1:  class=0x020000 card=0x125e8086 chip=0x105e8086 rev=0x04 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = 'PRO/1000 PT'
>     class    = network
>     subclass = ethernet
>
> Starting with a freshly-booted system, we see this ifconfig output,
> as expected:
>
> em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
>         options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4>
>         ether 00:0e:0c:6f:0e:18
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
> em1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
>         options=18b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWCSUM,TSO4>
>         ether 00:0e:0c:6f:0e:19
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
>
> Now I do "ifconfig em0 10.5.1.1/24" and then ping that address from
> another machine on the LAN:
>
> thin# ping 10.5.1.1
> PING 10.5.1.1 (10.5.1.1): 56 data bytes
> 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.524 ms
>
> Then nothing after the first reply.  Leaving the ping running on the
> other machine, I configure the address a 2nd time on the Dell with
> "ifconfig em0 10.5.1.1/24".  Still no response.  Next, ifconfig em0
> down and then up again.  After a few seconds, the ping responses
> start coming in and continue to work.  Try a flood ping from the
> other machine: it works fine.
>
> I kill the flood ping and go have lunch for a half-hour, then start
> up a normal 1-per-second ping from the other machine:
>
> thin# ping 10.5.1.1
> PING 10.5.1.1 (10.5.1.1): 56 data bytes
> 64 bytes from 10.5.1.1: icmp_seq=0 ttl=64 time=0.612 ms
> [then nothing]
>
> This time, I check the vmstat -i output a few times, and see that
> em0 isn't generating any interrupts.  I ifconfig em0 down and then
> up, and the pings start working again.
>
> Now, leaving that 1-per-second ping running, I start messing with
> em1.  I do "ifconfig em1 10.6.1.1/24", and within a few seconds, the
> pings on em0 stop responding.  Again em0 isn't generating
> interrupts.  Pings to em1 aren't working, either.  I ifconfig em1
> down and then up.  The pings still aren't working.  I set em1's
> address again with "ifconfig em1 10.6.1.1/24", and the pings start
> working.  Now I ping em0 from the other machine and find that it
> works, too.  Hallelujah!  Now both interfaces are working at the
> same time.  But what's the key to getting to this point?
>
> I let the pings run for awhile.  Pretty soon, both of them stop
> working again.
>
> The other machine is a Tyan 2721 with dual Xeons in it.  Its
> dual-port NIC is on the motherboard, and it looks like this:
>
> em0@pci7:1:0:   class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
>     class    = network
>     subclass = ethernet
> em1@pci7:1:1:   class=0x020000 card=0x10118086 chip=0x10108086 rev=0x01 hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82546EB Dual Port Gigabit Ethernet Controller (Copper)'
>     class    = network
>     subclass = ethernet
>
> I can't get either port to send any packets at all.  When I try, the
> driver reports transmit watchdog timeouts.
>
> Is this stuff working for anybody at all?

This sounds bizarrely broken, can you try and back off the deltas of
if_em.[ch] and find a point where it works? I have not been making
the changes into CURRENT, and I am busy with some important
Intel tasks that I must get done, so it would help knowing when it
broke.

Thanks,

Jack