From owner-freebsd-net@FreeBSD.ORG  Wed May 12 14:28:11 2004
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 73A1116A4CE
	for <freebsd-net@freebsd.org>; Wed, 12 May 2004 14:28:11 -0700 (PDT)
Received: from cube.gelatinous.com (rdns.106.161.62.64.fre.communitycolo.net
	[64.62.161.106])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0413743D41
	for <freebsd-net@freebsd.org>; Wed, 12 May 2004 14:28:11 -0700 (PDT)
	(envelope-from scott@gelatinous.com)
Received: (qmail 49120 invoked from network); 12 May 2004 21:28:10 -0000
Received: from dsl093-129-198.sfo4.dsl.speakeasy.net (HELO ?192.168.1.183?)
	(66.93.129.198)SMTP; 12 May 2004 21:28:10 -0000
From: "Scott T. Smith" <scott@gelatinous.com>
To: freebsd-net@freebsd.org
Content-Type: text/plain
Message-Id: <1084397289.8017.30.camel@tinny.home.foo>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.6 
Date: Wed, 12 May 2004 14:28:09 -0700
Content-Transfer-Encoding: 7bit
Subject: em driver losing packets
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 May 2004 21:28:11 -0000

I have a Sun 1U server with 2 built in Intel Pro/1000 "LOMs" (though I
had the exact same problem with a previous machine using a standalone
Intel NIC).  I notice that after the machine has been up for 12-20
hours, the network card starts dropping packets.

Here is the relevant dmesg info:

em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.25> port
0x2040-0x207f mem 0xfe680000-0xfe69ffff irq 30 at device 7.0 on pci3
em0:  Speed:N/A  Duplex:N/A
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.25> port
0x2000-0x203f mem 0xfe6a0000-0xfe6bffff irq 31 at device 7.1 on pci3
em1:  Speed:N/A  Duplex:N/A

....

em0: Link is up 100 Mbps Full Duplex
em1: Link is up 1000 Mbps Full Duplex

....

Limiting icmp unreach response from 1770 to 200 packets/sec
^^^ Not sure what this is, but I received a bunch of them after
everything was working and before everything stopped working
....

em1: Excessive collisions = 0
em1: Symbol errors = 0
em1: Sequence errors = 0
em1: Defer count = 0
em1: Missed Packets = 1682
em1: Receive No Buffers = 75
em1: Receive length errors = 0
em1: Receive errors = 0
em1: Crc errors = 0
em1: Alignment errors = 0
em1: Carrier extension errors = 0
em1: XON Rcvd = 0
em1: XON Xmtd = 0
em1: XOFF Rcvd = 0
em1: XOFF Xmtd = 0
em1: Good Packets Rcvd = 119975570
em1: Good Packets Xmtd = 164
em1: Adapter hardware address = 0xc76262ec 
em1:tx_int_delay = 66, tx_abs_int_delay = 66
em1:rx_int_delay = 488, rx_abs_int_delay = 977
em1: fifo workaround = 0, fifo_reset = 0
em1: hw tdh = 170, hw tdt = 170
em1: Num Tx descriptors avail = 256
em1: Tx Descriptors not avail1 = 0
em1: Tx Descriptors not avail2 = 0
em1: Std mbuf failed = 0
em1: Std mbuf cluster failed = 0
em1: Driver dropped packets = 0


I was running 5.2.1-RELEASE with em driver version 1.7.19 or 1.7.17 (I
forget what it comes with).  I
had the problems so I backported 1.7.25 from 5.2.1-STABLE as of May 10. 
Same issue.

Notice the "missed packets" and "receive no buffers".  I assume that
means the network card ran out of memory?  How much memory does it
have?  If it uses the mainboard memory, can I make that amount any
bigger?

The odd thing (which is why I think this is a driver issue) is that it
works just fine when the machine is first booted.

I am driving approximately 680 mbits/sec of UDP traffic; 1316 byte
packets.  The only other traffic is arp traffic (em1 has a netmask of
255.255.255.255).

I have this problem whether I use kernel polling (HZ=1000) or with
rx_abs_int_delay=1000, or with rx_abs_int_delay=500.  If I shut off the
rx_*int_delay, then CPU load goes to 100% and I still have the same
problem.  With the abs delay at 1000, cpu load is 90% (about split
evenly between user and system).

If you have any ideas I'd really appreciate it.  Thanks!  I'm thinking
of trying to backport 1.7.31.

        Scott