From owner-freebsd-net@FreeBSD.ORG Tue May 17 15:25:39 2005 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8221616A4D0 for ; Tue, 17 May 2005 15:25:39 +0000 (GMT) Received: from zixvpm01.seton.org (zixvpm01.seton.org [207.193.126.161]) by mx1.FreeBSD.org (Postfix) with ESMTP id E508743D73 for ; Tue, 17 May 2005 15:25:37 +0000 (GMT) (envelope-from mgrooms@seton.org) Received: from zixvpm01.seton.org (ZixVPM [127.0.0.1]) by Outbound.seton.org (Proprietary) with ESMTP id 346533600C4 for ; Tue, 17 May 2005 10:25:30 -0500 (CDT) Received: from mx2-out.seton.org (unknown [10.21.254.241]) by zixvpm01.seton.org (Proprietary) with ESMTP id E1646330057 for ; Tue, 17 May 2005 10:25:29 -0500 (CDT) Received: from localhost (unknown [127.0.0.1]) by mx2-out.seton.org (Postfix) with ESMTP id 2C1C580D for ; Tue, 17 May 2005 09:18:29 -0500 (CDT) Received: from mx2-out.seton.org ([10.21.254.241]) by localhost (mx2 [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 16724-26 for ; Tue, 17 May 2005 09:18:28 -0500 (CDT) Received: from ausexfe02.seton.org (unknown [10.20.10.185]) by mx2-out.seton.org (Postfix) with ESMTP id D38CB811 for ; Tue, 17 May 2005 09:18:28 -0500 (CDT) Received: from [10.20.160.190] ([10.20.160.190]) by ausexfe02.seton.org with Microsoft SMTPSVC(6.0.3790.211); Tue, 17 May 2005 10:25:29 -0500 Message-ID: <428A0E02.2010607@seton.org> Date: Tue, 17 May 2005 10:30:10 -0500 From: Matthew Grooms Organization: Seton Healthcare Network User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 17 May 2005 15:25:29.0670 (UTC) FILETIME=[A893B660:01C55AF4] X-Virus-Scanned: by amavisd-new at seton.org Subject: 5.4 amd64 kernel and em driver issue ... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2005 15:25:39 -0000 All, Has anyone done any extensive testing with the em driver on a 5.4 release amd64 SMP kernel? I have two boxes in a firewall setup that contain 6 em interfaces each. The public interface on both of them ( em0 ) will simply stop transmitting and then start working again after some time all by themselves. I did a lot of testing with 5.3 release candidates and did not see this behavior. Did anything go into the 5.4 kernel late in the release cycle that could have effected this? My kernel config is GENERIC with the following modifications ... 1) removal of IPV6 and faith 2) removal of USB, USB Ethernet and Firewire 3) addition of SMP support 4) addition of pf, pflog, pfsync, carp and ALTQ Detailed description of the problem ... From the firewall itself I could be pinging www.google.com and it will just stop. After a few minutes to an hour or so later it will just start working again. The really odd thing is that I can always ssh into the box on private em interface. The really really odd thing is that I can run tcpdump the public interface ( that I can't talk out of ) while the problem occurs and see traffic on the wire like ... 1) ICMP packets still coming from ping on my firewall to google ( maybe BPF picks it up early and the interface is dropping it ? ) 2) ARP requests 3) CDP advertisements 4) Misc other broadcast traffic What I have tried so far to diagnose the issue ... 1) disabling pf using -d 2) disabling SMP in kernel 3) disabling carp in kernel 4) disabling ALTQ in kernel 5) hard coding the link speed to either half or full duplex 6) trimming down my route table 7) replacing both network cables 8) moving to different ports on the switch 9) moving to a different switch all together 10) running with mpsafenet disabled What I am testing right now ... 1) disabling pf in the kernel 2) disabling HTT in hardware 3) disabling USB & Firewire in hardware 4) sacrificing a chicken on the alter of the Ethernet gods Any help is _GREATLY_ appreciated as I have to get these boxes out into production quickly. Am I missing something obvious? Could I be having a resource conflict somehow? Could I be missing a lock assertion or LOR for lack of witness or invariants? I will do whatever I can to provide any info to help diagnose this problem. For starters, here is my kernel config and dmesg output. http://hole.shrew.net/~mgrooms/files/freebsd/custom.txt http://hole.shrew.net/~mgrooms/files/freebsd/dmesg.txt Thanks in advance, -Matthew