From owner-freebsd-net@FreeBSD.ORG Mon Feb 17 21:41:39 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D076192F for ; Mon, 17 Feb 2014 21:41:39 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5C8CD1478 for ; Mon, 17 Feb 2014 21:41:39 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WFVwR-0005Ow-TL for freebsd-net@freebsd.org; Mon, 17 Feb 2014 22:41:31 +0100 Received: from tempe0.bbox.io ([24.249.180.233]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Feb 2014 22:41:31 +0100 Received: from kevin.bowling by tempe0.bbox.io with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Feb 2014 22:41:31 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Kevin Bowling Subject: Re: FreeBSD 10 network flapping, ix driver unreliable? Date: Mon, 17 Feb 2014 14:41:17 -0700 Lines: 87 Message-ID: References: <61748F81-A763-4504-BC81-132D394F0170@neville-neil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: tempe0.bbox.io User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Thunderbird/27.0 In-Reply-To: X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Feb 2014 21:41:40 -0000 On 2/16/2014 9:04 PM, George Neville-Neil wrote: > > On Feb 15, 2014, at 21:32 , Kevin Bowling wrote: > >> On 2/15/2014 4:43 PM, George Neville-Neil wrote: >>> >>> On Feb 15, 2014, at 15:14 , Kevin Bowling wrote: >>> >>>> Hi, >>>> >>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes. Each node has an Intel X520-DA2 dual port 10gig card. One of the ports on each go to a switch using direct attach coaxial cables. The other port is directly connected between the two nodes (think crossover in twisted pair terminology) again using direct attach coaxial cables. >>>> >>>> On both machines, and on both ports (including the "crossover"), the links flap several times per day. >>>> >>>> I've pasted the output of lspci -vv and dmesg here: >>>> https://gist.github.com/kev009/9024442 >>>> >>>> There's nothing outstanding about the setup otherwise. I suspected some interaction with the switch initially but the "crossover" has eliminated that suspicion. >>>> >>>> It seems the ix driver is not very reliable under common conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=7&t=44570 and a search of this list. Any recommendations or tests? >>>> >>> >>> Can you post (to your gist link) the output of sysctl dev.ix ? >> >> Hi George, >> >> sysctl info added to gist link. ix0 has been up for around 27 days. ix1 for about 24hrs. >> > > I think this has something to do with it. > > dev.ix.0.mac_stats.local_faults: 314 > dev.ix.0.mac_stats.remote_faults: 41 > > The device is seeing errors at the MAC layer, which I don’t think a driver bug would > cause, though there is always the possibility of a misconfiguration causing flapping. > Can you try different cables? > > When you hook it to the switch does the switch give better diagnostics? Reading > over the Intel 82599 chip manual is not, shall we say, illuminating, > "Number of faults in the local MAC. This register is valid only when the link speed is 10 Gb/s.” Appreciate your help, this led me to find some new info although it doesn't entirely answer what local_faluts are for me: http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf I may have spoke too soon, the "crossover" ix1 seems to be holding steady, so the local and remote faults must have been during negotiation and me bringing up the interfaces. On the other system's ix0, the faults are almost all local and quite a bit more frequent: dev.ix.0.mac_stats.local_faults: 10752 dev.ix.0.mac_stats.remote_faults: 2 I then noticed the switch had mandatory flow control on both send and receive for 10gig, but the FreeBSD box was only negotiating receive flow control. I disabled both on the switch and rebooted but am still seeing some increments of local_faults. Could it be a switch STP problem? Switch is a Cisco 4948-10ge. Configs look like below, which is working well on some copper gigabit interfaces: spanning-tree mode pvst spanning-tree portfast default spanning-tree extend system-id ! interface TenGigabitEthernet1/49 switchport trunk encapsulation dot1q switchport mode trunk spanning-tree portfast trunk ! interface TenGigabitEthernet1/50 switchport trunk encapsulation dot1q switchport mode trunk flowcontrol receive desired flowcontrol send desired spanning-tree portfast trunk ! It will be hard for me to source SFPs and fiber, but I can try to see if it's a physical layer problem. In the mean time I might try imaging one of the systems with a different OS and seeing if the problem persists. Regards, Kevin Bowling