From owner-freebsd-net@FreeBSD.ORG Wed Feb 19 18:29:12 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7AD04EA2 for ; Wed, 19 Feb 2014 18:29:12 +0000 (UTC) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 31C4914C2 for ; Wed, 19 Feb 2014 18:29:12 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WGBtN-0000Bk-0x for freebsd-net@freebsd.org; Wed, 19 Feb 2014 19:29:09 +0100 Received: from tempe0.bbox.io ([24.249.180.233]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 19 Feb 2014 19:29:09 +0100 Received: from kevin.bowling by tempe0.bbox.io with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 19 Feb 2014 19:29:09 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Kevin Bowling Subject: Re: FreeBSD 10 network flapping, ix driver unreliable? Date: Wed, 19 Feb 2014 11:28:57 -0700 Lines: 88 Message-ID: References: <61748F81-A763-4504-BC81-132D394F0170@neville-neil.com> <11F52C6F-1A9C-4D5B-8364-AFB62322CB91@neville-neil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: tempe0.bbox.io User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Thunderbird/27.0 In-Reply-To: <11F52C6F-1A9C-4D5B-8364-AFB62322CB91@neville-neil.com> X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Feb 2014 18:29:12 -0000 On 2/18/2014 7:16 AM, George Neville-Neil wrote: > > On Feb 17, 2014, at 16:41 , Kevin Bowling wrote: > >> On 2/16/2014 9:04 PM, George Neville-Neil wrote: >>> >>> On Feb 15, 2014, at 21:32 , Kevin Bowling wrote: >>> >>>> On 2/15/2014 4:43 PM, George Neville-Neil wrote: >>>>> >>>>> On Feb 15, 2014, at 15:14 , Kevin Bowling wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes. Each node has an Intel X520-DA2 dual port 10gig card. One of the ports on each go to a switch using direct attach coaxial cables. The other port is directly connected between the two nodes (think crossover in twisted pair terminology) again using direct attach coaxial cables. >>>>>> >>>>>> On both machines, and on both ports (including the "crossover"), the links flap several times per day. >>>>>> >>>>>> I've pasted the output of lspci -vv and dmesg here: >>>>>> https://gist.github.com/kev009/9024442 >>>>>> >>>>>> There's nothing outstanding about the setup otherwise. I suspected some interaction with the switch initially but the "crossover" has eliminated that suspicion. >>>>>> >>>>>> It seems the ix driver is not very reliable under common conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=7&t=44570 and a search of this list. Any recommendations or tests? >>>>>> >>>>> >>>>> Can you post (to your gist link) the output of sysctl dev.ix ? >>>> >>>> Hi George, >>>> >>>> sysctl info added to gist link. ix0 has been up for around 27 days. ix1 for about 24hrs. >>>> >>> >>> I think this has something to do with it. >>> >>> dev.ix.0.mac_stats.local_faults: 314 >>> dev.ix.0.mac_stats.remote_faults: 41 >>> >>> The device is seeing errors at the MAC layer, which I don’t think a driver bug would >>> cause, though there is always the possibility of a misconfiguration causing flapping. >>> Can you try different cables? >>> >>> When you hook it to the switch does the switch give better diagnostics? Reading >>> over the Intel 82599 chip manual is not, shall we say, illuminating, >>> "Number of faults in the local MAC. This register is valid only when the link speed is 10 Gb/s.” >> >> Appreciate your help, this led me to find some new info although it doesn't entirely answer what local_faluts are for me: http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf >> >> I may have spoke too soon, the "crossover" ix1 seems to be holding steady, so the local and remote faults must have been during negotiation and me bringing up the interfaces. >> >> On the other system's ix0, the faults are almost all local and quite a bit more frequent: >> dev.ix.0.mac_stats.local_faults: 10752 >> dev.ix.0.mac_stats.remote_faults: 2 >> >> I then noticed the switch had mandatory flow control on both send and receive for 10gig, but the FreeBSD box was only negotiating receive flow control. I disabled both on the switch and rebooted but am still seeing some increments of local_faults. >> >> Could it be a switch STP problem? Switch is a Cisco 4948-10ge. Configs look like below, which is working well on some copper gigabit interfaces: >> >> spanning-tree mode pvst >> spanning-tree portfast default >> spanning-tree extend system-id >> ! >> interface TenGigabitEthernet1/49 >> switchport trunk encapsulation dot1q >> switchport mode trunk >> spanning-tree portfast trunk >> ! >> interface TenGigabitEthernet1/50 >> switchport trunk encapsulation dot1q >> switchport mode trunk >> flowcontrol receive desired >> flowcontrol send desired >> spanning-tree portfast trunk >> ! >> >> It will be hard for me to source SFPs and fiber, but I can try to see if it's a physical layer problem. In the mean time I might try imaging one of the systems with a different OS and seeing if the problem persists. >> > > Another possibility is flow control. > > Can you try this setting? > > sysctl dev.ix.0.fc=0 No luck with flow control disabled on the switch and on the interface :(. I'll continue to look into problems on the switch side.