From owner-freebsd-net@FreeBSD.ORG Tue Feb 18 14:16:35 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6955E96A for ; Tue, 18 Feb 2014 14:16:35 +0000 (UTC) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 37C111973 for ; Tue, 18 Feb 2014 14:16:34 +0000 (UTC) Received: from mobile-198-228-192-202.mycingular.net ([198.228.192.202]:32484 helo=[172.20.10.5]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1) (envelope-from ) id 1WFlTN-0005IX-JP; Tue, 18 Feb 2014 09:16:33 -0500 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: FreeBSD 10 network flapping, ix driver unreliable? From: George Neville-Neil In-Reply-To: Date: Tue, 18 Feb 2014 09:16:32 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <11F52C6F-1A9C-4D5B-8364-AFB62322CB91@neville-neil.com> References: <61748F81-A763-4504-BC81-132D394F0170@neville-neil.com> To: Kevin Bowling X-Mailer: Apple Mail (2.1827) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Feb 2014 14:16:35 -0000 On Feb 17, 2014, at 16:41 , Kevin Bowling = wrote: > On 2/16/2014 9:04 PM, George Neville-Neil wrote: >>=20 >> On Feb 15, 2014, at 21:32 , Kevin Bowling = wrote: >>=20 >>> On 2/15/2014 4:43 PM, George Neville-Neil wrote: >>>>=20 >>>> On Feb 15, 2014, at 15:14 , Kevin Bowling = wrote: >>>>=20 >>>>> Hi, >>>>>=20 >>>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes. = Each node has an Intel X520-DA2 dual port 10gig card. One of the ports = on each go to a switch using direct attach coaxial cables. The other = port is directly connected between the two nodes (think crossover in = twisted pair terminology) again using direct attach coaxial cables. >>>>>=20 >>>>> On both machines, and on both ports (including the "crossover"), = the links flap several times per day. >>>>>=20 >>>>> I've pasted the output of lspci -vv and dmesg here: >>>>> https://gist.github.com/kev009/9024442 >>>>>=20 >>>>> There's nothing outstanding about the setup otherwise. I = suspected some interaction with the switch initially but the "crossover" = has eliminated that suspicion. >>>>>=20 >>>>> It seems the ix driver is not very reliable under common = conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=3D7&t=3D44570 = and a search of this list. Any recommendations or tests? >>>>>=20 >>>>=20 >>>> Can you post (to your gist link) the output of sysctl dev.ix ? >>>=20 >>> Hi George, >>>=20 >>> sysctl info added to gist link. ix0 has been up for around 27 days. = ix1 for about 24hrs. >>>=20 >>=20 >> I think this has something to do with it. >>=20 >> dev.ix.0.mac_stats.local_faults: 314 >> dev.ix.0.mac_stats.remote_faults: 41 >>=20 >> The device is seeing errors at the MAC layer, which I don=92t think = a driver bug would >> cause, though there is always the possibility of a misconfiguration = causing flapping. >> Can you try different cables? >>=20 >> When you hook it to the switch does the switch give better = diagnostics? Reading >> over the Intel 82599 chip manual is not, shall we say, illuminating, >> "Number of faults in the local MAC. This register is valid only when = the link speed is 10 Gb/s.=94 >=20 > Appreciate your help, this led me to find some new info although it = doesn't entirely answer what local_faluts are for me: = http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf >=20 > I may have spoke too soon, the "crossover" ix1 seems to be holding = steady, so the local and remote faults must have been during negotiation = and me bringing up the interfaces. >=20 > On the other system's ix0, the faults are almost all local and quite a = bit more frequent: > dev.ix.0.mac_stats.local_faults: 10752 > dev.ix.0.mac_stats.remote_faults: 2 >=20 > I then noticed the switch had mandatory flow control on both send and = receive for 10gig, but the FreeBSD box was only negotiating receive flow = control. I disabled both on the switch and rebooted but am still seeing = some increments of local_faults. >=20 > Could it be a switch STP problem? Switch is a Cisco 4948-10ge. = Configs look like below, which is working well on some copper gigabit = interfaces: >=20 > spanning-tree mode pvst > spanning-tree portfast default > spanning-tree extend system-id > ! > interface TenGigabitEthernet1/49 > switchport trunk encapsulation dot1q > switchport mode trunk > spanning-tree portfast trunk > ! > interface TenGigabitEthernet1/50 > switchport trunk encapsulation dot1q > switchport mode trunk > flowcontrol receive desired > flowcontrol send desired > spanning-tree portfast trunk > ! >=20 > It will be hard for me to source SFPs and fiber, but I can try to see = if it's a physical layer problem. In the mean time I might try imaging = one of the systems with a different OS and seeing if the problem = persists. >=20 Another possibility is flow control. Can you try this setting? sysctl dev.ix.0.fc=3D0 Best, George