From owner-freebsd-net@freebsd.org Thu Jun 23 11:08:03 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CF4C1B73241 for ; Thu, 23 Jun 2016 11:08:03 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from smtp.krpservers.com (smtp.krpservers.com [62.13.128.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.krpservers.com", Issuer "RapidSSL SHA256 CA - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7BD5D2A73 for ; Thu, 23 Jun 2016 11:08:03 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from [10.12.30.106] (vpn01-01.tdx.co.uk [62.13.130.213] (may be forged)) (authenticated bits=0) by smtp.krpservers.com (8.15.2/8.15.2) with ESMTPSA id u5NArtit054794 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 23 Jun 2016 11:53:57 +0100 (BST) (envelope-from kpielorz_lst@tdx.co.uk) Date: Thu, 23 Jun 2016 11:53:47 +0100 From: Karl Pielorz To: freebsd-net@FreeBSD.org Subject: Problem with VLAN config and traffic after 10.1-R -> 10.3-R-p5 Upgrade? Message-ID: <2ED5D9FEB55641BF734C14F3@[10.12.30.106]> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2016 11:08:03 -0000 Hi, We're in the process of updating our boxes from 10.1 to 10.3. This has gone OK for the simpler cases - but I seem to have found a couple of issues with the way 10.3 handles both configuring VLANs and actual traffic on VLANs. On our box to be upgraded, our /etc/rc.conf has: cloned_interfaces="lagg0 lagg1 lagg1.30 lagg1.35" ifconfig_bge0="up" ifconfig_bge1="up" ifconfig_lagg0="laggproto failover laggport bge0 laggport bge1 172.16.50.1 netmask 255.255.255.0" ifconfig_em3="mtu 1504 up" ifconfig_em0="mtu 1504 up" ifconfig_lagg1="laggproto failover laggport em3 laggport em0 192.168.0.2 netmask 255.255.255.0 mtu 1504" ifconfig_lagg1_30="inet 192.168.200.2 netmask 255.255.255.0 mtu 1500" ifconfig_lagg1_35="inet 192.168.210.2 netmask 255.255.255.0 mtu 1500" The mtu 'hackery' is needed to avoid MTU issues with VLAN interfaces. The above worked fine under 10.1 - but the same config under 10.3: - Creates lagg0 correctly, and assigns the 172.16.50.1 IP to it - Creates lagg1 - and it's VLAN's - Does not assign 192.168.0.2 to lagg1 (it silently fails to - i.e. no errors logged / shown) So when the system has finished booting you end up with: lagg0 = 172.16.50.1 lagg1 = no IP assigned lagg1.30 = 192.168.200.2 lagg1.35 = 192.168.210.2 The other thing I've found is, once the box is up: #ping 192.168.200.1 PING 192.168.200.1 (192.168.200.1): 56 data bytes ping: sendto: Host is down ^C --- 192.168.200.1 ping statistics --- 6 packets transmitted, 0 packets received, 100.0% packet loss Hmm, not good. 192.168.200.1 is a host on the VLAN 30 network (and is up - I'm logged into it on another session). Same happens for the 192.168.210.0/24 network. Running tcpdump on 192.168.200.1 I see lots of: 11:31:52.956094 ARP, Request who-has 192.168.200.1 tell 192.168.200.2, length 46 11:31:52.956102 ARP, Reply 192.168.200.1 is-at x:x:x:x:x:x, length 28 11:31:53.969140 ARP, Request who-has 192.168.200.1 tell 192.168.200.2, length 46 11:31:53.969148 ARP, Reply 192.168.200.1 is-at x:x:x:x:x:x, length 28 Ok, so the other box can see the ARP requests from the 10.3 box - and issues a reply, but the 10.3 box can't "ping" it. This gets increasingly weird if I run tcpdump on the 10.3 box. The act of running 'tcpdump -i lagg1.30 -n' actually fixes the problem: #ping 192.168.200.1 PING 192.168.100.1 (192.168.200.1): 56 data bytes 64 bytes from 192.168.200.1: icmp_seq=0 ttl=64 time=0.257 ms 64 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=0.168 ms 64 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=0.320 ms If I ctrl-c the tcpdump on the 10.3 box at this point - pings stop dead. Restart the tcpdump - pings resume. Restoring 10.1 on the box fixes this - but I'd obviously rather be using 10.3 now. Any ideas? Thanks, -Karl