Date: Sun, 17 May 2009 16:08:17 -0400 From: Chris Buechler <freebsd@chrisbuechler.com> To: net@freebsd.org Subject: multi-homed systems stop answering ARP on local addresses w/ifconfig aliases Message-ID: <4A106EB1.1070709@chrisbuechler.com>
next in thread | raw e-mail | index | archive | help
There seems to be a regression between 6.x and 7.0 and 7.1 related to ifconfig aliases on multi-homed hosts. Not sure on anything newer than 7.1 (this is pfSense, we're just starting to test 7.2 builds). For periods of time, the system will stop answering ARP on some of its own addresses and hence anything on that network completely stops functioning. The same setup worked fine on 6.2. The particular system illustrated here is a router on part of an ISP's network. IPs are all public, in the info provided here they've been replaced with 10. IPs. The subnets on the inside interfaces are routed to the outside interface. When this problem occurs, the IPs assigned locally on the system will still respond from the Internet, but the system itself loses all connectivity with that subnet and nothing on that subnet can communicate with the host due to the lack of ARP. That makes some sense, I presume when routing to a locally assigned address via another interface, the system doesn't need ARP on the address to respond. But while it still responds from the Internet, even the host itself can't initiate a ping to that IP. It behaves the same whether pf is enabled or disabled. I see two similar issues in the past, one with a PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=121437&cat= that's exactly the same issue, it's not limited to VLANs, any multi-homed host is affected. And another: http://thread.gmane.org/gmane.os.freebsd.stable/57125 fxp0 is the outside interface. It doesn't make any difference whether the ifconfig aliases are on the em0 or fxp1 interfaces, they both behave the same if they have any ifconfig aliases assigned. # ifconfig fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 00:90:27:86:8b:9d inet6 fe80::290:27ff:fe86:8b9d%fxp0 prefixlen 64 scopeid 0x1 inet 10.11.185.146 netmask 0xfffffff8 broadcast 10.11.185.151 media: Ethernet 100baseTX <full-duplex> status: active em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether 00:11:43:2c:62:03 inet 10.10.0.1 netmask 0xffffff00 broadcast 10.10.0.255 inet6 fe80::211:43ff:fe2c:6203%em0 prefixlen 64 scopeid 0x2 inet 10.13.40.1 netmask 0xffffff00 broadcast 10.13.40.255 inet 10.13.41.1 netmask 0xffffff00 broadcast 10.13.41.255 inet 10.13.42.1 netmask 0xffffff00 broadcast 10.13.42.255 inet 10.13.43.1 netmask 0xffffff00 broadcast 10.13.43.255 inet 10.13.44.1 netmask 0xffffff00 broadcast 10.13.44.255 inet 10.13.45.1 netmask 0xffffff00 broadcast 10.13.45.255 inet 10.13.46.1 netmask 0xffffff00 broadcast 10.13.46.255 inet 10.13.47.1 netmask 0xffffff00 broadcast 10.13.47.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active fxp1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 00:d0:b7:5d:25:9f inet 10.1.242.1 netmask 0xffffff00 broadcast 10.1.242.255 inet6 fe80::2d0:b7ff:fe5d:259f%fxp1 prefixlen 64 scopeid 0x3 inet 10.1.243.1 netmask 0xffffff00 broadcast 10.1.243.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active When the problem is occurring, you can't even ping the affected locally assigned addresses from the box itself: # ping 10.10.0.1 PING 10.10.0.1 (10.10.0.1): 56 data bytes ping: sendto: Network is unreachable ping: sendto: Network is unreachable ping: sendto: Network is unreachable And when trying to ping something on one of the affected attached subnets, you get: # ping 10.10.0.30 PING 10.10.0.30 (10.10.0.30): 56 data bytes ping: sendto: Invalid argument ping: sendto: Invalid argument In the logs, you get a flood of these messages: May 14 02:55:12 kernel: arpresolve: can't allocate route for 10.10.0.1 May 14 02:55:12 kernel: arplookup 10.10.0.1 failed: host is not on local network May 14 02:55:12 kernel: arpresolve: can't allocate route for 10.10.0.1 May 14 02:55:12 kernel: arplookup 10.10.0.1 failed: host is not on local network It happens both with the primary IP assigned to the interface, and the aliases assigned, but not all at once. Some of the addresses will continue to work when others are failing. Somehow it thinks IPs that are locally assigned are not on a local network... after a couple minutes, it just starts working again without making any changes or even touching the system. If I can provide any additional information, please let me know. thanks, Chris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A106EB1.1070709>