From owner-svn-src-all@freebsd.org Wed Nov 30 21:53:08 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 044E7C5E09A; Wed, 30 Nov 2016 21:53:08 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D314A19D5; Wed, 30 Nov 2016 21:53:07 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id uAULr7t7057217; Wed, 30 Nov 2016 21:53:07 GMT (envelope-from vangyzen@FreeBSD.org) Received: (from vangyzen@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id uAULr6MB057213; Wed, 30 Nov 2016 21:53:06 GMT (envelope-from vangyzen@FreeBSD.org) Message-Id: <201611302153.uAULr6MB057213@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: vangyzen set sender to vangyzen@FreeBSD.org using -f From: Eric van Gyzen Date: Wed, 30 Nov 2016 21:53:06 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r309337 - in stable/11: sys/netinet usr.sbin/arp X-SVN-Group: stable-11 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Nov 2016 21:53:08 -0000 Author: vangyzen Date: Wed Nov 30 21:53:06 2016 New Revision: 309337 URL: https://svnweb.freebsd.org/changeset/base/309337 Log: MFC r306577 r306652 306830 Add GARP retransmit capability A single gratuitous ARP (GARP) is always transmitted when an IPv4 address is added to an interface, and that is usually sufficient. However, in some circumstances, such as when a shared address is passed between cluster nodes, this single GARP may occasionally be dropped or lost. This can lead to neighbors on the network link working with a stale ARP cache and sending packets destined for that address to the node that previously owned the address, which may not respond. To avoid this situation, GARP retransmissions can be enabled by setting the net.link.ether.inet.garp_rexmit_count sysctl to a value greater than zero. The setting represents the maximum number of retransmissions. The interval between retransmissions is calculated using an exponential backoff algorithm, doubling each time, so the retransmission intervals are: {1, 2, 4, 8, 16, ...} (seconds). Due to the exponential backoff algorithm used for the interval between GARP retransmissions, the maximum number of retransmissions is limited to 16 for sanity. This limit corresponds to a maximum interval between retransmissions of 2^16 seconds ~= 18 hours. Increasing this limit is possible, but sending out GARPs spaced days apart would be of little use. Update arp(4) to document the net.link.ether.inet.garp_rexmit_count sysctl. Submitted by: dab Relnotes: yes Sponsored by: Dell EMC Modified: stable/11/sys/netinet/if_ether.c stable/11/sys/netinet/in.c stable/11/sys/netinet/in_var.h stable/11/usr.sbin/arp/arp.4 Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/netinet/if_ether.c ============================================================================== --- stable/11/sys/netinet/if_ether.c Wed Nov 30 20:51:51 2016 (r309336) +++ stable/11/sys/netinet/if_ether.c Wed Nov 30 21:53:06 2016 (r309337) @@ -137,6 +137,28 @@ SYSCTL_INT(_net_link_ether_inet, OID_AUT "Maximum number of remotely triggered ARP messages that can be " "logged per second"); +/* + * Due to the exponential backoff algorithm used for the interval between GARP + * retransmissions, the maximum number of retransmissions is limited for + * sanity. This limit corresponds to a maximum interval between retransmissions + * of 2^16 seconds ~= 18 hours. + * + * Making this limit more dynamic is more complicated than worthwhile, + * especially since sending out GARPs spaced days apart would be of little + * use. A maximum dynamic limit would look something like: + * + * const int max = fls(INT_MAX / hz) - 1; + */ +#define MAX_GARP_RETRANSMITS 16 +static int sysctl_garp_rexmit(SYSCTL_HANDLER_ARGS); +static int garp_rexmit_count = 0; /* GARP retransmission setting. */ + +SYSCTL_PROC(_net_link_ether_inet, OID_AUTO, garp_rexmit_count, + CTLTYPE_INT|CTLFLAG_RW|CTLFLAG_MPSAFE, + &garp_rexmit_count, 0, sysctl_garp_rexmit, "I", + "Number of times to retransmit GARP packets;" + " 0 to disable, maximum of 16"); + #define ARP_LOG(pri, ...) do { \ if (ppsratecheck(&arp_lastlog, &arp_curpps, arp_maxpps)) \ log((pri), "arp: " __VA_ARGS__); \ @@ -1287,6 +1309,109 @@ arp_add_ifa_lle(struct ifnet *ifp, const lltable_free_entry(LLTABLE(ifp), lle_tmp); } +/* + * Handle the garp_rexmit_count. Like sysctl_handle_int(), but limits the range + * of valid values. + */ +static int +sysctl_garp_rexmit(SYSCTL_HANDLER_ARGS) +{ + int error; + int rexmit_count = *(int *)arg1; + + error = sysctl_handle_int(oidp, &rexmit_count, 0, req); + + /* Enforce limits on any new value that may have been set. */ + if (!error && req->newptr) { + /* A new value was set. */ + if (rexmit_count < 0) { + rexmit_count = 0; + } else if (rexmit_count > MAX_GARP_RETRANSMITS) { + rexmit_count = MAX_GARP_RETRANSMITS; + } + *(int *)arg1 = rexmit_count; + } + + return (error); +} + +/* + * Retransmit a Gratuitous ARP (GARP) and, if necessary, schedule a callout to + * retransmit it again. A pending callout owns a reference to the ifa. + */ +static void +garp_rexmit(void *arg) +{ + struct in_ifaddr *ia = arg; + + if (callout_pending(&ia->ia_garp_timer) || + !callout_active(&ia->ia_garp_timer)) { + IF_ADDR_WUNLOCK(ia->ia_ifa.ifa_ifp); + ifa_free(&ia->ia_ifa); + return; + } + + /* + * Drop lock while the ARP request is generated. + */ + IF_ADDR_WUNLOCK(ia->ia_ifa.ifa_ifp); + + arprequest(ia->ia_ifa.ifa_ifp, &IA_SIN(ia)->sin_addr, + &IA_SIN(ia)->sin_addr, IF_LLADDR(ia->ia_ifa.ifa_ifp)); + + /* + * Increment the count of retransmissions. If the count has reached the + * maximum value, stop sending the GARP packets. Otherwise, schedule + * the callout to retransmit another GARP packet. + */ + ++ia->ia_garp_count; + if (ia->ia_garp_count >= garp_rexmit_count) { + ifa_free(&ia->ia_ifa); + } else { + int rescheduled; + IF_ADDR_WLOCK(ia->ia_ifa.ifa_ifp); + rescheduled = callout_reset(&ia->ia_garp_timer, + (1 << ia->ia_garp_count) * hz, + garp_rexmit, ia); + IF_ADDR_WUNLOCK(ia->ia_ifa.ifa_ifp); + if (rescheduled) { + ifa_free(&ia->ia_ifa); + } + } +} + +/* + * Start the GARP retransmit timer. + * + * A single GARP is always transmitted when an IPv4 address is added + * to an interface and that is usually sufficient. However, in some + * circumstances, such as when a shared address is passed between + * cluster nodes, this single GARP may occasionally be dropped or + * lost. This can lead to neighbors on the network link working with a + * stale ARP cache and sending packets destined for that address to + * the node that previously owned the address, which may not respond. + * + * To avoid this situation, GARP retransmits can be enabled by setting + * the net.link.ether.inet.garp_rexmit_count sysctl to a value greater + * than zero. The setting represents the maximum number of + * retransmissions. The interval between retransmissions is calculated + * using an exponential backoff algorithm, doubling each time, so the + * retransmission intervals are: {1, 2, 4, 8, 16, ...} (seconds). + */ +static void +garp_timer_start(struct ifaddr *ifa) +{ + struct in_ifaddr *ia = (struct in_ifaddr *) ifa; + + IF_ADDR_WLOCK(ia->ia_ifa.ifa_ifp); + ia->ia_garp_count = 0; + if (callout_reset(&ia->ia_garp_timer, (1 << ia->ia_garp_count) * hz, + garp_rexmit, ia) == 0) { + ifa_ref(ifa); + } + IF_ADDR_WUNLOCK(ia->ia_ifa.ifa_ifp); +} + void arp_ifinit(struct ifnet *ifp, struct ifaddr *ifa) { @@ -1302,6 +1427,9 @@ arp_ifinit(struct ifnet *ifp, struct ifa if (ntohl(dst_in->sin_addr.s_addr) == INADDR_ANY) return; arp_announce_ifaddr(ifp, dst_in->sin_addr, IF_LLADDR(ifp)); + if (garp_rexmit_count > 0) { + garp_timer_start(ifa); + } arp_add_ifa_lle(ifp, dst); } Modified: stable/11/sys/netinet/in.c ============================================================================== --- stable/11/sys/netinet/in.c Wed Nov 30 20:51:51 2016 (r309336) +++ stable/11/sys/netinet/in.c Wed Nov 30 21:53:06 2016 (r309337) @@ -397,6 +397,8 @@ in_aifaddr_ioctl(u_long cmd, caddr_t dat ifa->ifa_addr = (struct sockaddr *)&ia->ia_addr; ifa->ifa_dstaddr = (struct sockaddr *)&ia->ia_dstaddr; ifa->ifa_netmask = (struct sockaddr *)&ia->ia_sockmask; + callout_init_rw(&ia->ia_garp_timer, &ifp->if_addr_lock, + CALLOUT_RETURNUNLOCKED); ia->ia_ifp = ifp; ia->ia_addr = *addr; @@ -635,6 +637,12 @@ in_difaddr_ioctl(caddr_t data, struct if IN_MULTI_UNLOCK(); } + IF_ADDR_WLOCK(ifp); + if (callout_stop(&ia->ia_garp_timer) == 1) { + ifa_free(&ia->ia_ifa); + } + IF_ADDR_WUNLOCK(ifp); + EVENTHANDLER_INVOKE(ifaddr_event, ifp); ifa_free(&ia->ia_ifa); /* in_ifaddrhead */ Modified: stable/11/sys/netinet/in_var.h ============================================================================== --- stable/11/sys/netinet/in_var.h Wed Nov 30 20:51:51 2016 (r309336) +++ stable/11/sys/netinet/in_var.h Wed Nov 30 21:53:06 2016 (r309337) @@ -82,6 +82,8 @@ struct in_ifaddr { struct sockaddr_in ia_dstaddr; /* reserve space for broadcast addr */ #define ia_broadaddr ia_dstaddr struct sockaddr_in ia_sockmask; /* reserve space for general netmask */ + struct callout ia_garp_timer; /* timer for retransmitting GARPs */ + int ia_garp_count; /* count of retransmitted GARPs */ }; /* Modified: stable/11/usr.sbin/arp/arp.4 ============================================================================== --- stable/11/usr.sbin/arp/arp.4 Wed Nov 30 20:51:51 2016 (r309336) +++ stable/11/usr.sbin/arp/arp.4 Wed Nov 30 21:53:06 2016 (r309337) @@ -28,7 +28,7 @@ .\" @(#)arp4.4 6.5 (Berkeley) 4/18/94 .\" $FreeBSD$ .\" -.Dd November 5, 2013 +.Dd October 7, 2016 .Dt ARP 4 .Os .Sh NAME @@ -121,49 +121,65 @@ of the MIB. .Bl -tag -width "log_arp_permanent_modify" .It Va allow_multicast -Should the kernel install ARP entries with multicast bit set in -the hardware address. -Installing such entries is RFC 1812 violation, but some prorietary -load balancing techniques require routers on network to do so. +Install ARP entries with the multicast bit set in the hardware address. +Installing such entries is an RFC 1812 violation, but some proprietary load +balancing techniques require routers to do so. Turned off by default. +.It Va garp_rexmit_count +Retransmit gratuitous ARP (GARP) packets when an IPv4 address is added to an +interface. +A GARP is always transmitted when an IPv4 address is added to an interface. +A non-zero value causes the GARP packet to be retransmitted the stated number +of times. +The interval between retransmissions is doubled each time, so the +retransmission intervals are: {1, 2, 4, 8, 16, ...} (seconds). +The default value of zero means only the initial GARP is sent; no +additional GARP packets are retransmitted. +The maximum value is sixteen. +.Pp +The default behavior of a single GARP packet is usually sufficient. +However, a single GARP might be dropped or lost in some circumstances. +This is particularly harmful when a shared address is passed between cluster +nodes. +Neighbors on the network link might then work with a stale ARP cache and send +packets destined for that address to the node that previously owned the +address, which might not respond. .It Va log_arp_movements -Should the kernel log movements of IP addresses from one hardware -address to an other. +Log movements of IP addresses from one hardware address to another. See .Sx DIAGNOSTICS below. Turned on by default. .It Va log_arp_permanent_modify -Should the kernel log attempts of remote host on network to modify a -permanent ARP entry. +Log attempts by a remote host to modify a permanent ARP entry. See .Sx DIAGNOSTICS below. Turned on by default. .It Va log_arp_wrong_iface -Should the kernel log attempts to insert an ARP entry on an interface -when the IP network the address belongs to is connected to an other -interface. +Log attempts to insert an ARP entry on an interface when the IP network to +which the address belongs is connected to another interface. See .Sx DIAGNOSTICS below. Turned on by default. .It Va max_log_per_second -Limit number of remotely triggered logging events to a configured value -per second. +Limit the number of remotely triggered logging events to a configured value per +second. Default is 1 log message per second. .It Va max_age How long an ARP entry is held in the cache until it needs to be refreshed. Default is 1200 seconds. .It Va maxhold -How many packets hold in the per-entry output queue while the entry +How many packets to hold in the per-entry output queue while the entry is being resolved. Default is one packet. .It Va maxtries -Number of retransmits before host is considered down and error is returned. +Number of retransmits before a host is considered down and an error is +returned. Default is 5 tries. .It Va proxyall -Enables ARP proxying for all hosts on net. +Enables ARP proxying. Turned off by default. .It Va wait Lifetime of an incomplete ARP entry.