Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Mar 2009 17:53:05 +0000 (UTC)
From:      Bruce M Simpson <bms@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r189592 - in head: . share/man/man4 sys/netinet sys/sys usr.bin/netstat usr.sbin/ifmcstat
Message-ID:  <200903091753.n29Hr5Jm033456@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: bms
Date: Mon Mar  9 17:53:05 2009
New Revision: 189592
URL: http://svn.freebsd.org/changeset/base/189592

Log:
  Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
  IPv4 stack.
  
  Diffs are minimized against p4.
  PCS has been used for some protocol verification, more widespread
  testing of recorded sources in Group-and-Source queries is needed.
  sizeof(struct igmpstat) has changed.
  
  __FreeBSD_version is bumped to 800070.

Modified:
  head/UPDATING
  head/share/man/man4/Makefile
  head/share/man/man4/ip.4
  head/share/man/man4/multicast.4
  head/sys/netinet/if_ether.c
  head/sys/netinet/igmp.c
  head/sys/netinet/igmp_var.h
  head/sys/netinet/in.c
  head/sys/netinet/in.h
  head/sys/netinet/in_mcast.c
  head/sys/netinet/in_proto.c
  head/sys/netinet/in_var.h
  head/sys/netinet/ip_input.c
  head/sys/netinet/ip_var.h
  head/sys/netinet/raw_ip.c
  head/sys/netinet/udp_usrreq.c
  head/sys/netinet/vinet.h
  head/sys/sys/param.h
  head/sys/sys/vimage.h
  head/usr.bin/netstat/inet.c
  head/usr.sbin/ifmcstat/Makefile
  head/usr.sbin/ifmcstat/ifmcstat.8
  head/usr.sbin/ifmcstat/ifmcstat.c

Modified: head/UPDATING
==============================================================================
--- head/UPDATING	Mon Mar  9 17:42:34 2009	(r189591)
+++ head/UPDATING	Mon Mar  9 17:53:05 2009	(r189592)
@@ -23,6 +23,45 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 8.
 	ln -s aj /etc/malloc.conf.)
 
 20090309:
+	IGMPv3 and Source-Specific Multicast (SSM) have been merged
+	to the IPv4 stack. VIMAGE hooks are in but not yet used.
+
+	For kernel developers, the most important changes are that the
+	ip_output() and ip_input() paths no longer take the IN_MULTI_LOCK(),
+	and this lock has been downgraded to a non-recursive mutex.
+
+	Transport protocols (UDP, Raw IP) are now responsible for filtering
+	inbound multicast traffic according to group membership and source
+	filters. The imo_multicast_filter() KPI exists for this purpose.
+	Transports which do not use multicast (SCTP, TCP) already reject
+	multicast by default. Forwarding and receive performance may improve
+	as a mutex acquisition is no longer needed in the ip_input()
+	low-level input path.  in_addmulti() and in_delmulti() are shimmed
+	to new KPIs which exist to support SSM in-kernel.
+
+	For application developers, it is recommended that loopback of
+	multicast datagrams be disabled for best performance, as this
+	will still cause the lock to be taken for each looped-back
+	datagram transmission. The net.inet.ip.mcast.loop sysctl may
+	be tuned to 0 to disable loopback by default; it defaults to 1
+	to preserve the existing behaviour.
+
+	For systems administrators, to obtain best performance with
+	multicast reception and multiple groups, it is always recommended
+	that a card with a suitably precise hash filter is used. Hash
+	collisions will still result in the lock being taken within the
+	transport protocol input path to check group membership.
+
+	If deploying FreeBSD in an environment with IGMP snooping switches,
+	it is recommended that the net.inet.igmp.sendlocal sysctl remain
+	enabled; this forces 224.0.0.0/24 group membership to be announced
+	via IGMP.
+
+	The size of 'struct igmpstat' has changed; netstat needs to be
+	recompiled to reflect this.
+	Bump __FreeBSD_version to 800070.
+
+20090309:
 	libusb20.so.1 is now installed as libusb.so.1 and the ports system
 	updated to use it. This requires a buildworld/installworld in order to
 	update the library and dependencies (usbconfig, etc). Its advisable to

Modified: head/share/man/man4/Makefile
==============================================================================
--- head/share/man/man4/Makefile	Mon Mar  9 17:42:34 2009	(r189591)
+++ head/share/man/man4/Makefile	Mon Mar  9 17:53:05 2009	(r189592)
@@ -128,6 +128,7 @@ MAN=	aac.4 \
 	if_bridge.4 \
 	ifmib.4 \
 	igb.4 \
+	igmp.4 \
 	iic.4 \
 	iicbb.4 \
 	iicbus.4 \

Modified: head/share/man/man4/ip.4
==============================================================================
--- head/share/man/man4/ip.4	Mon Mar  9 17:42:34 2009	(r189591)
+++ head/share/man/man4/ip.4	Mon Mar  9 17:53:05 2009	(r189592)
@@ -32,7 +32,7 @@
 .\"     @(#)ip.4	8.2 (Berkeley) 11/30/93
 .\" $FreeBSD$
 .\"
-.Dd February 13, 2009
+.Dd March 9, 2009
 .Dt IP 4
 .Os
 .Sh NAME
@@ -466,13 +466,19 @@ setsockopt(s, IPPROTO_IP, IP_MULTICAST_L
 .Pp
 This option
 improves performance for applications that may have no more than one
-instance on a single host (such as a router daemon), by eliminating
+instance on a single host (such as a routing daemon), by eliminating
 the overhead of receiving their own transmissions.
 It should generally not
 be used by applications for which there may be more than one instance on a
 single host (such as a conferencing program) or for which the sender does
 not belong to the destination group (such as a time querying program).
 .Pp
+The sysctl setting
+.Va net.inet.ip.mcast.loop
+controls the default setting of the
+.Dv IP_MULTICAST_LOOP
+socket option for new sockets.
+.Pp
 A multicast datagram sent with an initial TTL greater than 1 may be delivered
 to the sending host on a different interface from that on which it was sent,
 if the host belongs to the destination group on that other interface.
@@ -650,6 +656,13 @@ documented in RFC 3678.
 For management of source filter lists using this API,
 please refer to
 .Xr sourcefilter 3 .
+.Pp
+The sysctl settings
+.Va net.inet.ip.mcast.maxsocksrc
+and
+.Va net.inet.ip.mcast.maxgrpsrc
+are used to specify an upper limit on the number of per-socket and per-group
+source filter entries which the kernel may allocate.
 .\"-----------------------
 .Ss "Raw IP Sockets"
 .Pp
@@ -795,6 +808,7 @@ field was not equal to the length of the
 .Xr send 2 ,
 .Xr byteorder 3 ,
 .Xr icmp 4 ,
+.Xr igmp 4 ,
 .Xr inet 4 ,
 .Xr intro 4 ,
 .Xr multicast 4 ,

Modified: head/share/man/man4/multicast.4
==============================================================================
--- head/share/man/man4/multicast.4	Mon Mar  9 17:42:34 2009	(r189591)
+++ head/share/man/man4/multicast.4	Mon Mar  9 17:53:05 2009	(r189592)
@@ -956,6 +956,7 @@ after the previous upcall.
 .Xr socket 2 ,
 .Xr sourcefilter 3 ,
 .Xr icmp6 4 ,
+.Xr igmp 4 ,
 .Xr inet 4 ,
 .Xr inet6 4 ,
 .Xr intro 4 ,

Modified: head/sys/netinet/if_ether.c
==============================================================================
--- head/sys/netinet/if_ether.c	Mon Mar  9 17:42:34 2009	(r189591)
+++ head/sys/netinet/if_ether.c	Mon Mar  9 17:53:05 2009	(r189592)
@@ -81,7 +81,8 @@ __FBSDID("$FreeBSD$");
 
 #define SIN(s) ((struct sockaddr_in *)s)
 #define SDL(s) ((struct sockaddr_dl *)s)
-#define LLTABLE(ifp)	((struct lltable *)(ifp)->if_afdata[AF_INET])
+#define LLTABLE(ifp)	\
+	((struct in_ifinfo *)(ifp)->if_afdata[AF_INET])->ii_llt
 
 SYSCTL_DECL(_net_link_ether);
 SYSCTL_NODE(_net_link_ether, PF_INET, inet, CTLFLAG_RW, 0, "");

Modified: head/sys/netinet/igmp.c
==============================================================================
--- head/sys/netinet/igmp.c	Mon Mar  9 17:42:34 2009	(r189591)
+++ head/sys/netinet/igmp.c	Mon Mar  9 17:53:05 2009	(r189592)
@@ -1,4 +1,5 @@
 /*-
+ * Copyright (c) 2007-2009 Bruce Simpson.
  * Copyright (c) 1988 Stephen Deering.
  * Copyright (c) 1992, 1993
  *	The Regents of the University of California.  All rights reserved.
@@ -35,11 +36,13 @@
 
 /*
  * Internet Group Management Protocol (IGMP) routines.
+ * [RFC1112, RFC2236, RFC3376]
  *
  * Written by Steve Deering, Stanford, May 1988.
  * Modified by Rosen Sharma, Stanford, Aug 1994.
  * Modified by Bill Fenner, Xerox PARC, Feb 1995.
  * Modified to fully comply to IGMPv2 by Bill Fenner, Oct 1995.
+ * Significantly rewritten for IGMPv3, VIMAGE, and SMP by Bruce Simpson.
  *
  * MULTICAST Revision: 3.5.1.4
  */
@@ -52,6 +55,7 @@ __FBSDID("$FreeBSD$");
 
 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/module.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/socket.h>
@@ -59,8 +63,11 @@ __FBSDID("$FreeBSD$");
 #include <sys/kernel.h>
 #include <sys/sysctl.h>
 #include <sys/vimage.h>
+#include <sys/ktr.h>
+#include <sys/condvar.h>
 
 #include <net/if.h>
+#include <net/netisr.h>
 #include <net/route.h>
 #include <net/vnet.h>
 
@@ -78,464 +85,3608 @@ __FBSDID("$FreeBSD$");
 
 #include <security/mac/mac_framework.h>
 
-static MALLOC_DEFINE(M_IGMP, "igmp", "igmp state");
+#ifndef KTR_IGMPV3
+#define KTR_IGMPV3 KTR_SUBSYS
+#endif
+
+static struct igmp_ifinfo *
+		igi_alloc_locked(struct ifnet *);
+static void	igi_delete_locked(const struct ifnet *);
+static void	igmp_dispatch_queue(struct ifqueue *, int, const int);
+static void	igmp_fasttimo_vnet(void);
+static void	igmp_final_leave(struct in_multi *, struct igmp_ifinfo *);
+static int	igmp_handle_state_change(struct in_multi *,
+		    struct igmp_ifinfo *);
+static int	igmp_initial_join(struct in_multi *, struct igmp_ifinfo *);
+static int	igmp_input_v1_query(struct ifnet *, const struct ip *);
+static int	igmp_input_v2_query(struct ifnet *, const struct ip *,
+		    const struct igmp *);
+static int	igmp_input_v3_query(struct ifnet *, const struct ip *,
+		    /*const*/ struct igmpv3 *);
+static int	igmp_input_v3_group_query(struct in_multi *,
+		    struct igmp_ifinfo *, int, /*const*/ struct igmpv3 *);
+static int	igmp_input_v1_report(struct ifnet *, /*const*/ struct ip *,
+		    /*const*/ struct igmp *);
+static int	igmp_input_v2_report(struct ifnet *, /*const*/ struct ip *,
+		    /*const*/ struct igmp *);
+static void	igmp_intr(struct mbuf *);
+static int	igmp_isgroupreported(const struct in_addr);
+static struct mbuf *
+		igmp_ra_alloc(void);
+#ifdef KTR
+static char *	igmp_rec_type_to_str(const int);
+#endif
+static void	igmp_set_version(struct igmp_ifinfo *, const int);
+static void	igmp_slowtimo_vnet(void);
+static void	igmp_sysinit(void);
+static int	igmp_v1v2_queue_report(struct in_multi *, const int);
+static void	igmp_v1v2_process_group_timer(struct in_multi *, const int);
+static void	igmp_v1v2_process_querier_timers(struct igmp_ifinfo *);
+static void	igmp_v2_update_group(struct in_multi *, const int);
+static void	igmp_v3_cancel_link_timers(struct igmp_ifinfo *);
+static void	igmp_v3_dispatch_general_query(struct igmp_ifinfo *);
+static struct mbuf *
+		igmp_v3_encap_report(struct ifnet *, struct mbuf *);
+static int	igmp_v3_enqueue_group_record(struct ifqueue *,
+		    struct in_multi *, const int, const int, const int);
+static int	igmp_v3_enqueue_filter_change(struct ifqueue *,
+		    struct in_multi *);
+static void	igmp_v3_process_group_timers(struct igmp_ifinfo *,
+		    struct ifqueue *, struct ifqueue *, struct in_multi *,
+		    const int);
+static int	igmp_v3_merge_state_changes(struct in_multi *,
+		    struct ifqueue *);
+static void	igmp_v3_suppress_group_record(struct in_multi *);
+static int	sysctl_igmp_default_version(SYSCTL_HANDLER_ARGS);
+static int	sysctl_igmp_gsr(SYSCTL_HANDLER_ARGS);
+static int	sysctl_igmp_ifinfo(SYSCTL_HANDLER_ARGS);
+
+#ifdef VIMAGE
+static vnet_attach_fn	vnet_igmp_iattach;
+static vnet_detach_fn	vnet_igmp_idetach;
+#else
+static int	vnet_igmp_iattach(const void *);
+static int	vnet_igmp_idetach(const void *);
+#endif /* VIMAGE */
+
+/*
+ * System-wide globals.
+ *
+ * Unlocked access to these is OK, except for the global IGMP output
+ * queue. The IGMP subsystem lock ends up being system-wide for the moment,
+ * because all VIMAGEs have to share a global output queue, as netisrs
+ * themselves are not virtualized.
+ *
+ * Locking:
+ *  * The permitted lock order is: IN_MULTI_LOCK, IGMP_LOCK, IF_ADDR_LOCK.
+ *    Any may be taken independently; if any are held at the same
+ *    time, the above lock order must be followed.
+ *  * All output is delegated to the netisr to handle IFF_NEEDSGIANT.
+ *    Most of the time, direct dispatch will be fine.
+ *  * IN_MULTI_LOCK covers in_multi.
+ *  * IGMP_LOCK covers igmp_ifinfo and any global variables in this file,
+ *    including the output queue.
+ *  * IF_ADDR_LOCK covers if_multiaddrs, which is used for a variety of
+ *    per-link state iterators.
+ *  * igmp_ifinfo is valid as long as PF_INET is attached to the interface,
+ *    therefore it is not refcounted.
+ *    We allow unlocked reads of igmp_ifinfo when accessed via in_multi.
+ *
+ * Reference counting
+ *  * IGMP acquires its own reference every time an in_multi is passed to
+ *    it and the group is being joined for the first time.
+ *  * IGMP releases its reference(s) on in_multi in a deferred way,
+ *    because the operations which process the release run as part of
+ *    a loop whose control variables are directly affected by the release
+ *    (that, and not recursing on the IF_ADDR_LOCK).
+ *
+ * VIMAGE: Each in_multi corresponds to an ifp, and each ifp corresponds
+ * to a vnet in ifp->if_vnet.
+ *
+ */
+struct mtx		 igmp_mtx;
+int			 mpsafe_igmp = 0;
+SYSCTL_INT(_debug, OID_AUTO, mpsafe_igmp, CTLFLAG_RDTUN, &mpsafe_igmp, 0,
+    "Enable SMP-safe IGMPv3");
+
+struct mbuf		*m_raopt;		 /* Router Alert option */
+MALLOC_DEFINE(M_IGMP, "igmp", "igmp state");
+
+/*
+ * Global netisr output queue.
+ * This is only used as a last resort if we cannot directly dispatch.
+ * As IN_MULTI_LOCK is no longer in the bottom half of IP, we can do
+ * this, providing mpsafe_igmp is set. If it is not, we take Giant,
+ * and queueing is forced.
+ */
+struct ifqueue		 igmpoq;
+
+/*
+ * VIMAGE-wide globals.
+ *
+ * The IGMPv3 timers themselves need to run per-image, however,
+ * protosw timers run globally (see tcp).
+ * An ifnet can only be in one vimage at a time, and the loopback
+ * ifnet, loif, is itself virtualized.
+ * It would otherwise be possible to seriously hose IGMP state,
+ * and create inconsistencies in upstream multicast routing, if you have
+ * multiple VIMAGEs running on the same link joining different multicast
+ * groups, UNLESS the "primary IP address" is different. This is because
+ * IGMP for IPv4 does not force link-local addresses to be used for each
+ * node, unlike MLD for IPv6.
+ * Obviously the IGMPv3 per-interface state has per-vimage granularity
+ * also as a result.
+ *
+ * FUTURE: Stop using IFP_TO_IA/INADDR_ANY, and use source address selection
+ * policy to control the address used by IGMP on the link.
+ */
+#ifdef VIMAGE_GLOBALS
+int	 interface_timers_running;	 /* IGMPv3 general query response */
+int	 state_change_timers_running;	 /* IGMPv3 state-change retransmit */
+int	 current_state_timers_running;	 /* IGMPv1/v2 host report;
+					  * IGMPv3 g/sg query response */
+
+LIST_HEAD(, igmp_ifinfo)	 igi_head;
+struct igmpstat			 igmpstat;
+struct timeval			 igmp_gsrdelay;
+
+int	 igmp_recvifkludge;
+int	 igmp_sendra;
+int	 igmp_sendlocal;
+int	 igmp_v1enable;
+int	 igmp_v2enable;
+int	 igmp_legacysupp;
+int	 igmp_default_version;
+#endif /* VIMAGE_GLOBALS */
+
+/*
+ * Virtualized sysctls.
+ */
+SYSCTL_V_STRUCT(V_NET, vnet_inet, _net_inet_igmp, IGMPCTL_STATS, stats,
+    CTLFLAG_RW, igmpstat, igmpstat, "");
+SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, recvifkludge,
+    CTLFLAG_RW, igmp_recvifkludge, 0,
+    "Rewrite IGMPv1/v2 reports from 0.0.0.0 to contain subnet address");
+SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, sendra,
+    CTLFLAG_RW, igmp_sendra, 0,
+    "Send IP Router Alert option in IGMPv2/v3 messages");
+SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, sendlocal,
+    CTLFLAG_RW, igmp_sendlocal, 0,
+    "Send IGMP membership reports for 224.0.0.0/24 groups");
+SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, v1enable,
+    CTLFLAG_RW, igmp_v1enable, 0,
+    "Enable backwards compatibility with IGMPv1");
+SYSCTL_V_INT(V_NET, vnet_inet,  _net_inet_igmp, OID_AUTO, v2enable,
+    CTLFLAG_RW, igmp_v2enable, 0,
+    "Enable backwards compatibility with IGMPv2");
+SYSCTL_V_INT(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, legacysupp,
+    CTLFLAG_RW, igmp_legacysupp, 0,
+    "Allow v1/v2 reports to suppress v3 group responses");
+SYSCTL_V_PROC(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, default_version,
+    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, igmp_default_version, 0,
+    sysctl_igmp_default_version, "I",
+    "Default version of IGMP to run on each interface");
+SYSCTL_V_PROC(V_NET, vnet_inet, _net_inet_igmp, OID_AUTO, gsrdelay,
+    CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, igmp_gsrdelay.tv_sec, 0,
+    sysctl_igmp_gsr, "I",
+    "Rate limit for IGMPv3 Group-and-Source queries in seconds");
+
+/*
+ * Non-virtualized sysctls.
+ */
+SYSCTL_NODE(_net_inet_igmp, OID_AUTO, ifinfo, CTLFLAG_RD | CTLFLAG_MPSAFE,
+    sysctl_igmp_ifinfo, "Per-interface IGMPv3 state");
+
+static __inline void
+igmp_save_context(struct mbuf *m, struct ifnet *ifp)
+{
+
+#ifdef VIMAGE
+	m->m_pkthdr.header = ifp->if_vnet;
+#endif /* VIMAGE */
+	m->m_pkthdr.flowid = ifp->if_index;
+}
+
+static __inline void
+igmp_scrub_context(struct mbuf *m)
+{
+
+	m->m_pkthdr.header = NULL;
+	m->m_pkthdr.flowid = 0;
+}
+
+#ifdef KTR
+static __inline char *
+inet_ntoa_haddr(in_addr_t haddr)
+{
+	struct in_addr ia;
+
+	ia.s_addr = htonl(haddr);
+	return (inet_ntoa(ia));
+}
+#endif
+
+/*
+ * Restore context from a queued IGMP output chain.
+ * Return saved ifindex.
+ *
+ * VIMAGE: The assertion is there to make sure that we
+ * actually called CURVNET_SET() with what's in the mbuf chain.
+ */
+static __inline uint32_t
+igmp_restore_context(struct mbuf *m)
+{
+
+#ifdef notyet
+#if defined(VIMAGE) && defined(INVARIANTS)
+	KASSERT(curvnet == (m->m_pkthdr.header),
+	    ("%s: called when curvnet was not restored", __func__));
+#endif
+#endif
+	return (m->m_pkthdr.flowid);
+}
+
+/*
+ * Retrieve or set default IGMP version.
+ *
+ * VIMAGE: Assume curvnet set by caller.
+ * SMPng: NOTE: Serialized by IGMP lock.
+ */
+static int
+sysctl_igmp_default_version(SYSCTL_HANDLER_ARGS)
+{
+	int	 error;
+	int	 new;
+
+	error = sysctl_wire_old_buffer(req, sizeof(int));
+	if (error)
+		return (error);
+
+	IGMP_LOCK();
+
+	new = V_igmp_default_version;
+
+	error = sysctl_handle_int(oidp, &new, 0, req);
+	if (error || !req->newptr)
+		goto out_locked;
+
+	if (new < IGMP_VERSION_1 || new > IGMP_VERSION_3) {
+		error = EINVAL;
+		goto out_locked;
+	}
+
+	CTR2(KTR_IGMPV3, "change igmp_default_version from %d to %d",
+	     V_igmp_default_version, new);
+
+	V_igmp_default_version = new;
+
+out_locked:
+	IGMP_UNLOCK();
+	return (error);
+}
+
+/*
+ * Retrieve or set threshold between group-source queries in seconds.
+ *
+ * VIMAGE: Assume curvnet set by caller.
+ * SMPng: NOTE: Serialized by IGMP lock.
+ */
+static int
+sysctl_igmp_gsr(SYSCTL_HANDLER_ARGS)
+{
+	int error;
+	int i;
+
+	error = sysctl_wire_old_buffer(req, sizeof(int));
+	if (error)
+		return (error);
+
+	IGMP_LOCK();
+
+	i = V_igmp_gsrdelay.tv_sec;
+
+	error = sysctl_handle_int(oidp, &i, 0, req);
+	if (error || !req->newptr)
+		goto out_locked;
+
+	if (i < -1 || i >= 60) {
+		error = EINVAL;
+		goto out_locked;
+	}
+
+	CTR2(KTR_IGMPV3, "change igmp_gsrdelay from %d to %d",
+	     V_igmp_gsrdelay.tv_sec, i);
+	V_igmp_gsrdelay.tv_sec = i;
+
+out_locked:
+	IGMP_UNLOCK();
+	return (error);
+}
+
+/*
+ * Expose struct igmp_ifinfo to userland, keyed by ifindex.
+ * For use by ifmcstat(8).
+ *
+ * SMPng: NOTE: Does an unlocked ifindex space read.
+ * VIMAGE: Assume curvnet set by caller. The node handler itself
+ * is not directly virtualized.
+ */
+static int
+sysctl_igmp_ifinfo(SYSCTL_HANDLER_ARGS)
+{
+	INIT_VNET_NET(curvnet);
+	int			*name;
+	int			 error;
+	u_int			 namelen;
+	struct ifnet		*ifp;
+	struct igmp_ifinfo	*igi;
+
+	name = (int *)arg1;
+	namelen = arg2;
+
+	if (req->newptr != NULL)
+		return (EPERM);
+
+	if (namelen != 1)
+		return (EINVAL);
+
+	error = sysctl_wire_old_buffer(req, sizeof(struct igmp_ifinfo));
+	if (error)
+		return (error);
+
+	IN_MULTI_LOCK();
+	IGMP_LOCK();
+
+	if (name[0] <= 0 || name[0] > V_if_index) {
+		error = ENOENT;
+		goto out_locked;
+	}
+
+	error = ENOENT;
+
+	ifp = ifnet_byindex(name[0]);
+	if (ifp == NULL)
+		goto out_locked;
+
+	LIST_FOREACH(igi, &V_igi_head, igi_link) {
+		if (ifp == igi->igi_ifp) {
+			error = SYSCTL_OUT(req, igi,
+			    sizeof(struct igmp_ifinfo));
+			break;
+		}
+	}
+
+out_locked:
+	IGMP_UNLOCK();
+	IN_MULTI_UNLOCK();
+	return (error);
+}
+
+/*
+ * Dispatch an entire queue of pending packet chains
+ * using the netisr.
+ * VIMAGE: Assumes the vnet pointer has been set.
+ */
+static void
+igmp_dispatch_queue(struct ifqueue *ifq, int limit, const int loop)
+{
+	struct mbuf *m;
+
+	for (;;) {
+		_IF_DEQUEUE(ifq, m);
+		if (m == NULL)
+			break;
+		CTR3(KTR_IGMPV3, "%s: dispatch %p from %p", __func__, ifq, m);
+		if (loop)
+			m->m_flags |= M_IGMP_LOOP;
+		netisr_dispatch(NETISR_IGMP, m);
+		if (--limit == 0)
+			break;
+	}
+}
+
+/*
+ * Filter outgoing IGMP report state by group.
+ *
+ * Reports are ALWAYS suppressed for ALL-HOSTS (224.0.0.1).
+ * If the net.inet.igmp.sendlocal sysctl is 0, then IGMP reports are
+ * disabled for all groups in the 224.0.0.0/24 link-local scope. However,
+ * this may break certain IGMP snooping switches which rely on the old
+ * report behaviour.
+ *
+ * Return zero if the given group is one for which IGMP reports
+ * should be suppressed, or non-zero if reports should be issued.
+ */
+static __inline int
+igmp_isgroupreported(const struct in_addr addr)
+{
+
+	if (in_allhosts(addr) ||
+	    ((!V_igmp_sendlocal && IN_LOCAL_GROUP(ntohl(addr.s_addr)))))
+		return (0);
+
+	return (1);
+}
+
+/*
+ * Construct a Router Alert option to use in outgoing packets.
+ */
+static struct mbuf *
+igmp_ra_alloc(void)
+{
+	struct mbuf	*m;
+	struct ipoption	*p;
+
+	MGET(m, M_DONTWAIT, MT_DATA);
+	p = mtod(m, struct ipoption *);
+	p->ipopt_dst.s_addr = INADDR_ANY;
+	p->ipopt_list[0] = IPOPT_RA;	/* Router Alert Option */
+	p->ipopt_list[1] = 0x04;	/* 4 bytes long */
+	p->ipopt_list[2] = IPOPT_EOL;	/* End of IP option list */
+	p->ipopt_list[3] = 0x00;	/* pad byte */
+	m->m_len = sizeof(p->ipopt_dst) + p->ipopt_list[1];
+
+	return (m);
+}
+
+/*
+ * Attach IGMP when PF_INET is attached to an interface.
+ *
+ * VIMAGE: Currently we set the vnet pointer, although it is
+ * likely that it was already set by our caller.
+ */
+struct igmp_ifinfo *
+igmp_domifattach(struct ifnet *ifp)
+{
+	struct igmp_ifinfo *igi;
+
+	CTR3(KTR_IGMPV3, "%s: called for ifp %p(%s)",
+	    __func__, ifp, ifp->if_xname);
+
+	CURVNET_SET(ifp->if_vnet);
+	IGMP_LOCK();
+
+	igi = igi_alloc_locked(ifp);
+	if (!(ifp->if_flags & IFF_MULTICAST))
+		igi->igi_flags |= IGIF_SILENT;
+
+	IGMP_UNLOCK();
+	CURVNET_RESTORE();
+
+	return (igi);
+}
+
+/*
+ * VIMAGE: assume curvnet set by caller.
+ */
+static struct igmp_ifinfo *
+igi_alloc_locked(/*const*/ struct ifnet *ifp)
+{
+	struct igmp_ifinfo *igi;
+
+	IGMP_LOCK_ASSERT();
+
+	igi = malloc(sizeof(struct igmp_ifinfo), M_IGMP, M_NOWAIT|M_ZERO);
+	if (igi == NULL)
+		goto out;
+
+	igi->igi_ifp = ifp;
+	igi->igi_version = V_igmp_default_version;
+	igi->igi_flags = 0;
+	igi->igi_rv = IGMP_RV_INIT;
+	igi->igi_qi = IGMP_QI_INIT;
+	igi->igi_qri = IGMP_QRI_INIT;
+	igi->igi_uri = IGMP_URI_INIT;
+
+	SLIST_INIT(&igi->igi_relinmhead);
+
+	/*
+	 * Responses to general queries are subject to bounds.
+	 */
+	IFQ_SET_MAXLEN(&igi->igi_gq, IGMP_MAX_RESPONSE_PACKETS);
+
+	LIST_INSERT_HEAD(&V_igi_head, igi, igi_link);
+
+	CTR2(KTR_IGMPV3, "allocate igmp_ifinfo for ifp %p(%s)",
+	     ifp, ifp->if_xname);
+
+out:
+	return (igi);
+}
+
+/*
+ * Hook for ifdetach.
+ *
+ * NOTE: Some finalization tasks need to run before the protocol domain
+ * is detached, but also before the link layer does its cleanup.
+ *
+ * SMPNG: igmp_ifdetach() needs to take IF_ADDR_LOCK().
+ *
+ * VIMAGE: curvnet should have been set by caller, but let's not assume
+ * that for now.
+ */
+void
+igmp_ifdetach(struct ifnet *ifp)
+{
+	struct igmp_ifinfo	*igi;
+	struct ifmultiaddr	*ifma;
+	struct in_multi		*inm, *tinm;
+
+	CTR3(KTR_IGMPV3, "%s: called for ifp %p(%s)", __func__, ifp,
+	    ifp->if_xname);
+
+	CURVNET_SET(ifp->if_vnet);
+
+	IGMP_LOCK();
+
+	igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp;
+	if (igi->igi_version == IGMP_VERSION_3) {
+		IF_ADDR_LOCK(ifp);
+		TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
+			if (ifma->ifma_addr->sa_family != AF_INET)
+				continue;
+			inm = (struct in_multi *)ifma->ifma_protospec;
+			if (inm->inm_state == IGMP_LEAVING_MEMBER) {
+				SLIST_INSERT_HEAD(&igi->igi_relinmhead,
+				    inm, inm_nrele);
+			}
+			inm_clear_recorded(inm);
+		}
+		IF_ADDR_UNLOCK(ifp);
+		/*
+		 * Free the in_multi reference(s) for this IGMP lifecycle.
+		 */
+		SLIST_FOREACH_SAFE(inm, &igi->igi_relinmhead, inm_nrele,
+		    tinm) {
+			SLIST_REMOVE_HEAD(&igi->igi_relinmhead, inm_nrele);
+			inm_release_locked(inm);
+		}
+	}
+
+	IGMP_UNLOCK();
+
+#ifdef VIMAGE
+	/*
+	 * Plug the potential race which may occur when a VIMAGE
+	 * is detached and we are forced to queue pending IGMP output for
+	 * output netisr processing due to !mpsafe_igmp. In this case it
+	 * is possible that igmp_intr() is about to see mbuf chains with
+	 * invalid cached curvnet pointers.
+	 * This is a rare condition, so just blow them all away.
+	 * FUTURE: This may in fact not be needed, because IFF_NEEDSGIANT
+	 * is being removed in 8.x and the netisr may then be eliminated;
+	 * it is needed only if VIMAGE and IFF_NEEDSGIANT need to co-exist
+	 */
+	if (!mpsafe_igmp) {
+		int drops;
+
+		IF_LOCK(&igmpoq);
+		drops = igmpoq.ifq_len;
+		_IF_DRAIN(&igmpoq);
+		IF_UNLOCK(&igmpoq);
+		if (bootverbose && drops) {
+			printf("%s: dropped %d pending IGMP output packets\n",
+			    __func__, drops);
+		}
+	}
+#endif /* VIMAGE */
+
+	CURVNET_RESTORE();
+}
+
+/*
+ * Hook for domifdetach.
+ *
+ * VIMAGE: curvnet should have been set by caller, but let's not assume
+ * that for now.
+ */
+void
+igmp_domifdetach(struct ifnet *ifp)
+{
+	struct igmp_ifinfo *igi;
+
+	CTR3(KTR_IGMPV3, "%s: called for ifp %p(%s)",
+	    __func__, ifp, ifp->if_xname);
+
+	CURVNET_SET(ifp->if_vnet);
+	IGMP_LOCK();
+
+	igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp;
+	igi_delete_locked(ifp);
+
+	IGMP_UNLOCK();
+	CURVNET_RESTORE();
+}
+
+static void
+igi_delete_locked(const struct ifnet *ifp)
+{
+	struct igmp_ifinfo *igi, *tigi;
+
+	CTR3(KTR_IGMPV3, "%s: freeing igmp_ifinfo for ifp %p(%s)",
+	    __func__, ifp, ifp->if_xname);
+
+	IGMP_LOCK_ASSERT();
+
+	LIST_FOREACH_SAFE(igi, &V_igi_head, igi_link, tigi) {
+		if (igi->igi_ifp == ifp) {
+			/*
+			 * Free deferred General Query responses.
+			 */
+			_IF_DRAIN(&igi->igi_gq);
+
+			LIST_REMOVE(igi, igi_link);
+
+			KASSERT(SLIST_EMPTY(&igi->igi_relinmhead),
+			    ("%s: there are dangling in_multi references",
+			    __func__));
+
+			free(igi, M_IGMP);
+			return;
+		}
+	}
+
+#ifdef INVARIANTS
+	panic("%s: igmp_ifinfo not found for ifp %p\n", __func__,  ifp);
+#endif
+}
+
+/*
+ * Process a received IGMPv1 query.
+ * Return non-zero if the message should be dropped.
+ *
+ * VIMAGE: The curvnet pointer is derived from the input ifp.
+ */
+static int
+igmp_input_v1_query(struct ifnet *ifp, const struct ip *ip)
+{
+	INIT_VNET_INET(ifp->if_vnet);
+	struct ifmultiaddr	*ifma;
+	struct igmp_ifinfo	*igi;
+	struct in_multi		*inm;
+
+	/*
+	 * IGMPv1 General Queries SHOULD always addressed to 224.0.0.1.
+	 * igmp_group is always ignored. Do not drop it as a userland
+	 * daemon may wish to see it.
+	 */
+	if (!in_allhosts(ip->ip_dst)) {
+		++V_igmpstat.igps_rcv_badqueries;
+		return (0);
+	}
+
+	++V_igmpstat.igps_rcv_gen_queries;
+
+	/*
+	 * Switch to IGMPv1 host compatibility mode.
+	 */
+	IN_MULTI_LOCK();
+	IGMP_LOCK();
+
+	igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp;
+	KASSERT(igi != NULL, ("%s: no igmp_ifinfo for ifp %p", __func__, ifp));
+
+	if (igi->igi_flags & IGIF_LOOPBACK) {
+		CTR2(KTR_IGMPV3, "ignore v1 query on IGIF_LOOPBACK ifp %p(%s)",
+		    ifp, ifp->if_xname);
+		goto out_locked;
+	}
+
+	igmp_set_version(igi, IGMP_VERSION_1);
+
+	CTR2(KTR_IGMPV3, "process v1 query on ifp %p(%s)", ifp, ifp->if_xname);
+
+	/*
+	 * Start the timers in all of our group records
+	 * for the interface on which the query arrived,
+	 * except those which are already running.
+	 */
+	IF_ADDR_LOCK(ifp);
+	TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
+		if (ifma->ifma_addr->sa_family != AF_INET)
+			continue;
+		inm = (struct in_multi *)ifma->ifma_protospec;
+		if (inm->inm_timer != 0)
+			continue;
+		switch (inm->inm_state) {
+		case IGMP_NOT_MEMBER:
+		case IGMP_SILENT_MEMBER:
+			break;
+		case IGMP_G_QUERY_PENDING_MEMBER:
+		case IGMP_SG_QUERY_PENDING_MEMBER:
+		case IGMP_REPORTING_MEMBER:
+		case IGMP_IDLE_MEMBER:
+		case IGMP_LAZY_MEMBER:
+		case IGMP_SLEEPING_MEMBER:
+		case IGMP_AWAKENING_MEMBER:
+			inm->inm_state = IGMP_REPORTING_MEMBER;
+			inm->inm_timer = IGMP_RANDOM_DELAY(
+			    IGMP_V1V2_MAX_RI * PR_FASTHZ);
+			V_current_state_timers_running = 1;
+			break;
+		case IGMP_LEAVING_MEMBER:
+			break;
+		}
+	}
+	IF_ADDR_UNLOCK(ifp);
+
+out_locked:
+	IGMP_UNLOCK();
+	IN_MULTI_UNLOCK();
+
+	return (0);
+}
+
+/*
+ * Process a received IGMPv2 general or group-specific query.
+ */
+static int
+igmp_input_v2_query(struct ifnet *ifp, const struct ip *ip,
+    const struct igmp *igmp)
+{
+	struct ifmultiaddr	*ifma;
+	struct igmp_ifinfo	*igi;
+	struct in_multi		*inm;
+	uint16_t		 timer;
+
+	/*
+	 * Perform lazy allocation of IGMP link info if required,
+	 * and switch to IGMPv2 host compatibility mode.
+	 */
+	IN_MULTI_LOCK();
+	IGMP_LOCK();
+
+	igi = ((struct in_ifinfo *)ifp->if_afdata[AF_INET])->ii_igmp;
+	KASSERT(igi != NULL, ("%s: no igmp_ifinfo for ifp %p", __func__, ifp));
+
+	if (igi->igi_flags & IGIF_LOOPBACK) {
+		CTR2(KTR_IGMPV3, "ignore v2 query on IGIF_LOOPBACK ifp %p(%s)",
+		    ifp, ifp->if_xname);
+		goto out_locked;
+	}
+
+	igmp_set_version(igi, IGMP_VERSION_2);
+
+	timer = igmp->igmp_code * PR_FASTHZ / IGMP_TIMER_SCALE;
+	if (timer == 0)
+		timer = 1;
+
+	if (!in_nullhost(igmp->igmp_group)) {
+		/*
+		 * IGMPv2 Group-Specific Query.
+		 * If this is a group-specific IGMPv2 query, we need only
+		 * look up the single group to process it.
+		 */
+		inm = inm_lookup(ifp, igmp->igmp_group);
+		if (inm != NULL) {
+			CTR3(KTR_IGMPV3, "process v2 query %s on ifp %p(%s)",
+			    inet_ntoa(igmp->igmp_group), ifp, ifp->if_xname);
+			igmp_v2_update_group(inm, timer);
+		}
+		++V_igmpstat.igps_rcv_group_queries;
+	} else {
+		/*
+		 * IGMPv2 General Query.
+		 * If this was not sent to the all-hosts group, ignore it.
+		 */
+		if (in_allhosts(ip->ip_dst)) {
+			/*
+			 * For each reporting group joined on this
+			 * interface, kick the report timer.
+			 */
+			CTR2(KTR_IGMPV3,
+			    "process v2 general query on ifp %p(%s)",
+			    ifp, ifp->if_xname);
+
+			IF_ADDR_LOCK(ifp);
+			TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
+				if (ifma->ifma_addr->sa_family != AF_INET)
+					continue;
+				inm = (struct in_multi *)ifma->ifma_protospec;
+				igmp_v2_update_group(inm, timer);
+			}
+			IF_ADDR_UNLOCK(ifp);
+		}
+		++V_igmpstat.igps_rcv_gen_queries;
+	}
+
+out_locked:
+	IGMP_UNLOCK();
+	IN_MULTI_UNLOCK();
+
+	return (0);
+}
+
+/*
+ * Update the report timer on a group in response to an IGMPv2 query.
+ *
+ * If we are becoming the reporting member for this group, start the timer.
+ * If we already are the reporting member for this group, and timer is
+ * below the threshold, reset it.
+ *

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200903091753.n29Hr5Jm033456>