Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Jan 2014 11:32:49 +0400
From:      "Alexander V. Chernikov" <melifaro@yandex-team.ru>
To:        "net@freebsd.org" <net@freebsd.org>
Cc:        arch@freebsd.org, hackers@freebsd.org, "Andrey V. Elsukov" <ae@freebsd.org>
Subject:   "slow path" in network code || IPv6 panic on inteface removal
Message-ID:  <52E21721.5010309@yandex-team.ru>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------050804080408080705010802
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Hello guys!

Typically we're mostly interested in making "fast" paths in our code
running faster. However it seems it is time to take care of code which
is either called rarely or is quite complex in terms of relative code
size or/and locking.

Some good examples from current codebase are probably:
* L3->L2 mapping like ARP handling - while doing doing arpresolve we
discover there is no valid entry, so we start doing complex locking, are
request preparing/sending in the same piece of code. This washes out
both i/d caches and makes sending process _more_ unpredictable.
  Here we can queue given mbuf to delayed processing and return quickly.

* ip_fastfwd() handling corner cases. This is already optimized in terms
of splitting "fast" and "slow" code paths for all cases.

* ipfw(4) (and probably other pfil consumers) generating/sending various 
icmp/icmp6 packets for inbound mbuf


What exactly is proposed:
- Another one netisr queue for handling different types of packets
- metainfo is stored in mbuf_tag attached to packet
- ifnet departure handler taking care of packets queued from/to killed ifnet
- API to register/unregister/dispath given type of traffic



Real problem which is solved by this approach (traced by ae@):

We're using per-LLE IPv6 timers for various purposes, most of them
requires LLE modifications, so timer function starts with lle write lock
held.

Some timer events requires us to send neighbour solicication messages
which involves a) source address selection (requiring LLE lock being
held ) and b) calling ip6_output() which requires LLE lock being not
held. It is solved exactly as in IPv4 arp handling code: timer function
drops write lock before calling nd6_ns_output().

Dropping/acquiring lock is error-prone, for example, the following 
scenario is possible (traced by ae@):

we're calling if_detach(ifp) (thread 1) and nd6_llinfo_timer (thread 2).
Then the following can happen:

#1 T2 releases LLE lock and runs nd6_ns_output().
#2 T1 proceeds with detaching: in6_ifdetach() -> in6_purgeaddr() -> 
nd6_rem_ifa_lle() -> in6_lltable_prefix_free() which removes all LLEs 
for given prefix acquiring each LLE write lock. "Our" LLE is not 
destroyed since it is refcounted by nd6_llinfo_settimer_locked().

#3 T2 proceeds with nd6_ns_output() selecting source address (which 
involves acquiring LLE read lock)

#4 T1 finishes with detaching interface addresses and sets ifp->if_addr 
to NULL

#5 T2 calls nd6_ifptomac() which reads interface MAC from ifp->if_addr

#6 User inspects core generated by previous call

Using new API, we can avoid #6 by making the following code changes:
* LLE timer does not drop/reacquire LLE lock
* we require nd6_ns_output callers to lock LLE if it is provided
* nd6_ns_output() uses "slow" path instead of sending mbuf to 
ip6_output() immediately if LLE is not NULL.


What do you think?


--------------050804080408080705010802
Content-Type: text/x-patch;
 name="dly_fin2.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="dly_fin2.diff"

Index: sys/conf/files
===================================================================
--- sys/conf/files	(revision 260983)
+++ sys/conf/files	(working copy)
@@ -3044,6 +3044,7 @@ net/bpf_filter.c		optional bpf | netgraph_bpf
 net/bpf_zerocopy.c		optional bpf
 net/bridgestp.c			optional bridge | if_bridge
 net/flowtable.c			optional flowtable inet | flowtable inet6
+net/delayed_dispatch.c		standard
 net/ieee8023ad_lacp.c		optional lagg
 net/if.c			standard
 net/if_arcsubr.c		optional arcnet
Index: sys/net/netisr.c
===================================================================
--- sys/net/netisr.c	(revision 260983)
+++ sys/net/netisr.c	(working copy)
@@ -555,6 +555,81 @@ netisr_setqlimit(const struct netisr_handler *nhp,
 }
 
 /*
+ * Scan workqueue and delete mbufs pointed to handler.
+ */
+static int
+netisr_scan_workqueue(struct netisr_work *npwp, netisr_scan_t *scan_f,
+    void *data)
+{
+	struct mbuf *m, *m_prev;
+	int deleted;
+
+	deleted = 0;
+	m_prev = NULL;
+	m = npwp->nw_head;
+	while (m != NULL) {
+		if (scan_f(m, data) == 0) {
+			m_prev = m;
+			m = m->m_nextpkt;
+			continue;
+		}
+
+		/* Handler requested item deletion */
+		if (m_prev == NULL)
+			npwp->nw_head = m->m_nextpkt;
+		else
+			m_prev->m_nextpkt = m->m_nextpkt;
+
+		if (m->m_nextpkt == NULL)
+			npwp->nw_tail = m_prev;
+
+		npwp->nw_len--;
+		m_freem(m);
+		deleted++;
+
+		if (m_prev == NULL)
+			m = npwp->nw_head;
+		else
+			m = m_prev->m_nextpkt;
+	}
+
+	return (deleted);
+}
+
+int
+netisr_scan(unsigned int proto, netisr_scan_t *scan_f, void *data)
+{
+#ifdef NETISR_LOCKING
+	struct rm_priotracker tracker;
+#endif
+	struct netisr_proto *np;
+	struct netisr_work *npwp;
+	unsigned int i;
+	int deleted;
+
+#ifdef NETISR_LOCKING
+	NETISR_RLOCK(&tracker);
+#endif
+
+	deleted = 0;
+
+	KASSERT(scan_f != NULL, ("%s: scan function is NULL", __func__));
+
+	np = &netisr_proto[proto];
+
+	CPU_FOREACH(i) {
+		npwp = &(DPCPU_ID_PTR(i, nws))->nws_work[proto];
+		deleted += netisr_scan_workqueue(npwp, scan_f, data);
+	}
+
+#ifdef NETISR_LOCKING
+	NETISR_RUNLOCK(&tracker);
+#endif
+
+	return (deleted);
+}
+
+/*
  * Drain all packets currently held in a particular protocol work queue.
  */
 static void
Index: sys/net/netisr.h
===================================================================
--- sys/net/netisr.h	(revision 260983)
+++ sys/net/netisr.h	(working copy)
@@ -61,6 +61,7 @@
 #define	NETISR_IPV6	10
 #define	NETISR_NATM	11
 #define	NETISR_EPAIR	12		/* if_epair(4) */
+#define	NETISR_SLOWPATH	13		/* delayed dispatch */
 
 /*
  * Protocol ordering and affinity policy constants.  See the detailed
@@ -178,6 +179,7 @@ struct sysctl_netisr_work {
  */
 struct mbuf;
 typedef void		 netisr_handler_t(struct mbuf *m);
+typedef int		 netisr_scan_t(struct mbuf *m, void *);
 typedef struct mbuf	*netisr_m2cpuid_t(struct mbuf *m, uintptr_t source,
 			 u_int *cpuid);
 typedef	struct mbuf	*netisr_m2flow_t(struct mbuf *m, uintptr_t source);
@@ -212,6 +214,7 @@ void	netisr_getqlimit(const struct netisr_handler
 void	netisr_register(const struct netisr_handler *nhp);
 int	netisr_setqlimit(const struct netisr_handler *nhp, u_int qlimit);
 void	netisr_unregister(const struct netisr_handler *nhp);
+int	netisr_scan(u_int proto, netisr_scan_t *, void *);
 
 /*
  * Process a packet destined for a protocol, and attempt direct dispatch.
Index: sys/netinet6/nd6.c
===================================================================
--- sys/netinet6/nd6.c	(revision 260983)
+++ sys/netinet6/nd6.c	(working copy)
@@ -153,6 +153,8 @@ nd6_init(void)
 	callout_init(&V_nd6_slowtimo_ch, 0);
 	callout_reset(&V_nd6_slowtimo_ch, ND6_SLOWTIMER_INTERVAL * hz,
 	    nd6_slowtimo, curvnet);
+
+	nd6_nbr_init();
 }
 
 #ifdef VIMAGE
@@ -160,6 +162,7 @@ void
 nd6_destroy()
 {
 
+	nd6_nbr_destroy();
 	callout_drain(&V_nd6_slowtimo_ch);
 	callout_drain(&V_nd6_timer_ch);
 }
@@ -500,9 +503,7 @@ nd6_llinfo_timer(void *arg)
 		if (ln->la_asked < V_nd6_mmaxtries) {
 			ln->la_asked++;
 			nd6_llinfo_settimer_locked(ln, (long)ndi->retrans * hz / 1000);
-			LLE_WUNLOCK(ln);
 			nd6_ns_output(ifp, NULL, dst, ln, 0);
-			LLE_WLOCK(ln);
 		} else {
 			struct mbuf *m = ln->la_hold;
 			if (m) {
@@ -547,9 +548,7 @@ nd6_llinfo_timer(void *arg)
 			ln->la_asked = 1;
 			ln->ln_state = ND6_LLINFO_PROBE;
 			nd6_llinfo_settimer_locked(ln, (long)ndi->retrans * hz / 1000);
-			LLE_WUNLOCK(ln);
 			nd6_ns_output(ifp, dst, dst, ln, 0);
-			LLE_WLOCK(ln);
 		} else {
 			ln->ln_state = ND6_LLINFO_STALE; /* XXX */
 			nd6_llinfo_settimer_locked(ln, (long)V_nd6_gctimer * hz);
@@ -559,9 +558,7 @@ nd6_llinfo_timer(void *arg)
 		if (ln->la_asked < V_nd6_umaxtries) {
 			ln->la_asked++;
 			nd6_llinfo_settimer_locked(ln, (long)ndi->retrans * hz / 1000);
-			LLE_WUNLOCK(ln);
 			nd6_ns_output(ifp, dst, dst, ln, 0);
-			LLE_WLOCK(ln);
 		} else {
 			EVENTHANDLER_INVOKE(lle_event, ln, LLENTRY_EXPIRED);
 			(void)nd6_free(ln, 0);
Index: sys/netinet6/nd6.h
===================================================================
--- sys/netinet6/nd6.h	(revision 260983)
+++ sys/netinet6/nd6.h	(working copy)
@@ -421,6 +421,8 @@ int nd6_storelladdr(struct ifnet *, struct mbuf *,
 	const struct sockaddr *, u_char *, struct llentry **);
 
 /* nd6_nbr.c */
+void nd6_nbr_init(void);
+void nd6_nbr_destroy(void);
 void nd6_na_input(struct mbuf *, int, int);
 void nd6_na_output(struct ifnet *, const struct in6_addr *,
 	const struct in6_addr *, u_long, int, struct sockaddr *);
Index: sys/netinet6/nd6_nbr.c
===================================================================
--- sys/netinet6/nd6_nbr.c	(revision 260983)
+++ sys/netinet6/nd6_nbr.c	(working copy)
@@ -74,6 +74,7 @@ __FBSDID("$FreeBSD$");
 #include <netinet/icmp6.h>
 #include <netinet/ip_carp.h>
 #include <netinet6/send.h>
+#include <net/delayed_dispatch.h>
 
 #define SDL(s) ((struct sockaddr_dl *)s)
 
@@ -87,12 +88,37 @@ static void nd6_dad_ns_input(struct ifaddr *);
 static void nd6_dad_na_input(struct ifaddr *);
 static void nd6_na_output_fib(struct ifnet *, const struct in6_addr *,
     const struct in6_addr *, u_long, int, struct sockaddr *, u_int);
+static int nd6_ns_output2(struct mbuf *, int, uintptr_t, struct ifnet *);
 
 VNET_DEFINE(int, dad_ignore_ns) = 0;	/* ignore NS in DAD - specwise incorrect*/
 VNET_DEFINE(int, dad_maxtry) = 15;	/* max # of *tries* to transmit DAD packet */
 #define	V_dad_ignore_ns			VNET(dad_ignore_ns)
 #define	V_dad_maxtry			VNET(dad_maxtry)
 
+static struct dly_dispatcher dly_d = {
+	.name = "nd6_ns",
+	.dly_dispatch = nd6_ns_output2,
+};
+
+static int nd6_dlyid;
+
+void
+nd6_nbr_init()
+{
+
+	if (IS_DEFAULT_VNET(curvnet))
+		nd6_dlyid = dly_register(&dly_d);
+}
+
+void
+nd6_nbr_destroy()
+{
+
+	if (IS_DEFAULT_VNET(curvnet))
+		dly_unregister(nd6_dlyid);
+}
+
+
 /*
  * Input a Neighbor Solicitation Message.
  *
@@ -366,11 +392,34 @@ nd6_ns_input(struct mbuf *m, int off, int icmp6len
 	m_freem(m);
 }
 
+static int
+nd6_ns_output2(struct mbuf *m, int dad, uintptr_t _data, struct ifnet *ifp)
+{
+	struct ip6_moptions im6o;
+
+	if (m->m_flags & M_MCAST) {
+		im6o.im6o_multicast_ifp = ifp;
+		im6o.im6o_multicast_hlim = 255;
+		im6o.im6o_multicast_loop = 0;
+	}
+
+	/* Zero ingress interface not to fool PFIL consumers */
+	m->m_pkthdr.rcvif = NULL;
+
+	ip6_output(m, NULL, NULL, dad ? IPV6_UNSPECSRC : 0, &im6o, NULL, NULL);
+	icmp6_ifstat_inc(ifp, ifs6_out_msg);
+	icmp6_ifstat_inc(ifp, ifs6_out_neighborsolicit);
+	ICMP6STAT_INC(icp6s_outhist[ND_NEIGHBOR_SOLICIT]);
+
+	return (0);
+}
+
 /*
  * Output a Neighbor Solicitation Message. Caller specifies:
  *	- ICMP6 header source IP6 address
  *	- ND6 header target IP6 address
  *	- ND6 header source datalink address
+ * Note llentry has to be locked if specified
  *
  * Based on RFC 2461
  * Based on RFC 2462 (duplicate address detection)
@@ -386,11 +435,9 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_
 	struct m_tag *mtag;
 	struct ip6_hdr *ip6;
 	struct nd_neighbor_solicit *nd_ns;
-	struct ip6_moptions im6o;
 	int icmp6len;
 	int maxlen;
 	caddr_t mac;
-	struct route_in6 ro;
 
 	if (IN6_IS_ADDR_MULTICAST(taddr6))
 		return;
@@ -413,13 +460,8 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_
 	if (m == NULL)
 		return;
 
-	bzero(&ro, sizeof(ro));
-
 	if (daddr6 == NULL || IN6_IS_ADDR_MULTICAST(daddr6)) {
 		m->m_flags |= M_MCAST;
-		im6o.im6o_multicast_ifp = ifp;
-		im6o.im6o_multicast_hlim = 255;
-		im6o.im6o_multicast_loop = 0;
 	}
 
 	icmp6len = sizeof(*nd_ns);
@@ -468,7 +510,6 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_
 
 		hsrc = NULL;
 		if (ln != NULL) {
-			LLE_RLOCK(ln);
 			if (ln->la_hold != NULL) {
 				struct ip6_hdr *hip6;		/* hold ip6 */
 
@@ -483,7 +524,6 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_
 					hsrc = &hip6->ip6_src;
 				}
 			}
-			LLE_RUNLOCK(ln);
 		}
 		if (hsrc && (ifa = (struct ifaddr *)in6ifa_ifpwithaddr(ifp,
 		    hsrc)) != NULL) {
@@ -502,7 +542,7 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_
 
 			oifp = ifp;
 			error = in6_selectsrc(&dst_sa, NULL,
-			    NULL, &ro, NULL, &oifp, &src_in);
+			    NULL, NULL, NULL, &oifp, &src_in);
 			if (error) {
 				char ip6buf[INET6_ADDRSTRLEN];
 				nd6log((LOG_DEBUG,
@@ -572,20 +612,16 @@ nd6_ns_output(struct ifnet *ifp, const struct in6_
 		m_tag_prepend(m, mtag);
 	}
 
-	ip6_output(m, NULL, &ro, dad ? IPV6_UNSPECSRC : 0, &im6o, NULL, NULL);
-	icmp6_ifstat_inc(ifp, ifs6_out_msg);
-	icmp6_ifstat_inc(ifp, ifs6_out_neighborsolicit);
-	ICMP6STAT_INC(icp6s_outhist[ND_NEIGHBOR_SOLICIT]);
+	if (ln == NULL)
+		nd6_ns_output2(m, dad, 0, ifp);
+	else {
+		m->m_pkthdr.rcvif = ifp; /* Save VNET */
+		dly_queue(nd6_dlyid, m, dad, 0, ifp);
+	}
 
-	/* We don't cache this route. */
-	RO_RTFREE(&ro);
-
 	return;
 
   bad:
-	if (ro.ro_rt) {
-		RTFREE(ro.ro_rt);
-	}
 	m_freem(m);
 	return;
 }
Index: sys/sys/mbuf.h
===================================================================
--- sys/sys/mbuf.h	(revision 260983)
+++ sys/sys/mbuf.h	(working copy)
@@ -1022,6 +1022,7 @@ struct mbuf	*m_unshare(struct mbuf *, int);
 #define	PACKET_TAG_CARP				28 /* CARP info */
 #define	PACKET_TAG_IPSEC_NAT_T_PORTS		29 /* two uint16_t */
 #define	PACKET_TAG_ND_OUTGOING			30 /* ND outgoing */
+#define	PACKET_TAG_DISPATCH_INFO		31 /* Netist slow dispatch  */
 
 /* Specific cookies and tags. */
 
--- /dev/null	2014-01-24 00:33:00.000000000 +0400
+++ sys/net/delayed_dispatch.c	2014-01-24 00:17:05.573964680 +0400
@@ -0,0 +1,364 @@
+/*-
+ * Copyright (c) 2014 Alexander V. Chernikov
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD: head/sys/net/delayed_dispatch.c$");
+
+/*
+ * delayed dispatch is so-called "slowpath" packet path which permits you
+ * to enqueue mbufs requiring complex dispath (and/or possibly complex locking)
+ * into separate netisr queue instead of trying to deal with it in "fast" code path.
+ */
+
+#include <sys/param.h>
+#include <sys/kernel.h>
+#include <sys/lock.h>
+#include <sys/mbuf.h>
+#include <sys/socket.h>
+#include <sys/mutex.h>
+#include <sys/rmlock.h>
+#include <sys/syslog.h>
+#include <sys/types.h>
+#include <sys/taskqueue.h>
+#include <sys/eventhandler.h>
+
+#include <net/delayed_dispatch.h>
+#include <net/netisr.h>
+
+#include <net/if.h>
+#include <net/if_var.h>
+
+struct dly_info {
+	struct dly_dispatcher *index;
+	int alloc;
+	int count;
+	struct rmlock lock;
+};
+#define	DLY_ALLOC_ITEMS	16
+
+static struct dly_info dly;
+
+#define	DLY_LOCK_INIT()		rm_init(&dly.lock, "dly_lock")
+#define	DLY_RLOCK()		rm_rlock(&dly.lock, &tracker)
+#define	DLY_RUNLOCK()		rm_runlock(&dly.lock, &tracker)
+#define	DLY_WLOCK()		rm_wlock(&dly.lock)
+#define	DLY_WUNLOCK()		rm_wunlock(&dly.lock)
+#define	DLY_READER		struct rm_priotracker tracker
+
+static eventhandler_tag ifdetach_tag;
+
+/*
+ * Adds mbuf to slowpath queue. Additional information
+ * is stored in PACKET_TAG_DISPATCH_INFO mbuf tag.
+ * Returns 0 if successfull, error code overwise.
+ */
+int
+dly_queue(int dtype, struct mbuf *m, int dsubtype, uintptr_t data,
+    struct ifnet *ifp)
+{
+	struct dly_item *item;
+	struct m_tag *dtag;
+	DLY_READER;
+
+	/* Ensure we're not going to cycle packet */
+	if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) {
+		printf("tag found: %p\n", dtag);
+		return (EINVAL);
+	}
+
+	DLY_RLOCK();
+	if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) {
+		DLY_RUNLOCK();
+		printf("invalid dtype: 0..%d..%d\n", dtype, dly.alloc);
+		return (EINVAL);
+	}
+	DLY_RUNLOCK();
+
+	VNET_ASSERT(m->m_pkthdr.rcvif != NULL,
+	    ("%s:%d rcvif == NULL: m=%p", __func__, __LINE__, m));
+
+	/*
+	 * Do not allocate tag for basic IPv4/IPv6 output
+	 */
+	if (dtype != 0) {
+		dtag = m_tag_get(PACKET_TAG_DISPATCH_INFO,
+		    sizeof(struct dly_item), M_NOWAIT);
+	
+		if (dtag == NULL)
+			return (ENOBUFS);
+	
+		item = (struct dly_item *)(dtag + 1);
+	
+		item->type = dtype;
+		item->subtype = dsubtype;
+		item->data = data;
+		item->ifp = ifp;
+	
+		m_tag_prepend(m, dtag);
+	}
+
+	netisr_queue(NETISR_SLOWPATH, m);
+
+	return (0);
+}
+
+/*
+ * Adds mbuf to slowpath queue. User-provided buffer
+ * of size @size is stored inside PACKET_TAG_DISPATCH_INFO
+ * mbuf tag. Buffer structure needs to embed properly filled
+ * dly_item structure at the beginning of buffer. Such buffers
+ * needs to be dispatched by dly_pdispatch() handler.
+ *
+ * Returns 0 if successfull, error code overwise.
+ */
+int
+dly_pqueue(int dtype, struct mbuf *m, struct dly_item *item, size_t size)
+{
+	struct m_tag *dtag;
+	DLY_READER;
+
+	/* Ensure we're not going to cycle packet */
+	if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) {
+		return (EINVAL);
+	}
+
+	DLY_RLOCK();
+	if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) {
+		DLY_RUNLOCK();
+		return (EINVAL);
+	}
+	DLY_RUNLOCK();
+
+	VNET_ASSERT(m->m_pkthdr.rcvif != NULL,
+	    ("%s:%d rcvif == NULL: m=%p", __func__, __LINE__, m));
+
+	dtag = m_tag_get(PACKET_TAG_DISPATCH_INFO, size, M_NOWAIT);
+	
+	if (dtag == NULL)
+		return (ENOBUFS);
+
+	memcpy(dtag + 1, item, size);
+	m_tag_prepend(m, dtag);
+	netisr_queue(NETISR_SLOWPATH, m);
+
+	return (0);
+}
+
+/*
+ * Base netisr handler for slowpath
+ */
+static void
+dly_dispatch_item(struct mbuf *m)
+{
+	struct m_tag *dtag;
+	struct dly_item *item;
+	int dtype;
+	struct dly_dispatcher *dld;
+	DLY_READER;
+
+	item = NULL;
+	dtype = 0;
+
+	if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) {
+		item = (struct dly_item *)(dtag + 1);
+		dtype = item->type;
+	}
+
+	DLY_RLOCK();
+	if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) {
+		DLY_RUNLOCK();
+		return;
+	}
+
+	dld = &dly.index[dtype];
+
+	if (dld->dly_dispatch != NULL)
+		dld->dly_dispatch(m, item->subtype, item->data, item->ifp);
+	else
+		dld->dly_pdispatch(m, item);
+
+	DLY_RUNLOCK();
+
+	return;
+}
+
+
+/*
+ * Check if queue items is received or going to be transmitted
+ * via destroying interface.
+ */
+static int
+dly_scan_ifp(struct mbuf *m, void *_data)
+{
+	struct m_tag *dtag;
+	struct dly_item *item;
+	struct ifnet *difp;
+
+	difp = (struct ifnet *)_data;
+
+	if (m->m_pkthdr.rcvif == difp)
+		return (1);
+
+	if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) {
+		item = (struct dly_item *)(dtag + 1);
+		if (item->ifp == difp)
+			return (1);
+	}
+
+	return (0);
+}
+
+/*
+ * Registers new slowpath handler.
+ * Returns handler id to use in dly_queue() or
+ * dly_pqueue() functions/
+ */
+int
+dly_register(struct dly_dispatcher *dld)
+{
+	int i, alloc;
+	struct dly_dispatcher *dd, *tmp;
+
+again:
+	DLY_WLOCK();
+
+	if (dly.count < dly.alloc) {
+		i = dly.count++;
+		dly.index[i] = *dld;
+		DLY_WUNLOCK();
+		return (i);
+	}
+
+	alloc = dly.alloc + DLY_ALLOC_ITEMS;
+
+	DLY_WUNLOCK();
+
+	/* No spare room, need to increase */
+	dd = malloc(sizeof(struct dly_dispatcher) * alloc, M_TEMP,
+	    M_ZERO|M_WAITOK);
+
+	DLY_WLOCK();
+	if (dly.alloc >= alloc) {
+		/* Lost the race, try again */
+		DLY_WUNLOCK();
+		free(dd, M_TEMP);
+		goto again;
+	}
+
+	memcpy(dly.index, dd, sizeof(struct dly_dispatcher) * dly.alloc);
+	tmp = dly.index;
+	dly.index = dd;
+	dly.alloc = alloc;
+	i = dly.count++;
+	dly.index[i] = *dld;
+	DLY_WUNLOCK();
+
+	free(tmp, M_TEMP);
+
+	return (i);
+}
+
+/*
+ * Checks if given netisr queue item is of type which
+ * needs to be unregistered.
+ */
+static int
+dly_scan_unregistered(struct mbuf *m, void *_data)
+{
+	struct m_tag *dtag;
+	struct dly_item *item;
+	int i;
+
+	i = *((int *)(intptr_t)_data);
+
+	if ((dtag = m_tag_find(m, PACKET_TAG_DISPATCH_INFO, NULL)) != NULL) {
+		item = (struct dly_item *)(dtag + 1);
+		if (item->type == i)
+			return (1);
+	}
+
+	return (0);
+}
+
+/*
+ * Unregisters slow handler registered previously by dly_register().
+ * Caller needs to ensure that no new items of given type can be queued
+ * prior calling this function.
+ */
+void
+dly_unregister(int dtype)
+{
+
+	netisr_scan(NETISR_SLOWPATH, dly_scan_unregistered, &dtype);
+
+	DLY_WLOCK();
+	if (dtype < 0 || dtype >= dly.alloc || dly.index[dtype].name == NULL) {
+		DLY_WUNLOCK();
+		return;
+	}
+
+	KASSERT(dly.index[dtype].name != NULL,
+	    ("%s: unresigstering non-existend protocol %d", __func__, dtype));
+
+	memset(&dly.index[dtype], 0, sizeof(struct dly_dispatcher));
+	DLY_WUNLOCK();
+}
+
+
+static void
+dly_ifdetach(void *arg __unused, struct ifnet *ifp)
+{
+
+	netisr_scan(NETISR_SLOWPATH, dly_scan_ifp, ifp);
+}
+
+static struct netisr_handler	dly_nh = {
+	.nh_name = "slow",
+	.nh_handler = dly_dispatch_item,
+	.nh_proto = NETISR_SLOWPATH,
+	.nh_policy = NETISR_POLICY_SOURCE,
+};
+
+static void
+dly_init(__unused void *arg)
+{
+
+	memset(&dly, 0, sizeof(dly));
+	dly.index = malloc(sizeof(struct dly_dispatcher) * DLY_ALLOC_ITEMS,
+	    M_TEMP, M_ZERO|M_WAITOK);
+	dly.alloc = DLY_ALLOC_ITEMS;
+	dly.count = 1;
+
+	DLY_LOCK_INIT();
+
+	netisr_register(&dly_nh);
+	ifdetach_tag = EVENTHANDLER_REGISTER(ifnet_departure_event,
+	    dly_ifdetach, NULL, EVENTHANDLER_PRI_ANY);
+}
+
+/* Exactly after netisr */
+SYSINIT(dly_init, SI_SUB_SOFTINTR, SI_ORDER_SECOND, dly_init, NULL);
+
--- /dev/null	2014-01-24 00:33:00.000000000 +0400
+++ sys/net/delayed_dispatch.h	2014-01-23 23:54:50.166594749 +0400
@@ -0,0 +1,57 @@
+/*-
+ * Copyright (c) 2014 Alexander V. Chernikov
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD: head/sys/net/netisr.h 222249 2011-05-24 12:34:19Z rwatson $
+ */
+
+#ifndef _NET_DELAYED_DISPATCH_H_
+#define _NET_DELAYED_DISPATCH_H_
+
+struct dly_item {
+	int type;
+	int subtype;
+	struct ifnet *ifp;
+	uintptr_t data;
+};
+
+typedef int dly_dispatch_t(struct mbuf *, int, uintptr_t, struct ifnet *);
+typedef int dly_pdispatch_t(struct mbuf *, struct dly_item *);
+typedef int dly_free_t(struct mbuf *, int, uintptr_t, struct ifnet *);
+
+struct dly_dispatcher {
+	const char	*name;
+	dly_dispatch_t	*dly_dispatch;
+	dly_pdispatch_t	*dly_pdispatch;
+	dly_free_t	*dly_free;
+};
+
+
+int dly_register(struct dly_dispatcher *);
+void dly_unregister(int);
+int dly_queue(int, struct mbuf *, int, uintptr_t, struct ifnet *);
+int dly_pqueue(int, struct mbuf *, struct dly_item *, size_t);
+
+#endif
+

--------------050804080408080705010802--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52E21721.5010309>