From owner-svn-src-projects@FreeBSD.ORG  Mon May 18 17:08:57 2009
Return-Path: <owner-svn-src-projects@FreeBSD.ORG>
Delivered-To: svn-src-projects@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A407D1065676;
	Mon, 18 May 2009 17:08:57 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c])
	by mx1.freebsd.org (Postfix) with ESMTP id 8F9948FC1F;
	Mon, 18 May 2009 17:08:57 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from svn.freebsd.org (localhost [127.0.0.1])
	by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id n4IH8v4K068229;
	Mon, 18 May 2009 17:08:57 GMT (envelope-from rwatson@svn.freebsd.org)
Received: (from rwatson@localhost)
	by svn.freebsd.org (8.14.3/8.14.3/Submit) id n4IH8vPP068227;
	Mon, 18 May 2009 17:08:57 GMT (envelope-from rwatson@svn.freebsd.org)
Message-Id: <200905181708.n4IH8vPP068227@svn.freebsd.org>
From: Robert Watson <rwatson@FreeBSD.org>
Date: Mon, 18 May 2009 17:08:57 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-projects@freebsd.org
X-SVN-Group: projects
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: 
Subject: svn commit: r192309 - projects/pnet/sys/net
X-BeenThere: svn-src-projects@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the src &quot; projects&quot;
	tree" <svn-src-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
	<mailto:svn-src-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-projects>
List-Post: <mailto:svn-src-projects@freebsd.org>
List-Help: <mailto:svn-src-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
	<mailto:svn-src-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 18 May 2009 17:08:58 -0000

Author: rwatson
Date: Mon May 18 17:08:57 2009
New Revision: 192309
URL: http://svn.freebsd.org/changeset/base/192309

Log:
  Further refinement of netisr2:
  
  - Add netisr2_getqlimit() and netisr2_setqlimit() interfaces to allow
    protocols to query and manipulate per-workstream queue depth limits.
    This is required for routing socket and IPv4 netisrs which currently
    offer this functionality.
  
  - Add netisr2_getqdrops() and netisr2_clearqdrops() interfaces to allow
    protocols to query drops across CPUs, as well as clear drop
    statistics.  This is required for IPv4.
  
  - Use u_int64_t rather than u_int for stats.
  
  - Rather than passing in each parameter individually for
    netisr2_register(), netisr2_unregister(), define a public struct
    netisr_handler, with padding, to describe protocols.
  
  - Explicitly enumerate policies supported by netisr2, rather than
    deriving them from implemented function pointers; this allows multiple
    policies to depend on the same function pointers if desired.  We
    implement three policies now: NETISR_POLICY_SOURCE, NETISR_POLICY_FLOW,
    NETISR_POLICY_CPU.
  
  - Now that we use swi's, we can acquire the netisr lock around processing
    runs, since the wakeup can be waited for without holding the workstream
    lock.
  
  - Garbage collect NWS_SWI_BOUND and manual binding with sched_bind(),
    use intr_event_bind() now that this is supported for software
    interrupt threads.

Modified:
  projects/pnet/sys/net/netisr2.c
  projects/pnet/sys/net/netisr2.h

Modified: projects/pnet/sys/net/netisr2.c
==============================================================================
--- projects/pnet/sys/net/netisr2.c	Mon May 18 16:00:18 2009	(r192308)
+++ projects/pnet/sys/net/netisr2.c	Mon May 18 17:08:57 2009	(r192309)
@@ -33,7 +33,7 @@ __FBSDID("$FreeBSD$");
  * registered protocol handlers.  Callers pass a protocol identifier and
  * packet to netisr2, along with a direct dispatch hint, and work will either
  * be immediately processed with the registered handler, or passed to a
- * kernel worker thread for deferred dispatch.
+ * kernel software interrupt (SWI) thread for deferred dispatch.
  *
  * Maintaining ordering for protocol streams is a critical design concern.
  * Enforcing ordering limits the opportunity for concurrency, but maintains
@@ -46,23 +46,10 @@ __FBSDID("$FreeBSD$");
  * to avoid lock migration and contention where locks are associated with
  * more than one flow.
  *
- * There are two cases:
- *
- * - The packet has a flow ID, query the protocol to map it to a CPU and
- *   execute there if not direct dispatching.
- *
- * - The packet has no flowid, query the protocol to generate a flow ID, then
- *   query a CPU and execute there if not direct dispatching.
- *
- * We guarantee that if two packets from the same source have the same
- * protocol, and the source provides an ordering, that ordering will be
- * maintained *unless* the policy is changing between queued and direct
- * dispatch in which case minor re-ordering might occur.
- *
- * Some possible sources of flow identifiers for packets:
- * - Hardware-generated hash from RSS
- * - Software-generated hash from addresses and ports identifying the flow
- * - Interface identifier the packet came from
+ * netisr2 supports several policy variations, represented by the
+ * NETISR_POLICY_* constants, allowing protocols to play a varying role in
+ * identifying flows, assigning work to CPUs, etc.  These are described in
+ * detail in netisr2.h.
  */
 
 #include "opt_ddb.h"
@@ -79,6 +66,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/rwlock.h>
 #include <sys/sched.h>
 #include <sys/smp.h>
+#include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <sys/systm.h>
 
@@ -86,6 +74,8 @@ __FBSDID("$FreeBSD$");
 #include <ddb/ddb.h>
 #endif
 
+#include <net/if.h>
+#include <net/if_var.h>
 #include <net/netisr.h>
 #include <net/netisr2.h>
 
@@ -130,7 +120,8 @@ SYSCTL_INT(_net_isr2, OID_AUTO, direct, 
  * Allow the administrator to limit the number of threads (CPUs) to use for
  * netisr2.  Notice that we don't check netisr_maxthreads before creating the
  * thread for CPU 0, so in practice we ignore values <= 1.  This must be set
- * as a tunable, no run-time reconfiguration yet.
+ * as a tunable, no run-time reconfiguration yet.  We will create at most one
+ * thread per available CPU.
  */
 static int	netisr_maxthreads = 1;		/* Max number of threads. */
 TUNABLE_INT("net.isr2.maxthreads", &netisr_maxthreads);
@@ -142,16 +133,24 @@ TUNABLE_INT("net.isr2.bindthreads", &net
 SYSCTL_INT(_net_isr2, OID_AUTO, bindthreads, CTLFLAG_RD, &netisr_bindthreads,
     0, "Bind netisr2 threads to CPUs.");
 
+#define	NETISR_MAXQLIMIT	10240
+static int	netisr_maxqlimit = NETISR_MAXQLIMIT;
+TUNABLE_INT("net.isr2.bindthreads", &netisr_bindthreads);
+SYSCTL_INT(_net_isr2, OID_AUTO, maxqlimit, CTLFLAG_RD, &netisr_maxqlimit,
+    0, "Maximum netisr2 per-protocol, per-CPU queue depth.");
+
 /*
  * Each protocol is described by an instance of netisr_proto, which holds all
  * global per-protocol information.  This data structure is set up by
- * netisr_register().
+ * netisr_register(), and derived from the public struct netisr_handler.
  */
 struct netisr_proto {
-	const char		*np_name;	/* Protocol name. */
-	netisr_t		*np_func;	/* Protocol handler. */
-	netisr_m2flow_t		*np_m2flow;	/* mbuf -> flow ID. */
-	netisr_flow2cpu_t	*np_flow2cpu;	/* Flow ID -> CPU ID. */
+	const char	*np_name;	/* Character string protocol name. */
+	netisr_t	*np_handler;	/* Protocol handler. */
+	netisr_m2flow_t	*np_m2flow;	/* Query flow for untagged packet. */
+	netisr_m2cpu_t	*np_m2cpu;	/* Query CPU to process packet on. */
+	u_int		 np_qlimit;	/* Maximum per-CPU queue depth. */
+	u_int		 np_policy;	/* Work placement policy. */
 };
 
 #define	NETISR_MAXPROT		32		/* Compile-time limit. */
@@ -177,16 +176,16 @@ struct netisr_work {
 	struct mbuf	*nw_head;
 	struct mbuf	*nw_tail;
 	u_int		 nw_len;
-	u_int		 nw_max;
+	u_int		 nw_qlimit;
 	u_int		 nw_watermark;
 
 	/*
 	 * Statistics -- written unlocked, but mostly from curcpu.
 	 */
-	u_int		 nw_dispatched; /* Number of direct dispatches. */
-	u_int		 nw_dropped;	/* Number of drops. */
-	u_int		 nw_queued;	/* Number of enqueues. */
-	u_int		 nw_handled;	/* Number passed into handler. */
+	u_int64_t	 nw_dispatched; /* Number of direct dispatches. */
+	u_int64_t	 nw_qdrops;	/* Number of drops. */
+	u_int64_t	 nw_queued;	/* Number of enqueues. */
+	u_int64_t	 nw_handled;	/* Number passed into handler. */
 };
 
 /*
@@ -197,7 +196,7 @@ struct netisr_work {
  * concurrent processing is prevented by the NWS_RUNNING flag, which
  * indicates that a thread is already processing the work queue.
  *
- * Currently, #workstreams must equal #CPUs.
+ * #workstreams must be <= #CPUs.
  */
 struct netisr_workstream {
 	struct intr_event *nws_intr_event;	/* Handler for stream. */
@@ -205,9 +204,8 @@ struct netisr_workstream {
 	struct mtx	 nws_mtx;		/* Synchronize work. */
 	u_int		 nws_cpu;		/* CPU pinning. */
 	u_int		 nws_flags;		/* Wakeup flags. */
-	u_int		 nws_swi_flags;		/* Flags used in swi. */
-
 	u_int		 nws_pendingwork;	/* Across all protos. */
+
 	/*
 	 * Each protocol has per-workstream data.
 	 */
@@ -239,11 +237,6 @@ static u_int				 nws_count;
 #define	NWS_SIGNALED	0x00000002	/* Signal issued. */
 
 /*
- * Flags used internally to the SWI handler -- no locking required.
- */
-#define	NWS_SWI_BOUND	0x00000001	/* SWI bound to CPU. */
-
-/*
  * Synchronization for each workstream: a mutex protects all mutable fields
  * in each stream, including per-protocol state (mbuf queues).  The SWI is
  * woken up if asynchronous dispatch is required.
@@ -268,20 +261,23 @@ u_int
 netisr2_get_cpuid(u_int cpunumber)
 {
 
+	KASSERT(cpunumber < nws_count, ("netisr2_get_cpuid: %u > %u",
+	    cpunumber, nws_count));
+
 	return (nws_array[cpunumber]);
 }
 
 /*
- * The default implementation of (source, flow ID) -> CPU ID mapping.
+ * The default implementation of -> CPU ID mapping.
+ *
  * Non-static so that protocols can use it to map their own work to specific
  * CPUs in a manner consistent to netisr2 for affinity purposes.
  */
 u_int
-netisr2_default_flow2cpu(uintptr_t source, u_int flowid)
+netisr2_default_flow2cpu(u_int flowid)
 {
 
-	return (netisr2_get_cpuid((source ^ flowid) %
-	    netisr2_get_cpucount()));
+	return (netisr2_get_cpuid(flowid % nws_count));
 }
 
 /*
@@ -290,46 +286,195 @@ netisr2_default_flow2cpu(uintptr_t sourc
  * the protocol is installed.
  */
 void
-netisr2_register(u_int proto, const char *name, netisr_t func,
-    netisr_m2flow_t m2flow, netisr_flow2cpu_t flow2cpu, u_int max)
+netisr2_register(const struct netisr_handler *nhp)
 {
 	struct netisr_work *npwp;
-	int i;
+	const char *name;
+	u_int i, proto;
 
-	NETISR_WLOCK();
+	proto = nhp->nh_proto;
+	name = nhp->nh_name;
 	KASSERT(proto < NETISR_MAXPROT,
-	    ("netisr2_register(%d, %s): too many protocols", proto, name));
+	    ("netisr2_register(%d, %s): protocol too big", proto, name));
+	NETISR_WLOCK();
+
+	/*
+	 * Test that no existing registration exists for this protocol.
+	 */
 	KASSERT(np[proto].np_name == NULL,
 	    ("netisr2_register(%d, %s): name present", proto, name));
-	KASSERT(np[proto].np_func == NULL,
-	    ("netisr2_register(%d, %s): func present", proto, name));
-	KASSERT(np[proto].np_m2flow == NULL,
-	    ("netisr2_register(%d, %s): m2flow present", proto, name));
-	KASSERT(np[proto].np_flow2cpu == NULL,
-	    ("netisr2_register(%d, %s): flow2cpu present", proto, name));
+	KASSERT(np[proto].np_handler == NULL,
+	    ("netisr2_register(%d, %s): handler present", proto, name));
 
-	KASSERT(name != NULL, ("netisr2_register: name NULL for %d", proto));
-	KASSERT(func != NULL, ("netisr2_register: func NULL for %s", name));
+	/*
+	 * Test that the requested registration is valid.
+	 */
+	KASSERT(nhp->nh_name != NULL,
+	    ("netisr2_register: nh_name NULL for %d", proto));
+	KASSERT(nhp->nh_handler != NULL,
+	    ("netisr2_register: nh_handler NULL for %s", name));
+	KASSERT(nhp->nh_policy == NETISR_POLICY_SOURCE ||
+	    nhp->nh_policy == NETISR_POLICY_FLOW ||
+	    nhp->nh_policy == NETISR_POLICY_CPU,
+	    ("netisr2_register: unsupported nh_policy %u for %s",
+	    nhp->nh_policy, name));
+	KASSERT(nhp->nh_policy == NETISR_POLICY_FLOW ||
+	    nhp->nh_m2flow == NULL,
+	    ("netisr2_register: nh_policy != FLOW but m2flow defined for %s",
+	    name));
+	KASSERT(nhp->nh_policy == NETISR_POLICY_CPU || nhp->nh_m2cpu == NULL,
+	    ("netisr2_register: nh_policy != CPU but m2cpu defined for %s",
+	    name));
+	KASSERT(nhp->nh_policy != NETISR_POLICY_CPU || nhp->nh_m2cpu != NULL,
+	    ("netisr2_register: nh_policy == CPU but m2cpu not defined for "
+	    "%s", name));
+	KASSERT(nhp->nh_qlimit != 0,
+	    ("netisr2_register: nh_qlimit 0 for %s", name));
+	KASSERT(nhp->nh_qlimit <= netisr_maxqlimit,
+	    ("netisr2_register: nh_qlimit (%u) above max %u for %s",
+	    nhp->nh_qlimit, netisr_maxqlimit, name));
 
 	/*
 	 * Initialize global and per-workstream protocol state.
 	 */
 	np[proto].np_name = name;
-	np[proto].np_func = func;
-	np[proto].np_m2flow = m2flow;
-	if (flow2cpu != NULL)
-		np[proto].np_flow2cpu = flow2cpu;
-	else
-		np[proto].np_flow2cpu = netisr2_default_flow2cpu;
+	np[proto].np_handler = nhp->nh_handler;
+	np[proto].np_m2flow = nhp->nh_m2flow;
+	np[proto].np_m2cpu = nhp->nh_m2cpu;
+	np[proto].np_qlimit = nhp->nh_qlimit;
+	np[proto].np_policy = nhp->nh_policy;
 	for (i = 0; i < MAXCPU; i++) {
 		npwp = &nws[i].nws_work[proto];
 		bzero(npwp, sizeof(*npwp));
-		npwp->nw_max = max;
+		npwp->nw_qlimit = nhp->nh_qlimit;
+	}
+	NETISR_WUNLOCK();
+}
+
+/*
+ * Clear drop counters across all workstreams for a protocol.
+ */
+void
+netisr2_clearqdrops(const struct netisr_handler *nhp)
+{
+	struct netisr_work *npwp;
+#ifdef INVARIANTS
+	const char *name;
+#endif
+	u_int i, proto;
+
+	proto = nhp->nh_proto;
+#ifdef INVARIANTS
+	name = nhp->nh_name;
+#endif
+	KASSERT(proto < NETISR_MAXPROT,
+	    ("netisr_clearqdrops(%d): protocol too big for %s", proto, name));
+	NETISR_WLOCK();
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr_clearqdrops(%d): protocol not registered for %s", proto,
+	    name));
+
+	for (i = 0; i < MAXCPU; i++) {
+		npwp = &nws[i].nws_work[proto];
+		npwp->nw_qdrops = 0;
 	}
 	NETISR_WUNLOCK();
 }
 
 /*
+ * Query the current drop counters across all workstreams for a protocol.
+ */
+void
+netisr2_getqdrops(const struct netisr_handler *nhp, u_int64_t *qdropp)
+{
+	struct netisr_work *npwp;
+#ifdef INVARIANTS
+	const char *name;
+#endif
+	u_int i, proto;
+
+	*qdropp = 0;
+	proto = nhp->nh_proto;
+#ifdef INVARIANTS
+	name = nhp->nh_name;
+#endif
+	KASSERT(proto < NETISR_MAXPROT,
+	    ("netisr_getqdrops(%d): protocol too big for %s", proto, name));
+	NETISR_RLOCK();
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr_getqdrops(%d): protocol not registered for %s", proto,
+	    name));
+
+	for (i = 0; i < MAXCPU; i++) {
+		npwp = &nws[i].nws_work[proto];
+		*qdropp += npwp->nw_qdrops;
+	}
+	NETISR_RUNLOCK();
+}
+
+/*
+ * Query the current queue limit for per-workstream queues for a protocol.
+ */
+void
+netisr2_getqlimit(const struct netisr_handler *nhp, u_int *qlimitp)
+{
+#ifdef INVARIANTS
+	const char *name;
+#endif
+	u_int proto;
+
+	proto = nhp->nh_proto;
+#ifdef INVARIANTS
+	name = nhp->nh_name;
+#endif
+	KASSERT(proto < NETISR_MAXPROT,
+	    ("netisr_getqlimit(%d): protocol too big for %s", proto, name));
+	NETISR_RLOCK();
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr_getqlimit(%d): protocol not registered for %s", proto,
+	    name));
+	*qlimitp = np[proto].np_qlimit;
+	NETISR_RUNLOCK();
+}
+
+/*
+ * Update the queue limit across per-workstream queues for a protocol.  We
+ * simply change the limits, and don't drain overflowed packets as they will
+ * (hopefully) take care of themselves shortly.
+ */
+int
+netisr2_setqlimit(const struct netisr_handler *nhp, u_int qlimit)
+{
+	struct netisr_work *npwp;
+#ifdef INVARIANTS
+	const char *name;
+#endif
+	u_int i, proto;
+
+	if (qlimit > netisr_maxqlimit)
+		return (EINVAL);
+
+	proto = nhp->nh_proto;
+#ifdef INVARIANTS
+	name = nhp->nh_name;
+#endif
+	KASSERT(proto < NETISR_MAXPROT,
+	    ("netisr_setqlimit(%d): protocol too big for %s", proto, name));
+	NETISR_WLOCK();
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr_setqlimit(%d): protocol not registered for %s", proto,
+	    name));
+
+	np[proto].np_qlimit = nhp->nh_qlimit;
+	for (i = 0; i < MAXCPU; i++) {
+		npwp = &nws[i].nws_work[proto];
+		npwp->nw_qlimit = qlimit;
+	}
+	NETISR_WUNLOCK();
+	return (0);
+}
+
+/*
  * Drain all packets currently held in a particular protocol work queue.
  */
 static void
@@ -356,21 +501,31 @@ netisr2_drain_proto(struct netisr_work *
  * suspended while this takes place.
  */
 void
-netisr2_unregister(u_int proto)
+netisr2_unregister(const struct netisr_handler *nhp)
 {
 	struct netisr_work *npwp;
-	int i;
+#ifdef INVARIANTS
+	const char *name;
+#endif
+	u_int i, proto;
 
-	NETISR_WLOCK();
+	proto = nhp->nh_proto;
+#ifdef INVARIANTS
+	name = nhp->nh_name;
+#endif
 	KASSERT(proto < NETISR_MAXPROT,
-	    ("netisr_unregister(%d): protocol too big", proto));
-	KASSERT(np[proto].np_func != NULL,
-	    ("netisr_unregister(%d): protocol not registered", proto));
+	    ("netisr_unregister(%d): protocol too big for %s", proto, name));
+	NETISR_WLOCK();
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr_unregister(%d): protocol not registered for %s", proto,
+	    name));
 
 	np[proto].np_name = NULL;
-	np[proto].np_func = NULL;
+	np[proto].np_handler = NULL;
 	np[proto].np_m2flow = NULL;
-	np[proto].np_flow2cpu = NULL;
+	np[proto].np_m2cpu = NULL;
+	np[proto].np_qlimit = 0;
+	np[proto].np_policy = 0;
 	for (i = 0; i < MAXCPU; i++) {
 		npwp = &nws[i].nws_work[proto];
 		netisr2_drain_proto(npwp);
@@ -380,24 +535,15 @@ netisr2_unregister(u_int proto)
 }
 
 /*
- * Look up the correct stream for a requested flowid.  There are two cases:
- * one in which the caller has requested execution on the current CPU (i.e.,
- * source ordering is sufficient, perhaps because the underlying hardware has
- * generated multiple input queues with sufficient order), or the case in
- * which we ask the protocol to generate a flowid.  In the latter case, we
- * rely on the protocol generating a reasonable distribution across the
- * flowid space, and hence use a very simple mapping from flowids to workers.
- *
- * Because protocols may need to call m_pullup(), they may rewrite parts of
- * the mbuf chain.  As a result, we must return an mbuf chain that is either
- * the old chain (if there is no update) or the new chain (if there is). NULL
- * is returned if there is a failure in the protocol portion of the lookup
- * (i.e., out of mbufs and a rewrite is required).
+ * Look up the workstream given a packet and source identifier.  Do this by
+ * checking the protocol's policy, and optionally call out to the protocol
+ * for assistance if required.
  */
 static struct mbuf *
 netisr2_selectcpu(struct netisr_proto *npp, uintptr_t source, struct mbuf *m,
     u_int *cpuidp)
 {
+	struct ifnet *ifp;
 
 	NETISR_LOCK_ASSERT();
 
@@ -409,19 +555,41 @@ netisr2_selectcpu(struct netisr_proto *n
 		*cpuidp = nws_array[0];
 		return (m);
 	}
-	if (!(m->m_flags & M_FLOWID) && npp->np_m2flow != NULL) {
-		m = npp->np_m2flow(m);
-		if (m == NULL)
-			return (NULL);
-		KASSERT(m->m_flags & M_FLOWID, ("netisr2_selectcpu: protocol"
-		    " %s failed to return flowid on mbuf",
-		    npp->np_name));
+
+	/*
+	 * What happens next depends on the policy selected by the protocol.
+	 * If we want to support per-interface policies, we should do that
+	 * here first.
+	 */
+	switch (npp->np_policy) {
+	case NETISR_POLICY_CPU:
+		return (npp->np_m2cpu(m, source, cpuidp));
+
+	case NETISR_POLICY_FLOW:
+		if (!(m->m_flags & M_FLOWID) && npp->np_m2flow != NULL) {
+			m = npp->np_m2flow(m, source);
+			if (m == NULL)
+				return (NULL);
+		}
+		if (m->m_flags & M_FLOWID) {
+			*cpuidp =
+			    netisr2_default_flow2cpu(m->m_pkthdr.flowid);
+			return (m);
+		}
+		/* FALLTHROUGH */
+
+	case NETISR_POLICY_SOURCE:
+		ifp = m->m_pkthdr.rcvif;
+		if (ifp != NULL)
+			*cpuidp = (ifp->if_index + source) % nws_count;
+		else
+			*cpuidp = source % nws_count;
+		return (m);
+
+	default:
+		panic("netisr2_selectcpu: invalid policy %u for %s",
+		    npp->np_policy, npp->np_name);
 	}
-	if (m->m_flags & M_FLOWID)
-		*cpuidp = npp->np_flow2cpu(source, m->m_pkthdr.flowid);
-	else
-		*cpuidp = npp->np_flow2cpu(source, 0);
-	return (m);
 }
 
 /*
@@ -471,7 +639,7 @@ netisr2_process_workstream_proto(struct 
 		if (local_npw.nw_head == NULL)
 			local_npw.nw_tail = NULL;
 		local_npw.nw_len--;
-		np[proto].np_func(m);
+		np[proto].np_handler(m);
 	}
 	KASSERT(local_npw.nw_len == 0,
 	    ("netisr_process_proto(%d): len %d", proto, local_npw.nw_len));
@@ -520,23 +688,14 @@ swi_net(void *arg)
 
 	nwsp = arg;
 
-	/*
-	 * On first execution, force the ithread to the desired CPU.  There
-	 * should be a better way to do this.
-	 */
-	if (netisr_bindthreads && !(nwsp->nws_swi_flags & NWS_SWI_BOUND)) {
-		thread_lock(curthread);
-		sched_bind(curthread, nwsp->nws_cpu);
-		thread_unlock(curthread);
-		nwsp->nws_swi_flags |= NWS_SWI_BOUND;
-	}
-
+	NETISR_RLOCK();
 	NWS_LOCK(nwsp);
 	nwsp->nws_flags |= NWS_RUNNING;
 	while (nwsp->nws_pendingwork != 0)
 		netisr2_process_workstream(nwsp, NETISR_ALLPROT);
 	nwsp->nws_flags &= ~(NWS_SIGNALED | NWS_RUNNING);
 	NWS_UNLOCK(nwsp);
+	NETISR_RUNLOCK();
 }
 
 static int
@@ -553,7 +712,7 @@ netisr2_queue_internal(u_int proto, stru
 	nwsp = &nws[cpuid];
 	npwp = &nwsp->nws_work[proto];
 	NWS_LOCK(nwsp);
-	if (npwp->nw_len < npwp->nw_max) {
+	if (npwp->nw_len < npwp->nw_qlimit) {
 		m->m_nextpkt = NULL;
 		if (npwp->nw_head == NULL) {
 			npwp->nw_head = m;
@@ -577,23 +736,23 @@ netisr2_queue_internal(u_int proto, stru
 	if (dosignal)
 		NWS_SIGNAL(nwsp);
 	if (error)
-		npwp->nw_dropped++;
+		npwp->nw_qdrops++;
 	else
 		npwp->nw_queued++;
 	return (error);
 }
 
 int
-netisr2_queue(u_int proto, uintptr_t source, struct mbuf *m)
+netisr2_queue_src(u_int proto, uintptr_t source, struct mbuf *m)
 {
 	u_int cpuid, error;
 
 	KASSERT(proto < NETISR_MAXPROT,
-	    ("netisr2_dispatch: invalid proto %d", proto));
+	    ("netisr2_queue_src: invalid proto %d", proto));
 
 	NETISR_RLOCK();
-	KASSERT(np[proto].np_func != NULL,
-	    ("netisr2_dispatch: invalid proto %d", proto));
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr2_queue_src: invalid proto %d", proto));
 
 	m = netisr2_selectcpu(&np[proto], source, m, &cpuid);
 	if (m != NULL)
@@ -605,33 +764,35 @@ netisr2_queue(u_int proto, uintptr_t sou
 }
 
 int
-netisr2_queue_if(u_int proto, struct ifnet *ifp, struct mbuf *m)
+netisr2_queue(u_int proto, struct mbuf *m)
 {
 
-	return (netisr2_queue(proto, (uintptr_t)ifp, m));
+	return (netisr2_queue_src(proto, 0, m));
 }
 
 int
 netisr_queue(int proto, struct mbuf *m)
 {
 
-	return (netisr2_queue_if(proto, m->m_pkthdr.rcvif, m));
+	KASSERT(proto >= 0, ("netisr_queue: proto < 0"));
+
+	return (netisr2_queue(proto, m));
 }
 
 int
-netisr2_dispatch(u_int proto, uintptr_t source, struct mbuf *m)
+netisr2_dispatch_src(u_int proto, uintptr_t source, struct mbuf *m)
 {
 	struct netisr_workstream *nwsp;
 	struct netisr_work *npwp;
 
 	if (!netisr_direct)
-		return (netisr2_queue(proto, source, m));
-	KASSERT(proto < NETISR_MAXPROT,
-	    ("netisr2_dispatch: invalid proto %d", proto));
+		return (netisr2_queue_src(proto, source, m));
 
+	KASSERT(proto < NETISR_MAXPROT,
+	    ("netisr2_dispatch_src: invalid proto %u", proto));
 	NETISR_RLOCK();
-	KASSERT(np[proto].np_func != NULL,
-	    ("netisr2_dispatch: invalid proto %d", proto));
+	KASSERT(np[proto].np_handler != NULL,
+	    ("netisr2_dispatch_src: invalid proto %u", proto));
 
 	/*
 	 * Borrow current CPU's stats, even if there's no worker.
@@ -640,23 +801,25 @@ netisr2_dispatch(u_int proto, uintptr_t 
 	npwp = &nwsp->nws_work[proto];
 	npwp->nw_dispatched++;
 	npwp->nw_handled++;
-	np[proto].np_func(m);
+	np[proto].np_handler(m);
 	NETISR_RUNLOCK();
 	return (0);
 }
 
 int
-netisr2_dispatch_if(u_int proto, struct ifnet *ifp, struct mbuf *m)
+netisr2_dispatch(u_int proto, struct mbuf *m)
 {
 
-	return (netisr2_dispatch(proto, (uintptr_t)ifp, m));
+	return (netisr2_dispatch_src(proto, 0, m));
 }
 
 void
 netisr_dispatch(int proto, struct mbuf *m)
 {
 
-	(void)netisr2_dispatch_if(proto, m->m_pkthdr.rcvif, m);
+	KASSERT(proto >= 0, ("netisr_dispatch: proto < 0"));
+
+	(void)netisr2_dispatch(proto, m);
 }
 
 static void
@@ -669,14 +832,22 @@ netisr2_start_swi(u_int cpuid, struct pc
 	nwsp = &nws[cpuid];
 	mtx_init(&nwsp->nws_mtx, "netisr2_mtx", NULL, MTX_DEF);
 	nwsp->nws_cpu = cpuid;
-	snprintf(swiname, sizeof(swiname), "netisr2: %d", cpuid);
+	snprintf(swiname, sizeof(swiname), "netisr %d", cpuid);
 	error = swi_add(&nwsp->nws_intr_event, swiname, swi_net, nwsp,
 	    SWI_NET, INTR_MPSAFE, &nwsp->nws_swi_cookie);
 	if (error)
 		panic("netisr2_init: swi_add %d", error);
 	pc->pc_netisr2 = nwsp->nws_intr_event;
+	if (netisr_bindthreads) {
+		error = intr_event_bind(nwsp->nws_intr_event, cpuid);
+		if (error != 0)
+			printf("netisr2_start_swi cpu %d: intr_event_bind: %d",
+			    cpuid, error);
+	}
+	NETISR_WLOCK();
 	nws_array[nws_count] = nwsp->nws_cpu;
 	nws_count++;
+	NETISR_WUNLOCK();
 }
 
 /*
@@ -741,7 +912,7 @@ DB_SHOW_COMMAND(netisr2, db_show_netisr2
 			continue;
 		first = 1;
 		for (proto = 0; proto < NETISR_MAXPROT; proto++) {
-			if (np[proto].np_func == NULL)
+			if (np[proto].np_handler == NULL)
 				continue;
 			nwp = &nwsp->nws_work[proto];
 			if (first) {
@@ -750,10 +921,10 @@ DB_SHOW_COMMAND(netisr2, db_show_netisr2
 				first = 0;
 			} else
 				db_printf("%6s %6s ", "", "");
-			db_printf("%6s %6d %6d %6d %8d %8d %8d %8d\n",
+			db_printf("%6s %6d %6d %6d %8ju %8ju %8ju %8ju\n",
 			    np[proto].np_name, nwp->nw_len,
-			    nwp->nw_watermark, nwp->nw_max,
-			    nwp->nw_dispatched, nwp->nw_dropped,
+			    nwp->nw_watermark, nwp->nw_qlimit,
+			    nwp->nw_dispatched, nwp->nw_qdrops,
 			    nwp->nw_queued, nwp->nw_handled);
 		}
 	}

Modified: projects/pnet/sys/net/netisr2.h
==============================================================================
--- projects/pnet/sys/net/netisr2.h	Mon May 18 16:00:18 2009	(r192308)
+++ projects/pnet/sys/net/netisr2.h	Mon May 18 17:08:57 2009	(r192309)
@@ -34,59 +34,87 @@
 #endif
 
 /*-
- * Prototocols express flow and CPU affinities by implementing two functions:
+ * Protocols express ordering constraints and affinity preferences by
+ * implementing one or neither of nh_m2flow and nh_m2cpu, which are used by
+ * netisr2 to determine which per-CPU workstream to assign mbufs to.
  *
- * netisr_m2flow_t - When a packet without M_FLOWID is processed by netisr2,
- *                   it may call into the protocol to generate the missing
- *                   flow ID which should be installed in the packet header.
- *                   If a flow ID cannot be generated, or failure occurs,
- *                   NULL should be returned.
- *
- * netisr_flow2cpu_t - Given a flowid, possibly generated by netisr_m2flow,
- *                   and an optional source identifier (possibly a tid, pcb,
- *                   or kernel pointer), select a CPU to execute the packet
- *                   handler on.  If source isn't used, it will be 0/NULL.
+ * The following policies may be used by protocols:
+ *
+ * NETISR_POLICY_SOURCE - netisr2 should maintain source ordering without
+ *                        advice from the protocol.  netisr2 will ignore any
+ *                        flow IDs present on the mbuf for the purposes of
+ *                        work placement.
+ *
+ * NETISR_POLICY_FLOW - netisr2 should maintain flow ordering as defined by
+ *                      the mbuf header flow ID field.  If the protocol
+ *                      implements nh_m2flow, then netisr2 will query the
+ *                      protocol in the event that the mbuf doesn't have a
+ *                      flow ID, falling back on source ordering.
+ *
+ * NETISR_POLICY_CPU - netisr2 will delegate all work placement decisions to
+ *                     the protocol, querying nh_m2cpu for each packet.
+ *
+ * Protocols might make decisions about work placement based on an existing
+ * calculated flow ID on the mbuf, such as one provided in hardware, the
+ * receive interface pointed to by the mbuf (if any), the optional source
+ * identifier passed at some dispatch points, or even parse packet headers to
+ * calculate a flow.  Both protocol handlers may return a new mbuf pointer
+ * for the chain, or NULL if the packet proves invalid or m_pullup() fails.
  *
  * XXXRW: If we eventually support dynamic reconfiguration, there should be
  * protocol handlers to notify them of CPU configuration changes so that they
  * can rebalance work.
  */
-typedef struct mbuf	*netisr_m2flow_t(struct mbuf *m);
-typedef u_int		 netisr_flow2cpu_t(uintptr_t source, u_int flowid);
+typedef struct mbuf	*netisr_m2cpu_t(struct mbuf *m, uintptr_t source,
+			 u_int *cpuid);
+typedef	struct mbuf	*netisr_m2flow_t(struct mbuf *m, uintptr_t source);
+
+#define	NETISR_POLICY_SOURCE	1	/* Maintain source ordering. */
+#define	NETISR_POLICY_FLOW	2	/* Maintain flow ordering. */
+#define	NETISR_POLICY_CPU	3	/* Protocol determines CPU placement. */
 
-/*-
- * Register a new netisr2 handler for a given protocol.  No previous
- * registration may exist.
- *
- * proto        - Integer protocol identifier.
- * name         - Character string describing the protocol handler.
- * func         - Protocol handler.
- * m2flow       - Generate [missing] flowid for mbuf.
- * flow2cpu     - Convert a flowid to a CPU affinity.
- * max          - Maximum queue depth.
+/*
+ * Data structure describing a protocol handler.
  */
-void	netisr2_register(u_int proto, const char *name, netisr_t func,
-	    netisr_m2flow_t m2flow, netisr_flow2cpu_t flow2cpu, u_int max);
+struct netisr_handler {
+	const char	*nh_name;	/* Character string protocol name. */
+	netisr_t	*nh_handler;	/* Protocol handler. */
+	netisr_m2flow_t	*nh_m2flow;	/* Query flow for untagged packet. */
+	netisr_m2cpu_t	*nh_m2cpu;	/* Query CPU to process packet on. */
+	u_int		 nh_proto;	/* Integer protocol ID. */
+	u_int		 nh_qlimit;	/* Maximum per-CPU queue depth. */
+	u_int		 nh_policy;	/* Work placement policy. */
+	u_int		 nh_ispare[5];	/* For future use. */
+	void		*nh_pspare[4];	/* For future use. */
+};
 
 /*
- * Unregister a protocol handler.
+ * Register, unregister, and other netisr2 handler management functions.
  */
-void	netisr2_unregister(u_int proto);
+void	netisr2_clearqdrops(const struct netisr_handler *nhp);
+void	netisr2_getqdrops(const struct netisr_handler *nhp,
+	    u_int64_t *qdropsp);
+void	netisr2_getqlimit(const struct netisr_handler *nhp, u_int *qlimitp);
+void	netisr2_register(const struct netisr_handler *nhp);
+int	netisr2_setqlimit(const struct netisr_handler *nhp, u_int qlimit);
+void	netisr2_unregister(const struct netisr_handler *nhp);
 
 /*
  * Process a packet destined for a protocol, and attempt direct dispatch.
+ * Supplemental source ordering information can be passed using the _src
+ * variant.
  */
 //int	netisr_dispatch(u_int proto, struct mbuf *m);
 //int	netisr_queue(u_int proto, struct mbuf *m);
-int	netisr2_dispatch(u_int proto, uintptr_t source, struct mbuf *m);
-int	netisr2_dispatch_if(u_int proto, struct ifnet *ifp, struct mbuf *m);
-int	netisr2_queue(u_int proto, uintptr_t source, struct mbuf *m);
-int	netisr2_queue_if(u_int proto, struct ifnet *ifp, struct mbuf *m);
+int	netisr2_dispatch(u_int proto, struct mbuf *m);
+int	netisr2_dispatch_src(u_int proto, uintptr_t source, struct mbuf *m);
+int	netisr2_queue(u_int proto, struct mbuf *m);
+int	netisr2_queue_src(u_int proto, uintptr_t source, struct mbuf *m);
 
 /*
- * Provide a default implementation of "map a flow ID to a cpu ID".
+ * Provide a default implementation of "map a ID to a cpu ID".
  */
-u_int	netisr2_default_flow2cpu(uintptr_t source, u_int flowid);
+u_int	netisr2_default_flow2cpu(u_int flowid);
 
 /*
  * Utility routines to return the number of CPUs participting in netisr2, and