Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jul 2012 18:34:01 +0400
From:      "Alexander V. Chernikov" <melifaro@FreeBSD.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Doug Barton <dougb@freebsd.org>, Gleb Smirnoff <glebius@FreeBSD.org>, net@freebsd.org
Subject:   Re: FreeBSD 10G forwarding performance @Intel
Message-ID:  <5006C959.5060506@FreeBSD.org>
In-Reply-To: <50053297.8060008@FreeBSD.org>
References:  <4FF36438.2030902@FreeBSD.org> <4FF3E2C4.7050701@FreeBSD.org> <4FF3FB14.8020006@FreeBSD.org> <4FF402D1.4000505@FreeBSD.org> <20120704091241.GA99164@onelab2.iet.unipi.it> <4FF412B9.3000406@FreeBSD.org> <20120704154856.GC3680@onelab2.iet.unipi.it> <4FF59955.5090406@FreeBSD.org> <20120706061126.GA65432@onelab2.iet.unipi.it> <500452A5.3070501@FreeBSD.org> <20120716212249.GA14607@onelab2.iet.unipi.it> <50052419.7010601@FreeBSD.org> <50053297.8060008@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------090809000406030208080301
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 17.07.2012 13:38, Alexander V. Chernikov wrote:
> On 17.07.2012 12:36, Alexander V. Chernikov wrote:
>> On 17.07.2012 01:22, Luigi Rizzo wrote:
>>> On Mon, Jul 16, 2012 at 09:43:01PM +0400, Alexander V. Chernikov wrote:
>>>> On 06.07.2012 10:11, Luigi Rizzo wrote:
>>>>> On Thu, Jul 05, 2012 at 05:40:37PM +0400, Alexander V. Chernikov
>>>>> wrote:
>>>>>> On 04.07.2012 19:48, Luigi Rizzo wrote:
>>
>>> well, it seems that the counters are costing some 10% which is
>>> not negligible (60ns per packet according to your test).
>>> Also i'd be curious if you get better savings if you
>>> have actual conflicts on the rulesets (e.g. what happens
>>> with a ruleset that has, say, ten "count ip from any to any" rules) ?
>>
>> It is a bit difficult to get _exact_ performance numbers since 0.5% of
>> linerate is ~ 70kpps, however
>>
>> 1.98 MPPS
>> >> net.inet.ip.fw.update_counters=1
>> >> net.inet.ip.fw.enable=1
>> 1.67 MPPS
>>
>> .. And here it is time to check ipfw rmlock performance another time,
>> since we're acquiring recursive rmlock (pfil) and rwlock (ipfw) twice.

Try 2:

We're acquiring pfil read lock every time while we run ipfw L3 hooks.

Idea is to
1) make single pfil lock per vnet (instead of locking IPv4/IPv6 parts 
separately)
2) Use this lock in every ipfw vnet instance. Use spare u32 (on amd64) 
in ip_fw_args for indicating lock status.
If we're called from pfil_run_hooks than we don't need to acquire read 
lock at all.
If we're called from L2 code, than we use the same lock to achieve read 
locking (so ipfw effectively moves to rmlock).

Advantage is the following:
    2616383     0     0  172782796    2618097     0  104959874     0
             input          (ix0)           output
    packets  errs idrops      bytes    packets  errs      bytes colls
    2616555     0     0  172800298    2614022     0  240882076     0
sysctl net.inet.ip.fw.enable=1
    2579013 15020     0  172117070    2573027     0  103303742     0
    2482512 133018     0  172839764    2482611     0  100384486     0
    2488521 128616     0  172833196    2488436     0  100658014     0
    2484547 132775     0  172998262    2485739     0  359364892     0
..
    2484005     0     0  200574354    2490405     0  164732208     0

So, now overhead is 120-130kpps instead of ~240 with dual locking.

Patch attached.

>
> Merged r234648 for optimized rmlocks + conversion code attached.
>
> .. No significant improvement:
>
>
> The same 240kpps loss as in first test.
>
> 2123542 0 0 140309220 2125793 0 191227258 0
> 2125316 0 0 140167332 2119988 0 140427764 0
> 2121195 0 0 140353100 2126753 0 140264154 0
> 1995783 64677 0 137285216 1987194 0 132939934 0
> 1929111 170532 0 139248154 1927044 0 127285506 0
> 1905240 213804 0 140059092 1906731 0 78969668 0
> 1908064 219536 0 140444988 1907710 0 173344532 0
> 1906948 218554 0 140365550 1906791 0 79015778 0
> 1910961 214023 0 140395374 1911046 0 173126686 0
> 1913418 211265 0 140275990 1913272 0 126596628 0
>
> Another try without these changes:
>
> 2141665 0 0 141533920 2143295 0 141662810 0
> 2141647 0 0 141536144 2142662 0 89849076 0
> 2143523 0 0 113655814 2142859 0 89734352 0
> 2024970 1528 0 164095778 2011710 0 241024066 0
> 1905563 239180 0 141749498 1909889 0 83090452 0
> 1906343 237516 0 141518736 1903779 0 168894600 0
> 1907890 235413 0 141558608 1907365 0 126036784 0
> ...
> 1894339 1267 0 125099880 1892903 0 125032806 0
> 1885989 0 0 124627036 1894571 0 82456512 0
> 1889637 0 0 124735294 1890231 0 167681224 0
>
>
>
>
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"


-- 
WBR, Alexander

--------------090809000406030208080301
Content-Type: text/plain; charset=UTF-8;
 name="ipfw_rmlock2.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="ipfw_rmlock2.diff"

diff --git a/sys/net/if_ethersubr.c b/sys/net/if_ethersubr.c
index 3231a8b..a357e87 100644
--- a/sys/net/if_ethersubr.c
+++ b/sys/net/if_ethersubr.c
@@ -41,6 +41,7 @@
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
+#include <sys/rmlock.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/mbuf.h>
diff --git a/sys/net/pfil.c b/sys/net/pfil.c
index df0e30f..0819ddc 100644
--- a/sys/net/pfil.c
+++ b/sys/net/pfil.c
@@ -61,6 +61,8 @@ static int pfil_list_remove(pfil_list_t *,
 LIST_HEAD(pfilheadhead, pfil_head);
 VNET_DEFINE(struct pfilheadhead, pfil_head_list);
 #define	V_pfil_head_list	VNET(pfil_head_list)
+VNET_DEFINE(struct rmlock, pfil_lock);
+#define	V_pfil_lock	VNET(pfil_lock)
 
 /*
  * pfil_run_hooks() runs the specified packet filter hooks.
@@ -153,6 +155,12 @@ pfil_head_get(int type, u_long val)
 	return (ph);
 }
 
+struct rmlock *
+pfil_lock_get()
+{
+	return &V_pfil_lock;
+}
+
 /*
  * pfil_add_hook() adds a function to the packet filter hook.  the
  * flags are:
@@ -294,6 +302,7 @@ static int
 vnet_pfil_init(const void *unused)
 {
 	LIST_INIT(&V_pfil_head_list);
+	rm_init_flags(&V_pfil_lock, "PFil hook read/write mutex", RM_RECURSE);
 	return (0);
 }
 
@@ -304,6 +313,7 @@ static int
 vnet_pfil_uninit(const void *unused)
 {
 	/*  XXX should panic if list is not empty */
+	rm_destroy(&V_pfil_lock);
 	return 0;
 }
 
diff --git a/sys/net/pfil.h b/sys/net/pfil.h
index 6ac750a..6fd49fd 100644
--- a/sys/net/pfil.h
+++ b/sys/net/pfil.h
@@ -43,6 +43,7 @@ struct mbuf;
 struct ifnet;
 struct inpcb;
 
+
 /*
  * The packet filter hooks are designed for anything to call them to
  * possibly intercept the packet.
@@ -69,7 +70,7 @@ struct pfil_head {
 	pfil_list_t	ph_out;
 	int		ph_type;
 	int		ph_nhooks;
-	struct rmlock	ph_lock;
+	struct rmlock	*ph_plock;
 	union {
 		u_long		phu_val;
 		void		*phu_ptr;
@@ -90,8 +91,10 @@ int	pfil_head_register(struct pfil_head *);
 int	pfil_head_unregister(struct pfil_head *);
 
 struct pfil_head *pfil_head_get(int, u_long);
+struct rmlock *pfil_lock_get(void);
 
 #define	PFIL_HOOKED(p) ((p)->ph_nhooks > 0)
+#if 0
 #define	PFIL_LOCK_INIT(p) \
     rm_init_flags(&(p)->ph_lock, "PFil hook read/write mutex", RM_RECURSE)
 #define	PFIL_LOCK_DESTROY(p) rm_destroy(&(p)->ph_lock)
@@ -99,6 +102,15 @@ struct pfil_head *pfil_head_get(int, u_long);
 #define PFIL_WLOCK(p) rm_wlock(&(p)->ph_lock)
 #define PFIL_RUNLOCK(p, t) rm_runlock(&(p)->ph_lock, (t))
 #define PFIL_WUNLOCK(p) rm_wunlock(&(p)->ph_lock)
+#endif
+#define	PFIL_LOCK_INIT(p) \
+    (p)->ph_plock = &V_pfil_lock
+#define	PFIL_LOCK_DESTROY(p)
+#define PFIL_RLOCK(p, t) rm_rlock((p)->ph_plock, (t))
+#define PFIL_WLOCK(p) rm_wlock((p)->ph_plock)
+#define PFIL_RUNLOCK(p, t) rm_runlock((p)->ph_plock, (t))
+#define PFIL_WUNLOCK(p) rm_wunlock((p)->ph_plock)
+
 #define PFIL_LIST_LOCK() mtx_lock(&pfil_global_lock)
 #define PFIL_LIST_UNLOCK() mtx_unlock(&pfil_global_lock)
 
diff --git a/sys/netinet/ipfw/ip_fw2.c b/sys/netinet/ipfw/ip_fw2.c
index 7bb73b0..f995eda 100644
--- a/sys/netinet/ipfw/ip_fw2.c
+++ b/sys/netinet/ipfw/ip_fw2.c
@@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/rwlock.h>
+#include <sys/rmlock.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sysctl.h>
@@ -62,6 +63,7 @@ __FBSDID("$FreeBSD$");
 #include <net/route.h>
 #include <net/pf_mtag.h>
 #include <net/vnet.h>
+#include <net/pfil.h>
 
 #include <netinet/in.h>
 #include <netinet/in_var.h>
@@ -912,6 +914,8 @@ ipfw_chk(struct ip_fw_args *args)
 
 	int done = 0;		/* flag to exit the outer loop */
 
+	struct rm_priotracker	tracker;
+
 	if (m->m_flags & M_SKIP_FIREWALL || (! V_ipfw_vnet_ready))
 		return (IP_FW_PASS);	/* accept */
 
@@ -1178,9 +1182,11 @@ do {								\
 		args->f_id.dst_port = dst_port = ntohs(dst_port);
 	}
 
-	IPFW_RLOCK(chain);
+	if (args->locked == 0)
+		IPFW_RLOCK(chain);
 	if (! V_ipfw_vnet_ready) { /* shutting down, leave NOW. */
-		IPFW_RUNLOCK(chain);
+		if (args->locked == 0)
+			IPFW_RUNLOCK(chain);
 		return (IP_FW_PASS);	/* accept */
 	}
 	if (args->rule.slot) {
@@ -2394,7 +2400,8 @@ do {								\
 		retval = IP_FW_DENY;
 		printf("ipfw: ouch!, skip past end of rules, denying packet\n");
 	}
-	IPFW_RUNLOCK(chain);
+	if (args->locked == 0)
+		IPFW_RUNLOCK(chain);
 #ifdef __FreeBSD__
 	if (ucred_cache != NULL)
 		crfree(ucred_cache);
diff --git a/sys/netinet/ipfw/ip_fw_nat.c b/sys/netinet/ipfw/ip_fw_nat.c
index dbeb254..67faa44 100644
--- a/sys/netinet/ipfw/ip_fw_nat.c
+++ b/sys/netinet/ipfw/ip_fw_nat.c
@@ -35,6 +35,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/lock.h>
 #include <sys/module.h>
 #include <sys/rwlock.h>
+#include <sys/rmlock.h>
 
 #define        IPFW_INTERNAL   /* Access to protected data structures in ip_fw.h. */
 
@@ -210,6 +211,7 @@ ipfw_nat(struct ip_fw_args *args, struct cfg_nat *t, struct mbuf *m)
 	int ldt, retval, found;
 	struct ip_fw_chain *chain;
 	char *c;
+	struct rm_priotracker	tracker;
 
 	ldt = 0;
 	retval = 0;
@@ -268,7 +270,8 @@ ipfw_nat(struct ip_fw_args *args, struct cfg_nat *t, struct mbuf *m)
 
 		found = 0;
 		chain = &V_layer3_chain;
-		IPFW_RLOCK(chain);
+		if (args->locked == 0)
+			IPFW_RLOCK(chain);
 		/* Check every nat entry... */
 		LIST_FOREACH(t, &chain->nat, _next) {
 			if ((t->mode & PKT_ALIAS_SKIP_GLOBAL) != 0)
@@ -281,7 +284,8 @@ ipfw_nat(struct ip_fw_args *args, struct cfg_nat *t, struct mbuf *m)
 				break;
 			}
 		}
-		IPFW_RUNLOCK(chain);
+		if (args->locked == 0)
+			IPFW_RUNLOCK(chain);
 		if (found != 1) {
 			/* No instance found, return ignore */
 			args->m = mcl;
@@ -493,6 +497,7 @@ ipfw_nat_get_cfg(struct sockopt *sopt)
 	struct cfg_spool *s;
 	char *data;
 	int gencnt, nat_cnt, len, error;
+	struct rm_priotracker	tracker;
 
 	nat_cnt = 0;
 	len = sizeof(nat_cnt);
@@ -551,6 +556,7 @@ ipfw_nat_get_log(struct sockopt *sopt)
 	struct cfg_nat *ptr;
 	int i, size;
 	struct ip_fw_chain *chain;
+	struct rm_priotracker	tracker;
 
 	chain = &V_layer3_chain;
 
diff --git a/sys/netinet/ipfw/ip_fw_pfil.c b/sys/netinet/ipfw/ip_fw_pfil.c
index c6d7688..6e6764e 100644
--- a/sys/netinet/ipfw/ip_fw_pfil.c
+++ b/sys/netinet/ipfw/ip_fw_pfil.c
@@ -143,6 +143,8 @@ again:
 	args.m = *m0;
 	args.oif = dir == DIR_OUT ? ifp : NULL;
 	args.inp = inp;
+	/* We are locked by VNET pfil lock */
+	args.locked = 1;
 
 	ipfw = ipfw_chk(&args);
 	*m0 = args.m;
diff --git a/sys/netinet/ipfw/ip_fw_private.h b/sys/netinet/ipfw/ip_fw_private.h
index 46d011c..977661f 100644
--- a/sys/netinet/ipfw/ip_fw_private.h
+++ b/sys/netinet/ipfw/ip_fw_private.h
@@ -95,6 +95,7 @@ struct ip_fw_args {
 	 * Otherwise, we locate the first rule >= rulenum:rule_id
 	 */
 	struct ipfw_rule_ref rule;	/* match/restart info		*/
+	uint32_t	locked;		/* True if chain is already locked */
 
 	struct ether_header *eh;	/* for bridged packets		*/
 
@@ -226,7 +227,7 @@ struct ip_fw_chain {
 	spinlock_t rwmtx;
 	spinlock_t uh_lock;
 #else
-	struct rwlock	rwmtx;
+	struct rmlock	*rm_plock;	/* Pointer to pfil lock */
 	struct rwlock	uh_lock;	/* lock for upper half */
 #endif
 	uint32_t	id;		/* ruleset id */
@@ -240,6 +241,7 @@ struct sockopt;	/* used by tcp_var.h */
  * so the variable and the macros must be here.
  */
 
+#if 0
 #define	IPFW_LOCK_INIT(_chain) do {			\
 	rw_init(&(_chain)->rwmtx, "IPFW static rules");	\
 	rw_init(&(_chain)->uh_lock, "IPFW UH lock");	\
@@ -256,6 +258,23 @@ struct sockopt;	/* used by tcp_var.h */
 #define IPFW_RUNLOCK(p) rw_runlock(&(p)->rwmtx)
 #define IPFW_WLOCK(p) rw_wlock(&(p)->rwmtx)
 #define IPFW_WUNLOCK(p) rw_wunlock(&(p)->rwmtx)
+#endif
+#define	IPFW_LOCK_INIT(_chain) do {			\
+	(_chain)->rm_plock = pfil_lock_get();		\
+	rw_init(&(_chain)->uh_lock, "IPFW UH lock");	\
+	} while (0)
+
+#define	IPFW_LOCK_DESTROY(_chain) do {			\
+	(_chain)->rm_plock = NULL;			\
+	rw_destroy(&(_chain)->uh_lock);			\
+	} while (0)
+
+#define	IPFW_WLOCK_ASSERT(_chain)	rm_wowned((_chain)->rm_plock)
+
+#define IPFW_RLOCK(p) rm_rlock((p)->rm_plock, &tracker)
+#define IPFW_RUNLOCK(p) rm_runlock((p)->rm_plock, &tracker)
+#define IPFW_WLOCK(p) rm_wlock((p)->rm_plock)
+#define IPFW_WUNLOCK(p) rm_wunlock((p)->rm_plock)
 
 #define IPFW_UH_RLOCK(p) rw_rlock(&(p)->uh_lock)
 #define IPFW_UH_RUNLOCK(p) rw_runlock(&(p)->uh_lock)
diff --git a/sys/netinet/ipfw/ip_fw_sockopt.c b/sys/netinet/ipfw/ip_fw_sockopt.c
index 0aecc2c..e744429 100644
--- a/sys/netinet/ipfw/ip_fw_sockopt.c
+++ b/sys/netinet/ipfw/ip_fw_sockopt.c
@@ -49,6 +49,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/rwlock.h>
+#include <sys/rmlock.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/sysctl.h>
@@ -943,6 +944,7 @@ ipfw_ctl(struct sockopt *sopt)
 	u_int32_t rulenum[2];
 	uint32_t opt;
 	char xbuf[128];
+	struct rm_priotracker	tracker;
 	ip_fw3_opheader *op3 = NULL;
 
 	error = priv_check(sopt->sopt_td, PRIV_NETINET_IPFW);
diff --git a/sys/netinet/ipfw/ip_fw_table.c b/sys/netinet/ipfw/ip_fw_table.c
index 7545a29..4519472 100644
--- a/sys/netinet/ipfw/ip_fw_table.c
+++ b/sys/netinet/ipfw/ip_fw_table.c
@@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/rwlock.h>
+#include <sys/rmlock.h>
 #include <sys/socket.h>
 #include <net/if.h>	/* ip_fw.h requires IFNAMSIZ */
 #include <net/radix.h>

--------------090809000406030208080301--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5006C959.5060506>