Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Dec 2006 13:19:08 -0600
From:      "Christian S.J. Peron" <csjp@FreeBSD.ORG>
To:        Max Laier <max@love2party.net>
Cc:        avatar@mmlab.cse.yzu.edu.tw, julian@FreeBSD.org, freebsd-pf@freebsd.org
Subject:   Re: debug.mpsafenet=1 vs. user/group rules [Re: kern/106805: ...]
Message-ID:  <458446AC.6080309@FreeBSD.ORG>
In-Reply-To: <200612161709.48875.max@love2party.net>
References:  <200612161335.kBGDZkMj012022@freefall.freebsd.org> <200612161709.48875.max@love2party.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Max,

I thought about doing the exact same thing. There is only one problem, you just
move the LOR from pf<->inpcb to pfil<->inpcb.  One main difference is that pfil
currently uses an rw_lock, which allows for current read access of the filter lists.

(So, in this path, the in/out pcb->pfil/pfil->pcb: one thread will never be
blocked waiting on the firewall chain lock).  Unless of course there is a write
involved. But I can't see where a chain modifier picks up a the inpcb lock.

That said, we still wind up with a nasty LOR message, however it should be
completely harmless in principal. Currently, ipfw uses rw(9) so the LOR
should be completely harmless there, too.

I think the correct approach to this problem is to:

(1) Don't do that

(2) If we are going to do that, drop the pfil(9) locks (somehow) and move the
    pcb lookups "unconditionally" to before we pickup the firewall chain lock.
    Unconditionally might just mean "do this lookup if we have uid/gid rules).

(3) Introduce a PFIL_STATIC kernel option which requires that packet filtering
    modules be registered at boot-time via /boot/loader.conf. This would allow
    us to drop the pfil(9) lock.

I think that the implementation of a PFIL_STATIC kernel option would be useful
regardless, as it would allow us to eliminate two mutex acquisitions in the
fast path for those of us with 'boot-time-static' setups. A similar mechanism
was used in the Mandatory Access Control (MAC) framework.

I have also included Julian on the CC list, as we had some conversations
regarding this too. I think we have enough momentum to get this fixed now.
It's been bugging me for years! :)



Max Laier wrote:
> Okay, spoken too quick ... I just had an idea (enlightment you might say - 
> given the time of year), that might finally get us rid of this symptom 
> (not of the problem though).
>
> Short recap on why this is happening:  Checking socket credentials on the 
> IP layer (where pf lives) is a layering violation.  A useful one, you may 
> argue, but nontheless - it causes a lock order reversal:  In order to 
> walk the pf rules we need to hold the pf lock, in order to walk the 
> socket hash we need to hold the "inp" lock.
>
> The attached diff circumvents the problem by **always** doing the 
> credential lookup *before* walking the pf rules.  This has the benefit, 
> that it works (at least I think it should), but there is a price to pay.  
> Now we have to pay for the socket lookup for *every* tcp and udp packet 
> instead of just for those that really hit uid/gid rules.  That's why I 
> decided to make is a config option "PF_MPFSAFE_UGID" which you can turn 
> on if you are running a setup that will benefit.  The patch turns it on 
> for the module-built by default.
>
> A possible scenario that should benefit is a big iron SMP box running lot 
> of services that you want to filter using *stateful* uid/gid rules.  For 
> this setup where a huge percentage of the packets that are not captured 
> by states eventually match a uid/gid rule, you will even get added 
> parallelism with this patch.
>
> On every other typical setup, it should be better to avoid user/group 
> rules or to disable mpsafenet.
>
> In order for this to hit the tree, I need tests confirming that it really 
> helps and possibly benchmarks that qualify the impact of it.  Thanks.
>
>   
> ------------------------------------------------------------------------
>
> Index: conf/options
> ===================================================================
> RCS file: /usr/store/mlaier/fcvs/src/sys/conf/options,v
> retrieving revision 1.567
> diff -u -r1.567 options
> --- conf/options	10 Dec 2006 04:23:23 -0000	1.567
> +++ conf/options	16 Dec 2006 15:36:08 -0000
> @@ -349,6 +349,7 @@
>  DEV_PF			opt_pf.h
>  DEV_PFLOG		opt_pf.h
>  DEV_PFSYNC		opt_pf.h
> +PF_MPSAFE_UGID		opt_pf.h
>  ETHER_II		opt_ef.h
>  ETHER_8023		opt_ef.h
>  ETHER_8022		opt_ef.h
> Index: contrib/pf/net/pf.c
> ===================================================================
> RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/pf.c,v
> retrieving revision 1.42
> diff -u -r1.42 pf.c
> --- contrib/pf/net/pf.c	22 Oct 2006 11:52:11 -0000	1.42
> +++ contrib/pf/net/pf.c	16 Dec 2006 15:34:52 -0000
> @@ -3032,6 +3032,12 @@
>  		return (PF_DROP);
>  	}
>  
> +#if defined(__FreeBSD__) && defined(PF_MPSAFE_UGID)
> +	PF_UNLOCK();
> +	lookup = pf_socket_lookup(&uid, &gid, direction, pd, inp);
> +	PF_LOCK();
> +#endif
> +
>  	r = TAILQ_FIRST(pf_main_ruleset.rules[PF_RULESET_FILTER].active.ptr);
>  
>  	if (direction == PF_OUT) {
> @@ -3428,6 +3434,12 @@
>  		return (PF_DROP);
>  	}
>  
> +#if defined(__FreeBSD__) && defined(PF_MPSAFE_UGID)
> +	PF_UNLOCK();
> +	lookup = pf_socket_lookup(&uid, &gid, direction, pd, inp);
> +	PF_LOCK();
> +#endif
> +
>  	r = TAILQ_FIRST(pf_main_ruleset.rules[PF_RULESET_FILTER].active.ptr);
>  
>  	if (direction == PF_OUT) {
> Index: modules/pf/Makefile
> ===================================================================
> RCS file: /usr/store/mlaier/fcvs/src/sys/modules/pf/Makefile,v
> retrieving revision 1.12
> diff -u -r1.12 Makefile
> --- modules/pf/Makefile	12 Sep 2006 04:25:12 -0000	1.12
> +++ modules/pf/Makefile	16 Dec 2006 15:41:00 -0000
> @@ -10,7 +10,7 @@
>  	in4_cksum.c \
>  	opt_pf.h opt_inet.h opt_inet6.h opt_bpf.h opt_mac.h
>  
> -CFLAGS+=  -I${.CURDIR}/../../contrib/pf
> +CFLAGS+=  -I${.CURDIR}/../../contrib/pf -DPF_MPSAFE_UGID
>  
>  .if !defined(KERNBUILDDIR)
>  opt_inet.h:
>   




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?458446AC.6080309>