Date: Fri, 29 Dec 2006 17:38:09 +0100 From: Max Laier <max@love2party.net> To: "Christian S.J. Peron" <csjp@freebsd.org> Cc: avatar@mmlab.cse.yzu.edu.tw, Robert Nicholas Maxwell Watson <rwatson@FreeBSD.org>, freebsd-pf@freebsd.org Subject: Re: debug.mpsafenet=1 vs. user/group rules [Re: kern/106805: ...] Message-ID: <200612291738.15541.max@love2party.net> In-Reply-To: <45953727.7020405@FreeBSD.ORG> References: <200612161335.kBGDZkMj012022@freefall.freebsd.org> <200612161709.48875.max@love2party.net> <45953727.7020405@FreeBSD.ORG>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2555463.v5q0yGz9D8 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Friday 29 December 2006 16:41, Christian S.J. Peron wrote: > I have replied to this mail and I guess it has been lost, as I have had > no response. Although this technically makes > the problem harmless, all you are doing is moving the lock order > reversal from pf+inpcb to pfil+inpcb. The > only major difference is that we can have multiple readers of the pfil > lock, making the LOR harmless, in this path. I don't necessarily agree that you can have a LOR with non-exclusive=20 locks. As you rightly point out, it will never result in a deadlock=20 situation - as long as there isn't a code path that obtains the exclusive=20 version of the rw_lock and the inpcb lock. > In IPFW, I changed the chain locks to use rw_lock(9), so we can get rid > of the mpsafenet requirement for IPFW. I'm not 100% sure IPFW is safe yet. As we configure via raw sockets there= =20 might be some common locks that are held when we enter ip_fw_ctl, which=20 then picks up the exclusive version of the chain lock. I'm not sure what=20 kind of locks are in play really, though - CC'ing Robert. > I've thought about doing this in IPFW (looking up the ucred if there > are any uid/gid/jail rules) prior to picking up the > chain locks, but realized we could remove the lock ordering issue all > together if we anihilated the pfil lock. As long as you are holding the IPFW rw_lock over the lookup there is a=20 potential problem as described above. The pfil rw_lock doesn't have any=20 exclusive interaction with any of the inpcb locks, so there is no=20 problem - as far as I understand at least. > One idea I had was introducing PFIL_STATIC which requires that modules > wishing to register pfil hooks did so at > boot-time only. Something similar was done for the MAC framework to > avoid having to pickup policy list locks > for each security decision. Hmmm ... how would we implement something like this? Can we easily keep=20 the non-static version? I'd like to be able to develop using modules ;) > This would also have the nice side effect of eliminating a couple of > atomic instructions per packet in the fast path. > Taking this approach along with moving the inpcb lookup prior to the > firewall chain locks allows us to actually > eliminate the lock ordering issues. However, the layering violation > still exists. But it's harmless. =46rom the point of view of the locks the layering violation would be gone,= =20 that's why it is harmless. > However, having said all that. This works, too. > > Max Laier wrote: > > Okay, spoken too quick ... I just had an idea (enlightment you might > > say - given the time of year), that might finally get us rid of this > > symptom (not of the problem though). > > > > Short recap on why this is happening: Checking socket credentials on > > the IP layer (where pf lives) is a layering violation. A useful one, > > you may argue, but nontheless - it causes a lock order reversal: In > > order to walk the pf rules we need to hold the pf lock, in order to > > walk the socket hash we need to hold the "inp" lock. > > > > The attached diff circumvents the problem by **always** doing the > > credential lookup *before* walking the pf rules. This has the > > benefit, that it works (at least I think it should), but there is a > > price to pay. Now we have to pay for the socket lookup for *every* > > tcp and udp packet instead of just for those that really hit uid/gid > > rules. That's why I decided to make is a config option > > "PF_MPFSAFE_UGID" which you can turn on if you are running a setup > > that will benefit. The patch turns it on for the module-built by > > default. > > > > A possible scenario that should benefit is a big iron SMP box running > > lot of services that you want to filter using *stateful* uid/gid > > rules. For this setup where a huge percentage of the packets that > > are not captured by states eventually match a uid/gid rule, you will > > even get added parallelism with this patch. > > > > On every other typical setup, it should be better to avoid user/group > > rules or to disable mpsafenet. > > > > In order for this to hit the tree, I need tests confirming that it > > really helps and possibly benchmarks that qualify the impact of it.=20 > > Thanks. > > > > > > --------------------------------------------------------------------- > >--- > > > > Index: conf/options > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > RCS file: /usr/store/mlaier/fcvs/src/sys/conf/options,v > > retrieving revision 1.567 > > diff -u -r1.567 options > > --- conf/options 10 Dec 2006 04:23:23 -0000 1.567 > > +++ conf/options 16 Dec 2006 15:36:08 -0000 > > @@ -349,6 +349,7 @@ > > DEV_PF opt_pf.h > > DEV_PFLOG opt_pf.h > > DEV_PFSYNC opt_pf.h > > +PF_MPSAFE_UGID opt_pf.h > > ETHER_II opt_ef.h > > ETHER_8023 opt_ef.h > > ETHER_8022 opt_ef.h > > Index: contrib/pf/net/pf.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/pf.c,v > > retrieving revision 1.42 > > diff -u -r1.42 pf.c > > --- contrib/pf/net/pf.c 22 Oct 2006 11:52:11 -0000 1.42 > > +++ contrib/pf/net/pf.c 16 Dec 2006 15:34:52 -0000 > > @@ -3032,6 +3032,12 @@ > > return (PF_DROP); > > } > > > > +#if defined(__FreeBSD__) && defined(PF_MPSAFE_UGID) > > + PF_UNLOCK(); > > + lookup =3D pf_socket_lookup(&uid, &gid, direction, pd, inp); > > + PF_LOCK(); > > +#endif > > + > > r =3D > > TAILQ_FIRST(pf_main_ruleset.rules[PF_RULESET_FILTER].active.ptr); > > > > if (direction =3D=3D PF_OUT) { > > @@ -3428,6 +3434,12 @@ > > return (PF_DROP); > > } > > > > +#if defined(__FreeBSD__) && defined(PF_MPSAFE_UGID) > > + PF_UNLOCK(); > > + lookup =3D pf_socket_lookup(&uid, &gid, direction, pd, inp); > > + PF_LOCK(); > > +#endif > > + > > r =3D > > TAILQ_FIRST(pf_main_ruleset.rules[PF_RULESET_FILTER].active.ptr); > > > > if (direction =3D=3D PF_OUT) { > > Index: modules/pf/Makefile > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > RCS file: /usr/store/mlaier/fcvs/src/sys/modules/pf/Makefile,v > > retrieving revision 1.12 > > diff -u -r1.12 Makefile > > --- modules/pf/Makefile 12 Sep 2006 04:25:12 -0000 1.12 > > +++ modules/pf/Makefile 16 Dec 2006 15:41:00 -0000 > > @@ -10,7 +10,7 @@ > > in4_cksum.c \ > > opt_pf.h opt_inet.h opt_inet6.h opt_bpf.h opt_mac.h > > > > -CFLAGS+=3D -I${.CURDIR}/../../contrib/pf > > +CFLAGS+=3D -I${.CURDIR}/../../contrib/pf -DPF_MPSAFE_UGID > > > > .if !defined(KERNBUILDDIR) > > opt_inet.h: =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart2555463.v5q0yGz9D8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQBFlUR3XyyEoT62BG0RAiA3AJ4qb0gqEIsb3AIj640qZRDMLi/oDgCfZdtc QIO//w7adUsYFgpiszzA5lM= =V0yt -----END PGP SIGNATURE----- --nextPart2555463.v5q0yGz9D8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200612291738.15541.max>