Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Nov 2011 10:00:32 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Daan Vreeken <Daan@vitsch.nl>, Gleb Smirnoff <glebius@freebsd.org>
Subject:   Re: if_clone.c allows creating multiple interfaces with the same name
Message-ID:  <201111301000.32787.jhb@freebsd.org>
In-Reply-To: <20111129142842.GC44498@glebius.int.ru>
References:  <201111240928.51162.Daan@vitsch.nl> <201111290107.13631.Daan@vitsch.nl> <20111129142842.GC44498@glebius.int.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, November 29, 2011 9:28:42 am Gleb Smirnoff wrote:
>   Daan,
> 
> On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote:
> D> Thanks for the looking into this and for your quick commit. I like your twist
> D> on the patch with the move from the unit bitmap to allocating unit numbers
> D> with alloc_unr(9).
> D> 
> D> I do have two comments on the new code though.
> D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when the
> D> user requested a unit number larger than ifc->ifc_maxunit. Now the function
> D> returns EEXIST in that case because alloc_unr_specific() returns -1 both
> D> when a number is already allocated and when the number exceeds it's limits.
> D> This leads to the somewhat confusing:
> D> 
> D>         # ifconfig bridge123456 create
> D>         ifconfig: SIOCIFCREATE2: File exists
> D> 
> D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit leak
> D> */" part of your change. Once a unit number is leaked, there currently seems
> D> to be no way to ever get it back. In a worst case scenario, where the names of
> D> multiple interfaces in a row collides with the numbers alloc_unr() returns, an
> D> entire row of unit numbers could be leaked during the creation of a single
> D> cloned interface.
> D> We have a product where cloned interfaces come and go very frequently. I'd
> D> hate to run out of unit numbers after 'just a few' years of uptime ;-)
> D> 
> D> I've created a new patch on top of your change that fixes both of these
> D> problems. Could you have a look at the attached diff?
> 
> Thanks, I will work on applying it.
> 
> D> > Considering the second part, that adds locking. Unfortunately, right now we
> D> > have numerous races in the network configuration ocde. Many SIOCSsomething
> D> > ioctls can race with each other producing unpredictable results and kernel
> D> > panics. So, running two ifconfig(8) in parallel is a bad idea today. :(
> D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race
> D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between
> D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped after
> D> > unit allocation, but prior to interface attachement to global interface
> D> > list.
> D> 
> D> Are you sure? With the patch in the PR, the relevant code in 
> D> ifc_simple_create() would become :
> D> 
> D> 	...
> D> 	IFNET_NAMING_LOCK();
> D> 	err = ifc_alloc_unit(ifc, &unit);
> D> 	if (err != 0) {
> D> 		IFNET_NAMING_UNLOCK();
> D> 		return (err);
> D> 	}
> D> 
> D> 	err = ifcs->ifcs_create(ifc, unit, params);
> D> 	IFNET_NAMING_UNLOCK();
> D> 	if (err != 0) {
> D> 		ifc_free_unit(ifc, unit);
> D> 		return (err);
> D> 	}
> D> 	...
> D> 
> D> The lock is acquired before the call to ifc_alloc_unit().
> D> In the non-error case (e.g. when creating an if_bridge interface) the call
> D> ends up calling bridge_clone_create(), which calls ether_ifattach(), which
> D> calls if_attach() -> if_attach_internal() where the ifp is added to the ifnet
> D> list.
> D> Only when that's done, the lock is dropped.
> D> 
> D> Am I overlooking something obvious here?
> 
> The code above isn't correct since it holds mutex when calling ifcs->ifcs_create().
> These methods may (they actually do) call malloc() with M_WAITOK.
> 
> In general FreeBSD uses M_WAITOK on the configuration code paths, like
> any SIOCSsomething ioctl. So to do correct protection, you first need to
> allocate every kind of memory needed, then lock mutex, then run through configuration
> sequence, then release mutex and in case of error free all allocated memory.
> Sounds easy, but isn't in practice, especially when several network modules
> are involved.
> 
> So I'm still thinking about...
> 
> D> > From my point of view, we need a generic approach to ioctl() vs ioctl()
> D> > races, may be some global serializer of all re-configuration requests of
> D> > interfaces and addresses.
> 
> ... but several developers have noted that this approach may have some hidden
> problems.  When experimenting with new CARP, I have tried it on the carp_ioctl()
> without any problems. The idea is simple:
> 
> static int serializer = 0;
> static struct mtx serializer_mtx;
> MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF);
> 
> int
> foo_ioctl()
> {
> 	mtx_lock(&serializer_mtx);
> 	if (serializer != 0)
> 		msleep(&serializer, &serializer_mtx, 0, "ioctl", 0);
> 	serializer = 1;
> 	mtx_unlock(&serializer_mtx);
> 
> 	... any code here, uncluding calls to different lower layers...
> 
> 	mtx_lock(&serializer_mtx);
> 	serializer = 0;
> 	wakeup(&serializer);
> 	mtx_unlock(&serializer_mtx);
> 
> 	return (error);
> }
> 
> I have tried this for carp_ioctl() and it worked fine. You can try it for
> entire ifioctl() and all its descendants, but be aware of hidden problems :)

Hmm, is this much different in effect than:

static struct sx serializer_sx;
SX_SYSINIT(...);

int
foo_ioctl()
{
	sx_xlock(&serializer_sx);

	... any code here

	sx_xunlock(&serializer_sx);
}

?

Using an sx lock is shorter and also allows WITNESS to catch possible cycles
that can be otherwise missed with home-rolled locks.

Also, you should really use 'while (serializer)', not if.  Currently this doesn't
really serialize since you can have this:

- three threads all try to run foo_ioctl()
- first thread doesn't block and sets serializer 1
- next two threads both block
- first thread finishes and does a wakeup
- both threads resume and run the 'any code here' bits in parallel
  (not serialized)

If that is not the desired behavior then you need to use a while().  Also, if
this really is a bug and not the desired behavior it's even more reason to use
existing primitives like sx(9) rather than trying to roll your own locks.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201111301000.32787.jhb>