From owner-freebsd-current@FreeBSD.ORG  Wed Nov 30 15:00:34 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 89AB81065672;
	Wed, 30 Nov 2011 15:00:34 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 4FC548FC17;
	Wed, 30 Nov 2011 15:00:34 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170])
	by cyrus.watson.org (Postfix) with ESMTPSA id DFE4246B09;
	Wed, 30 Nov 2011 10:00:33 -0500 (EST)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6AA67B945;
	Wed, 30 Nov 2011 10:00:33 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-current@freebsd.org
Date: Wed, 30 Nov 2011 10:00:32 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; )
References: <201111240928.51162.Daan@vitsch.nl>
	<201111290107.13631.Daan@vitsch.nl>
	<20111129142842.GC44498@glebius.int.ru>
In-Reply-To: <20111129142842.GC44498@glebius.int.ru>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="koi8-r"
Content-Transfer-Encoding: 7bit
Message-Id: <201111301000.32787.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Wed, 30 Nov 2011 10:00:33 -0500 (EST)
Cc: Daan Vreeken <Daan@vitsch.nl>, Gleb Smirnoff <glebius@freebsd.org>
Subject: Re: if_clone.c allows creating multiple interfaces with the same
	name
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Nov 2011 15:00:34 -0000

On Tuesday, November 29, 2011 9:28:42 am Gleb Smirnoff wrote:
>   Daan,
> 
> On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote:
> D> Thanks for the looking into this and for your quick commit. I like your twist
> D> on the patch with the move from the unit bitmap to allocating unit numbers
> D> with alloc_unr(9).
> D> 
> D> I do have two comments on the new code though.
> D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when the
> D> user requested a unit number larger than ifc->ifc_maxunit. Now the function
> D> returns EEXIST in that case because alloc_unr_specific() returns -1 both
> D> when a number is already allocated and when the number exceeds it's limits.
> D> This leads to the somewhat confusing:
> D> 
> D>         # ifconfig bridge123456 create
> D>         ifconfig: SIOCIFCREATE2: File exists
> D> 
> D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit leak
> D> */" part of your change. Once a unit number is leaked, there currently seems
> D> to be no way to ever get it back. In a worst case scenario, where the names of
> D> multiple interfaces in a row collides with the numbers alloc_unr() returns, an
> D> entire row of unit numbers could be leaked during the creation of a single
> D> cloned interface.
> D> We have a product where cloned interfaces come and go very frequently. I'd
> D> hate to run out of unit numbers after 'just a few' years of uptime ;-)
> D> 
> D> I've created a new patch on top of your change that fixes both of these
> D> problems. Could you have a look at the attached diff?
> 
> Thanks, I will work on applying it.
> 
> D> > Considering the second part, that adds locking. Unfortunately, right now we
> D> > have numerous races in the network configuration ocde. Many SIOCSsomething
> D> > ioctls can race with each other producing unpredictable results and kernel
> D> > panics. So, running two ifconfig(8) in parallel is a bad idea today. :(
> D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race
> D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between
> D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped after
> D> > unit allocation, but prior to interface attachement to global interface
> D> > list.
> D> 
> D> Are you sure? With the patch in the PR, the relevant code in 
> D> ifc_simple_create() would become :
> D> 
> D> 	...
> D> 	IFNET_NAMING_LOCK();
> D> 	err = ifc_alloc_unit(ifc, &unit);
> D> 	if (err != 0) {
> D> 		IFNET_NAMING_UNLOCK();
> D> 		return (err);
> D> 	}
> D> 
> D> 	err = ifcs->ifcs_create(ifc, unit, params);
> D> 	IFNET_NAMING_UNLOCK();
> D> 	if (err != 0) {
> D> 		ifc_free_unit(ifc, unit);
> D> 		return (err);
> D> 	}
> D> 	...
> D> 
> D> The lock is acquired before the call to ifc_alloc_unit().
> D> In the non-error case (e.g. when creating an if_bridge interface) the call
> D> ends up calling bridge_clone_create(), which calls ether_ifattach(), which
> D> calls if_attach() -> if_attach_internal() where the ifp is added to the ifnet
> D> list.
> D> Only when that's done, the lock is dropped.
> D> 
> D> Am I overlooking something obvious here?
> 
> The code above isn't correct since it holds mutex when calling ifcs->ifcs_create().
> These methods may (they actually do) call malloc() with M_WAITOK.
> 
> In general FreeBSD uses M_WAITOK on the configuration code paths, like
> any SIOCSsomething ioctl. So to do correct protection, you first need to
> allocate every kind of memory needed, then lock mutex, then run through configuration
> sequence, then release mutex and in case of error free all allocated memory.
> Sounds easy, but isn't in practice, especially when several network modules
> are involved.
> 
> So I'm still thinking about...
> 
> D> > From my point of view, we need a generic approach to ioctl() vs ioctl()
> D> > races, may be some global serializer of all re-configuration requests of
> D> > interfaces and addresses.
> 
> ... but several developers have noted that this approach may have some hidden
> problems.  When experimenting with new CARP, I have tried it on the carp_ioctl()
> without any problems. The idea is simple:
> 
> static int serializer = 0;
> static struct mtx serializer_mtx;
> MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF);
> 
> int
> foo_ioctl()
> {
> 	mtx_lock(&serializer_mtx);
> 	if (serializer != 0)
> 		msleep(&serializer, &serializer_mtx, 0, "ioctl", 0);
> 	serializer = 1;
> 	mtx_unlock(&serializer_mtx);
> 
> 	... any code here, uncluding calls to different lower layers...
> 
> 	mtx_lock(&serializer_mtx);
> 	serializer = 0;
> 	wakeup(&serializer);
> 	mtx_unlock(&serializer_mtx);
> 
> 	return (error);
> }
> 
> I have tried this for carp_ioctl() and it worked fine. You can try it for
> entire ifioctl() and all its descendants, but be aware of hidden problems :)

Hmm, is this much different in effect than:

static struct sx serializer_sx;
SX_SYSINIT(...);

int
foo_ioctl()
{
	sx_xlock(&serializer_sx);

	... any code here

	sx_xunlock(&serializer_sx);
}

?

Using an sx lock is shorter and also allows WITNESS to catch possible cycles
that can be otherwise missed with home-rolled locks.

Also, you should really use 'while (serializer)', not if.  Currently this doesn't
really serialize since you can have this:

- three threads all try to run foo_ioctl()
- first thread doesn't block and sets serializer 1
- next two threads both block
- first thread finishes and does a wakeup
- both threads resume and run the 'any code here' bits in parallel
  (not serialized)

If that is not the desired behavior then you need to use a while().  Also, if
this really is a bug and not the desired behavior it's even more reason to use
existing primitives like sx(9) rather than trying to roll your own locks.

-- 
John Baldwin