From owner-freebsd-current@FreeBSD.ORG Wed Nov 30 15:00:34 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 89AB81065672; Wed, 30 Nov 2011 15:00:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4FC548FC17; Wed, 30 Nov 2011 15:00:34 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id DFE4246B09; Wed, 30 Nov 2011 10:00:33 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6AA67B945; Wed, 30 Nov 2011 10:00:33 -0500 (EST) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 30 Nov 2011 10:00:32 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <201111240928.51162.Daan@vitsch.nl> <201111290107.13631.Daan@vitsch.nl> <20111129142842.GC44498@glebius.int.ru> In-Reply-To: <20111129142842.GC44498@glebius.int.ru> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Message-Id: <201111301000.32787.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 30 Nov 2011 10:00:33 -0500 (EST) Cc: Daan Vreeken , Gleb Smirnoff Subject: Re: if_clone.c allows creating multiple interfaces with the same name X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Nov 2011 15:00:34 -0000 On Tuesday, November 29, 2011 9:28:42 am Gleb Smirnoff wrote: > Daan, > > On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote: > D> Thanks for the looking into this and for your quick commit. I like your twist > D> on the patch with the move from the unit bitmap to allocating unit numbers > D> with alloc_unr(9). > D> > D> I do have two comments on the new code though. > D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when the > D> user requested a unit number larger than ifc->ifc_maxunit. Now the function > D> returns EEXIST in that case because alloc_unr_specific() returns -1 both > D> when a number is already allocated and when the number exceeds it's limits. > D> This leads to the somewhat confusing: > D> > D> # ifconfig bridge123456 create > D> ifconfig: SIOCIFCREATE2: File exists > D> > D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit leak > D> */" part of your change. Once a unit number is leaked, there currently seems > D> to be no way to ever get it back. In a worst case scenario, where the names of > D> multiple interfaces in a row collides with the numbers alloc_unr() returns, an > D> entire row of unit numbers could be leaked during the creation of a single > D> cloned interface. > D> We have a product where cloned interfaces come and go very frequently. I'd > D> hate to run out of unit numbers after 'just a few' years of uptime ;-) > D> > D> I've created a new patch on top of your change that fixes both of these > D> problems. Could you have a look at the attached diff? > > Thanks, I will work on applying it. > > D> > Considering the second part, that adds locking. Unfortunately, right now we > D> > have numerous races in the network configuration ocde. Many SIOCSsomething > D> > ioctls can race with each other producing unpredictable results and kernel > D> > panics. So, running two ifconfig(8) in parallel is a bad idea today. :( > D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race > D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between > D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped after > D> > unit allocation, but prior to interface attachement to global interface > D> > list. > D> > D> Are you sure? With the patch in the PR, the relevant code in > D> ifc_simple_create() would become : > D> > D> ... > D> IFNET_NAMING_LOCK(); > D> err = ifc_alloc_unit(ifc, &unit); > D> if (err != 0) { > D> IFNET_NAMING_UNLOCK(); > D> return (err); > D> } > D> > D> err = ifcs->ifcs_create(ifc, unit, params); > D> IFNET_NAMING_UNLOCK(); > D> if (err != 0) { > D> ifc_free_unit(ifc, unit); > D> return (err); > D> } > D> ... > D> > D> The lock is acquired before the call to ifc_alloc_unit(). > D> In the non-error case (e.g. when creating an if_bridge interface) the call > D> ends up calling bridge_clone_create(), which calls ether_ifattach(), which > D> calls if_attach() -> if_attach_internal() where the ifp is added to the ifnet > D> list. > D> Only when that's done, the lock is dropped. > D> > D> Am I overlooking something obvious here? > > The code above isn't correct since it holds mutex when calling ifcs->ifcs_create(). > These methods may (they actually do) call malloc() with M_WAITOK. > > In general FreeBSD uses M_WAITOK on the configuration code paths, like > any SIOCSsomething ioctl. So to do correct protection, you first need to > allocate every kind of memory needed, then lock mutex, then run through configuration > sequence, then release mutex and in case of error free all allocated memory. > Sounds easy, but isn't in practice, especially when several network modules > are involved. > > So I'm still thinking about... > > D> > From my point of view, we need a generic approach to ioctl() vs ioctl() > D> > races, may be some global serializer of all re-configuration requests of > D> > interfaces and addresses. > > ... but several developers have noted that this approach may have some hidden > problems. When experimenting with new CARP, I have tried it on the carp_ioctl() > without any problems. The idea is simple: > > static int serializer = 0; > static struct mtx serializer_mtx; > MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF); > > int > foo_ioctl() > { > mtx_lock(&serializer_mtx); > if (serializer != 0) > msleep(&serializer, &serializer_mtx, 0, "ioctl", 0); > serializer = 1; > mtx_unlock(&serializer_mtx); > > ... any code here, uncluding calls to different lower layers... > > mtx_lock(&serializer_mtx); > serializer = 0; > wakeup(&serializer); > mtx_unlock(&serializer_mtx); > > return (error); > } > > I have tried this for carp_ioctl() and it worked fine. You can try it for > entire ifioctl() and all its descendants, but be aware of hidden problems :) Hmm, is this much different in effect than: static struct sx serializer_sx; SX_SYSINIT(...); int foo_ioctl() { sx_xlock(&serializer_sx); ... any code here sx_xunlock(&serializer_sx); } ? Using an sx lock is shorter and also allows WITNESS to catch possible cycles that can be otherwise missed with home-rolled locks. Also, you should really use 'while (serializer)', not if. Currently this doesn't really serialize since you can have this: - three threads all try to run foo_ioctl() - first thread doesn't block and sets serializer 1 - next two threads both block - first thread finishes and does a wakeup - both threads resume and run the 'any code here' bits in parallel (not serialized) If that is not the desired behavior then you need to use a while(). Also, if this really is a bug and not the desired behavior it's even more reason to use existing primitives like sx(9) rather than trying to roll your own locks. -- John Baldwin