From owner-freebsd-current@FreeBSD.ORG Tue Nov 29 14:28:44 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C3581065675 for ; Tue, 29 Nov 2011 14:28:44 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.64.117]) by mx1.freebsd.org (Postfix) with ESMTP id 997D48FC12 for ; Tue, 29 Nov 2011 14:28:43 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.5/8.14.5) with ESMTP id pATESgtX077233; Tue, 29 Nov 2011 18:28:42 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.5/8.14.5/Submit) id pATESgUm077232; Tue, 29 Nov 2011 18:28:42 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Tue, 29 Nov 2011 18:28:42 +0400 From: Gleb Smirnoff To: Daan Vreeken Message-ID: <20111129142842.GC44498@glebius.int.ru> References: <201111240928.51162.Daan@vitsch.nl> <20111125131935.GP96616@FreeBSD.org> <20111125143257.GR96616@FreeBSD.org> <201111290107.13631.Daan@vitsch.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <201111290107.13631.Daan@vitsch.nl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Current Subject: Re: if_clone.c allows creating multiple interfaces with the same name X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Nov 2011 14:28:44 -0000 Daan, On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote: D> Thanks for the looking into this and for your quick commit. I like your twist D> on the patch with the move from the unit bitmap to allocating unit numbers D> with alloc_unr(9). D> D> I do have two comments on the new code though. D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when the D> user requested a unit number larger than ifc->ifc_maxunit. Now the function D> returns EEXIST in that case because alloc_unr_specific() returns -1 both D> when a number is already allocated and when the number exceeds it's limits. D> This leads to the somewhat confusing: D> D> # ifconfig bridge123456 create D> ifconfig: SIOCIFCREATE2: File exists D> D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit leak D> */" part of your change. Once a unit number is leaked, there currently seems D> to be no way to ever get it back. In a worst case scenario, where the names of D> multiple interfaces in a row collides with the numbers alloc_unr() returns, an D> entire row of unit numbers could be leaked during the creation of a single D> cloned interface. D> We have a product where cloned interfaces come and go very frequently. I'd D> hate to run out of unit numbers after 'just a few' years of uptime ;-) D> D> I've created a new patch on top of your change that fixes both of these D> problems. Could you have a look at the attached diff? Thanks, I will work on applying it. D> > Considering the second part, that adds locking. Unfortunately, right now we D> > have numerous races in the network configuration ocde. Many SIOCSsomething D> > ioctls can race with each other producing unpredictable results and kernel D> > panics. So, running two ifconfig(8) in parallel is a bad idea today. :( D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped after D> > unit allocation, but prior to interface attachement to global interface D> > list. D> D> Are you sure? With the patch in the PR, the relevant code in D> ifc_simple_create() would become : D> D> ... D> IFNET_NAMING_LOCK(); D> err = ifc_alloc_unit(ifc, &unit); D> if (err != 0) { D> IFNET_NAMING_UNLOCK(); D> return (err); D> } D> D> err = ifcs->ifcs_create(ifc, unit, params); D> IFNET_NAMING_UNLOCK(); D> if (err != 0) { D> ifc_free_unit(ifc, unit); D> return (err); D> } D> ... D> D> The lock is acquired before the call to ifc_alloc_unit(). D> In the non-error case (e.g. when creating an if_bridge interface) the call D> ends up calling bridge_clone_create(), which calls ether_ifattach(), which D> calls if_attach() -> if_attach_internal() where the ifp is added to the ifnet D> list. D> Only when that's done, the lock is dropped. D> D> Am I overlooking something obvious here? The code above isn't correct since it holds mutex when calling ifcs->ifcs_create(). These methods may (they actually do) call malloc() with M_WAITOK. In general FreeBSD uses M_WAITOK on the configuration code paths, like any SIOCSsomething ioctl. So to do correct protection, you first need to allocate every kind of memory needed, then lock mutex, then run through configuration sequence, then release mutex and in case of error free all allocated memory. Sounds easy, but isn't in practice, especially when several network modules are involved. So I'm still thinking about... D> > From my point of view, we need a generic approach to ioctl() vs ioctl() D> > races, may be some global serializer of all re-configuration requests of D> > interfaces and addresses. ... but several developers have noted that this approach may have some hidden problems. When experimenting with new CARP, I have tried it on the carp_ioctl() without any problems. The idea is simple: static int serializer = 0; static struct mtx serializer_mtx; MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF); int foo_ioctl() { mtx_lock(&serializer_mtx); if (serializer != 0) msleep(&serializer, &serializer_mtx, 0, "ioctl", 0); serializer = 1; mtx_unlock(&serializer_mtx); ... any code here, uncluding calls to different lower layers... mtx_lock(&serializer_mtx); serializer = 0; wakeup(&serializer); mtx_unlock(&serializer_mtx); return (error); } I have tried this for carp_ioctl() and it worked fine. You can try it for entire ifioctl() and all its descendants, but be aware of hidden problems :) -- Totus tuus, Glebius.