From owner-freebsd-net@FreeBSD.ORG Thu Dec 29 17:13:12 2011 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC5A41065670; Thu, 29 Dec 2011 17:13:12 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.64.117]) by mx1.freebsd.org (Postfix) with ESMTP id 41D4E8FC14; Thu, 29 Dec 2011 17:13:12 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.5/8.14.5) with ESMTP id pBTHDBar019188; Thu, 29 Dec 2011 21:13:11 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.5/8.14.5/Submit) id pBTHDApD019187; Thu, 29 Dec 2011 21:13:10 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 29 Dec 2011 21:13:10 +0400 From: Gleb Smirnoff To: John Baldwin Message-ID: <20111229171310.GB12721@glebius.int.ru> References: <201112221130.01823.jhb@freebsd.org> <20111227041728.GI8035@FreeBSD.org> <201112291136.01889.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <201112291136.01889.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Robert Watson , net@FreeBSD.org Subject: Re: Transitioning if_addr_lock to an rwlock X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Dec 2011 17:13:13 -0000 On Thu, Dec 29, 2011 at 11:36:01AM -0500, John Baldwin wrote: J> On Monday, December 26, 2011 11:17:28 pm Gleb Smirnoff wrote: J> > On Thu, Dec 22, 2011 at 11:30:01AM -0500, John Baldwin wrote: J> > J> You can find the patch for 8.x at J> > J> http://www.freebsd.org/~jhb/patches/if_addr_rwlock.patch J> > J> > Just my two pennies: for head/ patching if ip_carp.c should J> > be straightforward: J> > J> > 1) Using W in carp_alloc_if() and carp_free_if(). J> > 2) Using R everywhere else. J> J> Yes, this is what I did, but with an extra XXX: J> J> @@ -1512,10 +1512,11 @@ carp_alloc_if(struct ifnet *ifp) J> cif->cif_ifp = ifp; J> TAILQ_INIT(&cif->cif_vrs); J> J> - IF_ADDR_LOCK(ifp); J> + /* XXX: Race, shouldn't this be checking for concurrent calls? */ J> + IF_ADDR_WLOCK(ifp); J> ifp->if_carp = cif; J> if_ref(ifp); J> - IF_ADDR_UNLOCK(ifp); J> + IF_ADDR_WUNLOCK(ifp); J> J> return (cif); J> J> @@ -1534,10 +1535,10 @@ carp_free_if(struct carp_if *cif) J> KASSERT(TAILQ_EMPTY(&cif->cif_vrs), ("%s: softc list not empty", J> __func__)); J> J> - IF_ADDR_LOCK(ifp); J> + IF_ADDR_WLOCK(ifp); J> ifp->if_carp = NULL; J> if_rele(ifp); J> - IF_ADDR_UNLOCK(ifp); J> + IF_ADDR_WUNLOCK(ifp); J> J> CIF_LOCK_DESTROY(cif); J> J> J> Specifically, if two threads both call carp_alloc() at the same time and both J> see a value of NULL for ifp->if_carp (and I really do not like side effects J> like assignments in conditional expressions of if() and while()), then both J> threads can call carp_if_alloc(). Instead, carp_if_alloc() should be doing J> something like this: J> J> IF_ADDR_LOCK(ifp); J> if (ifp->if_carp != NULL) { J> CIF_LOCK_DESTROY(cif); J> free(cif, M_CARP); J> cif = ifp->if_carp; J> } else J> ifp->if_carp = cif; J> IF_ADDR_UNLOCK(ifp); J> J> return (cif); J> J> Similarly, you have a race in the SIOCSVH ioctl handling code. You check J> for a duplicate carp device for a specific (ifnet, vhid), tuple, but J> carp_alloc() doesn't do a recheck when adding the new softc to the cif_vrs J> list. Rather, what it should do is move that loop into carp_alloc() while J> holding the CIF_LOCK() and free the already-created softc and fail J> carp_alloc() if it finds a duplicate. J> J> You also have a race between concurrent carp_alloc() and carp_destroy(). J> Specifically, carp_alloc() might find a cif and be in the process of building J> a new carp softc when a carp_destroy() tears down the cif. The right way to J> fix this is to add a reference count to the cif and have carp_alloc_if() J> always bump the reference count. carp_destroy() would need to drop the J> reference count, but under IF_ADDR_LOCK() and only do a carp_free_if() if the J> count drops to zero. You'd have to grab IF_ADDR_LOCK() in the caller and let J> carp_free_if() unlock it internally. I know all these races. When hacking on new CARP, I understood that trying to avoid them would make code much more hairy. Also, races between two threads that change configuration of networking exist not in CARP only. So I decided to make a sleepable serializer of the ioctl, like I suggested on mail thread about if_cloners, and then you noticed that sx_lock() is already invented and works better :) So the question is still open, I think that we need some clear and generic approach like serializing ifioctl(), instead of making zillions of re-checking and rollbacking when we temporary drop some nerworking lock for malloc(M_WAITOK). As temporary measure we can use this serializing lock not in ifioctl() but in more specific branches of ioctl() call tree, for example in carp_ioctl(). -- Totus tuus, Glebius.