From owner-freebsd-net@FreeBSD.ORG Mon May 3 14:40:25 2010 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5CCDB1065675; Mon, 3 May 2010 14:40:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2049E8FC28; Mon, 3 May 2010 14:40:25 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id B84A046B0C; Mon, 3 May 2010 10:40:24 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id C89A88A01F; Mon, 3 May 2010 10:40:23 -0400 (EDT) From: John Baldwin To: net@freebsd.org Date: Mon, 3 May 2010 10:40:05 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201005031040.05100.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 03 May 2010 10:40:23 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Brooks Davis Subject: Fix deadlock in vlan detach X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 May 2010 14:40:25 -0000 When a vlan interface is torn down in vlan_unconfig(), it removes references to any multicast addresses from the parent device. If any of those attempts fail, then vlan_unconfig() fails and doesn't tear down the device. However, when a "normal" ifnet is detached, it destroys all its multicast addresses in if_detach() before it invokes the ifnet_departure eventhandler. This means that when the vlan eventhandler tries to call vlan_unconfig(), it will fail if multicast has ever been used on the vlan interface as the attempts to release a reference on the multicast address on the parent interface will fail with ENOENT. However, the code does not expect vlan_unconfig() to ever fail, and in fact it will loop forever here: restart: for (i = 0; i < (1 << ifp->if_vlantrunk->hwidth); i++) if ((ifv = LIST_FIRST(&ifp->if_vlantrunk->hash[i]))) { vlan_unconfig_locked(ifv->ifv_ifp); if (ifp->if_vlantrunk) goto restart; /* trunk->hwidth can change */ else break; } due to the 'goto restart'. The fix I came up with was to make vlan_unconfig() simply ignore errors from removing a multicast reference from the parent device and always succeed. I think this is probably more robust anyway. None of the callers of vlan_unconfig() ever check the return value to handle failure anyway. You can trigger this hang by kldunload'ing a network driver where at least one instance has a sub-interface with a multicast address registered. Thoughts? Index: if_vlan.c =================================================================== --- if_vlan.c (revision 207329) +++ if_vlan.c (working copy) @@ -187,8 +187,8 @@ int (*func)(struct ifnet *, int)); static int vlan_setflags(struct ifnet *ifp, int status); static int vlan_setmulti(struct ifnet *ifp); -static int vlan_unconfig(struct ifnet *ifp); -static int vlan_unconfig_locked(struct ifnet *ifp); +static void vlan_unconfig(struct ifnet *ifp); +static void vlan_unconfig_locked(struct ifnet *ifp); static int vlan_config(struct ifvlan *ifv, struct ifnet *p, uint16_t tag); static void vlan_link_state(struct ifnet *ifp); static void vlan_capabilities(struct ifvlan *ifv); @@ -1128,25 +1128,22 @@ return (error); } -static int +static void vlan_unconfig(struct ifnet *ifp) { - int ret; VLAN_LOCK(); - ret = vlan_unconfig_locked(ifp); + vlan_unconfig_locked(ifp); VLAN_UNLOCK(); - return (ret); } -static int +static void vlan_unconfig_locked(struct ifnet *ifp) { struct ifvlantrunk *trunk; struct vlan_mc_entry *mc; struct ifvlan *ifv; struct ifnet *parent; - int error; VLAN_LOCK_ASSERT(); @@ -1175,9 +1172,15 @@ while ((mc = SLIST_FIRST(&ifv->vlan_mc_listhead)) != NULL) { bcopy((char *)&mc->mc_addr, LLADDR(&sdl), ETHER_ADDR_LEN); - error = if_delmulti(parent, (struct sockaddr *)&sdl); - if (error) - return (error); + + /* + * This may fail if the parent interface is + * being detached. Regardless, we should do a + * best effort to free this interface as much + * as possible as all callers expect vlan + * destruction to succeed. + */ + (void)if_delmulti(parent, (struct sockaddr *)&sdl); SLIST_REMOVE_HEAD(&ifv->vlan_mc_listhead, mc_entries); free(mc, M_VLAN); } @@ -1223,8 +1226,6 @@ */ if (parent != NULL) EVENTHANDLER_INVOKE(vlan_unconfig, parent, ifv->ifv_tag); - - return (0); } /* Handle a reference counted flag that should be set on the parent as well */ -- John Baldwin