From owner-freebsd-net@freebsd.org Fri Dec 11 09:15:33 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1DB49D8792 for ; Fri, 11 Dec 2015 09:15:33 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF657199F; Fri, 11 Dec 2015 09:15:33 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id B94251FE023; Fri, 11 Dec 2015 10:15:30 +0100 (CET) Subject: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937] To: "Alexander V. Chernikov" , Adrian Chadd , "freebsd-net@freebsd.org" References: null <2739461446298483@web2h.yandex.ru> From: Hans Petter Selasky Message-ID: <566A94A1.60400@selasky.org> Date: Fri, 11 Dec 2015 10:17:21 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <2739461446298483@web2h.yandex.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Dec 2015 09:15:34 -0000 Hi, Pulling the nail out of the haystack hopefully. >> Any ideas on where next to look? Adrian: In your dump aswell I see: la_flags = 1 That means there was a race calling arptimer() and removing the "lle". Alexander: Can you comment on the following patch: > Index: netinet/if_ether.c > =================================================================== > --- netinet/if_ether.c (revision 291256) > +++ netinet/if_ether.c (working copy) > @@ -185,7 +185,13 @@ > LLE_WUNLOCK(lle); > return; > } > - ifp = lle->lle_tbl->llt_ifp; > + if (lle->la_flags & LLE_LINKED) { > + ifp = lle->lle_tbl->llt_ifp; > + } else { > + /* XXX RACE entry has been freed */ > + llentry_free(lle); > + return; > + } > CURVNET_SET(ifp->if_vnet); > > if ((lle->la_flags & LLE_DELETED) == 0) { We need a check in arptimer() that the lle is still linked before proceeding, in there from what I can see. Because the callback is not protected by a mutex, it is not atomically stopped by callout_stop(). --HPS