From owner-freebsd-net@freebsd.org Fri Dec 11 12:10:44 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 396F49D6EBB for ; Fri, 11 Dec 2015 12:10:44 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C09AF1D97; Fri, 11 Dec 2015 12:10:43 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id E3C1A1FE023; Fri, 11 Dec 2015 13:10:40 +0100 (CET) Subject: Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937] To: "Alexander V. Chernikov" , Adrian Chadd , "freebsd-net@freebsd.org" , Randall Stewart , Gleb Smirnoff References: null <2739461446298483@web2h.yandex.ru> <566A94A1.60400@selasky.org> <2850091449828775@web21o.yandex.ru> <566AB081.8050100@selasky.org> From: Hans Petter Selasky Message-ID: <566ABDAF.7060208@selasky.org> Date: Fri, 11 Dec 2015 13:12:31 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <566AB081.8050100@selasky.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Dec 2015 12:10:44 -0000 On 12/11/15 12:16, Hans Petter Selasky wrote: > On 12/11/15 11:12, Alexander V. Chernikov wrote: >> 11.12.2015, 12:15, "Hans Petter Selasky" : >>> Hi, >>> >>> Pulling the nail out of the haystack hopefully. >>> >>>>> Any ideas on where next to look? >>> >>> Adrian: In your dump aswell I see: >>> >>> la_flags = 1 >>> >>> That means there was a race calling arptimer() and removing the "lle". >> Yes. The interesting part here is why lle is removed. There are quite >> a few reasons: either interface address deleted or interface going >> down, or explicit delete request. >> That's why I asked Adrian about interface stuff (and haven't got a >> reply). >>> >>> Alexander: Can you comment on the following patch: >>> >>> > Index: netinet/if_ether.c >>> > =================================================================== >>> > --- netinet/if_ether.c (revision 291256) >>> > +++ netinet/if_ether.c (working copy) >>> > @@ -185,7 +185,13 @@ >>> > LLE_WUNLOCK(lle); >>> > return; >>> > } >>> > - ifp = lle->lle_tbl->llt_ifp; >>> > + if (lle->la_flags & LLE_LINKED) { >>> > + ifp = lle->lle_tbl->llt_ifp; >>> > + } else { >>> > + /* XXX RACE entry has been freed */ >>> > + llentry_free(lle); >>> > + return; >>> > + } >>> > CURVNET_SET(ifp->if_vnet); >>> > >>> > if ((lle->la_flags & LLE_DELETED) == 0) { >>> >>> We need a check in arptimer() that the lle is still linked before >> Yes, I had exactly that approach in mind. (And nd6_llinfo_timer() >> needs the same fix). >> So, would you commit it or should I? >>> proceeding, in there from what I can see. Because the callback is not >>> protected by a mutex, it is not atomically stopped by callout_stop(). > > Hi, > > Talking to Randall offlist, I see some more trouble. Let's get > everything straight before making a fix. There is one more race I see: > > The start of the arptimer() callback looks like this: > > > static void > > arptimer(void *arg) > > { > POINT0 > > LLE_WLOCK(lle); > > if (callout_pending(&lle->lle_timer)) { > POINT1 > > LLE_WUNLOCK(lle); > > return; > > } > > The code starting the callback looks like this: > > > LLE_ADDREF(la); > > la->la_expire = time_uptime; > > canceled = callout_reset(&la->lle_timer, hz * > V_arpt_down, > > arptimer, la); > > if (canceled) > > LLE_REMREF(la); > > Which can be written like this: > > > la->la_expire = time_uptime; > > canceled = callout_reset(&la->lle_timer, hz * V_arpt_down, > > arptimer, la); > > if (canceled == 0) > > LLE_ADDREF(la); > > In case we are at POINT0 in arptimer, callout_reset() will not be able > to cancel the callout and will return 0. We should also drop one ref at > POINT1, or rewrite the code a bit, which might need Randall's help in > the callout subsystem area. > Hi, Looking at the version history, I see Gleb Smirnoff and Randall, heavily involved with these code paths. Gleb and Randall, do you have any comments on the above findings? Gleb+Randall: Do you agree or disagree there is a race as pointed out above? Ways forward: 1) Revert r278472 (done by Randall) and fix r238990 (done by Gleb) to use the new callout_async_drain() instead of callout_stop() to avoid using the rm lock after free. 2) Use callout_stop() before callout_reset() and add a reference when the callout was not pending nor completing. We need to use callout_stop() in this case because callout_reset() only has two return values and cannot be used to distinguish the three callout states in use. 3) Modify callout_reset() to have three return values and fix the arptimer code to not leak references. --HPS