From owner-freebsd-net@freebsd.org Fri Dec 11 23:26:34 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B3919D7C31 for ; Fri, 11 Dec 2015 23:26:34 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CE90A17FC for ; Fri, 11 Dec 2015 23:26:33 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by padhk6 with SMTP id hk6so32272259pad.2 for ; Fri, 11 Dec 2015 15:26:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=8TwPPowbt3sfq3u8xwNAibubJmu1iDNGbhaOCZ58I1s=; b=Ly4PVvQ+Hjrhc11PYVKwiMu0PeQ93JTuRrNQF2EPN+GLP2K+m/sN7LeRm+v5Oljm3B BmcvPNk1OSJA6GRIrYMo/x5LMAI96QfYhxu7UlXuc29CWHfjMyOPFGi2QB7Fw6tYq19h VDIVlu6P+9+rNldky16NjG6uQMMiSfFjXVxjc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=8TwPPowbt3sfq3u8xwNAibubJmu1iDNGbhaOCZ58I1s=; b=RymojcRnFbf3XfUKTUbAnSM+6IVuI9yovl3kgMLhy5vSdGPy/q7PMjjyWIe48elgco xQo1d5qPlFnUxaaPMZEEBfMjvh1vxKJzglums2kS2ekfD50fYvOkp9YCMwx/aSaQEO9B EUWFgo7zvhG8K1JI8H2QKK6TX5ee19/ljSRFv0t5PzAUGe8cHC32ibWr9UkJxQbaiiyJ A9XFU4xevlVQxxnKKxLZi5+H9q43Rjqthvxgta10NeNp7alLNTFVDni2LQDXOIRGCLs5 xXRXH/ao0rVgdEC6vwJoJeBK3ceMA18bl9J9PJC3suiudI+AJFAL0NpAup8ZrbDHbEVU okuw== X-Gm-Message-State: ALoCoQmbKLqa0BcpPVtoKL0uvs/Y7pq5OrcK4fHHJKYlce8CYEzBivnTVojDr3VqPmeOBhq04RnLaWDxLlXQxFOhSL065lUlMA== X-Received: by 10.66.122.105 with SMTP id lr9mr29042238pab.8.1449876393190; Fri, 11 Dec 2015 15:26:33 -0800 (PST) Received: from [172.20.1.11] ([40.140.7.61]) by smtp.gmail.com with ESMTPSA id qy7sm27698577pab.37.2015.12.11.15.26.30 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 11 Dec 2015 15:26:31 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937] From: Randall Stewart In-Reply-To: Date: Fri, 11 Dec 2015 15:26:29 -0800 Cc: "Alexander V. Chernikov" , Adrian Chadd , freebsd-net , Gleb Smirnoff Message-Id: <91332B46-8CD3-45C0-80D0-AAD29ADD2DE0@netflix.com> References: null <2739461446298483@web2h.yandex.ru> <566A94A1.60400@selasky.org> <2850091449828775@web21o.yandex.ru> <566AB081.8050100@selasky.org> <566ABDAF.7060208@selasky.org> To: Hans Petter Selasky X-Mailer: Apple Mail (2.1878.6) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Dec 2015 23:26:34 -0000 Hans: After talking with Gleb he tells me part of your test is to kldunload a = module. Now I think that is the source of the problem. Probably the cleanup code failed to stop the timer and did the remove.. = thus when the timer expires it blows up. This is not a callout issue.. I think you need to start looking at the = cleanup if you want to pursue this.=20 R On Dec 11, 2015, at 11:13 AM, Randall Stewart wrote: > Hans: >=20 > I don=92t think you are getting a 1 back from the callout_reset()..=20 >=20 > If the pending bit is set, you get a 1 back. But if you have a race = where > the arp-timer is blocked on the lock (held by arp resolve) your going = to > have the pending bit off.. since before calling the function the = callout > code removes pending. >=20 > Callout-reset() is going to return 0 here since it is running and = cannot > be stopped. That should make you *not* decrement your reference. >=20 > The only time you get a 1 back (where you will decrement your = reference) is if > the PENDING flag is set or if you are migrating and the current one = running. The ARP code > does not migrate that I can see. Which means that once you start = running > and block a return of 0 will be done by the callout system. >=20 > Since you check the PENDING flag at the top of the callout, that would = get you=20 > to return if its been reset. >=20 > However I notice down in the arptimer code a unlock/lock/lock is done. = Now again > the pending flag will be gone when that set of calls is made. So we = should > be getting a zero return out of the callout_reset() code. >=20 > Hmm. >=20 > let me mull on this a bit.. any time there is a unlock/lock/lock that = could be > a problem.. >=20 > R >=20 >=20 >=20 >=20 >=20 >=20 >=20 > On Dec 11, 2015, at 4:12 AM, Hans Petter Selasky = wrote: >=20 >> On 12/11/15 12:16, Hans Petter Selasky wrote: >>> On 12/11/15 11:12, Alexander V. Chernikov wrote: >>>> 11.12.2015, 12:15, "Hans Petter Selasky" : >>>>> Hi, >>>>>=20 >>>>> Pulling the nail out of the haystack hopefully. >>>>>=20 >>>>>>> Any ideas on where next to look? >>>>>=20 >>>>> Adrian: In your dump aswell I see: >>>>>=20 >>>>> la_flags =3D 1 >>>>>=20 >>>>> That means there was a race calling arptimer() and removing the = "lle". >>>> Yes. The interesting part here is why lle is removed. There are = quite >>>> a few reasons: either interface address deleted or interface going >>>> down, or explicit delete request. >>>> That's why I asked Adrian about interface stuff (and haven't got a >>>> reply). >>>>>=20 >>>>> Alexander: Can you comment on the following patch: >>>>>=20 >>>>> > Index: netinet/if_ether.c >>>>> > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> > --- netinet/if_ether.c (revision 291256) >>>>> > +++ netinet/if_ether.c (working copy) >>>>> > @@ -185,7 +185,13 @@ >>>>> > LLE_WUNLOCK(lle); >>>>> > return; >>>>> > } >>>>> > - ifp =3D lle->lle_tbl->llt_ifp; >>>>> > + if (lle->la_flags & LLE_LINKED) { >>>>> > + ifp =3D lle->lle_tbl->llt_ifp; >>>>> > + } else { >>>>> > + /* XXX RACE entry has been freed */ >>>>> > + llentry_free(lle); >>>>> > + return; >>>>> > + } >>>>> > CURVNET_SET(ifp->if_vnet); >>>>> > >>>>> > if ((lle->la_flags & LLE_DELETED) =3D=3D 0) { >>>>>=20 >>>>> We need a check in arptimer() that the lle is still linked before >>>> Yes, I had exactly that approach in mind. (And nd6_llinfo_timer() >>>> needs the same fix). >>>> So, would you commit it or should I? >>>>> proceeding, in there from what I can see. Because the callback is = not >>>>> protected by a mutex, it is not atomically stopped by = callout_stop(). >>>=20 >>> Hi, >>>=20 >>> Talking to Randall offlist, I see some more trouble. Let's get >>> everything straight before making a fix. There is one more race I = see: >>>=20 >>> The start of the arptimer() callback looks like this: >>>=20 >>> > static void >>> > arptimer(void *arg) >>> > { >>> POINT0 >>> > LLE_WLOCK(lle); >>> > if (callout_pending(&lle->lle_timer)) { >>> POINT1 >>> > LLE_WUNLOCK(lle); >>> > return; >>> > } >>>=20 >>> The code starting the callback looks like this: >>>=20 >>> > LLE_ADDREF(la); >>> > la->la_expire =3D time_uptime; >>> > canceled =3D callout_reset(&la->lle_timer, hz * >>> V_arpt_down, >>> > arptimer, la); >>> > if (canceled) >>> > LLE_REMREF(la); >>>=20 >>> Which can be written like this: >>>=20 >>> > la->la_expire =3D time_uptime; >>> > canceled =3D callout_reset(&la->lle_timer, hz * V_arpt_down, >>> > arptimer, la); >>> > if (canceled =3D=3D 0) >>> > LLE_ADDREF(la); >>>=20 >>> In case we are at POINT0 in arptimer, callout_reset() will not be = able >>> to cancel the callout and will return 0. We should also drop one ref = at >>> POINT1, or rewrite the code a bit, which might need Randall's help = in >>> the callout subsystem area. >>>=20 >>=20 >> Hi, >>=20 >> Looking at the version history, I see Gleb Smirnoff and Randall, = heavily involved with these code paths. Gleb and Randall, do you have = any comments on the above findings? >>=20 >> Gleb+Randall: Do you agree or disagree there is a race as pointed out = above? >>=20 >> Ways forward: >>=20 >> 1) Revert r278472 (done by Randall) and fix r238990 (done by Gleb) to = use the new callout_async_drain() instead of callout_stop() to avoid = using the rm lock after free. >>=20 >> 2) Use callout_stop() before callout_reset() and add a reference when = the callout was not pending nor completing. We need to use = callout_stop() in this case because callout_reset() only has two return = values and cannot be used to distinguish the three callout states in = use. >>=20 >> 3) Modify callout_reset() to have three return values and fix the = arptimer code to not leak references. >>=20 >> --HPS >=20 > -------- > Randall Stewart > rrs@netflix.com > 803-317-4952 >=20 >=20 >=20 >=20 >=20 -------- Randall Stewart rrs@netflix.com 803-317-4952