Date: Fri, 11 Dec 2015 11:13:58 -0800 From: Randall Stewart <rrs@netflix.com> To: Hans Petter Selasky <hps@selasky.org> Cc: "Alexander V. Chernikov" <melifaro@ipfw.ru>, Adrian Chadd <adrian@freebsd.org>, freebsd-net <freebsd-net@freebsd.org>, Gleb Smirnoff <glebius@FreeBSD.org> Subject: Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937] Message-ID: <FC434E75-442B-4E3D-9FFE-936C54823E19@netflix.com> In-Reply-To: <566ABDAF.7060208@selasky.org> References: null <CAJ-VmonvVyTNuYv_as41yPCFdfR5T3FE45DP9MKAc-eyzXzPUg@mail.gmail.com> <2739461446298483@web2h.yandex.ru> <566A94A1.60400@selasky.org> <2850091449828775@web21o.yandex.ru> <566AB081.8050100@selasky.org> <566ABDAF.7060208@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hans: I don=92t think you are getting a 1 back from the callout_reset()..=20 If the pending bit is set, you get a 1 back. But if you have a race = where the arp-timer is blocked on the lock (held by arp resolve) your going to have the pending bit off.. since before calling the function the callout code removes pending. Callout-reset() is going to return 0 here since it is running and cannot be stopped. That should make you *not* decrement your reference. The only time you get a 1 back (where you will decrement your reference) = is if the PENDING flag is set or if you are migrating and the current one = running. The ARP code does not migrate that I can see. Which means that once you start = running and block a return of 0 will be done by the callout system. Since you check the PENDING flag at the top of the callout, that would = get you=20 to return if its been reset. However I notice down in the arptimer code a unlock/lock/lock is done. = Now again the pending flag will be gone when that set of calls is made. So we = should be getting a zero return out of the callout_reset() code. Hmm. let me mull on this a bit.. any time there is a unlock/lock/lock that = could be a problem.. R On Dec 11, 2015, at 4:12 AM, Hans Petter Selasky <hps@selasky.org> = wrote: > On 12/11/15 12:16, Hans Petter Selasky wrote: >> On 12/11/15 11:12, Alexander V. Chernikov wrote: >>> 11.12.2015, 12:15, "Hans Petter Selasky" <hps@selasky.org>: >>>> Hi, >>>>=20 >>>> Pulling the nail out of the haystack hopefully. >>>>=20 >>>>>> Any ideas on where next to look? >>>>=20 >>>> Adrian: In your dump aswell I see: >>>>=20 >>>> la_flags =3D 1 >>>>=20 >>>> That means there was a race calling arptimer() and removing the = "lle". >>> Yes. The interesting part here is why lle is removed. There are = quite >>> a few reasons: either interface address deleted or interface going >>> down, or explicit delete request. >>> That's why I asked Adrian about interface stuff (and haven't got a >>> reply). >>>>=20 >>>> Alexander: Can you comment on the following patch: >>>>=20 >>>> > Index: netinet/if_ether.c >>>> > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> > --- netinet/if_ether.c (revision 291256) >>>> > +++ netinet/if_ether.c (working copy) >>>> > @@ -185,7 +185,13 @@ >>>> > LLE_WUNLOCK(lle); >>>> > return; >>>> > } >>>> > - ifp =3D lle->lle_tbl->llt_ifp; >>>> > + if (lle->la_flags & LLE_LINKED) { >>>> > + ifp =3D lle->lle_tbl->llt_ifp; >>>> > + } else { >>>> > + /* XXX RACE entry has been freed */ >>>> > + llentry_free(lle); >>>> > + return; >>>> > + } >>>> > CURVNET_SET(ifp->if_vnet); >>>> > >>>> > if ((lle->la_flags & LLE_DELETED) =3D=3D 0) { >>>>=20 >>>> We need a check in arptimer() that the lle is still linked before >>> Yes, I had exactly that approach in mind. (And nd6_llinfo_timer() >>> needs the same fix). >>> So, would you commit it or should I? >>>> proceeding, in there from what I can see. Because the callback is = not >>>> protected by a mutex, it is not atomically stopped by = callout_stop(). >>=20 >> Hi, >>=20 >> Talking to Randall offlist, I see some more trouble. Let's get >> everything straight before making a fix. There is one more race I = see: >>=20 >> The start of the arptimer() callback looks like this: >>=20 >> > static void >> > arptimer(void *arg) >> > { >> POINT0 >> > LLE_WLOCK(lle); >> > if (callout_pending(&lle->lle_timer)) { >> POINT1 >> > LLE_WUNLOCK(lle); >> > return; >> > } >>=20 >> The code starting the callback looks like this: >>=20 >> > LLE_ADDREF(la); >> > la->la_expire =3D time_uptime; >> > canceled =3D callout_reset(&la->lle_timer, hz * >> V_arpt_down, >> > arptimer, la); >> > if (canceled) >> > LLE_REMREF(la); >>=20 >> Which can be written like this: >>=20 >> > la->la_expire =3D time_uptime; >> > canceled =3D callout_reset(&la->lle_timer, hz * V_arpt_down, >> > arptimer, la); >> > if (canceled =3D=3D 0) >> > LLE_ADDREF(la); >>=20 >> In case we are at POINT0 in arptimer, callout_reset() will not be = able >> to cancel the callout and will return 0. We should also drop one ref = at >> POINT1, or rewrite the code a bit, which might need Randall's help in >> the callout subsystem area. >>=20 >=20 > Hi, >=20 > Looking at the version history, I see Gleb Smirnoff and Randall, = heavily involved with these code paths. Gleb and Randall, do you have = any comments on the above findings? >=20 > Gleb+Randall: Do you agree or disagree there is a race as pointed out = above? >=20 > Ways forward: >=20 > 1) Revert r278472 (done by Randall) and fix r238990 (done by Gleb) to = use the new callout_async_drain() instead of callout_stop() to avoid = using the rm lock after free. >=20 > 2) Use callout_stop() before callout_reset() and add a reference when = the callout was not pending nor completing. We need to use = callout_stop() in this case because callout_reset() only has two return = values and cannot be used to distinguish the three callout states in = use. >=20 > 3) Modify callout_reset() to have three return values and fix the = arptimer code to not leak references. >=20 > --HPS -------- Randall Stewart rrs@netflix.com 803-317-4952
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FC434E75-442B-4E3D-9FFE-936C54823E19>