Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Dec 2015 15:26:29 -0800
From:      Randall Stewart <rrs@netflix.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        "Alexander V. Chernikov" <melifaro@ipfw.ru>, Adrian Chadd <adrian@freebsd.org>, freebsd-net <freebsd-net@freebsd.org>, Gleb Smirnoff <glebius@FreeBSD.org>
Subject:   Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]
Message-ID:  <91332B46-8CD3-45C0-80D0-AAD29ADD2DE0@netflix.com>
In-Reply-To: <FC434E75-442B-4E3D-9FFE-936C54823E19@netflix.com>
References:  null <CAJ-VmonvVyTNuYv_as41yPCFdfR5T3FE45DP9MKAc-eyzXzPUg@mail.gmail.com> <2739461446298483@web2h.yandex.ru> <566A94A1.60400@selasky.org> <2850091449828775@web21o.yandex.ru> <566AB081.8050100@selasky.org> <566ABDAF.7060208@selasky.org> <FC434E75-442B-4E3D-9FFE-936C54823E19@netflix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hans:

After talking with Gleb he tells me part of your test is to kldunload a =
module.

Now I think that is the source of the problem.

Probably the cleanup code failed to stop the timer and did the remove.. =
thus
when the timer expires it blows up.

This is not a callout issue.. I think you need to start looking at the =
cleanup if you
want to pursue this.=20

R
On Dec 11, 2015, at 11:13 AM, Randall Stewart <rrs@netflix.com> wrote:

> Hans:
>=20
> I don=92t think you are getting a 1 back from the callout_reset()..=20
>=20
> If the pending bit is set, you get a 1 back. But if you have a race =
where
> the arp-timer is blocked on the lock (held by arp resolve) your going =
to
> have the pending bit off.. since before calling the function the =
callout
> code removes pending.
>=20
> Callout-reset() is going to return 0 here since it is running and =
cannot
> be stopped. That should make you *not* decrement your reference.
>=20
> The only time you get a 1 back (where you will decrement your =
reference) is if
> the PENDING flag is set or if you are migrating and the current one =
running. The ARP code
> does not migrate that I can see.  Which means that once you start =
running
> and block a return of 0 will be done by the callout system.
>=20
> Since you check the PENDING flag at the top of the callout, that would =
get you=20
> to return if its been reset.
>=20
> However I notice down in the arptimer code a unlock/lock/lock is done. =
Now again
> the pending flag will be gone when that set of calls is made. So we =
should
> be getting a zero return out of the callout_reset() code.
>=20
> Hmm.
>=20
> let me mull on this a bit.. any time there is a unlock/lock/lock that =
could be
> a problem..
>=20
> R
>=20
>=20
>=20
>=20
>=20
>=20
>=20
> On Dec 11, 2015, at 4:12 AM, Hans Petter Selasky <hps@selasky.org> =
wrote:
>=20
>> On 12/11/15 12:16, Hans Petter Selasky wrote:
>>> On 12/11/15 11:12, Alexander V. Chernikov wrote:
>>>> 11.12.2015, 12:15, "Hans Petter Selasky" <hps@selasky.org>:
>>>>> Hi,
>>>>>=20
>>>>> Pulling the nail out of the haystack hopefully.
>>>>>=20
>>>>>>>  Any ideas on where next to look?
>>>>>=20
>>>>> Adrian: In your dump aswell I see:
>>>>>=20
>>>>> la_flags =3D 1
>>>>>=20
>>>>> That means there was a race calling arptimer() and removing the =
"lle".
>>>> Yes. The interesting part here is why lle is removed. There are =
quite
>>>> a few reasons: either interface address deleted or interface going
>>>> down, or explicit delete request.
>>>> That's why I asked Adrian about interface stuff (and haven't got a
>>>> reply).
>>>>>=20
>>>>> Alexander: Can you comment on the following patch:
>>>>>=20
>>>>>  > Index: netinet/if_ether.c
>>>>>  > =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>>>>>  > --- netinet/if_ether.c (revision 291256)
>>>>>  > +++ netinet/if_ether.c (working copy)
>>>>>  > @@ -185,7 +185,13 @@
>>>>>  > LLE_WUNLOCK(lle);
>>>>>  > return;
>>>>>  > }
>>>>>  > - ifp =3D lle->lle_tbl->llt_ifp;
>>>>>  > + if (lle->la_flags & LLE_LINKED) {
>>>>>  > + ifp =3D lle->lle_tbl->llt_ifp;
>>>>>  > + } else {
>>>>>  > + /* XXX RACE entry has been freed */
>>>>>  > + llentry_free(lle);
>>>>>  > + return;
>>>>>  > + }
>>>>>  > CURVNET_SET(ifp->if_vnet);
>>>>>  >
>>>>>  > if ((lle->la_flags & LLE_DELETED) =3D=3D 0) {
>>>>>=20
>>>>> We need a check in arptimer() that the lle is still linked before
>>>> Yes, I had exactly that approach in mind. (And nd6_llinfo_timer()
>>>> needs the same fix).
>>>> So, would you commit it or should I?
>>>>> proceeding, in there from what I can see. Because the callback is =
not
>>>>> protected by a mutex, it is not atomically stopped by =
callout_stop().
>>>=20
>>> Hi,
>>>=20
>>> Talking to Randall offlist, I see some more trouble. Let's get
>>> everything straight before making a fix. There is one more race I =
see:
>>>=20
>>> The start of the arptimer() callback looks like this:
>>>=20
>>> > static void
>>> > arptimer(void *arg)
>>> > {
>>> POINT0
>>> >         LLE_WLOCK(lle);
>>> >         if (callout_pending(&lle->lle_timer)) {
>>> POINT1
>>> >                 LLE_WUNLOCK(lle);
>>> >                 return;
>>> >         }
>>>=20
>>> The code starting the callback looks like this:
>>>=20
>>> >                 LLE_ADDREF(la);
>>> >                 la->la_expire =3D time_uptime;
>>> >                 canceled =3D callout_reset(&la->lle_timer, hz *
>>> V_arpt_down,
>>> >                     arptimer, la);
>>> >                 if (canceled)
>>> >                         LLE_REMREF(la);
>>>=20
>>> Which can be written like this:
>>>=20
>>> > la->la_expire =3D time_uptime;
>>> > canceled =3D callout_reset(&la->lle_timer, hz * V_arpt_down,
>>> >     arptimer, la);
>>> > if (canceled =3D=3D 0)
>>> >     LLE_ADDREF(la);
>>>=20
>>> In case we are at POINT0 in arptimer, callout_reset() will not be =
able
>>> to cancel the callout and will return 0. We should also drop one ref =
at
>>> POINT1, or rewrite the code a bit, which might need Randall's help =
in
>>> the callout subsystem area.
>>>=20
>>=20
>> Hi,
>>=20
>> Looking at the version history, I see Gleb Smirnoff and Randall, =
heavily involved with these code paths. Gleb and Randall, do you have =
any comments on the above findings?
>>=20
>> Gleb+Randall: Do you agree or disagree there is a race as pointed out =
above?
>>=20
>> Ways forward:
>>=20
>> 1) Revert r278472 (done by Randall) and fix r238990 (done by Gleb) to =
use the new callout_async_drain() instead of callout_stop() to avoid =
using the rm lock after free.
>>=20
>> 2) Use callout_stop() before callout_reset() and add a reference when =
the callout was not pending nor completing. We need to use =
callout_stop() in this case because callout_reset() only has two return =
values and cannot be used to distinguish the three callout states in =
use.
>>=20
>> 3) Modify callout_reset() to have three return values and fix the =
arptimer code to not leak references.
>>=20
>> --HPS
>=20
> --------
> Randall Stewart
> rrs@netflix.com
> 803-317-4952
>=20
>=20
>=20
>=20
>=20

--------
Randall Stewart
rrs@netflix.com
803-317-4952








Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?91332B46-8CD3-45C0-80D0-AAD29ADD2DE0>