Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Feb 2015 17:27:40 -0500
From:      Randall Stewart <rrs@netflix.com>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Eric van Gyzen <eric_van_gyzen@dell.com>, svn-src-all@freebsd.org, Randall Stewart <rrs@freebsd.org>, svn-src-head@freebsd.org, zont@FreeBSD.org, rstone@FreeBSD.org
Subject:   Re: svn commit: r278472 - in head/sys: netinet netinet6
Message-ID:  <89BA29F2-C617-40DE-90E5-F3EC9C17AEC8@netflix.com>
In-Reply-To: <20150213212128.GG15484@FreeBSD.org>
References:  <201502091928.t19JSC5P066293@svn.freebsd.org> <1903622.i3cFFNVYcL@ralph.baldwin.cx> <20150213212128.GG15484@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Gleb:

Ok here is the diff of the arp timer function that this changed made =
(238990):
****************************
 arptimer(void *arg)
 {
+       struct llentry *lle =3D (struct llentry *)arg;
        struct ifnet *ifp;
-       struct llentry   *lle;
-       int pkts_dropped;
+       size_t pkts_dropped;
=20
-       KASSERT(arg !=3D NULL, ("%s: arg NULL", __func__));
-       lle =3D (struct llentry *)arg;
+       if (lle->la_flags & LLE_STATIC) {
+               LLE_WUNLOCK(lle);
+               return;
+       }
+
        ifp =3D lle->lle_tbl->llt_ifp;
        CURVNET_SET(ifp->if_vnet);
+
+       if (lle->la_flags !=3D LLE_DELETED) {
+               int evt;
+
+               if (lle->la_flags & LLE_VALID)
+                       evt =3D LLENTRY_EXPIRED;
+               else
+                       evt =3D LLENTRY_TIMEDOUT;
+               EVENTHANDLER_INVOKE(lle_event, lle, evt);
+       }
+
+       callout_stop(&lle->la_timer);
+
+       /* XXX: LOR avoidance. We still have ref on lle. */
+       LLE_WUNLOCK(lle);
        IF_AFDATA_LOCK(ifp);
        LLE_WLOCK(lle);
-       if (lle->la_flags & LLE_STATIC)
-               LLE_WUNLOCK(lle);
-       else {
-               if (!callout_pending(&lle->la_timer) &&
-                   callout_active(&lle->la_timer)) {
-                       callout_stop(&lle->la_timer);
-                       LLE_REMREF(lle);
=20
-                       if (lle->la_flags !=3D LLE_DELETED) {
-                               int evt;
-
-                               if (lle->la_flags & LLE_VALID)
-                                       evt =3D LLENTRY_EXPIRED;
-                               else
-                                       evt =3D LLENTRY_TIMEDOUT;
-                               EVENTHANDLER_INVOKE(lle_event, lle, =
evt);
-                       }
-
-                       pkts_dropped =3D llentry_free(lle);
-                       ARPSTAT_ADD(dropped, pkts_dropped);
-                       ARPSTAT_INC(timeouts);
-               } else {
-#ifdef DIAGNOSTIC
-                       struct sockaddr *l3addr =3D L3_ADDR(lle);
-                       log(LOG_INFO,
-                           "arptimer issue: %p, IPv4 address: =
\"%s\"\n", lle,
-                           inet_ntoa(
-                               ((const struct sockaddr_in =
*)l3addr)->sin_addr));
-#endif
-                       LLE_WUNLOCK(lle);
-               }
-       }
+       LLE_REMREF(lle);
+       pkts_dropped =3D llentry_free(lle);
        IF_AFDATA_UNLOCK(ifp);
+       ARPSTAT_ADD(dropped, pkts_dropped);
+       ARPSTAT_INC(timeouts);
        CURVNET_RESTORE();
 }
******************************

And I can see *what* the problem was.. If you look at the lines:
-               if (!callout_pending(&lle->la_timer) &&
-                   callout_active(&lle->la_timer)) {

This is the *incorrect* way to write this code it should have been:
              if (callout_pending(&lle->la_timer) &&
                  !callout_active(&lle->la_timer)) {

To properly do the MPSAFE thing.. take a look at the callout manual..
So the original author just mixed up the test..=20

The idea is after you get the lock you check if its pending again, if
so someone has restarted it.. and if its not active, then someone has
stopped it.

They check if it was *not* pending.. which is almost always the case
since the callout code removes the pending flag and thats what you want
to be there. So not pending would always be true..=20

I don=92t see that this old code did the callout_deactive().. but I =
think
the callout_stop() does that I guess..

I think the problem was that the original code was wrong and that
the fix yes, avoided one race but put in another.

I do think a callout_async_drain is the right thing, but that will take =
a while
to get right. Peter Holm (thank you so much) has been pounding on this, =
if you
could find another test to add.. that would be great. I think this will =
work
the way it is..

R






On Feb 13, 2015, at 4:21 PM, Gleb Smirnoff <glebius@freebsd.org> wrote:

> 165863

--------
Randall Stewart
rrs@netflix.com
803-317-4952








Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?89BA29F2-C617-40DE-90E5-F3EC9C17AEC8>