Date: Fri, 13 Feb 2015 17:27:40 -0500 From: Randall Stewart <rrs@netflix.com> To: Gleb Smirnoff <glebius@freebsd.org> Cc: src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Eric van Gyzen <eric_van_gyzen@dell.com>, svn-src-all@freebsd.org, Randall Stewart <rrs@freebsd.org>, svn-src-head@freebsd.org, zont@FreeBSD.org, rstone@FreeBSD.org Subject: Re: svn commit: r278472 - in head/sys: netinet netinet6 Message-ID: <89BA29F2-C617-40DE-90E5-F3EC9C17AEC8@netflix.com> In-Reply-To: <20150213212128.GG15484@FreeBSD.org> References: <201502091928.t19JSC5P066293@svn.freebsd.org> <1903622.i3cFFNVYcL@ralph.baldwin.cx> <20150213212128.GG15484@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Gleb: Ok here is the diff of the arp timer function that this changed made = (238990): **************************** arptimer(void *arg) { + struct llentry *lle =3D (struct llentry *)arg; struct ifnet *ifp; - struct llentry *lle; - int pkts_dropped; + size_t pkts_dropped; =20 - KASSERT(arg !=3D NULL, ("%s: arg NULL", __func__)); - lle =3D (struct llentry *)arg; + if (lle->la_flags & LLE_STATIC) { + LLE_WUNLOCK(lle); + return; + } + ifp =3D lle->lle_tbl->llt_ifp; CURVNET_SET(ifp->if_vnet); + + if (lle->la_flags !=3D LLE_DELETED) { + int evt; + + if (lle->la_flags & LLE_VALID) + evt =3D LLENTRY_EXPIRED; + else + evt =3D LLENTRY_TIMEDOUT; + EVENTHANDLER_INVOKE(lle_event, lle, evt); + } + + callout_stop(&lle->la_timer); + + /* XXX: LOR avoidance. We still have ref on lle. */ + LLE_WUNLOCK(lle); IF_AFDATA_LOCK(ifp); LLE_WLOCK(lle); - if (lle->la_flags & LLE_STATIC) - LLE_WUNLOCK(lle); - else { - if (!callout_pending(&lle->la_timer) && - callout_active(&lle->la_timer)) { - callout_stop(&lle->la_timer); - LLE_REMREF(lle); =20 - if (lle->la_flags !=3D LLE_DELETED) { - int evt; - - if (lle->la_flags & LLE_VALID) - evt =3D LLENTRY_EXPIRED; - else - evt =3D LLENTRY_TIMEDOUT; - EVENTHANDLER_INVOKE(lle_event, lle, = evt); - } - - pkts_dropped =3D llentry_free(lle); - ARPSTAT_ADD(dropped, pkts_dropped); - ARPSTAT_INC(timeouts); - } else { -#ifdef DIAGNOSTIC - struct sockaddr *l3addr =3D L3_ADDR(lle); - log(LOG_INFO, - "arptimer issue: %p, IPv4 address: = \"%s\"\n", lle, - inet_ntoa( - ((const struct sockaddr_in = *)l3addr)->sin_addr)); -#endif - LLE_WUNLOCK(lle); - } - } + LLE_REMREF(lle); + pkts_dropped =3D llentry_free(lle); IF_AFDATA_UNLOCK(ifp); + ARPSTAT_ADD(dropped, pkts_dropped); + ARPSTAT_INC(timeouts); CURVNET_RESTORE(); } ****************************** And I can see *what* the problem was.. If you look at the lines: - if (!callout_pending(&lle->la_timer) && - callout_active(&lle->la_timer)) { This is the *incorrect* way to write this code it should have been: if (callout_pending(&lle->la_timer) && !callout_active(&lle->la_timer)) { To properly do the MPSAFE thing.. take a look at the callout manual.. So the original author just mixed up the test..=20 The idea is after you get the lock you check if its pending again, if so someone has restarted it.. and if its not active, then someone has stopped it. They check if it was *not* pending.. which is almost always the case since the callout code removes the pending flag and thats what you want to be there. So not pending would always be true..=20 I don=92t see that this old code did the callout_deactive().. but I = think the callout_stop() does that I guess.. I think the problem was that the original code was wrong and that the fix yes, avoided one race but put in another. I do think a callout_async_drain is the right thing, but that will take = a while to get right. Peter Holm (thank you so much) has been pounding on this, = if you could find another test to add.. that would be great. I think this will = work the way it is.. R On Feb 13, 2015, at 4:21 PM, Gleb Smirnoff <glebius@freebsd.org> wrote: > 165863 -------- Randall Stewart rrs@netflix.com 803-317-4952
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?89BA29F2-C617-40DE-90E5-F3EC9C17AEC8>