Date: Sat, 14 Feb 2015 00:21:28 +0300 From: Gleb Smirnoff <glebius@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: src-committers@freebsd.org, Eric van Gyzen <eric_van_gyzen@dell.com>, svn-src-all@freebsd.org, Randall Stewart <rrs@freebsd.org>, svn-src-head@freebsd.org, zont@FreeBSD.org, rstone@FreeBSD.org Subject: Re: svn commit: r278472 - in head/sys: netinet netinet6 Message-ID: <20150213212128.GG15484@FreeBSD.org> In-Reply-To: <1903622.i3cFFNVYcL@ralph.baldwin.cx> References: <201502091928.t19JSC5P066293@svn.freebsd.org> <1903622.i3cFFNVYcL@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 09, 2015 at 03:11:21PM -0500, John Baldwin wrote: J> On Monday, February 09, 2015 07:28:12 PM Randall Stewart wrote: J> > Author: rrs J> > Date: Mon Feb 9 19:28:11 2015 J> > New Revision: 278472 J> > URL: https://svnweb.freebsd.org/changeset/base/278472 J> > J> > Log: J> > This fixes a bug in the way that the LLE timers for nd6 J> > and arp were being used. They basically would pass in the J> > mutex to the callout_init. Because they used this method J> > to the callout system, it was possible to "stop" the callout. J> > When flushing the table and you stopped the running callout, the J> > callout_stop code would return 1 indicating that it was going J> > to stop the callout (that was about to run on the callout_wheel blocked J> > by the function calling the stop). Now when 1 was returned, it would J> > lower the reference count one extra time for the stopped timer, then J> > a few lines later delete the memory. Of course the callout_wheel was J> > stuck in the lock code and would then crash since it was accessing J> > freed memory. By using callout_init(c, 1) we always get a 0 back J> > and the reference counting bug does not rear its head. We do have J> > to make a few adjustments to the callouts themselves though to make J> > sure it does the proper thing if rescheduled as well as gets the lock. J> > J> > Commented upon by hiren and sbruno J> > See Phabricator D1777 for more details. J> > J> > Commented upon by hiren and sbruno J> > Reviewed by: adrian, jhb and bz J> > Sponsored by: Netflix Inc. J> J> Eh, I looked at it, but I really, really don't like it. I think J> callout_init_*() should be preferred to CALLOUT_MPSAFE whenever possible as it J> is less race-prone. I think this should probably be fixed by adding Hans' J> callout_drain_async() instead, though this is fine as a temporary workaround. I second concerns. Please look at kern/165863 and r238990 that fixed it. Transition from CALLOUT_MPSAFE to callout_init_rw() was intentional and fixed races. I added to Cc guys who helped to track down that races. May be someone still has test scripts at hand. AFAIR, there were some that allowed to put a box down quite quickly. -- Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150213212128.GG15484>