Date: Mon, 17 Mar 2008 11:27:20 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-smp@freebsd.org Cc: stable@freebsd.org, Alfred Perlstein <alfred@freebsd.org> Subject: Re: timeout/untimeout race conditions/crash [patch] Message-ID: <200803171127.20561.jhb@freebsd.org> In-Reply-To: <20080315024114.GD67856@elvis.mu.org> References: <20080315024114.GD67856@elvis.mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 14 March 2008 10:41:14 pm Alfred Perlstein wrote: > We think we tracked down a defect in timeout/untimeout in > FreeBSD. > > We have reduced the problem to the following scenario: > > 2+ cpu system, one cpu is running softclock at the same time > another thread is running on another cpu which makes use of > timeout/untimeout. > > CPU 0 is running "softclock" > CPU 1 is running "driver" with Giant held. > > softclock: mtx_lock_spin(&callout_lock) > softclock: CACHES the callout structure's fields. > softclock: sees that it's a CALLOUT_LOCAL_ALLOC > softclock: executes this code: > if (c->c_flags & CALLOUT_LOCAL_ALLOC) { > c->c_func = NULL; > c->c_flags = CALLOUT_LOCAL_ALLOC; > SLIST_INSERT_HEAD(&callfree, c, > c_links.sle); > curr_callout = NULL; > } else { > > NOTE: that c->c_func has been set to NULL and curr_callout > is also NULL. > softclock: mtx_unlock_spin(&callout_lock) > driver: calls untimeout(), the following sequence happens: > mtx_lock_spin(&callout_lock); > if (handle.callout->c_func == ftn && handle.callout->c_arg == arg) > callout_stop(handle.callout); > mtx_unlock_spin(&callout_lock); > > NOTE: untimeout() sees that handle.callout->c_func is not set > to the function so it does NOT call callout_stop(9)! > driver: free's backing structure for c->c_arg. > softclock: executes callout. > softclock: likely crashes at this point due to access after free. > > I have a patch I'm trying out here, but I need feedback on it. > > The way the patch works is to treat CALLOUT_LOCAL_ALLOC (timeout/untimeout) > callouts the same as ~CALLOUT_LOCAL_ALLOC allocs, and moves the > freelist manipulation to the end of the callout dispatch. > > Some light testing seems to have the system work. > > We are doing some testing in-house to also make sure this works. > > Please provide feedback. > > See attached delta. This is not a bug. Don't use untimeout(9) as it is not guaranteed to be reliable. Instead, use callout_*(). Your patch doesn't solve any races as the driver detach routine needs to use callout_drain() and not just callout_stop/untimeout anyways. Fix your broken drivers. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200803171127.20561.jhb>