Date: Thu, 4 Jul 2002 21:02:30 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: David Xu <davidx@viasoft.com.cn> Cc: Julian Elischer <julian@elischer.org>, <freebsd-current@FreeBSD.ORG> Subject: Re: Timeout and SMP race Message-ID: <20020704202730.X21619-100000@gamplex.bde.org> In-Reply-To: <00f401c22337$c6c94d90$ef01a8c0@davidwnt>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Jul 2002, David Xu wrote:
> ----- Original Message -----
> From: "Julian Elischer" <julian@elischer.org>
> To: "David Xu" <davidx@viasoft.com.cn>
> Cc: <freebsd-current@FreeBSD.ORG>
> Sent: Thursday, July 04, 2002 4:36 PM
> Subject: Re: Timeout and SMP race
> >
> > On Thu, 4 Jul 2002, David Xu wrote:
> >
> > > while we are getting rid of Giant, current race condition between softclock()
> > > and callout_stop() is unacceptable. the race causes two many places in source
> > > code would be modified to fit this new behaviour, besides this, everywhere
> > > callout_stop() is used need to hold sched_lock and do a mi_switch() and
> > > modify td_flags is also unacceptable, this SMP race should be resolved in
> > > kern_timeout.c.
> > >
> > > David Xu
> >
> > This is probably true..
> > the current hacks for this are rather horrible. I think there msut be
> > better ways. Your suggestion sounds plausible.
> >
> >
>
> if another thread other than softclock itself is calling callout_stop(),
> and callout_stop() detected that softclock is currently running the
> callout, it should wait until softclock finishes the work, then return.
softclock() intentionally releases callout_lock() to allow other processes
to manipulate callouts. What is the race exactly? Concurrent calls to
softclock() seem to be possible but seem to be handled correctly (internal
locking prevents problems). Well, I can see one race in softclock():
% c_func = c->c_func;
% c_arg = c->c_arg;
% c_flags = c->c_flags;
This caches some values, as is needed since the 'c' pointer may become
invalid after we release the lock ... but the things pointed to may become
invalid too.
% c->c_func = NULL;
% if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
% c->c_flags = CALLOUT_LOCAL_ALLOC;
% SLIST_INSERT_HEAD(&callfree, c,
% c_links.sle);
% } else
% c->c_flags &= ~CALLOUT_PENDING;
% mtx_unlock_spin(&callout_lock);
callout_stop() may stop 'c' here. It won't do much, since we have already
set c->c_func to NULL, but its caller may want the callout actually stopped
so that it can do things like unloading the old c->c_func.
% if ((c_flags & CALLOUT_MPSAFE) == 0)
% mtx_lock(&Giant);
% c_func(c_arg);
This calls through a possibly-invalid function pointer.
% if ((c_flags & CALLOUT_MPSAFE) == 0)
% mtx_unlock(&Giant);
% mtx_lock_spin(&callout_lock);
This seems to be an old bug. In RELENG_4, splsoftclock() gives a more
global lock, but there is nothing to prevent callout_stop() being run
at splsoftclock(). In fact, it must be able to run when called nested
from inside softclock(), since it might be called from the handler.
Waiting in callout_stop() for softclock() to finish would deadlock in
this case. It's interesting that this case is (always?) avoided in
untimeout() by not calling callout_stop() when c->c_func == NULL.
softclock() can't do anything about c->c_func going away after it is
called. Clients must somehow avoid killing it.
I think c->c_func rarely goes away, and the race that you noticed is
lost more often.
Bruce
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020704202730.X21619-100000>
