Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Jul 2002 21:02:30 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Xu <davidx@viasoft.com.cn>
Cc:        Julian Elischer <julian@elischer.org>, <freebsd-current@FreeBSD.ORG>
Subject:   Re: Timeout and SMP race
Message-ID:  <20020704202730.X21619-100000@gamplex.bde.org>
In-Reply-To: <00f401c22337$c6c94d90$ef01a8c0@davidwnt>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Jul 2002, David Xu wrote:

> ----- Original Message -----
> From: "Julian Elischer" <julian@elischer.org>
> To: "David Xu" <davidx@viasoft.com.cn>
> Cc: <freebsd-current@FreeBSD.ORG>
> Sent: Thursday, July 04, 2002 4:36 PM
> Subject: Re: Timeout and SMP race
> >
> > On Thu, 4 Jul 2002, David Xu wrote:
> >
> > > while we are getting rid of Giant,  current race condition between softclock()
> > > and callout_stop() is unacceptable. the race causes two many places in source
> > > code would be modified to fit this new behaviour,  besides this, everywhere
> > > callout_stop() is used need to hold sched_lock and do a mi_switch() and
> > > modify td_flags is also unacceptable, this SMP race should be resolved in
> > > kern_timeout.c.
> > >
> > > David Xu
> >
> > This is probably true..
> > the current hacks for this are rather horrible. I think there msut be
> > better ways. Your suggestion sounds plausible.
> >
> >
>
> if another thread other than softclock itself is calling callout_stop(),
> and callout_stop() detected that softclock is currently running the
> callout,  it should wait until softclock finishes the work, then return.

softclock() intentionally releases callout_lock() to allow other processes
to manipulate callouts.  What is the race exactly?  Concurrent calls to
softclock() seem to be possible but seem to be handled correctly (internal
locking prevents problems).  Well, I can see one race in softclock():

%				c_func = c->c_func;
%				c_arg = c->c_arg;
%				c_flags = c->c_flags;

This caches some values, as is needed since the 'c' pointer may become
invalid after we release the lock ... but the things pointed to may become
invalid too.

%				c->c_func = NULL;
%				if (c->c_flags & CALLOUT_LOCAL_ALLOC) {
%					c->c_flags = CALLOUT_LOCAL_ALLOC;
%					SLIST_INSERT_HEAD(&callfree, c,
%					    c_links.sle);
%				} else
%					c->c_flags &= ~CALLOUT_PENDING;
%				mtx_unlock_spin(&callout_lock);

callout_stop() may stop 'c' here.  It won't do much, since we have already
set c->c_func to NULL, but its caller may want the callout actually stopped
so that it can do things like unloading the old c->c_func.

%				if ((c_flags & CALLOUT_MPSAFE) == 0)
%					mtx_lock(&Giant);
%				c_func(c_arg);

This calls through a possibly-invalid function pointer.

%				if ((c_flags & CALLOUT_MPSAFE) == 0)
%					mtx_unlock(&Giant);
%				mtx_lock_spin(&callout_lock);

This seems to be an old bug.  In RELENG_4, splsoftclock() gives a more
global lock, but there is nothing to prevent callout_stop() being run
at splsoftclock().  In fact, it must be able to run when called nested
from inside softclock(), since it might be called from the handler.
Waiting in callout_stop() for softclock() to finish would deadlock in
this case.  It's interesting that this case is (always?) avoided in
untimeout() by not calling callout_stop() when c->c_func == NULL.

softclock() can't do anything about c->c_func going away after it is
called.  Clients must somehow avoid killing it.

I think c->c_func rarely goes away, and the race that you noticed is
lost more often.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020704202730.X21619-100000>