Date: Thu, 4 Jul 2002 21:02:30 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: David Xu <davidx@viasoft.com.cn> Cc: Julian Elischer <julian@elischer.org>, <freebsd-current@FreeBSD.ORG> Subject: Re: Timeout and SMP race Message-ID: <20020704202730.X21619-100000@gamplex.bde.org> In-Reply-To: <00f401c22337$c6c94d90$ef01a8c0@davidwnt>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Jul 2002, David Xu wrote: > ----- Original Message ----- > From: "Julian Elischer" <julian@elischer.org> > To: "David Xu" <davidx@viasoft.com.cn> > Cc: <freebsd-current@FreeBSD.ORG> > Sent: Thursday, July 04, 2002 4:36 PM > Subject: Re: Timeout and SMP race > > > > On Thu, 4 Jul 2002, David Xu wrote: > > > > > while we are getting rid of Giant, current race condition between softclock() > > > and callout_stop() is unacceptable. the race causes two many places in source > > > code would be modified to fit this new behaviour, besides this, everywhere > > > callout_stop() is used need to hold sched_lock and do a mi_switch() and > > > modify td_flags is also unacceptable, this SMP race should be resolved in > > > kern_timeout.c. > > > > > > David Xu > > > > This is probably true.. > > the current hacks for this are rather horrible. I think there msut be > > better ways. Your suggestion sounds plausible. > > > > > > if another thread other than softclock itself is calling callout_stop(), > and callout_stop() detected that softclock is currently running the > callout, it should wait until softclock finishes the work, then return. softclock() intentionally releases callout_lock() to allow other processes to manipulate callouts. What is the race exactly? Concurrent calls to softclock() seem to be possible but seem to be handled correctly (internal locking prevents problems). Well, I can see one race in softclock(): % c_func = c->c_func; % c_arg = c->c_arg; % c_flags = c->c_flags; This caches some values, as is needed since the 'c' pointer may become invalid after we release the lock ... but the things pointed to may become invalid too. % c->c_func = NULL; % if (c->c_flags & CALLOUT_LOCAL_ALLOC) { % c->c_flags = CALLOUT_LOCAL_ALLOC; % SLIST_INSERT_HEAD(&callfree, c, % c_links.sle); % } else % c->c_flags &= ~CALLOUT_PENDING; % mtx_unlock_spin(&callout_lock); callout_stop() may stop 'c' here. It won't do much, since we have already set c->c_func to NULL, but its caller may want the callout actually stopped so that it can do things like unloading the old c->c_func. % if ((c_flags & CALLOUT_MPSAFE) == 0) % mtx_lock(&Giant); % c_func(c_arg); This calls through a possibly-invalid function pointer. % if ((c_flags & CALLOUT_MPSAFE) == 0) % mtx_unlock(&Giant); % mtx_lock_spin(&callout_lock); This seems to be an old bug. In RELENG_4, splsoftclock() gives a more global lock, but there is nothing to prevent callout_stop() being run at splsoftclock(). In fact, it must be able to run when called nested from inside softclock(), since it might be called from the handler. Waiting in callout_stop() for softclock() to finish would deadlock in this case. It's interesting that this case is (always?) avoided in untimeout() by not calling callout_stop() when c->c_func == NULL. softclock() can't do anything about c->c_func going away after it is called. Clients must somehow avoid killing it. I think c->c_func rarely goes away, and the race that you noticed is lost more often. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020704202730.X21619-100000>