From owner-freebsd-current Thu Jul 4 9:54:22 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 07B1C37B400 for ; Thu, 4 Jul 2002 09:54:17 -0700 (PDT) Received: from prism.flugsvamp.com (66-191-112-47.mad.wi.charter.com [66.191.112.47]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB5F343E09 for ; Thu, 4 Jul 2002 09:54:16 -0700 (PDT) (envelope-from jlemon@flugsvamp.com) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.6/8.11.6) id g64GrQN51105; Thu, 4 Jul 2002 11:53:26 -0500 (CDT) (envelope-from jlemon) Date: Thu, 4 Jul 2002 11:53:26 -0500 From: Jonathan Lemon To: Bruce Evans Cc: Jonathan Lemon , bsddiy@yahoo.com, current@freebsd.org Subject: Re: Timeout and SMP race Message-ID: <20020704115326.R1020@prism.flugsvamp.com> References: <200207041530.g64FUwE48481@prism.flugsvamp.com> <20020705022155.L475-100000@gamplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <20020705022155.L475-100000@gamplex.bde.org> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, Jul 05, 2002 at 02:38:08AM +1000, Bruce Evans wrote: > On Thu, 4 Jul 2002, Jonathan Lemon wrote: > > > In article you write: > > >in RELENG_4, when one calls callout_stop() (not nested in softclock execute > > >path > > >, I am not talking about this case), after it returns, he can believe that the > > >callout is truely stopped, however in CURRENT, this assumption is false, now we > > > > > >must care if callout_stop() truely stopped the callout when it returned, this > > >is all difference I see, we bring in this race which not exists in RELENG_4, > > >see what hacking code put in kern_condvar.c and kern_synch.c in CURRENT source, > > > > I don't believe this is true. There is a corresponding race in -stable, > > where the spl() level is dropped before performing the callout, thus > > allowing either a call to callout_stop() or callout_reset() to come in > > immediately before the callout is actually made. > > I think Giant locking everything prevents problems in RELENG_4, at least > when callout_stop() is called in process context (if it is called in > interrupt context then it could easily be interrupting a callout even in > the UP case). In the network stack at least, callout_stop() is called in interrupt context, so this case actually happens, and has to be handled. > The race window extends from when the ipl or lock is dropped across the > whole callout until the ipl or lock is regained. (The ipl is only dropped > to splstatclock(); this prevents interruption by timeouts but not by > other interrupts. In -current there is nothing much to prevent softclock() > itself being called concurrently, but in theory softclock()'s internal > locking should prevent problems.) > > > The callout function is responsible for checking to see if it has lost > > the race, and perform the appropriate action. This is done with the > > CALLOUT_PENDING and CALLOUT_ACTIVE flags: > > > > > s = splnet(); > > if (callout_pending(tp->tt_delack) || !callout_active(tp->tt_delack)) { > > splx(s); > > return; > > } > > callout_deactivate(tp->tt_delack); > > I think David is objecting to this complicating all callers that do the > check and breaking all that don't. The callers in kern_synch.c and > kern_condvar.c have an mi_switch() and other complications to handle > this sinc they can't just return. I believe I had this conversation with Justin Gibbs earlier; he told me that the callout consumers (network, cam) had to be aware of the race and handle this if it matters. I don't particularly like complicating the callout handlers as illustrated above, though, so if a better scheme is possible, that would be nice. I originally wanted something equivalent to an atomic spl downgrade (splhigh -> splnet), so the timeout code could obtain the target ipl/lock that the callout handler wanted before dropping splhigh(), but was told this would unnecessarily complicate things. > > If 'CALLOUT_PENDING' is set, then we lost a race with callout_reset(), > > and should not perform the callout. If 'CALLOUT_ACTIVE' is clear, then > > we lost a race with callout_stop(). > > > > Either way, on both -current and -stable, you cannot assume that the > > timer callback is completely gone immediately after calling callout_stop(). > > tsleep() seems to assume this in RELENG_4. tsleep() is only called from process context, and appears to be safe because p->wchan has already been cleared by a previous call to unsleep(). The races only matter if you actually care about them. -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message