From owner-freebsd-current  Thu Jul  4  9:54:22 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 07B1C37B400
	for <current@freebsd.org>; Thu,  4 Jul 2002 09:54:17 -0700 (PDT)
Received: from prism.flugsvamp.com (66-191-112-47.mad.wi.charter.com [66.191.112.47])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DB5F343E09
	for <current@freebsd.org>; Thu,  4 Jul 2002 09:54:16 -0700 (PDT)
	(envelope-from jlemon@flugsvamp.com)
Received: (from jlemon@localhost)
	by prism.flugsvamp.com (8.11.6/8.11.6) id g64GrQN51105;
	Thu, 4 Jul 2002 11:53:26 -0500 (CDT)
	(envelope-from jlemon)
Date: Thu, 4 Jul 2002 11:53:26 -0500
From: Jonathan Lemon <jlemon@flugsvamp.com>
To: Bruce Evans <bde@zeta.org.au>
Cc: Jonathan Lemon <jlemon@flugsvamp.com>, bsddiy@yahoo.com,
	current@freebsd.org
Subject: Re: Timeout and SMP race
Message-ID: <20020704115326.R1020@prism.flugsvamp.com>
References: <200207041530.g64FUwE48481@prism.flugsvamp.com> <20020705022155.L475-100000@gamplex.bde.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre2i
In-Reply-To: <20020705022155.L475-100000@gamplex.bde.org>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Fri, Jul 05, 2002 at 02:38:08AM +1000, Bruce Evans wrote:
> On Thu, 4 Jul 2002, Jonathan Lemon wrote:
> 
> > In article <local.mail.freebsd-current/20020704132044.7338.qmail@web20909.mail.yahoo.com> you write:
> > >in RELENG_4, when one calls callout_stop() (not nested in softclock execute
> > >path
> > >, I am not talking about this case), after it returns, he can believe that the
> > >callout is truely stopped, however in CURRENT, this assumption is false, now we
> > >
> > >must care if callout_stop() truely stopped the callout when it returned, this
> > >is all difference I see, we bring in this race which not exists in RELENG_4,
> > >see what hacking code put in kern_condvar.c and kern_synch.c in CURRENT source,
> >
> > I don't believe this is true.  There is a corresponding race in -stable,
> > where the spl() level is dropped before performing the callout, thus
> > allowing either a call to callout_stop() or callout_reset() to come in
> > immediately before the callout is actually made.
> 
> I think Giant locking everything prevents problems in RELENG_4, at least
> when callout_stop() is called in process context (if it is called in
> interrupt context then it could easily be interrupting a callout even in
> the UP case).

In the network stack at least, callout_stop() is called in interrupt
context, so this case actually happens, and has to be handled.

> The race window extends from when the ipl or lock is dropped across the
> whole callout until the ipl or lock is regained. (The ipl is only dropped
> to splstatclock(); this prevents interruption by timeouts but not by
> other interrupts.  In -current there is nothing much to prevent softclock()
> itself being called concurrently, but in theory softclock()'s internal
> locking should prevent problems.)
> 
> > The callout function is responsible for checking to see if it has lost
> > the race, and perform the appropriate action.  This is done with the
> > CALLOUT_PENDING and CALLOUT_ACTIVE flags:
> 
> >
> >         s = splnet();
> >         if (callout_pending(tp->tt_delack) || !callout_active(tp->tt_delack)) {
> >                 splx(s);
> >                 return;
> >         }
> >         callout_deactivate(tp->tt_delack);
> 
> I think David is objecting to this complicating all callers that do the
> check and breaking all that don't.  The callers in kern_synch.c and
> kern_condvar.c have an mi_switch() and other complications to handle
> this sinc they can't just return.

I believe I had this conversation with Justin Gibbs earlier; he told 
me that the callout consumers (network, cam) had to be aware of the 
race and handle this if it matters.  I don't particularly like complicating
the callout handlers as illustrated above, though, so if a better scheme
is possible, that would be nice.

I originally wanted something equivalent to an atomic spl downgrade
(splhigh -> splnet), so the timeout code could obtain the target
ipl/lock that the callout handler wanted before dropping splhigh(), but
was told this would unnecessarily complicate things.


> > If 'CALLOUT_PENDING' is set, then we lost a race with callout_reset(),
> > and should not perform the callout.  If 'CALLOUT_ACTIVE' is clear, then
> > we lost a race with callout_stop().
> >
> > Either way, on both -current and -stable, you cannot assume that the
> > timer callback is completely gone immediately after calling callout_stop().
> 
> tsleep() seems to assume this in RELENG_4.

tsleep() is only called from process context, and appears to be safe
because p->wchan has already been cleared by a previous call to unsleep().
The races only matter if you actually care about them.
-- 
Jonathan

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message