Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Jan 2000 16:09:13 +0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Jason Evans <jasone@canonware.com>, Bruce Evans <bde@zeta.org.au>, Alfred Perlstein <bright@wintelcom.net>, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: cvs commit: src/sys/kern vfs_aio.c 
Message-ID:  <20000120080913.0F35B1CA0@overcee.netplex.com.au>
In-Reply-To: Message from Matthew Dillon <dillon@apollo.backplane.com>  of "Wed, 19 Jan 2000 23:15:25 PST." <200001200715.XAA43425@apollo.backplane.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
> :> And usually necessary to avoid races (Locking 101, part 3: an spl is
> :> not a lock...).  However, it may not be necessary in vfs_aio.c, since
> :> the wakeups may only come from other processes and not from interrupt
> :> handlers.  vfs_aio.c sleeps outside of an spl'ed section in about 6
> :> other places, including 2 in very similar code.
> :
> :The person who sent me the patch was convinced that it was a necessary
> :change due to problems he had run into, and he convinced me at the time,
> :though my understanding of the code isn't great.
> :
> :Contrary to a previous statement, I'm going to leave the code as is for now
> :and actually write some test code to try to produce a problem with/without
> :the change.
> :
> :Jason
> 
>     I've reviewed the code carefully.    The KAIO_WAKEUP/tsleep combination
>     can be woken up by aio_physwakeup, which is called from biodone() which
>     itself may be called from an interrupt.  To avoid the race you *MUST*
>     be at splbio() when setting the flag and calling tsleep().
> 
>     There appears to be two places where the flag is set and tsleep is
>     called while running at splbio() and one place where it isn't.  The
>     place where it isn't is incorrect.  All instances where this code
>     sequence occurs:
> 
> 	    ki->kaio_flags |= KAIO_WAKEUP;
> 	    error = tsleep(p, PRIBIO | PCATCH, "aiowc", timo);
> 
>     *MUST* be called while at splbio() to avoid a potential race.

Jason, Matt is right here.  If there is a problem solved by this change,
then it's been done at the expense of introducing a timebomb waiting to 
go off.

>     I will also note that in the past year I found a number of similar
>     races in the VM system and that due to the small window of opportunity
>     they were often very hard, but not impossible, to reproduce.  These
>     are the kinds of races that might occur once every few months in a 
>     medium-loaded system.  Reproduction can depend on many factors including
>     system load and device driver latencies.  There were some horrendous 
>     windows in the VM system that were never hit and some one or two
>     instruction windows that were fairly easy to reproduce.  Go figure.

We need to seed in the new SPLASSERT() calls for checking this sort of thing.
IMHO, everywhere that says "This function must be called at splxxx()" we should
have a SPLASSERT() nearby.

> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>

Cheers,
-Peter




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000120080913.0F35B1CA0>