Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Jan 2003 10:38:19 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        alc@freebsd.org, tanimura@freebsd.org, jake@freebsd.org, dillon@freebsd.org, smp@freebsd.org, arch@freebsd.org
Subject:   Re: Need help fixing lock ordering with filedesc, proc and pipe.
Message-ID:  <20030103183819.GC33821@elvis.mu.org>
In-Reply-To: <XFMail.20030103132242.jhb@FreeBSD.org>
References:  <20030103030805.GS26140@elvis.mu.org> <XFMail.20030103132242.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
* John Baldwin <jhb@FreeBSD.org> [030103 10:24] wrote:
> 
> On 03-Jan-2003 Alfred Perlstein wrote:
> > The problem is that we have a lock order reversal because of these three
> > transitions.  Thanks to John Baldwin and his Witness subsystem it was
> > initially made apparent and finally tracked down.
> > 
> > In a sysctl we have: allproc -> proc -> filedesc
> >   This access is done in a sideways manner, meaning that another proc is
> >   examining the filedesc of a running process.  The proc lock is grabbed
> >   because there is a mistaken impression that this is the safe way to
> >   grab a filedesc in a 'sideways' manner.  (actually if you look at the
> >   code you'll realize that this is a mistaken assumption and is unsafe
> >   as fdfree() does not actually lock the proc lock when releasing the
> >   filedesc structure. (oops!))
> 
> Hmm, we should fix fdfree() then. :-/  Note that the rest of the kernel
> assumes it is ok to access p_fd directly without a lock for curproc but
> that should be safe.
> 
> This function has other problems though.  It does a SYSCTL_OUT while
> holding locks which is potentially ugly.  Probably what it should be
> doing is something like:

It looks like it uses sysctl_wire_old_buffer() to ensure that copyout
won't block.

> > In select we have: filedesc -> object
> >   This access is done as an efficiency mechanism, if we do not hold
> >   the filedesc locked while calling the fo_poll() routine on each
> >   object we must perform an order of magnitude more mutex operations
> >   per file being poll()'d/select()'d on.
> >   We would have to allocate a temporary array to hold all the file
> >   pointers, atomically bump and release thier reference counts each
> >   time we looped through the file scan.
> > 
> > In each object's write/read routine we have: object -> proc
> >   This is because traditionally wakeup()/psignal() didn't block the
> >   process, so we had "mutual exclusion" or "Giant" theoretically protecting
> >   all of our data.  The lock ordering here is done whenever someone requests
> >   SIGIO when data arrives on an object.  Data arrives which locks the object
> >   and it then calls into pgsigio to post SIGIO which grabs the proc lock.
> > 
> > All that said, we wind up with an ordering that's broken:
> >   proc -> filedesc -> object -> proc
> > 
> > Here's my solutions:  (implicit "fix the problem" with each 'pro')
> > 
> > .) add another mutex to struct proc, this will protect the filedesc reference.
> >    pros: simple fix works nicely.
> >    cons: bloating proc with yet another mutex,
> >          yet another lock aquisition in the exit() path.
> 
> Alternative to this (not that bad when you think about it, and really kind of
> like your last solution) is to use one global lock to protect all the p_fd
> pointers in the !curproc-read case.  I.e. instead of adding a mutex to each
> proc, just have a global pfd_mutex lock.

Yes, a global mutex in the process fork/exit/exec path.  Not sure if I
like that, but I could live with it.  I had thought of it, but felt
guilty proposing it so it must have stuck in the back of my head.

> > .) drop the object lock when performing pgsigio
> >    pros: none.
> >    cons: possible race conditions added to code paths, it's hard to
> >            mpsafe something that used to run under Giant if you must 
> >            make due with a lock you can't hold onto all the time,
> >          this sort of issue will happen with sockets, vnodes and kqueues
> >            as well.
> 
> If you defer the sending of sigio until after you are done messing with
> the object then I think you can safely do it w/o needing the object to be
> locked.  SIGIO has inherent userland races anyways.

This idea is deceptively simple.  Look at the pipe code and tell me where
this is possible.  Sometimes you have to post sigio in a loop where sleeping
occurs, so you can't just postpone it until just before you return, you
have to drop the object lock.

> > .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead
> >    split it into a routine that close()'s all the files in the first pass
> >    and return whether or not we were the last accessor to this particular
> >    filedesc.
> >    Later in kern_exit() right before moving it to the zombie list,
> >    while we have the allproc lock, we NULL out the filedesc pointer and free
> >    it if we were the last accessor.  We also need to handle the case
> >    of exec() unsharing the filedescriptor table, I think we can do this
> >    the same way at the cost of a possible allproc lock piggybacked or
> >    placed in the uncommon case of exec with shared descriptor tables.
> 
> I would rather not leave bogus filedesc's lying arond that have zero reference
> counts.

This is a really bad assesment of this option.  Please give it some more
thought, it wasn't completetly apparent when I first started coming up
with solutions but it's still my favorite.  And the zero refcount filedesc
is the _only_ problem with it. :)

The filedesc is only held onto for a short period after it is unlinked
from it's parent proc.

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103183819.GC33821>