Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Jan 2003 19:08:05 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        arch@freebsd.org
Cc:        smp@freebsd.org, jhb@freebsd.org, dillon@freebsd.org, jake@freebsd.org, tanimura@freebsd.org, alc@freebsd.org
Subject:   Need help fixing lock ordering with filedesc, proc and pipe.
Message-ID:  <20030103030805.GS26140@elvis.mu.org>

next in thread | raw e-mail | index | archive | help
[please keep me on cc list, I'm not subscribed to most lists]

So far I have several fixes, for the lock order reversal with pipes
and sigio.  I'm not happy with most of them and the one I am happy 
with I came up with while writing this email, so please pick on it
the most. :)

Anyhow, I'd really appreciate feedback from those that understand
lock order and smp issues. :)

I'm going to outline and explain them here and hopefully someone
will come up with an alternate or convince me that one of them
is right.

First a refresher/explanation.

Each proc has a filedesc that holds the process's file descriptors.
This structure needs locking because other threads within the process
or other processes (via rfork()) may want to manipulate the array of
file pointers and needs exclusive access to do so safely.

Each proc has a lock, this lock covers many, many, many fields of the
proc struct including the signal state.

Each pipe (or any other object behind a 'struct file') has a lock, this
lock is user to maintain consitancy over the object itself, examples:
bytes to be transfered, how many readers/writers, etc.  From now on
I will refer to pipe as 'object'.

The problem is that we have a lock order reversal because of these three
transitions.  Thanks to John Baldwin and his Witness subsystem it was
initially made apparent and finally tracked down.

In a sysctl we have: allproc -> proc -> filedesc
  This access is done in a sideways manner, meaning that another proc is
  examining the filedesc of a running process.  The proc lock is grabbed
  because there is a mistaken impression that this is the safe way to
  grab a filedesc in a 'sideways' manner.  (actually if you look at the
  code you'll realize that this is a mistaken assumption and is unsafe
  as fdfree() does not actually lock the proc lock when releasing the
  filedesc structure. (oops!))

In select we have: filedesc -> object
  This access is done as an efficiency mechanism, if we do not hold
  the filedesc locked while calling the fo_poll() routine on each
  object we must perform an order of magnitude more mutex operations
  per file being poll()'d/select()'d on.
  We would have to allocate a temporary array to hold all the file
  pointers, atomically bump and release thier reference counts each
  time we looped through the file scan.

In each object's write/read routine we have: object -> proc
  This is because traditionally wakeup()/psignal() didn't block the
  process, so we had "mutual exclusion" or "Giant" theoretically protecting
  all of our data.  The lock ordering here is done whenever someone requests
  SIGIO when data arrives on an object.  Data arrives which locks the object
  and it then calls into pgsigio to post SIGIO which grabs the proc lock.

All that said, we wind up with an ordering that's broken:
  proc -> filedesc -> object -> proc

Here's my solutions:  (implicit "fix the problem" with each 'pro')

.) add another mutex to struct proc, this will protect the filedesc reference.
   pros: simple fix works nicely.
   cons: bloating proc with yet another mutex,
         yet another lock aquisition in the exit() path.

.) don't hold the filedesc lock over the fo_poll routines in select()/poll()
   pros: fo_poll routines can now block,
         our routine is somewhat closer to linux's implementation so
           we have a reference.
   cons: fo_poll routines should not block,
         expensive mutex operations at least 4 lock/unlock and 2 atomic ops
           per file,
         extra malloc in the select/poll codepath,
         extra complexity,
         our routine is somewhat closer to linux's implementation. :)

.) drop the object lock when performing pgsigio
   pros: none.
   cons: possible race conditions added to code paths, it's hard to
           mpsafe something that used to run under Giant if you must 
           make due with a lock you can't hold onto all the time,
         this sort of issue will happen with sockets, vnodes and kqueues
           as well.

.) don't allow fdfree() to NULL out the filedesc in the proc struct, instead
   split it into a routine that close()'s all the files in the first pass
   and return whether or not we were the last accessor to this particular
   filedesc.
   Later in kern_exit() right before moving it to the zombie list,
   while we have the allproc lock, we NULL out the filedesc pointer and free
   it if we were the last accessor.  We also need to handle the case
   of exec() unsharing the filedescriptor table, I think we can do this
   the same way at the cost of a possible allproc lock piggybacked or
   placed in the uncommon case of exec with shared descriptor tables.
   pros: should work,
         minimal change,
         we pigggyback the locking operations on existing ones (allproc).
   cons: fdfree now needs to lock the filedesc while closing each file,
           not such a big deal.
         allproc needed whenever sideways access to a single proc's filedesc
           is required,
         allproc must be held whenever performing any sideways filedesc
           operation

I really like the last solution (piggyback on allproc).  But since
this _will_ dictate how we continue to do locking in the future I'd
like to open it for discussion.

Of course alternate solutions would be nice to hear.

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103030805.GS26140>