Date: Fri, 03 Jan 2003 13:22:42 -0500 (EST) From: John Baldwin <jhb@FreeBSD.org> To: Alfred Perlstein <bright@mu.org> Cc: alc@freebsd.org, tanimura@freebsd.org, jake@freebsd.org, dillon@freebsd.org, smp@freebsd.org, arch@freebsd.org Subject: RE: Need help fixing lock ordering with filedesc, proc and pipe. Message-ID: <XFMail.20030103132242.jhb@FreeBSD.org> In-Reply-To: <20030103030805.GS26140@elvis.mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 03-Jan-2003 Alfred Perlstein wrote: > The problem is that we have a lock order reversal because of these three > transitions. Thanks to John Baldwin and his Witness subsystem it was > initially made apparent and finally tracked down. > > In a sysctl we have: allproc -> proc -> filedesc > This access is done in a sideways manner, meaning that another proc is > examining the filedesc of a running process. The proc lock is grabbed > because there is a mistaken impression that this is the safe way to > grab a filedesc in a 'sideways' manner. (actually if you look at the > code you'll realize that this is a mistaken assumption and is unsafe > as fdfree() does not actually lock the proc lock when releasing the > filedesc structure. (oops!)) Hmm, we should fix fdfree() then. :-/ Note that the rest of the kernel assumes it is ok to access p_fd directly without a lock for curproc but that should be safe. This function has other problems though. It does a SYSCTL_OUT while holding locks which is potentially ugly. Probably what it should be doing is something like: PROC_LOCK(p); ... (pid and uid) fdp = p->p_fd; if (fdp == NULL) { PROC_UNLOCK(p); } FILEDESC_LOCK(fdp); bump filedesc ref count PROC_UNLOCK(). Then I think you can drop the FILEDESC_LOCK around SYSCTL_OUT and if files close out from under you at the end of the list, so be it. This still has the same lock order problem however. You wouldn't have to wire in the user buffer for the sysctl though. *shrug* > In select we have: filedesc -> object > This access is done as an efficiency mechanism, if we do not hold > the filedesc locked while calling the fo_poll() routine on each > object we must perform an order of magnitude more mutex operations > per file being poll()'d/select()'d on. > We would have to allocate a temporary array to hold all the file > pointers, atomically bump and release thier reference counts each > time we looped through the file scan. > > In each object's write/read routine we have: object -> proc > This is because traditionally wakeup()/psignal() didn't block the > process, so we had "mutual exclusion" or "Giant" theoretically protecting > all of our data. The lock ordering here is done whenever someone requests > SIGIO when data arrives on an object. Data arrives which locks the object > and it then calls into pgsigio to post SIGIO which grabs the proc lock. > > All that said, we wind up with an ordering that's broken: > proc -> filedesc -> object -> proc > > Here's my solutions: (implicit "fix the problem" with each 'pro') > > .) add another mutex to struct proc, this will protect the filedesc reference. > pros: simple fix works nicely. > cons: bloating proc with yet another mutex, > yet another lock aquisition in the exit() path. Alternative to this (not that bad when you think about it, and really kind of like your last solution) is to use one global lock to protect all the p_fd pointers in the !curproc-read case. I.e. instead of adding a mutex to each proc, just have a global pfd_mutex lock. > .) drop the object lock when performing pgsigio > pros: none. > cons: possible race conditions added to code paths, it's hard to > mpsafe something that used to run under Giant if you must > make due with a lock you can't hold onto all the time, > this sort of issue will happen with sockets, vnodes and kqueues > as well. If you defer the sending of sigio until after you are done messing with the object then I think you can safely do it w/o needing the object to be locked. SIGIO has inherent userland races anyways. > .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead > split it into a routine that close()'s all the files in the first pass > and return whether or not we were the last accessor to this particular > filedesc. > Later in kern_exit() right before moving it to the zombie list, > while we have the allproc lock, we NULL out the filedesc pointer and free > it if we were the last accessor. We also need to handle the case > of exec() unsharing the filedescriptor table, I think we can do this > the same way at the cost of a possible allproc lock piggybacked or > placed in the uncommon case of exec with shared descriptor tables. I would rather not leave bogus filedesc's lying arond that have zero reference counts. -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20030103132242.jhb>