Date: Fri, 3 Jan 2003 10:38:19 -0800 From: Alfred Perlstein <bright@mu.org> To: John Baldwin <jhb@FreeBSD.org> Cc: alc@freebsd.org, tanimura@freebsd.org, jake@freebsd.org, dillon@freebsd.org, smp@freebsd.org, arch@freebsd.org Subject: Re: Need help fixing lock ordering with filedesc, proc and pipe. Message-ID: <20030103183819.GC33821@elvis.mu.org> In-Reply-To: <XFMail.20030103132242.jhb@FreeBSD.org> References: <20030103030805.GS26140@elvis.mu.org> <XFMail.20030103132242.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
* John Baldwin <jhb@FreeBSD.org> [030103 10:24] wrote: > > On 03-Jan-2003 Alfred Perlstein wrote: > > The problem is that we have a lock order reversal because of these three > > transitions. Thanks to John Baldwin and his Witness subsystem it was > > initially made apparent and finally tracked down. > > > > In a sysctl we have: allproc -> proc -> filedesc > > This access is done in a sideways manner, meaning that another proc is > > examining the filedesc of a running process. The proc lock is grabbed > > because there is a mistaken impression that this is the safe way to > > grab a filedesc in a 'sideways' manner. (actually if you look at the > > code you'll realize that this is a mistaken assumption and is unsafe > > as fdfree() does not actually lock the proc lock when releasing the > > filedesc structure. (oops!)) > > Hmm, we should fix fdfree() then. :-/ Note that the rest of the kernel > assumes it is ok to access p_fd directly without a lock for curproc but > that should be safe. > > This function has other problems though. It does a SYSCTL_OUT while > holding locks which is potentially ugly. Probably what it should be > doing is something like: It looks like it uses sysctl_wire_old_buffer() to ensure that copyout won't block. > > In select we have: filedesc -> object > > This access is done as an efficiency mechanism, if we do not hold > > the filedesc locked while calling the fo_poll() routine on each > > object we must perform an order of magnitude more mutex operations > > per file being poll()'d/select()'d on. > > We would have to allocate a temporary array to hold all the file > > pointers, atomically bump and release thier reference counts each > > time we looped through the file scan. > > > > In each object's write/read routine we have: object -> proc > > This is because traditionally wakeup()/psignal() didn't block the > > process, so we had "mutual exclusion" or "Giant" theoretically protecting > > all of our data. The lock ordering here is done whenever someone requests > > SIGIO when data arrives on an object. Data arrives which locks the object > > and it then calls into pgsigio to post SIGIO which grabs the proc lock. > > > > All that said, we wind up with an ordering that's broken: > > proc -> filedesc -> object -> proc > > > > Here's my solutions: (implicit "fix the problem" with each 'pro') > > > > .) add another mutex to struct proc, this will protect the filedesc reference. > > pros: simple fix works nicely. > > cons: bloating proc with yet another mutex, > > yet another lock aquisition in the exit() path. > > Alternative to this (not that bad when you think about it, and really kind of > like your last solution) is to use one global lock to protect all the p_fd > pointers in the !curproc-read case. I.e. instead of adding a mutex to each > proc, just have a global pfd_mutex lock. Yes, a global mutex in the process fork/exit/exec path. Not sure if I like that, but I could live with it. I had thought of it, but felt guilty proposing it so it must have stuck in the back of my head. > > .) drop the object lock when performing pgsigio > > pros: none. > > cons: possible race conditions added to code paths, it's hard to > > mpsafe something that used to run under Giant if you must > > make due with a lock you can't hold onto all the time, > > this sort of issue will happen with sockets, vnodes and kqueues > > as well. > > If you defer the sending of sigio until after you are done messing with > the object then I think you can safely do it w/o needing the object to be > locked. SIGIO has inherent userland races anyways. This idea is deceptively simple. Look at the pipe code and tell me where this is possible. Sometimes you have to post sigio in a loop where sleeping occurs, so you can't just postpone it until just before you return, you have to drop the object lock. > > .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead > > split it into a routine that close()'s all the files in the first pass > > and return whether or not we were the last accessor to this particular > > filedesc. > > Later in kern_exit() right before moving it to the zombie list, > > while we have the allproc lock, we NULL out the filedesc pointer and free > > it if we were the last accessor. We also need to handle the case > > of exec() unsharing the filedescriptor table, I think we can do this > > the same way at the cost of a possible allproc lock piggybacked or > > placed in the uncommon case of exec with shared descriptor tables. > > I would rather not leave bogus filedesc's lying arond that have zero reference > counts. This is a really bad assesment of this option. Please give it some more thought, it wasn't completetly apparent when I first started coming up with solutions but it's still my favorite. And the zero refcount filedesc is the _only_ problem with it. :) The filedesc is only held onto for a short period after it is unlinked from it's parent proc. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103183819.GC33821>