Date: Thu, 2 Jan 2003 19:08:05 -0800 From: Alfred Perlstein <bright@mu.org> To: arch@freebsd.org Cc: smp@freebsd.org, jhb@freebsd.org, dillon@freebsd.org, jake@freebsd.org, tanimura@freebsd.org, alc@freebsd.org Subject: Need help fixing lock ordering with filedesc, proc and pipe. Message-ID: <20030103030805.GS26140@elvis.mu.org>
next in thread | raw e-mail | index | archive | help
[please keep me on cc list, I'm not subscribed to most lists] So far I have several fixes, for the lock order reversal with pipes and sigio. I'm not happy with most of them and the one I am happy with I came up with while writing this email, so please pick on it the most. :) Anyhow, I'd really appreciate feedback from those that understand lock order and smp issues. :) I'm going to outline and explain them here and hopefully someone will come up with an alternate or convince me that one of them is right. First a refresher/explanation. Each proc has a filedesc that holds the process's file descriptors. This structure needs locking because other threads within the process or other processes (via rfork()) may want to manipulate the array of file pointers and needs exclusive access to do so safely. Each proc has a lock, this lock covers many, many, many fields of the proc struct including the signal state. Each pipe (or any other object behind a 'struct file') has a lock, this lock is user to maintain consitancy over the object itself, examples: bytes to be transfered, how many readers/writers, etc. From now on I will refer to pipe as 'object'. The problem is that we have a lock order reversal because of these three transitions. Thanks to John Baldwin and his Witness subsystem it was initially made apparent and finally tracked down. In a sysctl we have: allproc -> proc -> filedesc This access is done in a sideways manner, meaning that another proc is examining the filedesc of a running process. The proc lock is grabbed because there is a mistaken impression that this is the safe way to grab a filedesc in a 'sideways' manner. (actually if you look at the code you'll realize that this is a mistaken assumption and is unsafe as fdfree() does not actually lock the proc lock when releasing the filedesc structure. (oops!)) In select we have: filedesc -> object This access is done as an efficiency mechanism, if we do not hold the filedesc locked while calling the fo_poll() routine on each object we must perform an order of magnitude more mutex operations per file being poll()'d/select()'d on. We would have to allocate a temporary array to hold all the file pointers, atomically bump and release thier reference counts each time we looped through the file scan. In each object's write/read routine we have: object -> proc This is because traditionally wakeup()/psignal() didn't block the process, so we had "mutual exclusion" or "Giant" theoretically protecting all of our data. The lock ordering here is done whenever someone requests SIGIO when data arrives on an object. Data arrives which locks the object and it then calls into pgsigio to post SIGIO which grabs the proc lock. All that said, we wind up with an ordering that's broken: proc -> filedesc -> object -> proc Here's my solutions: (implicit "fix the problem" with each 'pro') .) add another mutex to struct proc, this will protect the filedesc reference. pros: simple fix works nicely. cons: bloating proc with yet another mutex, yet another lock aquisition in the exit() path. .) don't hold the filedesc lock over the fo_poll routines in select()/poll() pros: fo_poll routines can now block, our routine is somewhat closer to linux's implementation so we have a reference. cons: fo_poll routines should not block, expensive mutex operations at least 4 lock/unlock and 2 atomic ops per file, extra malloc in the select/poll codepath, extra complexity, our routine is somewhat closer to linux's implementation. :) .) drop the object lock when performing pgsigio pros: none. cons: possible race conditions added to code paths, it's hard to mpsafe something that used to run under Giant if you must make due with a lock you can't hold onto all the time, this sort of issue will happen with sockets, vnodes and kqueues as well. .) don't allow fdfree() to NULL out the filedesc in the proc struct, instead split it into a routine that close()'s all the files in the first pass and return whether or not we were the last accessor to this particular filedesc. Later in kern_exit() right before moving it to the zombie list, while we have the allproc lock, we NULL out the filedesc pointer and free it if we were the last accessor. We also need to handle the case of exec() unsharing the filedescriptor table, I think we can do this the same way at the cost of a possible allproc lock piggybacked or placed in the uncommon case of exec with shared descriptor tables. pros: should work, minimal change, we pigggyback the locking operations on existing ones (allproc). cons: fdfree now needs to lock the filedesc while closing each file, not such a big deal. allproc needed whenever sideways access to a single proc's filedesc is required, allproc must be held whenever performing any sideways filedesc operation I really like the last solution (piggyback on allproc). But since this _will_ dictate how we continue to do locking in the future I'd like to open it for discussion. Of course alternate solutions would be nice to hear. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030103030805.GS26140>