Date: Tue, 29 Jul 2003 00:04:50 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Robert Watson <rwatson@FreeBSD.org> Cc: Kris Kennaway <kris@obsecurity.org> Subject: Re: LOR with filedesc structure and Giant Message-ID: <3F261C92.2FB1AFCD@mindspring.com> References: <Pine.NEB.3.96L.1030728110437.57321A-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > On Sun, 27 Jul 2003, Kris Kennaway wrote: > > After upgrading last night, one of the package machines found this: > > I've bumped into some similar problems -- it's a property of how we > current lock select(). We hold the file descriptor lock for the duration > of polling each object being "selected", and if any of those objects has > to grab a lock for any reason, it has to implicitly fall after the file > descriptor lock. I actually run into this in some of our MAC code, > because I need to grab a vnode lock to authorize polling the vnode using > VOP_POLL(), and since the vnode lock is a sleep lock, this generates a > WITNESS warning. Unfortunately, it's not immediately clear what a better > locking scheme would look like without going overboard on the fine-grained > side. We probably need to grab Giant before entering the select code > since it's highly likely something in there will require Giant -- it > reaches down into VFS, the device stuff, socket code, tc. This is basically an instance of the race I warned against being an impending danger to FreeBSD last week, now that we have the possibility of multiple application threads in the kernel simutaneously. The easiest fix is to take a reference on the descriptors in the select itself around the calls to selscan(). This is annoying, because it's moderately expensive. The problem is that you have to avoid false positives, and you could sleep as a result of some of the underylying implementation stuff. Effectively, this means that you have to allocate a copy of the descriptor table to hold a reference over the whole thing, so that a close of the descriptors out from under you doesn't cause you to panic after coming back. The tricky part is that you have to change the semantics of the selscan() (this part is ugly), which you do by verifying that the descriptor you are using is the same one you were using. You have to do this because there's no way to safely avoid the race of the first and second selscan, where someone closes a descriptor out from under you, and then opens another one in its place in a separate thread in the same application. The way Solaris deals with this issue is that all sleeps are at the same level, so what it can do is force the call to fail out (basically, a cancellation point) for outstanding read() on an fd in one thread, with the descriptor being closed in another. It turns out that parts of the Java netwoking implementation in the full-on Java implementation actually *depends* on this behaviour, so we will have to bite the bullet on it eventually, or rewrite that code when/if it makes it to FreeBSD. The only other things I've thought of that could let you fix this is to be Evil(tm) and fail the close() with the only error that would cause the application to retry it later -- EINTR. This is truly evil, since it's likely the application doesn't check the return value from close, and even if it did, an EINTR would likely cause it to try again immediately. The only other semi-sane approach is Too Evil For Words(tm), which would be to delay the return on the close until the reference count goes from 2->1; the reason this is Too Evil For Words(tm) is this: what do you do when one thread is blocked in the close(), and one is blocked in a read(), and a third thread *also* calls close()? You'd pretty much be guaranteeing that applications that assume the Solaris semantics will all break (arguably, they are already broken, with one thread closing files out from under another, but they appear to function. I guess there's always "psignal(p,SIGABRT);" 8-) 8-)). I don't think FreeBSD could really implement the Solaris approach, without a lot of code needing to be rewritten to support the idea of cancellation of blocked operations pending completion (basically, a wake-up-any-slept-on-this-fd-process). Maybe Dragonfly BSD will be able to solve this elegantly, but I really doubt it's going to be able to do it. You might also be able to kludge up an "internal signal", but the slept thread is probably not sleeping on e.g. the descriptor address (maybe it's slept in a malloc, etc.), so I don't know how you'd know that it had a reference to the thing your trying to take away from it without a rendevous. Maybe it's time to give each blocking context a kqueue or something equally gross. 8-(. -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F261C92.2FB1AFCD>