Date: Wed, 16 Dec 1998 19:33:55 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: hackers@FreeBSD.ORG Subject: Async blocking on temporary failures Message-ID: <199812170333.TAA83886@apollo.backplane.com>
next in thread | raw e-mail | index | archive | help
This is a general idea I'm throwing out that I believe could be integrated into FreeBSD pretty easily. It is the notion of not blocking on 'trivial' conditions deep, deep in the kernel, but instead flagging the task for an asynchronous wait and returning NULL. This gives us the ability to pop a temporary failure (typically NULL) to a higher level routine. The higher level routine can then decide what to do: i.e. block on the asynchronous wait queued by the subroutine or pop itself out to yet a higher level routine and let it deal with it, or do something else. An asynchronous wait capability would allow a deep, low level routine to propogate the blocking condition up through multiple procedural levels undoing any temporary locks made by those procedures before the blocking condition is actually acted upon. Why do we need this? Well, we have several serious deadlock problems within the kernel and these problems are only going to get worse with SMP as master locks are propogated inward. By my reading of the kernel, most of these deadlock situations occur when something deep within the kernel finds it necessary to block on some temporary situation such as trying to allocate memory or a buffer or something like that. Most of these blocking situations already incorporate hysterisis, which means that we *can* 'abort' the routine by initiating the async wait, returning NULL instead of blocking, and allowing some higher level procedure to determine when to actually block. An asynchronous wait capability thus allows a process to block without holding major locks (spinlocks, bp locks, vm_page locks, vnode locks, etc etc etc). I use asynchronous waits in some of my other OS projects and I have found them to be invaluable in their ability to avoid deadlocks and to even greatly simplify code. Here's an example to illustrate the idea: The (FS)->bread()->getblk()->allocbuf() chain would benefit greatly from such a mechanism. Assuming an async wait capability exists, allocbuf() could be adjusted such that it never blocks but instead returns 0 if an async wait occurs, allowing the chain to 'undo' itself back through getblk() and then have the blocking condition actually occur in the bread(). The mechanism could then eventually be extended on up past the bread() and be directly supported by FS code and thus avoid holding locks on (for example) vnodes due to a synchronous I/O request, which would massively increase parallelism on simultanious VFS/VNODE ops to the same descriptor (I'm thinking of mmap page faults specifically but it applies to any lseek()/read() combo). How it would work: Instead of tsleep()ing on a structure, we call asleep() instead and return a temporary failure. For example, a routine that allocates or returns a bp would call asleep() and return NULL rather then tsleep(), retry internally, and eventually return a valid bp. The higher level parent procedure can either propogate the failure up by undoing whatever locks it had and returning a condition (note: without calling asleep()), and eventually you get to a parent procedure which decides it must block waiting for the temporary failure to clear, then retry the call that failed. This routine blocks by calling await(). Now, asleep() and await() do not nest. There is a single embedded asyncwait structure in the struct process. An asleep() call *replaces* any previous async sleep. await() blocks the process on whatever the most recent asyncwait structure was. A wakeup on the associated address clears any queued asyncwait's. In the case where the async wait address is woken up prior to await() being called, the async wait structure is cleared by the wakeup and await() becomes a NOP. The async wait can also be cleared by calling asleep(NULL). Thus, *ALL* potential race conditions can be handled without any fancy coding. How to deal with race conditions: There are two ways to deal with potential race conditions. The traditional way is to call splbio() or equivalent to prevent other processes from waking up the object you are about to sleep on. You can still do this with asleep(). asleep() gives us another option: Call asleep() BEFORE testing the condition in the structure being waited on. Then test the condition and if you determine that you do not need to block, call asleep(NULL) to clear the async wait and continue as if nothing had happened. Specifically: /* * Block waiting for blah */ if (structure->flags & somecondition) { asleep(structure, ...); if (structure->flags & somecondition) { return failure.... } asleep(NULL, ...); } I invite discussion on this feature. I would be pleased to develop it for FreeBSD. I think it would be extremely useful, especially with SMP but also with non-SMP kernels in regards to avoiding deadlock situations in the kernel. I believe that the feature could be implemented easily and folded into major subsystems incrementally, 'fixing' the kernel from the inside out without having to make wholesale changes all in one shot. -Matt Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. <dillon@backplane.com> (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812170333.TAA83886>