Date: Wed, 16 Dec 1998 19:33:55 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: hackers@FreeBSD.ORG Subject: Async blocking on temporary failures Message-ID: <199812170333.TAA83886@apollo.backplane.com>
next in thread | raw e-mail | index | archive | help
This is a general idea I'm throwing out that I believe could be integrated
into FreeBSD pretty easily.
It is the notion of not blocking on 'trivial' conditions deep, deep in
the kernel, but instead flagging the task for an asynchronous wait and
returning NULL. This gives us the ability to pop a temporary failure
(typically NULL) to a higher level routine. The higher level routine
can then decide what to do: i.e. block on the asynchronous wait queued
by the subroutine or pop itself out to yet a higher level routine and let
it deal with it, or do something else.
An asynchronous wait capability would allow a deep, low level routine
to propogate the blocking condition up through multiple procedural levels
undoing any temporary locks made by those procedures before the blocking
condition is actually acted upon.
Why do we need this? Well, we have several serious deadlock problems
within the kernel and these problems are only going to get worse with SMP
as master locks are propogated inward. By my reading of the kernel,
most of these deadlock situations occur when something deep within the
kernel finds it necessary to block on some temporary situation such as
trying to allocate memory or a buffer or something like that. Most of
these blocking situations already incorporate hysterisis, which means
that we *can* 'abort' the routine by initiating the async wait, returning
NULL instead of blocking, and allowing some higher level procedure to
determine when to actually block.
An asynchronous wait capability thus allows a process to block without
holding major locks (spinlocks, bp locks, vm_page locks, vnode locks,
etc etc etc).
I use asynchronous waits in some of my other OS projects and I have
found them to be invaluable in their ability to avoid deadlocks and
to even greatly simplify code.
Here's an example to illustrate the idea:
The (FS)->bread()->getblk()->allocbuf() chain would benefit greatly
from such a mechanism. Assuming an async wait capability exists,
allocbuf() could be adjusted such that it never blocks but instead
returns 0 if an async wait occurs, allowing the chain to 'undo'
itself back through getblk() and then have the blocking condition
actually occur in the bread(). The mechanism could then eventually
be extended on up past the bread() and be directly supported by
FS code and thus avoid holding locks on (for example) vnodes due
to a synchronous I/O request, which would massively increase
parallelism on simultanious VFS/VNODE ops to the same descriptor
(I'm thinking of mmap page faults specifically but it applies to
any lseek()/read() combo).
How it would work:
Instead of tsleep()ing on a structure, we call asleep() instead and
return a temporary failure. For example, a routine that allocates or
returns a bp would call asleep() and return NULL rather then tsleep(),
retry internally, and eventually return a valid bp.
The higher level parent procedure can either propogate the failure up
by undoing whatever locks it had and returning a condition (note:
without calling asleep()), and eventually you get to a parent procedure
which decides it must block waiting for the temporary failure to
clear, then retry the call that failed. This routine blocks by
calling await().
Now, asleep() and await() do not nest. There is a single embedded
asyncwait structure in the struct process. An asleep() call
*replaces* any previous async sleep. await() blocks the process
on whatever the most recent asyncwait structure was. A wakeup on
the associated address clears any queued asyncwait's.
In the case where the async wait address is woken up prior to await()
being called, the async wait structure is cleared by the wakeup and
await() becomes a NOP. The async wait can also be cleared by calling
asleep(NULL). Thus, *ALL* potential race conditions can be handled
without any fancy coding.
How to deal with race conditions:
There are two ways to deal with potential race conditions. The
traditional way is to call splbio() or equivalent to prevent other
processes from waking up the object you are about to sleep on.
You can still do this with asleep().
asleep() gives us another option: Call asleep() BEFORE testing the
condition in the structure being waited on. Then test the condition
and if you determine that you do not need to block, call asleep(NULL)
to clear the async wait and continue as if nothing had happened.
Specifically:
/*
* Block waiting for blah
*/
if (structure->flags & somecondition) {
asleep(structure, ...);
if (structure->flags & somecondition) {
return failure....
}
asleep(NULL, ...);
}
I invite discussion on this feature. I would be pleased to develop it
for FreeBSD. I think it would be extremely useful, especially with SMP
but also with non-SMP kernels in regards to avoiding deadlock situations
in the kernel. I believe that the feature could be implemented easily
and folded into major subsystems incrementally, 'fixing' the kernel from
the inside out without having to make wholesale changes all in one shot.
-Matt
Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet
Communications & God knows what else.
<dillon@backplane.com> (Please include original email in any response)
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812170333.TAA83886>
