Date: Thu, 19 Sep 1996 12:24:31 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: gwr@mc.com (Gordon W. Ross) Cc: michaelh@cet.co.jp, freebsd-hackers@FreeBSD.org, tech-kern@netbsd.org Subject: Re: spl models and smp (was Re: Some interesting papers on BSD ...) Message-ID: <199609191924.MAA01259@phaeton.artisoft.com> In-Reply-To: <9609191326.AA03526@bach> from "Gordon W. Ross" at Sep 19, 96 09:26:42 am
next in thread | previous in thread | raw e-mail | index | archive | help
[ ... SVR4/MP mutex/spl interaction ... ] > Note that you MUST hold a mutex lock on some object that has both > the mutex and a condition variable, adn the cv_timedwait_sig() > does an atomic "block and release the mutex" while making you > non-runnable, and later does an atomic "resume and take mutex." > Interesting scheme, eh? The one problem with this is that condition variables, as required, must be synchronized across all processors in a shared area mapped into the kernel adress space of all of them. This seems a big hit on concurrency, to me. I prefer a design which would include the ability to transparently localize a mutex/condition-avliable based on the resource locality, to a single CPU. This implies that the mutex allocation is hierarchical, in the same way that all processes are descendents of "init" (a "group leader" is equivalent to a CPU locality for this analogy). This allows the use of mutexes and condition variables which will only invoke bus arbitration *as necessary*. The key is the ability to predict deadlock conditions. You can do this by computing the transitive closure over the hierarchy as if it were a directed acyclic graph. There is code in both "tsort" and "gprof" that gives an example of how this works in actual practice. Because it is a hierarchy, this means you can "inherit to root" hints to allow the computation to be more quickly achieved (or-and-xor-and in the simplest case). The main enabling technology for making hints so inexpensive is the use of lock intention modes, so that once intention is established, a resync of the shared objects is not required to prevent conflicting requests. Clearly, we want to compromise and not propagate inheritance of hints over the boundry to the common system instead of per-CPU objects. This trades propagation bus overhead for run time overhead. In effect, the "expensive" object references become slightly more expensive because the propagation is not interleaved. In trade, we get get vastly increased access concurrency without bus arbitration for "inexpensive" (local to a single CPU context) objects. It is then incumbent upon us to design using as few critical path "expensive" objects as possible. For instance, we should use the Dynix (Sequent) per CPU page pool design for local VM allocations, and only go to a share (bus arbitrating) mutex to refill the local page pool for a CPU from the global page pool, when we hit a high water mark (or conversly, return pages to the global pool only when we hit a low water mark). Unlike file system reentrancy, Sequent did this right: it is intuitively scalable to N processors (it turns out that Unisys has/had the best -- IMO -- FS reentrancy mechanisms). Other issues, such as per FS reentrancy for FS's in a transition kernel, can be handled by allocating an expensive global mutex for the VFS subsystems (and one for each other kernel subsystem, to achieve a per-subsystem granularity). At that point, it's possible to "push down" the interfaces and kernel reentrancy trough the trap, exception, and interrupt code, to gradually increase concurrency. It also allows import of "foreign" file systems, drivers, and other components by causing them to use the global mutex until such time as they can be made "safely kernel reentrant" and "safely kernel context reentrant", for kernel multithreading and SMP reeentrancy, respectively. I *highly* recommend "UNIX For Modern Architectures"; it is basically a handbook on how to build SMP UNIX systems. SMP should be the target, since the context handling necessary for SMP buys you the ability to reenter on a single CPU context (for kernel multithreading) for free. It also buys you the ability to support kernel preemption (because of the multithreading contexts), which is something that is necessary to support true RealTime scheduling algorithms (like deadlining) and related issues (like priority lending or RT event-based preemption). So this discussion probably belongs on the SMP list... Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609191924.MAA01259>
