Date: Sat, 25 Oct 2003 21:37:28 -0400 (EDT) From: Robert Watson <rwatson@freebsd.org> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Marcel Moolenaar <marcel@xcllnt.net> Subject: Re: FreeBSD mail list etiquette Message-ID: <Pine.NEB.3.96L.1031025211002.83249C-100000@fledge.watson.org> In-Reply-To: <200310252213.h9PMDCHq032546@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 25 Oct 2003, Matthew Dillon wrote: > It's a lot easier lockup path then the direction 5.x is going, and > a whole lot more maintainable IMHO because most of the coding doesn't > have to worry about mutexes or LORs or anything like that. You still have to be pretty careful, though, with relying on implicit synchronization, because while it works well deep in a subsystem, it can break down on subsystem boundaries. One of the challenges I've been bumping into recently when working with Darwin has been the split between their Giant kernel lock, and their network lock. To give a high level summary of the architecture, basically they have two Funnels, which behave similarly to the Giant lock in -STABLE/-CURRENT: when you block, the lock is released, allowing other threads to enter the kernel, and regained when the thread starts to execute again. They then have fine-grained locking for the Mach-derived components, such as memory allocation, VM, et al. Deep in a particular subsystem -- say, the network stack, all works fine. The problem is at the boundaries, where structures are shared between multiple compartments. I.e., process credentials are referenced by both "halves" of the Darwin BSD kernel code, and are insufficiently protected in the current implementation (they have a write lock, but no read lock, so it looks like it should be possible to get stale references with pointers accessed in a read form under two different locks). Similarly, there's the potential for serious problems at the surprisingly frequently occuring boundaries between the network subsystem and remainder of the kernel: file descriptor related code, fifos, BPF, et al. By making use of two large subsystem locks, they do simplify locking inside the subsystem, but it's based on a web of implicit assumptions and boundary synchronization that carries most of the risks of explicit locking. It's also worth noting that there have been some serious bugs associated with a lack of explicit synchronization in the non-concurrent kernel model used in RELENG_4 (and a host of other early UNIX systems relying on a single kernel lock). These have to do with unexpected blocking deep in a function call stack, where it's not anticipated by a developer writing source code higher in the stack, resulting in race conditions. In the past, there have been a number of exploitable security vulnerabilities due to races opened up in low memory conditions, during paging, etc. One solution I was exploring was using the compiler to help track the potential for functions to block, similar to the const qualifier, combined with blocking/non-blocking assertions evaluated at compile-time. However, some of our current APIs (M_NOWAIT, M_WAITOK, et al) make that approach somewhat difficult to apply, and would have to be revised to use a compiler solution. These potential weaknesses very much exist in an explicit model, but with explicit locking, we have a clearer notion of how to express assertions. In -CURRENT, we make use of thread-based serialization in a number of places to avoid explicit synchronization costs (such as in GEOM for processing work queues), and we should make more use of this practice. I'm particularly interested in the use of interface interrupt threads performing direct dispatch as a means to maintain interface ordering of packets coming in network interfaces while allowing parallelism in network processing (you'll find this in use in Sam's netperf branch currently). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1031025211002.83249C-100000>