Date: Wed, 5 Jul 2000 23:00:49 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: cp@bsdi.com (Chuck Paterson) Cc: grog@lemis.com (Greg Lehey), eischen@vigrid.com (Daniel Eischen), jasone@canonware.com (Jason Evans), luoqi@watermarkgroup.com (Luoqi Chen), smp@FreeBSD.ORG Subject: Re: SMP meeting summary Message-ID: <200007052300.QAA26840@usr05.primenet.com> In-Reply-To: <200007040218.UAA01169@berserker.bsdi.com> from "Chuck Paterson" at Jul 03, 2000 08:18:00 PM
next in thread | previous in thread | raw e-mail | index | archive | help
> }I'm not sure we're talking about the same thing, but if so I must be > }missing something. If I'm waiting on a mutex, I still need to > }reacquire it on wakeup, don't I? In that case, only the first process > }to be scheduled will actually get the mutex, and the others will block > }again. > > Yes, you need to acquire the mutex on wakeup, but likely > one process will run acquiring and releasing the mutex in an > uncontested fashion before other processes run and do the same > thing. You can assume in SVR4, in the wake_one case, that you will be the only process awake, and so your acquisition will not be contested, and will not result in a sleep. Logically, you can consider that there is one waiter and N-1 sleepers for every N processses trying to acquire a mutex. This is normally handled [in the literature] by using a hybrid lock in a hierarchy. That is, you attempt a fast lock, and if that fails, then you attempt a slow ("sleeping") lock. You are guaranteed a wakeup on release of a fast lock, and on release of a sleeping lock, so it's sixes, Of course, it's a lot easier to just critical section. > }In my experience, I've seen mutexes used for long-term waits, and I > }don't see any a priori reason not to do so. Of course, if we make > }design decisions based on the assumption that all waits will be short, > }then we will have a reason, but it won't be a good one. > } > }Before you say that long-term waits are evil, note that we're probably > }talking about different kinds of waits. Obviously anything that > }threatens to keep the system idle while it waits is bad, but a > }replacement for tsleep(), say, can justifiably wait for a long time. > > A replacement for tsleep is not a mutex, but in Solaris > parlance a conditional variable. The uses are different, one is > for locking a resource, the other is waiting on a synch event. A > conditional variable, like the sleep queues has a mutex associated > with it. This mutex is not held except while processing the event, > both by the process waiting and the process doing the activation. > I don't think it is a good idea to assume that the heuristics for > waking up tsleep / conditional variables is going to be > anything like those seen with mutexs. Effectively, condition variables are critical sectioned in their manipulation through the use of a mutex. In practice, there are some ugly areas in the Solaris SMP reentrant VFS code that necessitate trating the cond variable as if it were a mutext on a larger structure. This reduces concurrency considerably. The main point about wake_one that's problematic is the deadly embrace deadlock, not the priority inversion deadlock, which can always be "opted out of" by lending (or making the wake_one more choosy about who it wakes, above and beyond the head of the wait queue). The thing that makes a thundering herd expensive is less the herd than it is the traversal of the list; think about it: if I have the cycles to burn in the scheduler to pick someone to run, then I wasn't doing important other work anyway, and I might as well burn them in the herd, as opposed to other places I could burn them. A spinlock fixes this by implementing back-off + retry, at least for sets of two locks. Sets of more locks are really problematic. A lot of work was done in SVR4 ES/MP to, effectively, resolve the problem using Djikstra's "Banker's Algorithm" (that is, all the resources for sets of greater than two members, and in some cases, one member -- usually parent directory in a descending path lookup -- are allocated "up front", which is to say "at the same stack depth/in the same function" to permit state to be backed out easily in the case of a deadlock detection). This stuff is really unsatisfying from the point of view of someone trying to write a reentrant ("kernel thread safe" or "kernel preemption safe") VFS provider of some kind, since it's really hard to know when the semantics applied by an upper level function might result in a problem. Other subsystems have similar issues, but most of my experience was with VFS providers, so I can't give you the PCMCIA device attach issues in SVR4 (maybe we can track down Kurt Mahon, though). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200007052300.QAA26840>