Date: Wed, 11 Sep 2002 09:41:30 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Luigi Rizzo <rizzo@icir.org> Cc: smp@freebsd.org Subject: Re: wakeup handling on SMP boxes Message-ID: <3D7F723A.7B49E8E0@mindspring.com> References: <20020911083854.A88921@iguana.icir.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote: > I have a question about the handling of wakeup on SMP machine. > (I am looking at RELENG_4, but i believe the same thing happens > on CURRENT with threads instead of processes). > > Imagine the following situation -- all CPUs are running cpu-bound > processes (say, all with the same priority), and at some point a > wakeup() is invoked which awakes one or more sleeping processes. > > My understanding of the behaviour is that: > > + the processor handling the wakeup will suspend the > curproc and, eventually, invoke need_resched(); > > + on this same processor, the priority of the newly awaken process > is compared with the one of the suspended process; > > + if the comparison succeeds, the suspended process is preempted > and the new one runs; otherwise, the new process will have a > chance at the next voluntary descheduling or roundrobin(); > > Am I correct ? > This seems to suggest that the priority ordering might be violated > for as much as kern.quantum, after which the roundrobin() and > forward_roundrobin() will do the right thing. Yes, this is correct. This is an implicit problem with a pull-based CPU migration model for processes, along with requiring locking in the scheduler in the common code path. The only way to get around this is to convert to a push-based CPU migration model, instead. There are still going to be introduced latency issues for the migrated process in that case, as well, if the average process' CPU utilization is highly variant between CPUs by more than 50% of a quantum difference for the mean value. This can be mitigated statistically by providing an artificial per CPU figure of merit boost following a migration, to reduce the probability of it becoming the common target of additional migrations, until it does a figure of merit recalculation at the next rescheduling event. > The only reason why this more or less works in practice is that the > sleeping process likely has raised its priority in the tsleep() > call, so it will preempt the process running on the processor > handling the wakeup(). On the other hand, there is no guarantee > that this process is the one with the lowest priority among those > currently running. In practice, where you have N+1 processes, on average, in a ready to run state, and you have N CPUs, there is still a likelihood of starvation occurring. The same likelihood is present at every modulus of the number of ready-to-run processes-plus-one, whose remainder is zero. It "works" on average because no one really notices the stalls, in practice, because of the relative ability of humans to notice things at 100ms granularity. 8-). > I guess to fix this one would need to determine if one of the > processes needs to be kicked out and replaced with the new one, > by invoking an Xcpuast IPI on the specific processor. > > Any reason why this is not done ? Is the call too expensive so > one prefers to tolerate the temporary inconsistency ? This would not fix the problem, because you would introduce stalls by triggering the IPI. The only effect this would have is to average out the pain, so that it's less noticible, rather than actually getting rid of it. I rather expect that this isn't done because optimizing the current code, when there is a publically stated architectural intent to replace it, is kind of a waste of time. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D7F723A.7B49E8E0>