Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Sep 2002 09:41:30 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        smp@freebsd.org
Subject:   Re: wakeup handling on SMP boxes
Message-ID:  <3D7F723A.7B49E8E0@mindspring.com>
References:  <20020911083854.A88921@iguana.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote:
> I have a question about the handling of wakeup on SMP machine.
> (I am looking at RELENG_4, but i believe the same thing happens
> on CURRENT with threads instead of processes).
> 
> Imagine the following situation -- all CPUs are running cpu-bound
> processes (say, all with the same priority), and at some point a
> wakeup() is invoked which awakes one or more sleeping processes.
> 
> My understanding of the behaviour is that:
> 
>   + the processor handling the wakeup will suspend the
>     curproc and, eventually, invoke need_resched();
> 
>   + on this same processor, the priority of the newly awaken process
>     is compared with the one of the suspended process;
> 
>   + if the comparison succeeds, the suspended process is preempted
>     and the new one runs; otherwise, the new process will have a
>     chance at the next voluntary descheduling or roundrobin();
> 
> Am I correct ?
> This seems to suggest that the priority ordering might be violated
> for as much as kern.quantum, after which the roundrobin() and
> forward_roundrobin() will do the right thing.

Yes, this is correct.  This is an implicit problem with a pull-based
CPU migration model for processes, along with requiring locking in
the scheduler in the common code path.

The only way to get around this is to convert to a push-based CPU
migration model, instead.

There are still going to be introduced latency issues for the
migrated process in that case, as well, if the average process'
CPU utilization is highly variant between CPUs by more than 50%
of a quantum difference for the mean value.  This can be mitigated
statistically by providing an artificial per CPU figure of merit
boost following a migration, to reduce the probability of it
becoming the common target of additional migrations, until it does
a figure of merit recalculation at the next rescheduling event.


> The only reason why this more or less works in practice is that the
> sleeping process likely has raised its priority in the tsleep()
> call, so it will preempt the process running on the processor
> handling the wakeup(). On the other hand, there is no guarantee
> that this process is the one with the lowest priority among those
> currently running.

In practice, where you have N+1 processes, on average, in a ready
to run state, and you have N CPUs, there is still a likelihood of
starvation occurring.  The same likelihood is present at every
modulus of the number of ready-to-run processes-plus-one, whose
remainder is zero.

It "works" on average because no one really notices the stalls,
in practice, because of the relative ability of humans to notice
things at 100ms granularity.  8-).


> I guess to fix this one would need to determine if one of the
> processes needs to be kicked out and replaced with the new one,
> by invoking an  Xcpuast IPI on the specific processor.
> 
> Any reason why this is not done ? Is the call too expensive so
> one prefers to tolerate the temporary inconsistency ?

This would not fix the problem, because you would introduce stalls
by triggering the IPI.  The only effect this would have is to
average out the pain, so that it's less noticible, rather than
actually getting rid of it.

I rather expect that this isn't done because optimizing the
current code, when there is a publically stated architectural
intent to replace it, is kind of a waste of time.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D7F723A.7B49E8E0>