Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Sep 2002 05:18:34 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        smp@FreeBSD.ORG
Subject:   Re: wakeup handling on SMP boxes
Message-ID:  <20020912041647.U3346-100000@gamplex.bde.org>
In-Reply-To: <20020911083854.A88921@iguana.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 11 Sep 2002, Luigi Rizzo wrote:

> I have a question about the handling of wakeup on SMP machine.
> ...
> My understanding of the behaviour is that:
>
>   + the processor handling the wakeup will suspend the
>     curproc and, eventually, invoke need_resched();

The suspension is only logical.  Physically, the current process just
runs wakeup() to completion.  Any context switching before wakeup()
returns is accidental in -current and a bug in RELENG_4.

>   + on this same processor, the priority of the newly awaken process
>     is compared with the one of the suspended process;
>
>   + if the comparison succeeds, the suspended process is preempted
>     and the new one runs; otherwise, the new process will have a
>     chance at the next voluntary descheduling or roundrobin();

If the newly awaken process has a higher priority than the current process,
then the current process's rescheduling flag is set and the current
process is switched away from some time later, often (typically I think)
not until it returns to use mode.  In -current, a switch may also occur
as a side affect of locking or in response to an interrupt.  In RELENG_4,
it is fairly fundamental that processes in the kernel don't get preempted.

> Am I correct ?
> This seems to suggest that the priority ordering might be violated
> for as much as kern.quantum, after which the roundrobin() and
> forward_roundrobin() will do the right thing.

Only if the process stays in the kernel that long.  The quantum is not
very relevant here, since it doesn't affect processes in the kernel
(until they leave the kernel).  The "hogticks" hack that I added to
uiomove() works around this in some cases.  It is possible for a
process to spend a long time in uiomove() (e.g., 1 second for reading
10-20MB from /dev/zero on a slow 486).

Things are better in -current.  Some interrupts cause context switching
(fairly soon if not immediately), and after the interrupt is handled
the kernel just runs the highest priority running process in a round
robin fashion (instead of the previous one; it really should run the
previous process again if it has equally highest priority and its
quantum has not expired, since not doing so bogotifies the quantum).
Blocking on locks also causes context switching.

> The only reason why this more or less works in practice is that the
> sleeping process likely has raised its priority in the tsleep()
> call, so it will preempt the process running on the processor
> handling the wakeup(). On the other hand, there is no guarantee
> that this process is the one with the lowest priority among those
> currently running.

Processes are supposed to sleep at a high (or suitable) priority so that
they run soon (enough) after they wake up.  This works OK except for the
above problem and the bugfeature that processes retain their high sleep
priority for a long time after they wake up (typically all the way back
to user mode!).

> I guess to fix this one would need to determine if one of the
> processes needs to be kicked out and replaced with the new one,
> by invoking an  Xcpuast IPI on the specific processor.
>
> Any reason why this is not done ? Is the call too expensive so
> one prefers to tolerate the temporary inconsistency ?

I think it would be too expensive to switch contexts on every significant
priority change even under !SMP.  Rescheduling under SMP now works much
the same under SMP as under !SMP -- a flag is set but doesn't cause the
current process to give up control until it reaches the user boundary.
But there is a difference if the process is in user mode when the wakeup
occurs.  This can only happen in the SMP case.  An Xcpuast IPI would work
like any other interrupt for kicking the process into kernel mode so that
it checks the flag on the way back.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020912041647.U3346-100000>