Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Apr 2000 21:08:06 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        jgowdy@home.com (Jeremiah Gowdy), smp@csn.net (Steve Passe), jim@thehousleys.net (James Housley), freebsd-smp@FreeBSD.ORG
Subject:   Re: hlt instructions and temperature issues
Message-ID:  <200004282108.OAA01313@usr08.primenet.com>
In-Reply-To: <200004280142.SAA07744@apollo.backplane.com> from "Matthew Dillon" at Apr 27, 2000 06:42:48 PM

next in thread | previous in thread | raw e-mail | index | archive | help
> :In this piece of code:
> :------------------------------------
> :ENTRY(default_halt)
> :sti
> :#ifndef SMP
> :hlt                                     /* XXX:  until a wakeup IPI */
> :#ifdef SMP
> :#ifdef CHEAP_TPR
> :movl    $0, lapic_tpr
> :#else
> :andl    $~APIC_TPR_PRIO, lapic_tpr
> :#endif /** CHEAP_TPR */
> :#endif
> :hlt
> :ret
> 
>     Umm... where'd you get the above code?   This is not the current
>     halt code for 3.x, 4.x, or 5.x.


This was Loqui's patch; in it he suggested replacing (in swtch.s):

	ENTRY(default_halt)
		sti
	#ifndef SMP
		hlt			/* XXX:  until a wakeup IPI */
	#endif
		ret

With:

	ENTRY(default_halt)
		sti
	#ifdef SMP
	#ifdef CHEAP_TPR
		movl    $0, lapic_tpr
	#else
		andl    $~APIC_TPR_PRIO, lapic_tpr
	#endif /** CHEAP_TPR */
	#endif
		hlt
		ret

Some people have (correctly) pointed out that this would slow down
SMP operations, since it reduces halted CPU's to "wake on int".

This is correct.

Some people have also pointed out that the TPR is already 0 when
the "hlt" would have been executed.  I'm not positive about this
in the "just finished handling a fastintr" case.


Others have complained about the "air gap" between the "sti" and
the "hlt".  I think that this is not really an issue, but it's
very easy to rectify this, if it were.  It's clearly not an issue
if the TPR claims are correct, and the new code merely removes
the "#ifdef SMP/#endif" directives.



The comment "until a wakeup IPI" applies to the case:

	when releasing the BGL while leaving the scheduler,
	with a process still on the ready-to-run queue, and
	a CPU that could take it having been halted

..at least in the scheduling code as it currently sits.  So
it's pretty trivial to fix the "slows to a crawl" problem, and
for the person with the 8 processor system to verify that it
is fixed for us (having seen the "slows to a crawl" problem, in
person).


The comment about the TPR level for the lock holder vs. the
"hlt"'ed processor is a valid point.

I think that there is, however, on an NCPU > 2 machine, a new
"thundering herd" problem, if all halted CPU's have a TPR of
0, and the IPI is a broadcast IPI that wakes them all.

I would be very tempted to have broadcast IPIs of a high level,
with the lock holder at a higher level (2 * NCPU + 1), and an
unblocked processor at yet a higher level, and then an entry
count for the TPR for processors, as they "get in line" for the
"hlt".

Then you could IPI with the min of the number of processes
waiting in the ready-to-run state plus the number of processors.
Each CPU would subtract the IPI level from their TPR, and, if
zero or less, "go live" on one of the ready-to-run processes
after setting the highest ("running") TPR.  Otherwise, the CPU
would decrement their TPR by the remainder, and go back to
sleep.

This would provide a generic "wakeone/waken/wakeup" mechanism,
which should be the most efficient for a single, system wide
ready to run queue.  We don't care that we wakeup the CPUs
with no work to do and send them right back to sleep, since
they weren't doing anything valuable anyway.

This scheme would not provide completely optimal "hlt"-ness,
but it would provide the largest amount of "hlt"-ness which
would not unduly slow the system relative to no "hlt" at all.

It seems to me the best trade-off between running temperature,
vs. the optimal amount of work you can squeeze out of the
system.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200004282108.OAA01313>