From owner-freebsd-current  Sat Feb  1 13:30:30 2003
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5581037B401
	for <freebsd-current@freebsd.org>; Sat,  1 Feb 2003 13:30:27 -0800 (PST)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9854C43FB9
	for <freebsd-current@freebsd.org>; Sat,  1 Feb 2003 13:30:25 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0019.cvx22-bradley.dialup.earthlink.net ([209.179.198.19] helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)
	id 18f5DE-0000Bj-00; Sat, 01 Feb 2003 13:30:09 -0800
Message-ID: <3E3C3C0D.15722918@mindspring.com>
Date: Sat, 01 Feb 2003 13:28:45 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Bosko Milekic <bmilekic@unixdaemons.com>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	"Daniel C. Sobral" <dcs@tcoip.com.br>,
	Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG
Subject: Re: Hyperthreading and machdep.cpu_idle_hlt
References: <20030131125804.E1357-100000@femme> <200301311824.h0VIOtmF095380@apollo.backplane.com> <3E3AC33E.9060204@tcoip.com.br> <200301311908.h0VJ8cNZ007396@apollo.backplane.com> <20030131141700.A7526@unixdaemons.com> <200301311952.h0VJqrMB076135@apollo.backplane.com> <20030201100412.B11945@unixdaemons.com> <3E3C327F.FD9E26F7@mindspring.com> <20030201160547.A13169@unixdaemons.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4b19035059b1307a07b676eda2d30ce9a350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Bosko Milekic wrote:
> > >   Or, as I explained in my previous post, only HLT the [virtual] CPU if
> > >   the other [virtual] CPU that is sharing the same execution & cache
> > >   units is not HLT'd itself.  If the other one is HLT'd, then not do the
> > >   HLT.
> >
> > Actually, why is that?  Why would you not want to HLT all the
> > units that are not being used?
> 
>   Because, the comment explains, a halted CPU will not pick up a new
>   thread off the run queue until the next timer tick.  So if all your
>   logical units are idled then you can afford to just loop checking
>   whether something is runnable without interfering with the performance
>   of other threads running on a different logical cpu sharing your
>   execution unit (because the other logical units are idle anyway).
>   That way, you don't have to necessarily wait for the next timer tick
>   to check whether something is runnable, especially if it's made
>   runnable before.  The disadvantage is that you don't really economize
>   on power consumption.

There's an assumption in there of a shared scheduler queue, and a
lack of CPU affinity (or negaffinity, for multiple threads in a
single process), isn't there?

Or are you talking about processes that are ready-to-run as a
result of an event that was handled by another CPU?  It seems to
me that a non-shared queue would need to signal wakeups with IPIs,
which would wake up a HLT'ed processor, which would make it a
non-problem (since there's no way to avoid per-CPU queue locking,
if you don't have an IPI-based mechanism available).


>   The ideal situation would be to have as Matt (and the comment
>   actually) says a cpu mask of idle cpus and generate an IPI to wake up
>   CPUs sitting in HLT when something hits the runqueue, then you can
>   just hlt all of them and rely on the IPI to wake you up, or the next
>   timer tick, whichever comes first and you can really get the best of
>   both worlds.

I think it's more complicated than that; you don't want to have
anything other than the CPU that owns the per CPU run queue doing
anything with it, which means that it's the wakeup event, not the
arrival on the run queue, which needs to be signalled.  Then the
CPU in question has to do it's own processing of pending wakeup
events in order to handle the placing of the process on the run
queue itself, rather than it being handled by another CPU.

This also implies per-CPU wait queues, and a reliable message
delivery mechanism for wakeup messages.

Though it may be enough to simple mark everything on the wait
queue as "wakeup pending", for a first rev., and run the wait
queue, it's probably not a good idea for a production system,
since it brings back the Giant Scheduler Lock for the wait queue
(on the plus side, items awakened could be moved to the head of
the queue when they were marked, with the lock held anyway, and
that would shorten the list of traversed items per CPU to "all
pending wakeup processing", rather than "all queue entries").
But it's still too ugly for words.

I think something like wakeup signalling, as a message abstraction,
is required, in any case, considering support for clustering or NUMA,
going forward, to deal with slower signal paths on a single system
image for much more loosely coupled CPUs.  Directly modifying queues
in memory of other CPUs is unlikely to scale well, if it can even be
made to work at all.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message