Date: Sat, 1 Feb 2003 13:57:50 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: Bosko Milekic <bmilekic@unixdaemons.com>, "Daniel C. Sobral" <dcs@tcoip.com.br>, Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG Subject: Re: Hyperthreading and machdep.cpu_idle_hlt Message-ID: <200302012157.h11Lvo7f017280@apollo.backplane.com> References: <20030131125804.E1357-100000@femme> <200301311824.h0VIOtmF095380@apollo.backplane.com> <3E3AC33E.9060204@tcoip.com.br> <200301311908.h0VJ8cNZ007396@apollo.backplane.com> <20030131141700.A7526@unixdaemons.com> <200301311952.h0VJqrMB076135@apollo.backplane.com> <20030201100412.B11945@unixdaemons.com> <3E3C327F.FD9E26F7@mindspring.com> <20030201160547.A13169@unixdaemons.com> <3E3C3C0D.15722918@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:> The ideal situation would be to have as Matt (and the comment
:> actually) says a cpu mask of idle cpus and generate an IPI to wake up
:> CPUs sitting in HLT when something hits the runqueue, then you can
:> just hlt all of them and rely on the IPI to wake you up, or the next
:> timer tick, whichever comes first and you can really get the best of
:> both worlds.
:
:I think it's more complicated than that; you don't want to have
:anything other than the CPU that owns the per CPU run queue doing
:anything with it, which means that it's the wakeup event, not the
:arrival on the run queue, which needs to be signalled. Then the
:CPU in question has to do it's own processing of pending wakeup
:events in order to handle the placing of the process on the run
:queue itself, rather than it being handled by another CPU.
:
:This also implies per-CPU wait queues, and a reliable message
:delivery mechanism for wakeup messages.
:
:Though it may be enough to simple mark everything on the wait
:queue as "wakeup pending", for a first rev., and run the wait
:queue, it's probably not a good idea for a production system,
:since it brings back the Giant Scheduler Lock for the wait queue
:(on the plus side, items awakened could be moved to the head of
:the queue when they were marked, with the lock held anyway, and
:that would shorten the list of traversed items per CPU to "all
:pending wakeup processing", rather than "all queue entries").
:But it's still too ugly for words.
The HLT/clock interrupt issue is precisely what I describe in the
idle_hlt comments in i386/i386/machdep.c (last July). I wish we had a
better mechanism then the stupid IPI stuff, like a simple per-cpu
latch/acknowledge level interrupt (softint), but we don't.
I don't think we want to over-engineer per-cpu scheduling. The
system really doesn't know what cpu a task is going to wind up running
on until a cpu scheduler (sched_choose()) comes along and needs to
locate the next task to run. Too many things can happen in between the
initiation of the wait, the wakeup, and the task actually getting cpu.
Introducing a complex per-cpu wait queue or trying to do something
complex at wakeup time instead of at sched_choose() time is just going
to be a waste of time. I think it is best to wakeup a task by placing
it on the same cpu run queue as it was previously on (which is what Jeffs
code does for the most part), and deal with task stealing in
sched_choose(). The scheduler, when it comes time to actually switch
in the next runnable task, then deals with complexities associated with
misbalancing (i.e. cpu A is idle and ready to accept a new task, and
cpu B's run-queue has a task ready to be run).
While it is true that we would like a cpu to predominantly use the
per-cpu run-queue that it owns, we don't really lose anything in the
way of performance by allowing cpu A to add a task to cpu B's
run queue or for cpu A to steal a task from cpu B's run queue. Sure
we have the overhead of a per-cpu mutex, but the reason we don't lose
anything is that this sort of mechanism will *STILL* scale linearly
with the number of cpus in the system (whereas the global run queue in
sched_4bsd.c constricts at a single sched_mtx and does not scale). The
overhead of a per-cpu run-queue with a per-cpu mutex is *STILL*
effectively O(1) and the more complex overheads involved with locating
a new task to schedule from some other cpu's run queue when the current
cpu's run-queue is empty are irrelevant because you are only eating
into cycles which would otherwise be idle anyway.
-Matt
Matthew Dillon
<dillon@backplane.com>
:I think something like wakeup signalling, as a message abstraction,
:is required, in any case, considering support for clustering or NUMA,
:going forward, to deal with slower signal paths on a single system
:image for much more loosely coupled CPUs. Directly modifying queues
:in memory of other CPUs is unlikely to scale well, if it can even be
:made to work at all.
:
:-- Terry
:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200302012157.h11Lvo7f017280>
