FreeBSD Mail Archives

Date:      Sat, 1 Feb 2003 13:57:50 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Bosko Milekic <bmilekic@unixdaemons.com>, "Daniel C. Sobral" <dcs@tcoip.com.br>, Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG
Subject:   Re: Hyperthreading and machdep.cpu_idle_hlt
Message-ID:  <200302012157.h11Lvo7f017280@apollo.backplane.com>
References:  <20030131125804.E1357-100000@femme> <200301311824.h0VIOtmF095380@apollo.backplane.com> <3E3AC33E.9060204@tcoip.com.br> <200301311908.h0VJ8cNZ007396@apollo.backplane.com> <20030131141700.A7526@unixdaemons.com> <200301311952.h0VJqrMB076135@apollo.backplane.com> <20030201100412.B11945@unixdaemons.com> <3E3C327F.FD9E26F7@mindspring.com> <20030201160547.A13169@unixdaemons.com> <3E3C3C0D.15722918@mindspring.com>


:>   The ideal situation would be to have as Matt (and the comment
:>   actually) says a cpu mask of idle cpus and generate an IPI to wake up
:>   CPUs sitting in HLT when something hits the runqueue, then you can
:>   just hlt all of them and rely on the IPI to wake you up, or the next
:>   timer tick, whichever comes first and you can really get the best of
:>   both worlds.
:
:I think it's more complicated than that; you don't want to have
:anything other than the CPU that owns the per CPU run queue doing
:anything with it, which means that it's the wakeup event, not the
:arrival on the run queue, which needs to be signalled.  Then the
:CPU in question has to do it's own processing of pending wakeup
:events in order to handle the placing of the process on the run
:queue itself, rather than it being handled by another CPU.
:
:This also implies per-CPU wait queues, and a reliable message
:delivery mechanism for wakeup messages.
:
:Though it may be enough to simple mark everything on the wait
:queue as "wakeup pending", for a first rev., and run the wait
:queue, it's probably not a good idea for a production system,
:since it brings back the Giant Scheduler Lock for the wait queue
:(on the plus side, items awakened could be moved to the head of
:the queue when they were marked, with the lock held anyway, and
:that would shorten the list of traversed items per CPU to "all
:pending wakeup processing", rather than "all queue entries").
:But it's still too ugly for words.

    The HLT/clock interrupt issue is precisely what I describe in the
    idle_hlt comments in i386/i386/machdep.c (last July).  I wish we had a
    better mechanism then the stupid IPI stuff, like a simple per-cpu 
    latch/acknowledge level interrupt (softint), but we don't.

    I don't think we want to over-engineer per-cpu scheduling.  The
    system really doesn't know what cpu a task is going to wind up running
    on until a cpu scheduler (sched_choose()) comes along and needs to
    locate the next task to run.  Too many things can happen in between the
    initiation of the wait, the wakeup, and the task actually getting cpu.
    Introducing a complex per-cpu wait queue or trying to do something
    complex at wakeup time instead of at sched_choose() time is just going
    to be a waste of time.   I think it is best to wakeup a task by placing
    it on the same cpu run queue as it was previously on (which is what Jeffs
    code does for the most part), and deal with task stealing in 
    sched_choose().  The scheduler, when it comes time to actually switch
    in the next runnable task, then deals with complexities associated with
    misbalancing (i.e. cpu A is idle and ready to accept a new task, and
    cpu B's run-queue has a task ready to be run).

    While it is true that we would like a cpu to predominantly use the
    per-cpu run-queue that it owns, we don't really lose anything in the
    way of performance by allowing cpu A to add a task to cpu B's
    run queue or for cpu A to steal a task from cpu B's run queue.  Sure
    we have the overhead of a per-cpu mutex, but the reason we don't lose
    anything is that this sort of mechanism will *STILL* scale linearly
    with the number of cpus in the system (whereas the global run queue in
    sched_4bsd.c constricts at a single sched_mtx and does not scale).  The
    overhead of a per-cpu run-queue with a per-cpu mutex is *STILL*
    effectively O(1) and the more complex overheads involved with locating
    a new task to schedule from some other cpu's run queue when the current
    cpu's run-queue is empty are irrelevant because you are only eating
    into cycles which would otherwise be idle anyway.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


:I think something like wakeup signalling, as a message abstraction,
:is required, in any case, considering support for clustering or NUMA,
:going forward, to deal with slower signal paths on a single system
:image for much more loosely coupled CPUs.  Directly modifying queues
:in memory of other CPUs is unlikely to scale well, if it can even be
:made to work at all.
:
:-- Terry
:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200302012157.h11Lvo7f017280>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation