Date: Sat, 1 Feb 2003 13:57:50 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: Bosko Milekic <bmilekic@unixdaemons.com>, "Daniel C. Sobral" <dcs@tcoip.com.br>, Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG Subject: Re: Hyperthreading and machdep.cpu_idle_hlt Message-ID: <200302012157.h11Lvo7f017280@apollo.backplane.com> References: <20030131125804.E1357-100000@femme> <200301311824.h0VIOtmF095380@apollo.backplane.com> <3E3AC33E.9060204@tcoip.com.br> <200301311908.h0VJ8cNZ007396@apollo.backplane.com> <20030131141700.A7526@unixdaemons.com> <200301311952.h0VJqrMB076135@apollo.backplane.com> <20030201100412.B11945@unixdaemons.com> <3E3C327F.FD9E26F7@mindspring.com> <20030201160547.A13169@unixdaemons.com> <3E3C3C0D.15722918@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:> The ideal situation would be to have as Matt (and the comment :> actually) says a cpu mask of idle cpus and generate an IPI to wake up :> CPUs sitting in HLT when something hits the runqueue, then you can :> just hlt all of them and rely on the IPI to wake you up, or the next :> timer tick, whichever comes first and you can really get the best of :> both worlds. : :I think it's more complicated than that; you don't want to have :anything other than the CPU that owns the per CPU run queue doing :anything with it, which means that it's the wakeup event, not the :arrival on the run queue, which needs to be signalled. Then the :CPU in question has to do it's own processing of pending wakeup :events in order to handle the placing of the process on the run :queue itself, rather than it being handled by another CPU. : :This also implies per-CPU wait queues, and a reliable message :delivery mechanism for wakeup messages. : :Though it may be enough to simple mark everything on the wait :queue as "wakeup pending", for a first rev., and run the wait :queue, it's probably not a good idea for a production system, :since it brings back the Giant Scheduler Lock for the wait queue :(on the plus side, items awakened could be moved to the head of :the queue when they were marked, with the lock held anyway, and :that would shorten the list of traversed items per CPU to "all :pending wakeup processing", rather than "all queue entries"). :But it's still too ugly for words. The HLT/clock interrupt issue is precisely what I describe in the idle_hlt comments in i386/i386/machdep.c (last July). I wish we had a better mechanism then the stupid IPI stuff, like a simple per-cpu latch/acknowledge level interrupt (softint), but we don't. I don't think we want to over-engineer per-cpu scheduling. The system really doesn't know what cpu a task is going to wind up running on until a cpu scheduler (sched_choose()) comes along and needs to locate the next task to run. Too many things can happen in between the initiation of the wait, the wakeup, and the task actually getting cpu. Introducing a complex per-cpu wait queue or trying to do something complex at wakeup time instead of at sched_choose() time is just going to be a waste of time. I think it is best to wakeup a task by placing it on the same cpu run queue as it was previously on (which is what Jeffs code does for the most part), and deal with task stealing in sched_choose(). The scheduler, when it comes time to actually switch in the next runnable task, then deals with complexities associated with misbalancing (i.e. cpu A is idle and ready to accept a new task, and cpu B's run-queue has a task ready to be run). While it is true that we would like a cpu to predominantly use the per-cpu run-queue that it owns, we don't really lose anything in the way of performance by allowing cpu A to add a task to cpu B's run queue or for cpu A to steal a task from cpu B's run queue. Sure we have the overhead of a per-cpu mutex, but the reason we don't lose anything is that this sort of mechanism will *STILL* scale linearly with the number of cpus in the system (whereas the global run queue in sched_4bsd.c constricts at a single sched_mtx and does not scale). The overhead of a per-cpu run-queue with a per-cpu mutex is *STILL* effectively O(1) and the more complex overheads involved with locating a new task to schedule from some other cpu's run queue when the current cpu's run-queue is empty are irrelevant because you are only eating into cycles which would otherwise be idle anyway. -Matt Matthew Dillon <dillon@backplane.com> :I think something like wakeup signalling, as a message abstraction, :is required, in any case, considering support for clustering or NUMA, :going forward, to deal with slower signal paths on a single system :image for much more loosely coupled CPUs. Directly modifying queues :in memory of other CPUs is unlikely to scale well, if it can even be :made to work at all. : :-- Terry : To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200302012157.h11Lvo7f017280>