From owner-freebsd-smp Fri Apr 28 18:20:29 2000 Delivered-To: freebsd-smp@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 8525437B824 for ; Fri, 28 Apr 2000 18:20:24 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id SAA19968; Fri, 28 Apr 2000 18:20:19 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAARcai_M; Fri Apr 28 18:20:12 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id SAA11745; Fri, 28 Apr 2000 18:20:14 -0700 (MST) From: Terry Lambert Message-Id: <200004290120.SAA11745@usr08.primenet.com> Subject: Re: hlt instructions and temperature issues To: dillon@apollo.backplane.com (Matthew Dillon) Date: Sat, 29 Apr 2000 01:20:13 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), jgowdy@home.com (Jeremiah Gowdy), smp@csn.net (Steve Passe), jim@thehousleys.net (James Housley), freebsd-smp@FreeBSD.ORG In-Reply-To: <200004282240.PAA14200@apollo.backplane.com> from "Matthew Dillon" at Apr 28, 2000 03:40:33 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > :Others have complained about the "air gap" between the "sti" and > :the "hlt". I think that this is not really an issue, but it's > :very easy to rectify this, if it were. It's clearly not an issue > :if the TPR claims are correct, and the new code merely removes > :the "#ifdef SMP/#endif" directives. > > This is definitely an issue. If both cpu's go idle the interrupt > that's supposed to wake one of them up (e.g. some event that causes > a process to be woken up) can get lost in the air-gap. The reason I said that is that I think that access to that code might be serialized, until the priority is dropped. > It's trivial to rectify, so we should just do it :-). Besides, you > can't mess with the APIC stuff with interrupts enabled anyway because, > again, an interrupt might occur that alters the state of the system > just before or just after you modify the APIC priority. The sequence > of events should be: (1) mess with apic (2) sti (3) hlt. Right, very easy to rectify. > These windows are small, but as we have seen an ample number of times > even one-instruction windows can get hit when the code in > question is being run thousands of times a second. Ouch! Still smarting from the lock stuff. 8-) 8-). > I like the HLT + IPI idea, but none of the patches to date > really cover the bases and switching performance is not going > to be as good as when you don't have the HLT due to the > overhead of sending the IPIs This is a non-issue, I think. The IPIs will be sent at a time that the sending processor would otherwise be going idle. The need to do this is no more of a hit, I think, than the hit FreeBSD normally takes from "hlt" in the non-SMP case. > and having to keep track of which cpu's are in a HLT'd state > and which are not (so you don't send IPI's to all cpu's > gratuitously). A trivial gross approximation here would be to have a 32 bit bitmap, one bit per processor, which did an XOR with memory of only its own personal bit. The only danger here is a window in which someone (holding the BGL) leaving the scheduler send a spurious IPI between the wakeup and the XOR operation. You could fix this by having the bit set when it is going to sleep, and unset based on the IPI about to be sent. > This is not a trivial problem because we cannot afford to > have N cpu's all trying to do locked bitset instructions on > the same memory location in order to go idle -- that alone > will create big latencies. There is a lot of current SMP code that assumes MESI cache coherency. Adding to this will not be an issue. The XOR instructions will not need to be locked, I believe, since the cache coherency notifications should handle synchronization. As I said, the bit will only ever be being cleared in the BGL case. If you want to get gross, you can set the bit in the scheduler with the BGL held on the processor that's about to go idle, which would take care of your objection: the bitmap is only ever manipulated with the BGL held, and the manipulation is done opportunistically, so there is not additional locking overhead. You would, of course, have to undo this hack when you went to per CPU ready-to-run queues. But realize that per CPU ready-to-run queues already magically have an IPI call location reserved in the code which migrates processes from one CPU ready-to-run queue to another. 8-). > We should consider testing other possible solutions, such as having a > really tight idle loop that stays in the same cacheline and thus does > not greatly exercise the cpu's circuitry, resulting in less heat without > having to HLT. I think that going outside is the least of the heat dissipation workies; it strikes me that line drivers are not where the heat is coming from, and that running over the same cache line would be a very bad thing. The other problem with this idea is that you rely on a shootdown notification for a data change in order to exit the loop, and that is, defacto, an IPI in all but name. > For example, if we can remove *ALL* memory writes from the > best-case idle loop it should make a huge difference in heat > dissipation without having to resort to HLT! Right now we > make a number of subroutine calls (such as to procrunnable()) > which will result in external bus cycles. If those can be > inlined it should have a noticeable effect. You can give it a try, but I don't think it will have the effect you think that it will. I think the numbers for a 2 CPU system with Loqui's patch were extremely exagerated by the CPU stalling-until-interrupt issue, and that the heat numbers will not be nearly so good, even on a totally "correct" solotuion because of this. I expect your approach would result in temeperatures nearly as high, if not downright indistinguishable from, the measured numbers for an unmodified system. This is really not an issue, anyway, except for power consumption and heat dissipation critical environments, but that said, if it's for an SMP box going into a colocation server room rack somewhere in a 1U case, this could be significant for some percentage of users, so maybe it's worth still talking about. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message