Date: Thu, 30 May 1996 20:28:41 +0000 From: Poul-Henning Kamp <phk@critter.tfs.com> To: erich@uruk.org Cc: freebsd-smp@freebsd.org Subject: Re: How do you get the SMP code Message-ID: <777.833488121@critter.tfs.com> In-Reply-To: Your message of "Thu, 30 May 1996 12:05:31 MST." <199605301905.MAA07776@uruk.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> You have to do at least one level of pointer indirection for indexing > which CPU you're using anyway, so this is mostly a redesign of how the > mapping works. I agree mostly to this, I just want to make sure that we don't overengineer it. > > Nasty question time: Can part of the local apic registers be used > > as "per-cpu registers" without too much performance penalty ? > > It is particular to each processor. On the Pentium and Pentium Pro, > accesses are part of the external bus and L2 DCU path respectively. > I.e. I think it is slower than the L1. but faster than doing too much aritmetic on the apic_id. My point being that any attempt to find the per-cpu data starts out with trying to read the APIC_ID, so we might as well cache a pointer in the APIC and read that instead... I thought about the fact that we don't use %[gf]s in the kernel quite a bit, one could make a segment per cpu and have the CPU's differ only in %gs's contents. That way we just need to set %gs on entry to the kernel (in trap/syscall/irq &c) and everything is (moderate) downhill from there, with the footnote that we have no way of explaining to CC that it should use the "gs:" prefix, so a lot of ugly inline assembler is needed for it. > > I would really love to have one or two 32bit registers local per CPU > > to speed up all this stuff... > > Yes, yes. We've heard that quite a bit. Oh, and while your're at it: add a nano-second clock, it doesn't have to have nano-sec increments, just units of nano-secs. And if you have more space on your silicon, we have more ideas as well :-) > FWIW: GCC has a lot of room for improvement... the "Pentium GCC" work > showed that just taking advantage of some x86-isms can get you > BOTH 10-30% denser code that's also 10-30% faster. Oh, sure! But performance by design is even better :-) If I can design it so that I only have one level of indirection, then any version of any compiler will do better :-) -- Poul-Henning Kamp | phk@FreeBSD.ORG FreeBSD Core-team. http://www.freebsd.org/~phk | phk@login.dknet.dk Private mailbox. whois: [PHK] | phk@ref.tfs.com TRW Financial Systems, Inc. Future will arrive by its own means, progress not so.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?777.833488121>