Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Jan 2010 07:25:31 -0800
From:      Neel Natu <neelnatu@gmail.com>
To:        Juli Mallett <jmallett@freebsd.org>
Cc:        freebsd-mips@freebsd.org
Subject:   Re: Code review: groundwork for SMP
Message-ID:  <dffe84831001290725g2ca2574ap22b82f2ad38af2d6@mail.gmail.com>
In-Reply-To: <eaa228be1001282242q1f78fff2w9804da6cdadb3d1f@mail.gmail.com>
References:  <dffe84831001262336l1978797g8b12fab815f4eb52@mail.gmail.com> <20100128.132114.1004138037722505681.imp@bsdimp.com> <dffe84831001281401n7c9fb64bjb38260943448f315@mail.gmail.com> <66207A08-F691-4603-A6C5-9C675414C91E@lakerest.net> <eaa228be1001282040r416f3764lde577786347a4d5e@mail.gmail.com> <85D9D383-29A3-4F09-A2FE-61E4EA85CE9B@lakerest.net> <eaa228be1001282242q1f78fff2w9804da6cdadb3d1f@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks Juli, Randall and JC for the comments.

I think it is fair to ask that we don't burn another TLB entry to
store the pcpu data. So maybe it might help if I went through what
options I considered before settling on this one:

- One of the first things that I did investigate was using per-cpu
scratch registers but the Sibyte did not have any and they are not
part of the MIPS architecture.

- The second thing I considered was using a platform-specific
getcpuid() to index into the struct pcpu pcpu[MAXCPU] array to compute
the KSEG0 address of pcpu at runtime. However this turned out to be a
bit messy because there are consumers of getcpuid() in exception
context where we are restricted to using only k0 and k1 (and sometimes
only one of them). Also, like Juli pointed out getcpuid() is slow on
some cpus and I did not want to make the assumption that one could
write getcpuid() using a single k0/k1 register.

So, having the pcpu pointer in a TLB entry divorces us from any
assumptions about the CPU we are running on.

I think that there is a legitimate concern about this on the XLR - but
given that you are sharing the TLB among 4 threads I think there is
the bigger issue of the wired kstack entries that you need to solve
before even thinking about pcpu mapping.

I did not consider the approach suggested by Juli where the pcpu and
kstack pointers can be stashed in a single wired TLB entry. I need
some time to chew on it and prototype it.

I would still like to commit this so as to keep making progress on the
SMP support. This is a small piece of the bigger goal of getting SMP
functional and can be replaced in the future if need be.

best
Neel

On Thu, Jan 28, 2010 at 10:42 PM, Juli Mallett <jmallett@freebsd.org> wrote=
:
> On Thu, Jan 28, 2010 at 21:28, Randall Stewart <rrs@lakerest.net> wrote:
>>> [ Using a single wired TLB entry for kstack and pcpu ]
>>
>> Which means you have a big array that you are offsetting.
>
> Not really =97 you can have a structure at 0xc000000000000000u (or the
> same >> 32) with two pointers in it, even, one to pcpu and one to
> KSTACK_PAGES direct-mapped, contiguous pages. =A0Then you can load the
> kstack address or the pcpu base very quickly. =A0Of course, you can even
> have a single wired entry consisting of the pcpu data and then put a
> pointer to the top of the kstack in it. =A0I don't think you can get by
> with no wired TLB entries, but you also don't have to index a big
> array. =A0The nice thing about setting up a per-CPU TLB entry (you have
> to set up at least one, the kstack, in order to be able to handle
> exceptions) is that then you need only access offsets into it that are
> known at compile time and constant no matter what CPU you're running
> on. =A0Load the kstack by doing "ld sp, 0(0xc...)" and load the pcpu
> address by doing "ld t0, 8(0xc....)". =A0Two wired entries lets you get
> rid of the indirection, but you can get by with one and still not have
> to do (1) run-time computation of the index into some array (2)
> possibly very expensive getting of the cpuid.
>
>> I was even thinking get a LARGE entry.. one that is say 8 Meg
>> for the kernel.. covering all text/data etc... with this
>> new super page stuff. of course I have never looked into how
>> its implemented..
>
> That would be easy to do, but what would be the benefits of accessing
> that data through a wired TLB entry instead of the direct map?
>
>> Yes, you pay an index reference for every access .. or at
>> least one to setup a pointer.. but I think that it much cheaper
>> than a TLB miss is... (words for imp to think about)...
>
> Yes, TLB misses are very slow. =A0Your desire to avoid adding another
> wired entry seems pretty reasonable. =A0I think that using a single
> wired TLB entry for a mux or for both the kstack and pcpu is easy and
> usable. =A0I feel like just wiring the kstack and putting a
> direct-mapped, sometimes-recomputed pointer to the pcpu into gp is the
> best combination in the long run =97 even just loading an immediate
> 64-bit address is pretty slow wrt how often things in the PCPU are
> accessed in SMP kernels.
>
> Juli.
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?dffe84831001290725g2ca2574ap22b82f2ad38af2d6>