Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Apr 2019 21:45:00 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: How many segments does it take to span from VM_MIN_KERNEL_ADDRESS through VM_MAX_SAFE_KERNEL_ADDRESS? 128 in moea64_late_bootstrap
Message-ID:  <6159F4A6-9431-4B99-AA62-451B8DF08A6E@yahoo.com>
In-Reply-To: <3C69CF7C-7F33-4C79-92C0-3493A1294996@yahoo.com>
References:  <3C69CF7C-7F33-4C79-92C0-3493A1294996@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I realized another implication about a another point of
potential slb-misses in cpudep_ap_bootstrap: the
address in sprg0 on the cpu might end up not able to be
dereferenced.]

On 2019-Apr-30, at 20:58, Mark Millard <marklmi at yahoo.com> wrote:

> [At the end this note shows why the old VM_MAX_KERNEL_ADDRESS
> lead to no slb-miss exceptions in cpudep_ap_bootstrap.]
> 
> There is code in moea64_late_bootstrap that looks like:
> 
>        virtual_avail = VM_MIN_KERNEL_ADDRESS;
>        virtual_end = VM_MAX_SAFE_KERNEL_ADDRESS;
> 
>        /*
>         * Map the entire KVA range into the SLB. We must not fault there.
>         */
>        #ifdef __powerpc64__
>        for (va = virtual_avail; va < virtual_end; va += SEGMENT_LENGTH)
>                moea64_bootstrap_slb_prefault(va, 0);
>        #endif
> 
> where (modern):
> 
> #define VM_MIN_KERNEL_ADDRESS           0xe000000000000000UL
> #define VM_MAX_SAFE_KERNEL_ADDRESS      VM_MAX_KERNEL_ADDRESS
> #define VM_MAX_KERNEL_ADDRESS           0xe0000007ffffffffUL
> #define       SEGMENT_LENGTH  0x10000000UL
> 
> So:
> 
> 0xe000000000000000UL: VM_MIN_KERNEL_ADDRESS
> 0x0000000010000000UL: SEGMENT_LENGTH
> 0xe0000007ffffffffUL: VM_MAX_KERNEL_ADDRESS
> 
> So I see the loop as doing moea64_bootstrap_slb_prefault
> 128 times (decimal, 0x00..0x7f at the appropriate
> byte in va).
> 
> (I do not see why this loop keeps going once the slb
> kernel slots are all full. Nor is it obvious to me
> why the larger va values should be the ones more
> likely to still be covered. But I'm going a different
> direction below.)
> 
> That also means that the code does random replacement (based
> on mftb()%n_slbs, but avoiding USER_SLB_SLOT) 128-(64-1),
> or 65 times. The slb_insert_kernel use in 
> moea64_bootstrap_slb_prefault does that:
> 
> moea64_bootstrap_slb_prefault(vm_offset_t va, int large)
> {
>        struct slb *cache;
>        struct slb entry;
>        uint64_t esid, slbe;
>        uint64_t i;
> 
>        cache = PCPU_GET(aim.slb);
>        esid = va >> ADDR_SR_SHFT;
>        slbe = (esid << SLBE_ESID_SHIFT) | SLBE_VALID;
> 
>        for (i = 0; i < 64; i++) {
>                if (cache[i].slbe == (slbe | i))
>                        return;
>        }
> 
>        entry.slbe = slbe;
>        entry.slbv = KERNEL_VSID(esid) << SLBV_VSID_SHIFT;
>        if (large)
>                entry.slbv |= SLBV_L;
> 
>        slb_insert_kernel(entry.slbe, entry.slbv);
> }
> 
> where slb_insert_kernel is in turn has the code that
> will do replacements:
> 
> void
> slb_insert_kernel(uint64_t slbe, uint64_t slbv)
> {
>        struct slb *slbcache;
>        int i;
> 
>        /* We don't want to be preempted while modifying the kernel map */
>        critical_enter();
> 
>        slbcache = PCPU_GET(aim.slb);
> 
>        /* Check for an unused slot, abusing the user slot as a full flag */
>        if (slbcache[USER_SLB_SLOT].slbe == 0) {
>                for (i = 0; i < n_slbs; i++) {
>                        if (i == USER_SLB_SLOT)
>                                continue;
>                        if (!(slbcache[i].slbe & SLBE_VALID))
>                                goto fillkernslb;
>                }
> 
>                if (i == n_slbs)
>                        slbcache[USER_SLB_SLOT].slbe = 1;
>        }
> 
>        i = mftb() % n_slbs;
>        if (i == USER_SLB_SLOT)
>                        i = (i+1) % n_slbs;
> 
> fillkernslb:
>        KASSERT(i != USER_SLB_SLOT,
>            ("Filling user SLB slot with a kernel mapping"));
>        slbcache[i].slbv = slbv;
>        slbcache[i].slbe = slbe | (uint64_t)i;
> 
>        /* If it is for this CPU, put it in the SLB right away */
>        if (pmap_bootstrapped) {
>                /* slbie not required */
>                __asm __volatile ("slbmte %0, %1" ::
>                    "r"(slbcache[i].slbv), "r"(slbcache[i].slbe));
>        }
> 
>        critical_exit();
> }
> 
> [The USER_SLB_SLOT handling makes selection of slot
> USER_SLB_SLOT+1 for what to replace more likely than
> the other kernel slots.]
> 
> I expect that the above explains the variability in
> if cpudep_ap_bootstrap 's:
> 
> sp = pcpup->pc_curpcb->pcb_sp
> 
> gets a slb fault for dereferencing the pc_curpcb stage
> of that vs. not.


Note: the random replacements could also make
dereferencing pcpup-> (aka (get_pcpu())->) end
up with a slb-miss, where:

static __inline struct pcpu *
get_pcpu(void)
{
        struct pcpu *ret;
 
        __asm __volatile("mfsprg %0, 0" : "=r"(ret));
        
        return (ret);
}

If the slb entry covering address ranges accessed
based on sprg0 is ever replaced, no code based on
getting sprg0's value to find the matching pcpu
information is going to work, *including in the
slb spill trap code* [GET_CPUINFO(%r?)].

Does a kernel entry need to be reserved for the
CPU that never is replaced so that sprg0 can always
be used to find pcpu information via sprg0?

This is something my hack did not deal with.
And, in fact, trying to force 2 entries to exist
at the same time, one for "dereferencing sprg0"
and one for dereferencing the pc_curpcb so found
is currently messy, given the way other things
work.

The lack of handing may explain the (rare) hangups
with the existing hack present.


> I also expect that the old VM_MAX_KERNEL_ADDRESS value
> explains the lack of slb-misses in old times:
> 
> 0xe000000000000000UL: VM_MIN_KERNEL_ADDRESS
> 0x0000000010000000UL: SEGMENT_LENGTH
> 0xe0000001c7ffffffUL: VM_MAX_KERNEL_ADDRESS
> 
> So 0x00..0x1c is 29 alternatives (decimal). That
> fits in 64-1 slots, or even 32-1 slots: no
> random replacements happened above or elsewhere.
> That, in turn meant no testing of the handling
> of any slb-misses back then.
> 
> 
> [Other list messages suggest missing context synchronizing
> instructions for slbmte and related instructions. The
> history is not evidence about that, given the lack of
> slb-misses.]




===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6159F4A6-9431-4B99-AA62-451B8DF08A6E>