Date: Wed, 1 May 2019 11:51:44 -0700 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: How many segments does it take to span from VM_MIN_KERNEL_ADDRESS through VM_MAX_SAFE_KERNEL_ADDRESS? 128 in moea64_late_bootstrap Message-ID: <212E50E5-7EB1-4381-A662-D5EACB1E5D46@yahoo.com> In-Reply-To: <20190501094029.542c5f46@titan.knownspace> References: <3C69CF7C-7F33-4C79-92C0-3493A1294996@yahoo.com> <6159F4A6-9431-4B99-AA62-451B8DF08A6E@yahoo.com> <20190501094029.542c5f46@titan.knownspace>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-May-1, at 07:40, Justin Hibbits <chmeeedalf at gmail.com> wrote: > On Tue, 30 Apr 2019 21:45:00 -0700 > Mark Millard <marklmi@yahoo.com> wrote: > >> [I realized another implication about a another point of >> potential slb-misses in cpudep_ap_bootstrap: the >> address in sprg0 on the cpu might end up not able to be >> dereferenced.] >> >> On 2019-Apr-30, at 20:58, Mark Millard <marklmi at yahoo.com> wrote: >> >>> [At the end this note shows why the old VM_MAX_KERNEL_ADDRESS >>> lead to no slb-miss exceptions in cpudep_ap_bootstrap.] >>> >>> There is code in moea64_late_bootstrap that looks like: >>> >>> virtual_avail = VM_MIN_KERNEL_ADDRESS; >>> virtual_end = VM_MAX_SAFE_KERNEL_ADDRESS; >>> >>> /* >>> * Map the entire KVA range into the SLB. We must not fault >>> there. */ >>> #ifdef __powerpc64__ >>> for (va = virtual_avail; va < virtual_end; va += >>> SEGMENT_LENGTH) moea64_bootstrap_slb_prefault(va, 0); >>> #endif > > What happens if you revert all your patches, Most of the patches in Bugzilla 233863 are not for this issue at all and are not tied to starting the non-bsp cpus. (The one for improving how close the Time Base registers are is tied to starting these cpus.) Only the aim/mp_cpudep.c and aim/slb.c changes seem relevant. Are you worried about some form of interaction that means I need to avoid patches for other issues? Note: for now I'm staying at using head -r345758 as the basis for my experiments. > and change this loop to > stop at n_slb? So something more akin to: > > int i = 0; > > for (va = virtual_avail; va < virtual_end && i < n_slb - 1; va > += SEGMENT_LENGTH, i++); > ... > > If it reliably boots with that, then that's fine. We can prefault as > much as we can and leave the rest for on-demand. I'm happy to experiment with this loop without my hack for forcing the slb entry to exist in cpudep_ap_bootstrap. But, it seems to presume that the pc_curpcb's will all always point into the lower address range spanned when cpudep_ap_bootstrap is executing on the cpu. Does some known property limit the pc_curpcb-> references to such? Only that would be sure to avoid an slb-miss at that stage. Or is this just an alternate hack or a means of getting evidence, not a proposed solution? (Again, I'm happy to disable my hack that forces the slb entry and to try the loop suggested.) >>> >>> where (modern): >>> >>> #define VM_MIN_KERNEL_ADDRESS 0xe000000000000000UL >>> #define VM_MAX_SAFE_KERNEL_ADDRESS VM_MAX_KERNEL_ADDRESS >>> #define VM_MAX_KERNEL_ADDRESS 0xe0000007ffffffffUL >>> #define SEGMENT_LENGTH 0x10000000UL >>> >>> So: >>> >>> 0xe000000000000000UL: VM_MIN_KERNEL_ADDRESS >>> 0x0000000010000000UL: SEGMENT_LENGTH >>> 0xe0000007ffffffffUL: VM_MAX_KERNEL_ADDRESS >>> >>> So I see the loop as doing moea64_bootstrap_slb_prefault >>> 128 times (decimal, 0x00..0x7f at the appropriate >>> byte in va). >>> >>> (I do not see why this loop keeps going once the slb >>> kernel slots are all full. Nor is it obvious to me >>> why the larger va values should be the ones more >>> likely to still be covered. But I'm going a different >>> direction below.) >>> >>> That also means that the code does random replacement (based >>> on mftb()%n_slbs, but avoiding USER_SLB_SLOT) 128-(64-1), >>> or 65 times. The slb_insert_kernel use in >>> moea64_bootstrap_slb_prefault does that: >>> > ... >>> >>> I expect that the above explains the variability in >>> if cpudep_ap_bootstrap 's: >>> >>> sp = pcpup->pc_curpcb->pcb_sp >>> >>> gets a slb fault for dereferencing the pc_curpcb stage >>> of that vs. not. >> >> >> Note: the random replacements could also make >> dereferencing pcpup-> (aka (get_pcpu())->) end >> up with a slb-miss, where: >> >> static __inline struct pcpu * >> get_pcpu(void) >> { >> struct pcpu *ret; >> >> __asm __volatile("mfsprg %0, 0" : "=r"(ret)); >> >> return (ret); >> } >> >> If the slb entry covering address ranges accessed >> based on sprg0 is ever replaced, no code based on >> getting sprg0's value to find the matching pcpu >> information is going to work, *including in the >> slb spill trap code* [GET_CPUINFO(%r?)]. > > Keep in mind that the PCPU pointer is in the DMAP, since it's in the > kernel image. It's not in KVA. However, some structures pointed to by > pcpu are in KVA, and those are what are faulting. Ahh, the slb is not involved for the DMAP's address range. Good to know. Also, no virtual address range is set up to map to the same memory and then used. Also good to know. (As is probably clear, I'm figuring things out as I go. Some things are easier to figure out from what I see than others. Thanks for the notes!) >> >> Does a kernel entry need to be reserved for the >> CPU that never is replaced so that sprg0 can always >> be used to find pcpu information via sprg0? >> >> This is something my hack did not deal with. >> And, in fact, trying to force 2 entries to exist >> at the same time, one for "dereferencing sprg0" >> and one for dereferencing the pc_curpcb so found >> is currently messy, given the way other things >> work. > > It should not. The kernel SLB miss handler runs entirely in real > mode, and __pcpu[] is in the DMAP, so real addresses match up directly > with DMAP addresses (modulo 0xc000000000000000). Good to know the limited usage context (for example, DMAP only addresses in sprg0). Thanks again for the notes. They definately help. >> >> The lack of handing may explain the (rare) hangups >> with the existing hack present. >> >> >>> I also expect that the old VM_MAX_KERNEL_ADDRESS value >>> explains the lack of slb-misses in old times: >>> >>> 0xe000000000000000UL: VM_MIN_KERNEL_ADDRESS >>> 0x0000000010000000UL: SEGMENT_LENGTH >>> 0xe0000001c7ffffffUL: VM_MAX_KERNEL_ADDRESS >>> >>> So 0x00..0x1c is 29 alternatives (decimal). That >>> fits in 64-1 slots, or even 32-1 slots: no >>> random replacements happened above or elsewhere. >>> That, in turn meant no testing of the handling >>> of any slb-misses back then. >>> >>> >>> [Other list messages suggest missing context synchronizing >>> instructions for slbmte and related instructions. The >>> history is not evidence about that, given the lack of >>> slb-misses.] > > Possibly. However, we may want direct control of the slots at boot > time so as to make sure none of the range we're prefaulting gets > replaced by the pseudo-random slot chooser. Sounds like a potential direction. Thanks again. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?212E50E5-7EB1-4381-A662-D5EACB1E5D46>