Date: Fri, 3 May 2019 14:52:02 -0700 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: 970MP PowerMac G5s: What printf's show about cpu_mp_unleash hangups on the test variant of head -r347003 (found cpu_mp_unleash counterexample) Message-ID: <9B05FB92-D1E5-4DD5-BB04-53088E528F15@yahoo.com> In-Reply-To: <4D659851-8731-4116-A6B6-33A75E9F0B76@yahoo.com> References: <EF44A358-CC6D-4244-A911-6D4DACFF4B21@yahoo.com> <4D659851-8731-4116-A6B6-33A75E9F0B76@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[I've tried a different moea64_bootstrap_slb_prefault usage variation. Also, I reverted all the Rx,Rx,Rx no-op changes. I put in the change to protect the mttb(...) implementation from interrupts messing up the assignment to the TBR, just so that would not be a worry. I still have my isync additions and my libc string compare code changes in what I'm working with for head -r347003 .] On 2019-May-3, at 11:09, Mark Millard <marklmi at yahoo.com> wrote: > [Exeriment with instead disabling the code in each of > the nop_prio_ routines instead of commenting out specific > calls. But mostly: more test runs. It does not support > what I thought yesterday's cpu_mp_unlead suggested.] >=20 > On 2019-May-2, at 22:23, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> [Note: I still have your requested loop change, my >> isync additions, and my libc string compare code >> change in what I'm working with for head -r347003 .] >>=20 >> I started using printf to help identify more about what >> code managed to execute vs what code did not for >> hang-ups. >>=20 >> This note is just about cpu_mp_unleash observations and >> experiments related to what printf's showed. >>=20 >> I did: >>=20 >> static void >> cpu_mp_unleash(void *dummy) >> { >> . . . (omitted as all earlier printf's printed) . . . >> printf("cpu_mp_unleash: before DELAY\n"); >> /* Let the APs get into the scheduler */ >> DELAY(10000); >> printf("cpu_mp_unleash: after DELAY\n"); >>=20 >> } >>=20 >> What I saw was only the first of the twoDEALY printf's >> shown above was printing when cpu_mp_unleash hung up, >> such a hangup being the normal case when vt_upgrade >> did not hang-up first. >>=20 >> So I looked at /mnt/usr/src/sys/powerpc/powerpc/clock.c >> and its DELAY routine and came up with only one thing >> that looked like a useful experiment. Note what I >> then commented out: >>=20 >> # svnlite diff /mnt/usr/src/sys/powerpc/powerpc/clock.c >> Index: /mnt/usr/src/sys/powerpc/powerpc/clock.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- /mnt/usr/src/sys/powerpc/powerpc/clock.c (revision 347003) >> +++ /mnt/usr/src/sys/powerpc/powerpc/clock.c (working copy) >> @@ -309,10 +309,10 @@ >> TSENTER(); >> tb =3D mftb(); >> ttb =3D tb + howmany((uint64_t)n * 1000000, ps_per_tick); >> - nop_prio_vlow(); >> + //nop_prio_vlow(); >> while (tb < ttb) >> tb =3D mftb(); >> - nop_prio_medium(); >> + //nop_prio_medium(); >> TSEXIT(); >> } >>=20 >> After this change I've not (yet?) seen another cpu_mp_unleash >> hangup in my test context. >>=20 >> Even if not documented to do so, it appears to me that >> ori Rx,Rx,Rx code that is behind the nop_prio_vlow() does >> something specific on the 970MP's in the 2-socket/2-core-each >> G5 PowerMac11,2's --and what it does interferes with making >> progress in DELAY, in at least that specific use of it and/or >> any others on the ap's during cpu_mp_unleash. >>=20 >> Of course, this testing process is of a probabilistic context >> and I do not have hundreds or more of examples of any specific >> condition at this point. But, so far, the change in behavior >> seems clear: I went from always-hanging-up-so-far to >> always-booting-so-far (when vt_upgrade did not prevent the >> test in each context). >=20 > So I uncommented the 2 calls commented out the contents of > the nop_prio_ routines, summarized here via: >=20 >=20 > # egrep -r 'or.*(31,31,31|1,1,1|6,6,6|2,2,2|5,5,5|3,3,3)' = /mnt/usr/src/sys/powerpc/ | more = = /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or = 31,31,31"); > /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or = 1,1,1"); > /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or = 6,6,6"); > /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or = 2,2,2"); > /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or = 5,5,5"); > /mnt/usr/src/sys/powerpc/include/cpufunc.h: //__asm __volatile("or = 3,3,3"); [I had forgotten to list cpu.h 's use of the 27,27,27 form in cpu_spinwait.] I've reverted all such "or Rx,Rx,Rx" changes as having no overall change in observed behavior for any of the 3 hang-up areas. The new experiment . . . I've tried changing the loop to only moea64_bootstrap_slb_prefault about 1/2 of the kernel slb slots, leaving the others to be filled before mftb()%n_slbs based "random" replacements would start: /* - * Map the entire KVA range into the SLB. We must not fault = there. + * Map about 1/2 the kernel slb slots, leaving the others = available without + * mftb()%n_slbs replaceme being involved. */ #ifdef __powerpc64__ - for (va =3D virtual_avail; va < virtual_end; va +=3D = SEGMENT_LENGTH) + i =3D 0; + for (va =3D virtual_avail; va < virtual_end && i<(n_slbs-1)/2; = va +=3D SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); #endif This change still gets hang-ups in the 3 same places but, so far, has booted more often. (Not that I have a large sampling at this point.) Definitely still still probabilistic. Other status . . . I've still not figured out a way to discover what happens at each of the 3 hang-up points in the areas identified. I'm also running out of ideas for what to do for possibly gaining other types of evidence. One hypothesis about why I've not seen these 3 hang-up places in my normal environment ( currently based on head -r345758 ), is that I normally run non-debug kernels in that context. Here it has all been debug kernels to better match with what artifacts.ci provides, among other reasons. For reference, the current -r347003 patches that I was last exploring, including showing the extra printf's for the 3 hang-up areas: # svnlite diff /mnt/usr/src/sys/ | more Index: /mnt/usr/src/sys/dev/vt/vt_core.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/dev/vt/vt_core.c (revision 347003) +++ /mnt/usr/src/sys/dev/vt/vt_core.c (working copy) @@ -2720,6 +2720,7 @@ =20 /* Init 25 Hz timer. */ callout_init_mtx(&vd->vd_timer, &vd->vd_lock, 0); +printf("vt_upgrade after callout_init_mtx / before sequence leading to = callout_reset\n"); =20 /* * Start timer when everything ready. @@ -2732,6 +2733,7 @@ atomic_add_acq_int(&vd->vd_timer_armed, 1); vd->vd_flags |=3D VDF_ASYNC; callout_reset(&vd->vd_timer, hz / VT_TIMERFREQ, = vt_timer, vd); +printf("vt_upgrade after callout_reset\n"); register_handlers =3D 1; } =20 Index: /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (revision 347003) +++ /mnt/usr/src/sys/fs/nfsserver/nfs_fha_new.c (working copy) @@ -93,9 +93,11 @@ * Initialize the sysctl context list for the fha module. */ sysctl_ctx_init(&softc->sysctl_ctx); +printf("fhanew_init: after sysctl_ctx_init / before = SYSCTL_ADD_NODE\n"); softc->sysctl_tree =3D SYSCTL_ADD_NODE(&softc->sysctl_ctx, SYSCTL_STATIC_CHILDREN(_vfs_nfsd), OID_AUTO, "fha", = CTLFLAG_RD, 0, "NFS File Handle Affinity (FHA)"); +printf("fhanew_init: after SYSCTL_ADD_NODE\n"); if (softc->sysctl_tree =3D=3D NULL) { printf("%s: unable to allocate sysctl tree\n", = __func__); return; Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (revision 347003) +++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c (working copy) @@ -956,10 +956,12 @@ virtual_end =3D VM_MAX_SAFE_KERNEL_ADDRESS;=20 =20 /* - * Map the entire KVA range into the SLB. We must not fault = there. + * Map about 1/2 the kernel slb slots, leaving the others = available without + * mftb()%n_slbs replaceme being involved. */ #ifdef __powerpc64__ - for (va =3D virtual_avail; va < virtual_end; va +=3D = SEGMENT_LENGTH) + i =3D 0; + for (va =3D virtual_avail; va < virtual_end && i<(n_slbs-1)/2; = va +=3D SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0); #endif =20 @@ -1090,8 +1092,8 @@ CPU_SET(PCPU_GET(cpuid), &pm->pm_active); =20 #ifdef __powerpc64__ - PCPU_SET(aim.userslb, pm->pm_slb); - __asm __volatile("slbmte %0, %1; isync" :: + PCPU_SET(aim.userslb, pm->pm_slb); // no slbie needed? + __asm __volatile("isync; slbmte %0, %1; isync" :: "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE)); #else PCPU_SET(curpmap, pm->pmap_phys); @@ -1104,7 +1106,7 @@ { pmap_t pm; =20 - __asm __volatile("isync; slbie %0" :: "r"(USER_ADDR)); + __asm __volatile("isync; slbie %0; isync" :: "r"(USER_ADDR)); =20 pm =3D &td->td_proc->p_vmspace->vm_pmap; CPU_CLR(PCPU_GET(cpuid), &pm->pm_active); @@ -1956,7 +1958,7 @@ (uintptr_t)uaddr >> ADDR_SR_SHFT; curthread->td_pcb->pcb_cpu.aim.usr_vsid =3D slbv; #ifdef __powerpc64__ - __asm __volatile ("slbie %0; slbmte %1, %2; isync" :: + __asm __volatile ("isync; slbie %0; slbmte %1, %2; isync" :: "r"(USER_ADDR), "r"(slbv), "r"(USER_SLB_SLBE)); #else __asm __volatile("mtsr %0,%1; isync" :: "n"(USER_SR), = "r"(slbv)); Index: /mnt/usr/src/sys/powerpc/aim/moea64_native.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/aim/moea64_native.c (revision = 347003) +++ /mnt/usr/src/sys/powerpc/aim/moea64_native.c (working copy) @@ -406,7 +406,7 @@ */ =20 #ifdef __powerpc64__ - __asm __volatile ("slbia"); + __asm __volatile ("isync; slbia"); __asm __volatile ("slbmfee %0,%1; slbie %0;" : = "=3Dr"(seg0) : "r"(0)); =20 @@ -417,6 +417,7 @@ __asm __volatile ("slbmte %0, %1" ::=20 "r"(slb[i].slbv), "r"(slb[i].slbe));=20 } + __asm __volatile ("isync"); #else for (i =3D 0; i < 16; i++) mtsrin(i << ADDR_SR_SHFT, = kernel_pmap->pm_sr[i]); Index: /mnt/usr/src/sys/powerpc/aim/slb.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/aim/slb.c (revision 347003) +++ /mnt/usr/src/sys/powerpc/aim/slb.c (working copy) @@ -457,7 +457,7 @@ /* If it is for this CPU, put it in the SLB right away */ if (pmap_bootstrapped) { /* slbie not required */ - __asm __volatile ("slbmte %0, %1" ::=20 + __asm __volatile ("isync; slbmte %0, %1; isync" ::=20 "r"(slbcache[i].slbv), "r"(slbcache[i].slbe));=20 } =20 Index: /mnt/usr/src/sys/powerpc/aim/trap_subr64.S =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/aim/trap_subr64.S (revision 347003) +++ /mnt/usr/src/sys/powerpc/aim/trap_subr64.S (working copy) @@ -65,6 +65,7 @@ =20 li %r29, 0 /* Set the counter to zero */ =20 + isync slbia slbmfee %r31,%r29 =20 clrrdi %r31,%r31,28 @@ -71,6 +72,7 @@ slbie %r31 1: ld %r31, 0(%r28) /* Load SLB entry pointer */ cmpdi %r31, 0 /* If NULL, stop */ + isync beqlr =20 ld %r30, 0(%r31) /* Load SLBV */ @@ -96,6 +98,7 @@ /* Otherwise, set up SLBs */ li %r29, 0 /* Set the counter to zero */ =20 + isync slbia slbmfee %r31,%r29 =20 clrrdi %r31,%r31,28 @@ -105,6 +108,7 @@ =20 ld %r31, 8(%r28) /* Load SLBE */ cmpdi %r31, 0 /* If SLBE is not valid, stop */ + isync beqlr ld %r30, 0(%r28) /* Load SLBV */ slbmte %r30, %r31 /* Install SLB entry */ @@ -113,6 +117,7 @@ addi %r29, %r29, 1 cmpdi %r29, 64 /* Repeat if we are not at the = end */ blt 1b=20 + isync blr =20 /* Index: /mnt/usr/src/sys/powerpc/include/cpufunc.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/include/cpufunc.h (revision 347003) +++ /mnt/usr/src/sys/powerpc/include/cpufunc.h (working copy) @@ -155,15 +155,8 @@ return (tb); } =20 -static __inline void -mttb(u_quad_t time) -{ +// mttb moved to after intr_restore =20 - mtspr(TBR_TBWL, 0); - mtspr(TBR_TBWU, (uint32_t)(time >> 32)); - mtspr(TBR_TBWL, (uint32_t)(time & 0xffffffff)); -} - static __inline void eieio(void) { @@ -202,6 +195,19 @@ mtmsr(msr); } =20 +static __inline void +mttb(u_quad_t time) +{ + const uint32_t high=3D time>>32; + const uint32_t low=3D time&0xffffffffu; + + const register_t predisable_msr=3D intr_disable(); + mtspr(TBR_TBWL, 0); + mtspr(TBR_TBWU, high); + mtspr(TBR_TBWL, low); + intr_restore(predisable_msr); +} + static __inline struct pcpu * get_pcpu(void) { Index: /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c (revision = 347003) +++ /mnt/usr/src/sys/powerpc/powerpc/mp_machdep.c (working copy) @@ -286,8 +286,10 @@ if (smp_cpus > 1) atomic_store_rel_int(&smp_started, 1); =20 +printf("cpu_mp_unleash: before DELAY\n"); /* Let the APs get into the scheduler */ DELAY(10000); +printf("cpu_mp_unleash: after DELAY\n"); =20 } =20 Index: /mnt/usr/src/sys/powerpc/powerpc/trap.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /mnt/usr/src/sys/powerpc/powerpc/trap.c (revision 347003) +++ /mnt/usr/src/sys/powerpc/powerpc/trap.c (working copy) @@ -453,8 +453,8 @@ #if defined(__powerpc64__) && defined(AIM) case EXC_DSE: if (td->td_pcb->pcb_cpu.aim.usr_vsid !=3D 0 && - (frame->dar & SEGMENT_MASK) =3D=3D = USER_ADDR) { - __asm __volatile ("slbmte %0, %1" :: + (frame->dar & SEGMENT_MASK) =3D=3D = USER_ADDR) { // no slbie needed? + __asm __volatile ("isync; slbmte %0, %1; = isync" :: = "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), "r"(USER_SLB_SLBE)); return; @@ -712,8 +712,8 @@ * Speculatively restore last user SLB segment, which we know is * invalid already, since we are likely to do = copyin()/copyout(). */ - if (td->td_pcb->pcb_cpu.aim.usr_vsid !=3D 0) - __asm __volatile ("slbmte %0, %1; isync" :: + if (td->td_pcb->pcb_cpu.aim.usr_vsid !=3D 0) // no slbie needed? + __asm __volatile ("isync; slbmte %0, %1; isync" :: "r"(td->td_pcb->pcb_cpu.aim.usr_vsid), = "r"(USER_SLB_SLBE)); #endif =20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9B05FB92-D1E5-4DD5-BB04-53088E528F15>