Date: Tue, 16 Apr 2019 14:26:39 -0700 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: freebsd-ppc <freebsd-ppc@freebsd.org> Subject: Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? [mpc85xx_smp_timebase_sync problem too] Message-ID: <984188AE-93A9-4321-99EC-522C2E0CE9A9@yahoo.com> In-Reply-To: <20190306223611.75c8a87e@titan.knownspace> References: <D9B56EE2-35C7-44A2-9229-D9E4AECAD3E1@yahoo.com> <20190306151914.44ea831c@titan.knownspace> <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> <99AD89F8-0F90-48BE-A060-DA12FD7129E6@yahoo.com> <20190306223611.75c8a87e@titan.knownspace>
next in thread | previous in thread | raw e-mail | index | archive | help
[Looks to me like the use and content of mpc85xx_smp_timebase_sync have the same type of problems I noted for the proposed powermac patch.] On 2019-Mar-6, at 20:36, Justin Hibbits <chmeeedalf at gmail.com> wrote: > On Wed, 6 Mar 2019 18:35:42 -0800 > Mark Millard <marklmi at yahoo.com> wrote: > >> [The patch is definitely wrong via a 3rd use of >> platform_smp_timebase_sync that I'd not noted before. Details at the >> end.] >> >> On 2019-Mar-6, at 16:39, Mark Millard <marklmi at yahoo.com> wrote: >> >> >> >>> On 2019-Mar-6, at 13:19, Justin Hibbits <chmeeedalf at gmail.com> >>> wrote: >>>> On Mon, 4 Mar 2019 19:43:09 -0800 >>>> Mark Millard via freebsd-ppc <freebsd-ppc@freebsd.org> wrote: >>>> >>>>> [It is possible that the following is tied to my hack to >>>>> avoid threads ending up stuck-sleeping. But I ask about >>>>> an alternative that I see in the code.] >>>>> >>>>> Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS >>>>> and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores >>>>> each). Hacks are in use to provide fairly reliable booting >>>>> and to avoid threads getting stuck sleeping. >>>>> >>>>> Before the modern VM_MAX_KERNEL_ADDRESS figure there were only >>>>> 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8 >>>>> (plus bufdaemon and its worker), for example: >>>>> >>>>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.39 >>>>> [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 - >>>>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >>>>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] >>>>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 >>>>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - >>>>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >>>>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] >>>>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.07 >>>>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - >>>>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >>>>> 0.0 0 288 - DL 15:48 0:00.56 [bufdaemon// worker] >>>>> >>>>> I'm sometimes seeing processes showing [*buffer arena] that >>>>> seemed to wait for a fairly long time with that status, not >>>>> something I'd seen historically for those same types of >>>>> processes for a similar overall load (not much). During such >>>>> times trying to create processes to look around at what is >>>>> going on seems to also wait. (Probably with the same status?) >>>>> >>>> >>>> Hi Mark, >>>> >>>> Can you try the attached patch? It might be overkill in the >>>> synchronization, and I might be using the wrong barriers to be >>>> considered correct, but I think this should narrow the race down, >>>> and synchronize the timebases to within a very small margin. The >>>> real correct fix would be to suspend the timebase on all cores, >>>> which is feasible (there's a GPIO for the G4s, and i2c for G5s), >>>> but that's a non-trivial extra work. >>>> >>>> Be warned, I haven't tested it, I've only compiled it (I don't >>>> have a G5 to test with anymore). >>>> >>> >>> Sure, I'll try it when the G5 is again available: it is doing >>> a time consuming build. >>> >>> I do see one possible oddity: tracing another >>> platform_smp_timebase_sync use in the code . . . >>> >>> DEVMETHOD(cpufreq_drv_set, pmufreq_set) >>> >>> static int >>> pmufreq_set(device_t dev, const struct cf_setting *set) >>> { >>> . . . >>> error = pmu_set_speed(speed_sel); >>> . . . >>> } >>> >>> int >>> pmu_set_speed(int low_speed) >>> { >>> . . . >>> platform_sleep(); >>> . . . >>> } >>> >>> PLATFORMMETHOD(platform_sleep, powermac_sleep), >>> >>> void >>> powermac_sleep(platform_t platform) >>> { >>> >>> *(unsigned long *)0x80 = 0x100; >>> cpu_sleep(); >>> } >>> >>> void >>> cpu_sleep() >>> { >>> . . . >>> platform_smp_timebase_sync(timebase, 0); >>> . . . >>> } >>> >>> PLATFORMMETHOD(platform_smp_timebase_sync, >>> powermac_smp_timebase_sync), >>> >>> The issue: >>> >>> I do not see any matching platform_smp_timebase_sync(timebase, 1) >>> or other CPUs doing a powermac_smp_timebase_sync in this sequence. >>> >>> (If this makes testing the patch inappropriate, let me know.) >>> >> >> More important: There is also a use of: >> >> /* The following is needed for restoring from sleep. */ >> platform_smp_timebase_sync(0, 1); >> >> in cpudep_ap_setup . That in turn happens during cpu_reset_handler >> before machdep_ap_bootstrap is called (which does >> platform_smp_timebase_sync as well) : >> >> cpu_reset_handler: >> GET_TOCBASE(%r2) >> >> ld %r1,TOC_REF(tmpstk)(%r2) /* get new SP */ >> addi %r1,%r1,(TMPSTKSZ-48) >> >> bl CNAME(cpudep_ap_early_bootstrap) /* Set PCPU */ >> nop >> lis %r3,1@l >> bl CNAME(pmap_cpu_bootstrap) /* Turn on virtual >> memory */ nop >> bl CNAME(cpudep_ap_bootstrap) /* Set up PCPU and >> stack */ nop >> mr %r1,%r3 /* Use new stack */ >> bl CNAME(cpudep_ap_setup) >> nop >> GET_CPUINFO(%r5) >> ld %r3,(PC_RESTORE)(%r5) >> cmpldi %cr0,%r3,0 >> beq %cr0,2f >> nop >> li %r4,1 >> bl CNAME(longjmp) >> nop >> 2: >> #ifdef SMP >> bl CNAME(machdep_ap_bootstrap) /* And away! */ >> nop >> #endif >> >> Thus overall for ap's there is the sequence: >> >> platform_smp_timebase_sync(0, 1); >> . . . >> while (ap_letgo == 0) >> __asm __volatile("or 31,31,31"); >> __asm __volatile("or 6,6,6"); >> >> /* >> * Set timebase as soon as possible to meet an implicit >> rendezvous >> * from cpu_mp_unleash(), which sets ap_letgo and then >> immediately >> * sets timebase. >> * >> * Note that this is instrinsically racy and is only relevant >> on >> * platforms that do not support better mechanisms. >> */ >> platform_smp_timebase_sync(ap_timebase, 1); >> >> for each ap . So the (ap) case in powermac_smp_timebase_sync >> will wait with tb==0 (from cpudep_ap_setup) and the later calls >> from machdep_ap_bootstrap will not wait and will be after the >> unleash but not just local to powermac_smp_timebase_sync: >> >> static void >> powermac_smp_timebase_sync(platform_t plat, u_long tb, int ap) >> { >> static int cpus; >> static int unleash; >> >> if (ap) { >> atomic_add_int(&cpus, 1); >> while (!atomic_load_acq_int(&unleash)) >> ; >> } else { >> atomic_add_int(&cpus, 1); >> while (atomic_load_int(&cpus) != mp_ncpus) >> ; >> atomic_store_rel_int(&unleash, 1); >> } >> >> mttb(tb); >> } >> >> In the end cpus will have double counts of the ap cpus instead >> of matching mp_ncpus. >> >> cpufreq_drv_set activity is a seperate, additional issue from this. >> >> === >> Mark Millard >> marklmi at yahoo.com >> ( dsl-only.net went >> away in early 2018-Mar) >> > > As mentioned, I had only compiled it. Your examination of the code > path demonstrates that the patch is insufficient, and would hang at > unleash anyway. The sleep/wake logic probably needs to be updated > anyway. It was written for a G4 powerbook primarily for the PMU-based > cpufreq driver, so some bits might need to be moved around. Orthogonal > to this issue, though. It appears to me that the overall sequence: platform_smp_timebase_sync(0, 1); // called from cpudep_ap_setup . . . // The following are from in machdep_ap_bootstrap . . . while (ap_letgo == 0) __asm __volatile("or 31,31,31"); __asm __volatile("or 6,6,6"); . . . platform_smp_timebase_sync(ap_timebase, 1); calls mpc85xx_smp_timebase_sync twice per ap and that the 2nd call has tb_ready && mp_ncpus<=cpu_done for each such ap, leading to a lack of coordination of the activity that 2nd time for each ap: static void mpc85xx_smp_timebase_sync(platform_t plat, u_long tb, int ap) { static volatile bool tb_ready; static volatile int cpu_done; if (ap) { /* APs. Hold off until we get a stable timebase. */ while (!tb_ready) atomic_thread_fence_seq_cst(); mttb(tb); atomic_add_int(&cpu_done, 1); while (cpu_done < mp_ncpus) atomic_thread_fence_seq_cst(); } else { /* BSP */ freeze_timebase(rcpm_dev, true); tb_ready = true; mttb(tb); atomic_add_int(&cpu_done, 1); while (cpu_done < mp_ncpus) atomic_thread_fence_seq_cst(); freeze_timebase(rcpm_dev, false); } } === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?984188AE-93A9-4321-99EC-522C2E0CE9A9>