Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Jan 2020 14:22:21 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        Francis Little <oggy@farscape.co.uk>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: G5 Quad Fans full speed after 1 min
Message-ID:  <FCED3FE8-0839-4A3D-B730-FE4A10B396A9@yahoo.com>
In-Reply-To: <DC70B788-24D0-4578-8A57-BB8DA753CAD4@yahoo.com>
References:  <CAGSRtz76TC44UiECMepLAZjXE=H33sg4OFLJpmZpNCSEwE=ETQ@mail.gmail.com> <21533290-667C-472E-89F7-D1E7DE773193@yahoo.com> <DC70B788-24D0-4578-8A57-BB8DA753CAD4@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I forgot to include something for the patches
involved. Added this time.]

On 2020-Jan-19, at 13:00, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Jan-19, at 03:13, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>> On 2020-Jan-19, at 00:38, Francis Little <oggy at farscape.co.uk> =
wrote:
>>=20
>>> Hi,
>>>=20
>>> My G5 Quad is running current from a few days ago, but this issue =
has been
>>> happening for a long time.
>>>=20
>>> After about 1 min of uptime, the fans go full speed.
>>>=20
>>> As soon as I query anything like CPU temp or fan rpm with sysctl, =
the fans
>>> return to a normal speed.
>>>=20
>>> 1 min later the fans go full speed again.
>>>=20
>>> I've been working round this for some time with a cron job that runs =
sysctl
>>> with one of the cpu temp sensors to calm the system.
>>=20
>> QUOTING an old message:
>>  The mftb figures on the various cores can be so far apart that
>>  threads can end-up stuck sleeping, such as syncr, pmac_thermal,
>>  and buf*deamon*  threads. (This can really mess things up by
>>  not updating the storage correctly.) Such is still true of the=20
>>  ELFv2 context.
>>=20
>>  (Most folks notice this via shutdown timeouts and the fans
>>  going fast unnecessarily. But it is involved earlier as well.)
>> END QUOTE
>>=20
>> Nothing in the boot sequence is forcing the CPUs/Cores to
>> see any particular time relationship to each other and on
>> the multi-socket PowerMacs it can vary widely (G4 and G5).
>> Sometimes it will happen to end up okay, other times not.
>>=20
>> (I've no access to a single-socket, multi-core PowerMac,
>> so I just do not know for that kind of context.)
>>=20
>> I run with patched boot-code that has cross-cpu/core time
>> observations and adjustments to non-bsp time to see the
>> bsp time as between the start and end of a round trip to
>> the bsp from each non-bsp to get the bsp's time. It is
>> based on the mid-point of the start and end times for
>> the non-bsp's round trip vs. the bsp's returned time.
>> With at most 4 cores, each non-bsp is done in sequence.
>> The code only does this on PowerMacs, having no access
>> to other types of PowerPC examples to test.
>>=20
>> I've not seen this type of problem since mid 2019-May on
>> any of:
>>=20
>> 1 G5 with 2 sockets, 1 core each
>> 2 G5's, 2 sockets, 2 cores each
>> 2 G4's, 2 sockets, 1 core each
>>=20
>> (The mid-May time frame is when I adjusted the code to
>> deal with the faster time increment on the slower
>> 32-bit processors for the model that I have access to.
>> I had to be more careful to avoid biasing the approximate
>> symmetry to be less symmetric. On the G5's its been
>> longer since I've seen this problem, based on earlier
>> source code.)
>>=20
>> Unfortunately the "lab" the machines are in is powered
>> down currently.
>>=20
>> FYI: Prior to this technique, I had a pure hack that
>> was observed to avoid the problem. But it changed code
>> used all the time --code that I did not want to have
>> any hack's in if I could avoid it.
>>=20
>> FYI: I also run with other PowerMac related patches.
>> Generally this mftb() time adjustment is one of the
>> newest patches, possibly the newest. So my test
>> context may be biased by the other patches.
>>=20
>>> If I boot to OS X 10.5 and load the system, the fans are stable.
>>=20
>> I've not done any investigation of the issue for the
>> older contexts. But, if I remember right, I did see
>> the problem on occasion back in that time frame.
>>=20
>>> Does anyone else get this?
>>=20
>> My understanding is everyone booting a fairly modern
>> standard FreeBSD gets this sometimes for the kind of
>> context that you specified. (I'm not sure of the
>> variability if the frequency of the problem happening
>> for that kind of context.)
>>=20
>> I certainly saw it before I investigated avoiding it.
>=20
> I got access to the sources with the patches.
> I'll cover both 32-bit and 64-bit for how I
> have avoided the thread-stuck-sleeping problems,
> including for pmac_thermal.
>=20
>=20
> In the following you likely want to avoid my
> use of ap_pcpu being volatile (not a pointer
> to something volatile). I use volatile to
> document that I do not want the code
> generation to avoid storing or accessing
> RAM for what is volatile.
>=20
> -extern void *ap_pcpu;
> +extern void * volatile ap_pcpu;
>=20
> There is one example of this in the diff's
> that I show.
>=20
> I also put powerpc_sync(); soon after any
> ap_pcpu =3D ... assignment. I do not show
> here such declarations or powerpc_sync()
> use that are not in the sources tied to
> the mftb related time measurements and
> adjustments at boot time. (ap_pcpu is
> sometimes not alone in being a variable
> that an added powerpc_sync() is for. An
> example will occur below.)
>=20
> The additional powerpc_sync() use is for
> gathering evidence that a sufficient context
> exists, not trying to be an example of a
> minimalist sufficient implementation. isync()
> use and possibly more have a similar status
> here.
>=20
>=20
> First: I'm paranoid about interrupts messing
> up times, so I use a modified mttb(...):
> (whitespace may not be well preserved)
>=20
> Index: /usr/src/sys/powerpc/include/cpufunc.h
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/include/cpufunc.h      (revision 356426)
> +++ /usr/src/sys/powerpc/include/cpufunc.h      (working copy)
> @@ -155,15 +155,8 @@
>        return (tb);
> }
>=20
> -static __inline void
> -mttb(u_quad_t time)
> -{
> +// mttb is listed after intr_disable and intr_restore.
>=20
> -       mtspr(TBR_TBWL, 0);
> -       mtspr(TBR_TBWU, (uint32_t)(time >> 32));
> -       mtspr(TBR_TBWL, (uint32_t)(time & 0xffffffff));
> -}
> -
> static __inline void
> eieio(void)
> {
> @@ -202,6 +195,19 @@
>        mtmsr(msr);
> }
>=20
> +static __inline void
> +mttb(u_quad_t time)
> +{
> +       const uint32_t   high=3D time>>32;
> +       const uint32_t   low=3D  time&0xffffffffu;
> +      =20
> +       const register_t predisable_msr=3D intr_disable();
> +       mtspr(TBR_TBWL, 0);
> +       mtspr(TBR_TBWU, high);
> +       mtspr(TBR_TBWL, low);
> +       intr_restore(predisable_msr);
> +}
> +
> static __inline struct pcpu *
> get_pcpu(void)
> {
>=20
>=20
> I have platform_powermac.c enable what I call
> alternate_timebase_sync_style. I have no place
> else doing so: PowerMac's are my only PowerPC
> test environment.
>=20
> Be WARNED this code is one of various places
> involved in changing ap_pcpu to be volatile
> (not point to volatile). You likely do not want
> that --and, if you use it, various other places
> would need to track. Still, I show my code as
> it is here.
>=20
> Index: /usr/src/sys/powerpc/powermac/platform_powermac.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/powermac/platform_powermac.c   (revision =
356426)
> +++ /usr/src/sys/powerpc/powermac/platform_powermac.c   (working copy)
> @@ -55,7 +55,7 @@
>=20
> #include "platform_if.h"
>=20
> -extern void *ap_pcpu;
> +extern void * volatile ap_pcpu;
>=20
> static int powermac_probe(platform_t);
> static int powermac_attach(platform_t);
> @@ -333,6 +333,9 @@
>        return (powermac_smp_fill_cpuref(cpuref, bsp));
> }
>=20
> +// platform_powermac.c is implicitly an AIM context: no explicit AIM =
test.
> +extern volatile int alternate_timebase_sync_style; // 0 indicates old =
style; 1 indicates new style
> +
> static int
> powermac_smp_start_cpu(platform_t plat, struct pcpu *pc)
> {
> @@ -367,6 +370,13 @@
>=20
>        ap_pcpu =3D pc;
>=20
> +       // platform_powermac.c is implicitly an AIM context: no =
explicit AIM test.
> +       // Part of: Attempt a better-than-historical approximately
> +       //          equal timebase value for ap vs. bsp
> +       alternate_timebase_sync_style=3D 1; // So: new style for =
PowerMacs
> +
> +       powerpc_sync(); // for ap_pcpu and =
alternate_timebase_sync_style
> +
>        if (rstvec_virtbase =3D=3D NULL)
>                rstvec_virtbase =3D pmap_mapdev(0x80000000, PAGE_SIZE);
>=20
>=20
>=20
> There is more use of intr_disable() in the code directly
> targeting measuring and adjusting time: some of the
> code in mp_machdep.c . The protocol for going back and
> forth with the bsp here is based on memory flags.
> machdep_ap_bootstrap and cpu_mp_unleash have comments
> about the protocol and its use.
>=20
> In setting this up I had to be careful to make code that
> avoided optimizations from calculating various things
> earlier than appropriate. There is a comment about that
> in machdep_ap_bootstrap.
>=20
> Index: /usr/src/sys/powerpc/powerpc/mp_machdep.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- /usr/src/sys/powerpc/powerpc/mp_machdep.c   (revision 356426)
> +++ /usr/src/sys/powerpc/powerpc/mp_machdep.c   (working copy)
> @@ -69,6 +69,88 @@
> static struct mtx ap_boot_mtx;
> struct pcb stoppcbs[MAXCPU];
>=20
> +#if defined(AIM)
> +// Part of: Attempt a better-than-historical approximately
> +//          equal timebase value for ap vs. bsp
> +
> +volatile int alternate_timebase_sync_style=3D 0; // 0 indicates old =
style; 1 indicates new style.
> +volatile uint64_t bsp_timebase_sample=3D 0u;
> +
> +volatile unsigned int from_bsp_status_flag=3D 0u;
> +// stages: 0u, 1u (bsp ready to start), 2u (bsp tbr value available =
to ap)
> +
> +volatile unsigned int from_ap_status_flag=3D 0u;
> +// stages: 0u, 1u (ap ready for bsp tbr value to be found and sent)
> +#endif
> +
> +static __inline uint64_t
> +mftb_with_no_pointer_use(void)
> +{
> +#ifdef __powerpc64__
> +       uint64_t tb; // not used for 32-bit powerpc
> +       __asm __volatile ("mftb %0" : "=3Dr"(tb));
> +       return tb;
> +#else
> +       uint32_t tbu; // not pointer into tb
> +       uint32_t tbl; // not pointer into tb
> +
> +       do {
> +               tbu=3D mfspr(TBR_TBU);
> +               tbl=3D mfspr(TBR_TBL);
> +       } while (tbu !=3D mfspr(TBR_TBU));
> +
> +       // The construction of the unint64_t value does bias the mttb =
some
> +       // for the round-trip-start side of things.
> +       //
> +       // The pointers into tb technique would involve a pair of =
memory
> +       // writes and a pair of memory reads instead, the writes being
> +       // in the loop.
> +       return ((uint64_t)tbu<<32) | tbl;
> +#endif
> +}
> +
> +static __inline uint64_t
> +mftb_plus_delta(volatile uint64_t* bsp_tbr, int64_t ap_midpoint)
> +       // The return value is used in the mttb as the argument.
> +{     =20
> +#ifdef __powerpc64__
> +       uint64_t tb; // not used for 32-bit powerpc
> +       __asm __volatile ("mftb %0" : "=3Dr"(tb));
> +       // The construction of the unint64_t value does bias the mttb =
some:
> +       // it assignes an earlier time than hoped for, given these =
later
> +       // calculations.
> +       return tb + ((int64_t)*bsp_tbr - ap_midpoint);
> +#else
> +       // Establishes delta_for_approx_match_to_bsp_tbr_values such =
that:
> +       //           =
ap_midpoint+delta_for_approx_match_to_bsp_tbr_values=3D=3D*bsp_tbr
> +       int64_t  delta_for_approx_match_to_bsp_tbr_values;
> +       uint32_t tbu;       // not pointer into tb
> +       uint32_t tbl;       // not pointer into tb
> +
> +       do {
> +               // The below in-loop style is for avoiding the loop
> +               // vs. ap_midpoint's calculation being reversed in
> +               // the code generated: volatile is is being put to
> +               // use here.
> +               delta_for_approx_match_to_bsp_tbr_values=3D =
(int64_t)*bsp_tbr-ap_midpoint;
> +
> +               tbu=3D mfspr(TBR_TBU);
> +               tbl=3D mfspr(TBR_TBL);
> +       } while (tbu !=3D mfspr(TBR_TBU));
> +
> +       // The construction of the unint64_t value does bias the mttb =
some:
> +       // it assignes an earlier time than hoped for, given these =
later
> +       // calculations. Easily observable on the example 7455 based =
PowerMac
> +       // G4. (Faster than G5 tbr increment rate but a slower =
processor,)
> +       // But the overall process is still an improvement.
> +       //
> +       // The pointers into tb technique would involve a pair of =
memory
> +       // writes and a pair of memory reads instead, the writes being
> +       // in the loop. The "+ . . ." would still be involved.
> +       return ( ((uint64_t)tbu<<32) | tbl ) + =
delta_for_approx_match_to_bsp_tbr_values;
> +#endif
> +}
> +
> void
> machdep_ap_bootstrap(void)
> {
> @@ -76,19 +158,76 @@
>        PCPU_SET(awake, 1);
>        __asm __volatile("msync; isync");
>=20
> +#if defined(AIM)
> +       powerpc_sync();
> +       isync();
> +       if (1=3D=3Dalternate_timebase_sync_style)
> +       {
> +               // Part of: Attempt a better-than-historical =
approximately
> +               //          equal timebase value for ap vs. bsp
> +
> +               // No claim to deal with overflow/wraparound of tbr, =
or even
> +               // of the upper bit being on.
> +
> +               register_t oldmsr=3D intr_disable();
> +
> +               while (1u!=3Dfrom_bsp_status_flag)
> +                       ; // spin waiting for bsp to flag that its =
ready to start.
> +
> +               //  Start to measure a round trip:: to the bsp and =
back.
> +
> +               isync(); // Be sure below mftb() result is not from =
earlier speculative execution.
> +               uint64_t const start_round_trip_time_on_ap=3D =
mftb_with_no_pointer_use();
> +               atomic_store_rel_int(&from_ap_status_flag, 1u); // bsp =
waits for such before its mftb().
> +
> +               while (2u!=3Dfrom_bsp_status_flag)
> +                       ; // spin waiting for bsp's tbr value
> +
> +               // Mid-point of ap round trip and the bsp timebase =
value should be approximately equal
> +               // when the tbr's are well matched, absent =
interruptions on both sides.
> +
> +               isync(); // Be sure below mftb() result is not from =
earlier speculative execution.
> +               uint64_t const end_round_trip_time_on_ap=3D =
mftb_with_no_pointer_use();
> +               isync(); // Be sure above mftb() result is not from =
overlapping with the following.
> +
> +               int64_t const approx_round_trip_tbr_delta_on_ap
> +                               =3D (int64_t)end_round_trip_time_on_ap =
- (int64_t)start_round_trip_time_on_ap;
> +               int64_t const ap_midpoint_value
> +                               =3D =
(int64_t)start_round_trip_time_on_ap + =
approx_round_trip_tbr_delta_on_ap/2;
> +
> +               // The mftb_plus_delta use is for helping to the =
control the code order relative
> +               // to tbr access. Such issues are notable for the 7455 =
based 2-socket PowerMacs,
> +               // for example. Faster tbr increment rate than the =
G5's but slower processors
> +               // and such. Still, overall this definitely helps such =
contexts compared to the
> +               // historical style of timebase synchronization.
> +               isync(); // Be sure below mftb() result is not from =
earlier speculative execution.
> +               =
mttb(mftb_plus_delta(&bsp_timebase_sample,ap_midpoint_value));
> +
> +               atomic_store_rel_int(&from_bsp_status_flag, 0u); // =
Get ready for next ap in bsp loop
> +               atomic_store_rel_int(&from_ap_status_flag, 0u);  // =
Flag bsp that this ap is done
> +
> +               mtmsr(oldmsr);
> +       }
> +#endif
> +
>        while (ap_letgo =3D=3D 0)
>                nop_prio_vlow();
>        nop_prio_medium();
>=20
> -       /*
> -        * Set timebase as soon as possible to meet an implicit =
rendezvous
> -        * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
> -        * sets timebase.
> -        *
> -        * Note that this is instrinsically racy and is only relevant =
on
> -        * platforms that do not support better mechanisms.
> -        */
> -       platform_smp_timebase_sync(ap_timebase, 1);
> +#if defined(AIM)
> +       if (0=3D=3Dalternate_timebase_sync_style)
> +#endif
> +       {
> +               /*
> +                * Set timebase as soon as possible to meet an =
implicit rendezvous
> +                * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
> +                * sets timebase.
> +                *
> +                * Note that this is instrinsically racy and is only =
relevant on
> +                * platforms that do not support better mechanisms.
> +                */
> +               platform_smp_timebase_sync(ap_timebase, 1);
> +       }
>=20
>        /* Give platform code a chance to do anything else necessary */
>        platform_smp_ap_init();
> @@ -261,6 +400,34 @@
>                                    pc->pc_cpuid, =
(uintmax_t)pc->pc_hwref,
>                                    pc->pc_awake);
>                        smp_cpus++;
> +
> +#if defined(AIM)
> +                       // Part of: Attempt a better-than-historical =
approximately
> +                       //          equal timebase value for ap vs. =
bsp
> +                       powerpc_sync();
> +                       isync();
> +                       if (1=3D=3Dalternate_timebase_sync_style)
> +                       {
> +                               register_t oldmsr=3D intr_disable();
> +
> +                               =
atomic_store_rel_int(&from_bsp_status_flag, 1u); // bsp ready to start.
> +
> +                               while (1u!=3Dfrom_ap_status_flag)
> +                                       ; // spin waiting for ap to =
flag: time to send a tbr.
> +
> +                               isync(); // Be sure below mftb() =
result is not from earlier.
> +                               bsp_timebase_sample=3D =
mftb_with_no_pointer_use();
> +                               =
atomic_store_rel_int(&from_bsp_status_flag, 2u); // bsp tbr available.
> +
> +                               // Most of the rest of the usage is in =
machdep_ap_bootstrap,
> +                               // other than controling =
alternate_timebase_sync_style value.
> +
> +                               while (0u!=3Dfrom_ap_status_flag)
> +                                       ; // spin waiting for ap to be =
done with the sample.
> +
> +                               mtmsr(oldmsr);
> +                       }
> +#endif
>                } else
>                        CPU_SET(pc->pc_cpuid, &stopped_cpus);
>        }
> @@ -267,14 +434,22 @@
>=20
>        ap_awake =3D 1;
>=20
> -       /* Provide our current DEC and TB values for APs */
> -       ap_timebase =3D mftb() + 10;
> -       __asm __volatile("msync; isync");
> +#if defined(AIM)
> +       if (0=3D=3Dalternate_timebase_sync_style)
> +#endif
> +       {
> +               /* Provide our current DEC and TB values for APs */
> +               ap_timebase =3D mftb() + 10;
> +               __asm __volatile("msync; isync");
> +       }
>=20
>        /* Let APs continue */
>        atomic_store_rel_int(&ap_letgo, 1);
>=20
> -       platform_smp_timebase_sync(ap_timebase, 0);
> +#if defined(AIM)
> +       if (0=3D=3Dalternate_timebase_sync_style)
> +#endif
> +               platform_smp_timebase_sync(ap_timebase, 0);
>=20
>        while (ap_awake < smp_cpus)
>                ;
>=20


I forgot to include a patch that was tied to some PowerPC
reference material's notes about the TB register for at
least some PowerPC family members:

"The TB is a volatile resource and must be initialized during reset."

Note:
Here the ap_cpu was simply deleted because it was
not used anywhere in the file.


Index: /usr/src/sys/powerpc/powerpc/machdep.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/powerpc/powerpc/machdep.c      (revision 356426)
+++ /usr/src/sys/powerpc/powerpc/machdep.c      (working copy)
@@ -142,8 +142,6 @@
 extern vm_paddr_t kernload;
 #endif
=20
-extern void *ap_pcpu;
-
 struct pcpu __pcpu[MAXCPU];
 static char init_kenv[2048];

@@ -283,9 +281,12 @@
        cpu_feature_setup();

 #ifdef AIM
+       // May restart the kernel at __start and get back here again.
        aim_early_init(fdt, toc, ofentry, mdp, mdp_cookie);
 #endif

+       mttb(0); // "The TB is a volatile resource and must be =
initialized during reset."
+
        /*
         * Parse metadata if present and fetch parameters.  Must be done
         * before console is inited so cninit gets the right value of


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FCED3FE8-0839-4A3D-B730-FE4A10B396A9>