FreeBSD Mail Archives

Date:      Mon, 4 Mar 2019 01:40:18 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Mark Millard via freebsd-hackers <freebsd-hackers@freebsd.org>
Subject:   Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed]
Message-ID:  <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com>
In-Reply-To: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com>
References:  <B898BF60-2872-4FFC-AD72-A32591BC7D20@yahoo.com> <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com>

[I did some testing of other figures than testing for < 0x10.]

On 2019-Mar-3, at 13:23, Mark Millard <marklmi at yahoo.com> wrote:

> [So far the hack has been successful. Details given later
> below.]
>=20
> On 2019-Mar-2, at 21:20, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [This note goes in a different direction compared to my
>> prior evidence report for overflows and the later activity
>> that has been happening for it. This does *not* involve
>> the patches associated with that report.]
>>=20
>> I view the following as an evidence-gathering hack:
>> showing the change in behavior with the code changes,
>> not as directly what FreeBSD should do for powerpc64.
>> In code for defined(__powerpc64__) && defined(AIM)
>> I freely use knowledge of the PowerMac G5 context
>> instead of attempting general code.
>>=20
>> Also: the code is set up to record some information
>> that I've been looking at via ddb. The recording is
>> not part of what changes the behavior but I decided
>> to show that code too.
>>=20
>> It is preliminary, but, so far, the hack has avoided
>> buf*daemon* threads and pmac_thermal getting stuck
>> sleeping (or, at least, far less frequently).
>>=20
>>=20
>> The tbr-value hack:
>>=20
>> =46rom what I see the G5 various cores have each tbr running at the
>> same rate but have some some offsets as far as the base time
>> goes. cpu_mp_unleash does:
>>=20
>>       ap_awake =3D 1;
>>=20
>>       /* Provide our current DEC and TB values for APs */
>>       ap_timebase =3D mftb() + 10;
>>       __asm __volatile("msync; isync");
>>=20
>>       /* Let APs continue */
>>       atomic_store_rel_int(&ap_letgo, 1);
>>=20
>>       platform_smp_timebase_sync(ap_timebase, 0);
>>=20
>> and machdep_ap_bootstrap does:
>>=20
>>       /*
>>        * Set timebase as soon as possible to meet an implicit =
rendezvous
>>        * from cpu_mp_unleash(), which sets ap_letgo and then =
immediately
>>        * sets timebase.
>>        *
>>        * Note that this is instrinsically racy and is only relevant =
on
>>        * platforms that do not support better mechanisms.
>>        */
>>       platform_smp_timebase_sync(ap_timebase, 1);
>>=20
>>=20
>> which attempts to set the tbrs appropriately.
>>=20
>> But on small scales of differences the various tbr
>> values from different cpus end up not well ordered
>> relative to time, synchronizes with, and the like.
>> Only large enough differences can well indicate an
>> ordering of interest.
>>=20
>> Note: tc->tc_get_timecount(tc) only provides the
>> least signficant 32 bits of the tbr value.
>> th->th_offset_count is also 32 bits and based on
>> truncated tbr values.
>>=20
>> So I made binuptime avoid finishing when it sees
>> a small (<0x10) step backwards for a new
>> tc->tc_get_timecount(tc) value vs. the existing
>> th->th_offset_count value (values strongly tied
>> to powerpc64 tbr values):
>>=20
>> void
>> binuptime(struct bintime *bt)
>> {
>>       struct timehands *th;
>>       u_int gen;
>>=20
>>       struct bintime old_bt=3D *bt; // HACK!!!
>>       struct timecounter *tc; // HACK!!!
>>       u_int tim_cnt, tim_offset, tim_diff; // HACK!!!
>>       uint64_t freq, scale_factor, diff_scaled; // HACK!!!
>>=20
>>       u_int try_cnt=3D 0ull; // HACK!!!
>>=20
>>       do {
>>               do { // HACK!!!
>>                   th =3D timehands;
>>                   tc =3D th->th_counter;
>>                   gen =3D atomic_load_acq_int(&th->th_generation);
>>                   tim_cnt=3D tc->tc_get_timecount(tc);
>>                   tim_offset=3D th->th_offset_count;
>>               } while (tim_cnt<tim_offset && =
tim_offset-tim_cnt<0x10);
>>               *bt =3D th->th_offset;
>>               tim_diff=3D (tim_cnt - tim_offset) & =
tc->tc_counter_mask;
>>               scale_factor=3D th->th_scale;
>>               diff_scaled=3D scale_factor * tim_diff;
>>               bintime_addx(bt, diff_scaled);
>>               freq=3D tc->tc_frequency;
>>               atomic_thread_fence_acq();
>>               try_cnt++;
>>       } while (gen =3D=3D 0 || gen !=3D th->th_generation);
>>=20
>>       if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>>               *(volatile uint64_t*)0xc000000000000020=3D =
bttosbt(old_bt);
>>               *(volatile uint64_t*)0xc000000000000028=3D =
bttosbt(*bt);
>>               *(volatile uint64_t*)0xc000000000000030=3D freq;
>>               *(volatile uint64_t*)0xc000000000000038=3D =
scale_factor;
>>               *(volatile uint64_t*)0xc000000000000040=3D tim_offset;
>>               *(volatile uint64_t*)0xc000000000000048=3D tim_cnt;
>>               *(volatile uint64_t*)0xc000000000000050=3D tim_diff;
>>               *(volatile uint64_t*)0xc000000000000058=3D try_cnt;
>>               *(volatile uint64_t*)0xc000000000000060=3D diff_scaled;
>>               *(volatile uint64_t*)0xc000000000000068=3D =
scale_factor*freq;
>>               __asm__ ("sync");
>>       } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && =
(0xffffffffffffffffull/scale_factor)<tim_diff) { // HACK!!!
>>               *(volatile uint64_t*)0xc0000000000000a0=3D =
bttosbt(old_bt);
>>               *(volatile uint64_t*)0xc0000000000000a8=3D =
bttosbt(*bt);
>>               *(volatile uint64_t*)0xc0000000000000b0=3D freq;
>>               *(volatile uint64_t*)0xc0000000000000b8=3D =
scale_factor;
>>               *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset;
>>               *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt;
>>               *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff;
>>               *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt;
>>               *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled;
>>               *(volatile uint64_t*)0xc0000000000000e8=3D =
scale_factor*freq;
>>               __asm__ ("sync");
>>       }
>> }
>> #else
>> . . .
>> #endif
>>=20
>> So far as I can tell, the FreeBSD code is not designed to deal
>> with small differences in tc->tc_get_timecount(tc) not actually
>> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely.
>>=20
>> (I make no claim that the hack is a proper way to deal with
>> such.)
>=20
> I did a somewhat over 7 hours buildworld buildkernel on the
> PowerMac G5. Overall the G5 has been up over 13 hours and
> none of the buf*daemon* threads have gotten stuck sleeping.
> Nor has pmac_thermal gotten stuck. Similarly for vnlru
> and syncer: "top -HIStopid" still shows them all as
> periodically active.
>=20
> Previously for this usefdt=3D1 context (with the modern
> VM_MAX_KERNEL_ADDRESS), going more than a few minutes
> without at least one of those threads getting stuck
> sleeping was rare on the G5 (powerpc64 example).
>=20
> So this hack has managed to avoid finding sbinuptime()
> in sleepq_timeout being less than the earlier (by call
> structure/code sequencing) sbinuptime() in timercb that
> lead to the sleepq_timeout callout being called in the
> first place.
>=20
> So in the sleepq_timeout callout's:
>=20
>        if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D =
0) {
>                /*
>                 * The thread does not want a timeout (yet).
>                 */
>        } else . . .
>=20
> td->td_sleeptimo > sbinuptime() ends up false now for small
> enough original differences.
>=20
> This case does not set up another timeout, it just leaves the
> thread stuck sleeping, no longer doing periodic activities.
>=20
> As stands what I did (presuming an appropriate definition
> of "small differences in the problematical direction") should
> leave this and other sbinuptime-using code with:
>=20
> td->td_sleeptimo <=3D sbinuptime()
>=20
> for what were originally "small" tbr value differences in the
> problematical direction (in case other places require it in
> some way).
>=20
> If, instead, just sleepq_timeout's test could allow for
> some slop in the ordering, it could be a cheaper hack then
> looping in binuptime .
>=20
> At this point I've no clue what a correct/efficient FreeBSD
> design for allowing the sloppy match across tbr's for different
> CPUs would be.

Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried
the each of following and they all failed:

&& tim_offset-tim_cnt<0x2
&& tim_offset-tim_cnt<0x4
&& tim_offset-tim_cnt<0x8
&& tim_offset-tim_cnt<0xc

0x2, 0x4, and 0x8 failed for the first boot attempt,
almost mediately having stuck-in-sleep threads.

0xc seemed to be working for the first boot (including
a buildworld buildkernel that did not have to rebuild
much). But the 2nd boot attempt had a stuck-in-sleep
thread by the time I logged in.

By contrast, for:

&& tim_offset-tim_cnt<0x10

I've not it fail so far, after many reboots, a full
buildworld buildkernel, and running over 24 hours
(that included the somewhat over 7 hours for build
world buildkernel). But it might be that some boots
would need a bigger figure.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?75A8BB07-3273-423E-9436-798395BC8640>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation