Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Feb 2019 13:11:52 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, Dennis Clarke <dclarke@blastwave.org>
Subject:   Re: An experimental hack that appears to allow old PowerMacG5 4-core (system total) system to boot reliably (head -r343884 based context)
Message-ID:  <E0F61356-1B3F-4894-9979-A5D27D7E4686@yahoo.com>
In-Reply-To: <466B6E08-5631-41FB-A1FD-263C27519F65@yahoo.com>
References:  <AE42887B-3B50-452F-85AA-CCB382179124@yahoo.com> <CAHSQbTBX3UR0M6V4sOjO9KFMWbi32bCR5TmBj6kF%2BgF9hF_kLg@mail.gmail.com> <466B6E08-5631-41FB-A1FD-263C27519F65@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I explicitly note that my hack is racy. It apepars that
I've finally had an example.]

On 2019-Feb-24, at 13:50, Mark Millard <marklmi at yahoo.com> wrote:



> On 2019-Feb-24, at 13:07, Justin Hibbits <chmeeedalf at gmail.com> =
wrote:
>=20
>> On Sat, Feb 23, 2019 at 1:36 PM Mark Millard <marklmi@yahoo.com> =
wrote:
>>>=20
>>> For sys/powerpc/aim/mp_cpudep.c 's cpudep_ap_bootstrap I added as =
shown below:
>>>=20
>>> +extern void hack_into_slb_if_needed(void* vap); // HACK!!!
>>> +
>>> uintptr_t
>>> cpudep_ap_bootstrap(void)
>>> {
>>> . . .
>>> +       hack_into_slb_if_needed(pcpup->pc_curpcb); // HACK!!!
>>> +
>>>       sp =3D pcpup->pc_curpcb->pcb_sp;

In the above, after the implict slb_insert_kernel, but before
the pcpup->pc_curpcb-> attempt, the slb entry could be replaced
again. There are, after all, other threads in operation before
SI_SUB_SMP starts:

        SI_SUB_KTHREAD_INIT     =3D 0xe000000,    /* init process*/
        SI_SUB_KTHREAD_PAGE     =3D 0xe400000,    /* pageout daemon*/
        SI_SUB_KTHREAD_VM       =3D 0xe800000,    /* vm daemon*/
        SI_SUB_KTHREAD_BUF      =3D 0xea00000,    /* buffer daemon*/
        SI_SUB_KTHREAD_UPDATE   =3D 0xec00000,    /* update daemon*/
        SI_SUB_KTHREAD_IDLE     =3D 0xee00000,    /* idle procs*/
#ifndef EARLY_AP_STARTUP
        SI_SUB_SMP              =3D 0xf000000,    /* start the APs*/
#endif


I've finally had one boot hang-up, apparently from this happening.

>>> and in src/sys/powerpc/aim/slb.c I added an implementation:
>>>=20
>>> +void hack_into_slb_if_needed(void* vap); // HACK!!!
>>> +void hack_into_slb_if_needed(void* vap) // HACK!!!
>>> +{ // HACK!!!
>>> +       struct slb *cache=3D PCPU_GET(aim.slb);
>>> +       vm_offset_t va=3D    (vm_offset_t)vap;
>>> +       uint64_t    slbv=3D  kernel_va_to_slbv(va);
>>> +       uint64_t    esid=3D  va>>ADDR_SR_SHFT;
>>> +       uint64_t    slbe=3D  (esid<<SLBE_ESID_SHIFT) | SLBE_VALID;
>>> +       int i;
>>> +
>>> +       for (i =3D 0; i < n_slbs; i++) {
>>> +               if (i =3D=3D USER_SLB_SLOT)
>>> +                       continue;
>>> +               if (cache[i].slbe =3D=3D (slbe | i))
>>> +                       break;
>>> +       }
>>> +
>>> +       if (i=3D=3Dn_slbs)
>>> +               slb_insert_kernel(slbe,slbv);
>>> +} // HACK!!!
>>> +
>>>=20
>>> So far I've not had any boot hang-ups after this.
>>>=20
>>> Given the random nature of the hang-ups it will be a
>>> while before I conclude for sure how reliable this
>>> change makes booting, but so far so good.
>>>=20
>>> (I recognize that the "break" could be "return"
>>> and then then the "if (i=3D=3Dn_slbs)" would not be
>>> needed.)
>>>=20
>>>=20
>>> Other issues not fixed by this:
>>>=20
>>> This does not change the buf*daemon* randomly getting
>>> hung up (and so timing out on shutdown). This appears
>>> to be the same issue that leads to the fans sometimes
>>> starting to run full-rate because of pmac_thermal
>>> being hun -up.
>>>=20
>>> For  buf*daemon* "top -SHIopid" before shutdown shows
>>> just the ones that will not hang-up. The same goes for
>>> seeing before hand for pmac_thermal vs. the fans.
>>>=20
>>> =3D=3D=3D
>>> Mark Millard
>>=20
>> Hi Mark,
>>=20
>> Fantastic work tracking this down!  So the problem is we now can =
fault
>> when accessing KVA space.  I think we should allow this, otherwise we
>> can hamper performance with reduced KVA size.  I'll have to think
>> about how best to do this.  Would you be willing to test patches I
>> come up with?
>=20
> I'll try to test whatever updates you want but there may be some
> issues with timeliness.
>=20
>=20
>=20
> The reason for the "sometimes" boot-failure is that the entry in the
> slb for the PCB/stack for the CPU being added has sometimes been
> replaced already before the CPU the pcb is for has sufficiently
> configured to allow automatic handling --and other times has not
> yet been replaced: the random slb replacement mechanism.
>=20
> There already is code to handle slb entry replacements but it does
> not work for a CPU still being set up (at the stage of the
> sometimes failure). At least that is what I expect for:
>=20
> # grep -r "handle_kernel_slb_spill" /usr/src/sys/powerpc/
> /usr/src/sys/powerpc/aim/trap_subr64.S:	bl	=
handle_kernel_slb_spill
> /usr/src/sys/powerpc/powerpc/trap.c:       void	=
handle_kernel_slb_spill(int, register_t, register_t);
> /usr/src/sys/powerpc/powerpc/trap.c:handle_kernel_slb_spill(int type, =
register_t dar, register_t srr0)
>=20
> So my hack was to separately do the potential replacement in that
> early time frame to allow the configuration for the CPU to get
> far enough along for the existing mechanism to work. (At least
> that is what I expect that I did.)
>=20
> So far I've had no boot failures of any kind with the hack.
> I've removed the hacks for reporting information and things
> still work.
>=20
> But I've not tried anything extensive after booting because
> things like buf*daemon* threads and pmac_thermal are randomly
> hanging up in/at:
>=20
> mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c
> (mi_swtich seems to have called sched_switch based on the
> "+0x134" and the code in that area --but ched_switch is not
> listed)
>=20
> I've no clue what is safe when one or more buf*daeomon* threads
> make no progress.
>=20
> For shutdown that frequently leads to timeouts for stopping some
> buf*deamon* threads (when all 8 time out it takes about 8 minutes).
> The buf*deamon* that fail are the ones that "top -SHIopid" no
> longer shows.



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E0F61356-1B3F-4894-9979-A5D27D7E4686>