Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 May 2019 23:21:43 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: How many segments does it take to span from VM_MIN_KERNEL_ADDRESS through VM_MAX_SAFE_KERNEL_ADDRESS? 128 in moea64_late_bootstrap
Message-ID:  <AEC7FFA4-955B-4F4B-91C0-7B3B054C6BC7@yahoo.com>
In-Reply-To: <1B8116F2-9749-4331-95BD-D528AA52A771@yahoo.com>
References:  <3C69CF7C-7F33-4C79-92C0-3493A1294996@yahoo.com> <6159F4A6-9431-4B99-AA62-451B8DF08A6E@yahoo.com> <20190501094029.542c5f46@titan.knownspace> <212E50E5-7EB1-4381-A662-D5EACB1E5D46@yahoo.com> <C01CF848-890B-407D-876A-9C48F5F3CD40@yahoo.com> <20190501165403.7d8d1f8f@titan.knownspace> <1B8116F2-9749-4331-95BD-D528AA52A771@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Some results, mixed Im afraid.]

On 2019-May-1, at 17:22, Mark Millard <marklmi at yahoo.com> wrote:

> On 2019-May-1, at 14:54, Justin Hibbits <chmeeedalf at gmail.com> =
wrote:
>=20
>> On Wed, 1 May 2019 14:35:56 -0700
>> Mark Millard <marklmi@yahoo.com> wrote:
>>=20
>>>>> What happens if you revert all your patches, =20
>>>>=20
>>>> Most of the patches in Bugzilla 233863 are not for this
>>>> issue at all and are not tied to starting the non-bsp
>>>> cpus. (The one for improving how close the Time Base
>>>> registers are is tied to starting these cpus.) Only the
>>>> aim/mp_cpudep.c and aim/slb.c changes seem relevant.
>>>>=20
>>>> Are you worried about some form of interaction that means
>>>> I need to avoid patches for other issues?
>>>>=20
>>>> Note: for now I'm staying at using head -r345758 as the
>>>> basis for my experiments.
>>>>=20
>>>>> and change this loop to
>>>>> stop at n_slb?  So something more akin to:
>>>>>=20
>>>>> 	int i =3D 0;
>>>>>=20
>>>>> 	for (va =3D virtual_avail; va < virtual_end && i < n_slb -
>>>>> 1; va +=3D SEGMENT_LENGTH, i++);
>>>>> 		...
>>>>>=20
>>>>> If it reliably boots with that, then that's fine.  We can prefault
>>>>> as much as we can and leave the rest for on-demand. =20
>>>>=20
>>>> I'm happy to experiment with this loop without my hack
>>>> for forcing the slb entry to exist in cpudep_ap_bootstrap.
>>>>=20
>>>> But, it seems to presume that the pc_curpcb's will
>>>> all always point into the lower address range spanned
>>>> when cpudep_ap_bootstrap is executing on the cpu.
>>>> Does some known property limit the pc_curpcb->
>>>> references to such? Only that would be sure to
>>>> avoid an slb-miss at that stage. Or is this just an
>>>> alternate hack or a means of getting evidence, not a
>>>> proposed solution?
>>>>=20
>>>> (Again, I'm happy to disable my hack that forces the
>>>> slb entry and to try the loop suggested.) =20
>> ...
>>> And the patch for the loop looks like:
>>>=20
>>> 	virtual_end =3D VM_MAX_SAFE_KERNEL_ADDRESS;=20
>>>=20
>>> 	/*
>>> -	 * Map the entire KVA range into the SLB. We must not fault
>>> there.
>>> +	 * Map the lower-address part of the KVA range into the SLB.
>>> We must not fault there. */
>>> 	#ifdef __powerpc64__
>>> -	for (va =3D virtual_avail; va < virtual_end; va +=3D
>>> SEGMENT_LENGTH)
>>> +	i =3D 0;
>>> +	for (va =3D virtual_avail; va < virtual_end && i<n_slbs-1; va
>>> +=3D SEGMENT_LENGTH, i++) moea64_bootstrap_slb_prefault(va, 0);
>>> 	#endif
>>>=20
>>=20
>> Yep, that's the patch I was going for.
>>=20
>>>=20
>>> So I've built, installed, and have tested some: it did not go well
>>> overall.
>>>=20
>>> Using:
>>>=20
>>> OK set debug.verbose_sysinit=3D1
>>>=20
>>> to show better context about where the hangs occur, shows:
>>> (Typed from a screen picture.)
>>>=20
>>> subsystem a800000
>>> boot_run_interrupt_driven_config_hooks(0)...
>>> . . . (omitted) . . .
>>> done.
>>> vt_upgrade(&vt_consdev). . .
>>>=20
>>> The "vt_upgrade(&vt_consdev). . ." never says done when booting
>>> hangs with the above changes.
>>>=20
>>> Trying to boot a bunch of times did produce one
>>> completed boot, all 4 cpus working. Otherwise I'm
>>> using kernel.old to manage to complete a boot.
>>>=20
>>> I'll note that "vt_upgrade(&vt_consdev). . ." is where
>>> Dennis Clarke reported for the hangups that he was
>>> seeing, without any of my patches being available back
>>> then: 2019-Feb-14.
>>=20
>> Maybe try the commit that caused the problem back in July?  r334498.
>>=20
>=20
> I'd already started down the path of getting materials from:
>=20
> =
https://artifact.ci.freebsd.org/snapshot/head/r347003/powerpc/powerpc64/
>=20
> and putting them on a separate SSD that I sometimes use for =
artifact.ci
> or snapshot experiments. Also: checking out matching svn sources for
> -r347003 and then doing a buildworld buildkernel with a bootstrap gcc
> 4.2.1 compiler used. I'm verifying that I can build it before making
> the source changes for the kernel. The build is of a debug kernel
> (GENERIC64).
>=20
> The test buildworld is still in process.
>=20
> Let me know if this is insufficient for your purposes. I could revert
> to:
>=20
> =
https://artifact.ci.freebsd.org/snapshot/head/r334594/powerpc/powerpc64/
>=20
> (There is no head/r334498/ and the first after that with a
> powerpc64/ is head/r334594/ .)
>=20
> For either head/r347003/ or head/r334594/ :
>=20
> Use of artifact materials allows using officially built files for
> every file but some specific file(s) that I replace. It also allows
> comparison/contrast of the behavior of the official files vs. when
> adjusted ones are substituted.
>=20
> Use of artifact-version materials also means that I know I'm using
> a vintage that actually built --and so I hope to avoid other problems
> getting in the way.

I present without-the-patch results before presenting
with-the-patch results. The end result is mixed, I'm
afraid.



As for the results without any patch,
just artifact materials . . .

Note: "Add debug.verbose_sysinit tunable for VERBOSE_SYSINIT" was
not checked-in until -r335458 .

Trying to boot without any updates or rebuilds, just artifact
materials shows variable stopping points:

(For debug.verbose_sysinit=3D1 :)
-r347003 stops sometimes at: vt_upgrade(&vt_consdev). . .
-r347003 stops sometimes at: cpu_mp_unleash(0). . .

-r334594 stops after: ada0 lines, VERBOSE_SYSINIT not built in



So I had to build my own -r334594 kernel to see verbose_sysinit
information about the stopping point. Again, no patch here,
I just copied over my build of the /boot/kernel/kernel file:

-r334594 stops sometimes at: vt_upgrade(&vt_consdev). . .
-r334594 stops sometimes at: cpu_mp_unleash(0). . .


Summary thus far:

I did not find any obvious difference in how often each stops
in either of the alternatives.

So I'm seeing if the proposed patch changes the behavior of
-r347003 .



Later test of patched -r347003 . . .

The patched kernel is based on:

# svnlite diff /mnt/usr/src/ | more
Index: /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c    (revision 347003)
+++ /mnt/usr/src/sys/powerpc/aim/mmu_oea64.c    (working copy)
@@ -959,7 +959,8 @@
         * Map the entire KVA range into the SLB. We must not fault =
there.
         */
        #ifdef __powerpc64__
-       for (va =3D virtual_avail; va < virtual_end; va +=3D =
SEGMENT_LENGTH)
+       i =3D 0;
+       for (va =3D virtual_avail; va < virtual_end && i<n_slbs-1; va +=3D=
 SEGMENT_LENGTH, i++)
                moea64_bootstrap_slb_prefault(va, 0);
        #endif
=20

So far with the patched code:

-r347003 has never stopped at: vt_upgrade(&vt_consdev). . .
-r347003 stops sometimes at: cpu_mp_unleash(0). . . [but differently!]
-r347003 panics at a particular point the rest of the time

The cpu_mp_unleash hangups report:
(typed from screen pictures)

subsystem f000000
   cpu_mp_unleash(0)... Launching APs 1 2 SMP: 4 CPUs found; 4 CPUs =
usable; 3 CPUs woken

After that it is hung-up.


As for when that does not happen . . .

I do not even have /etc/fstab set up and so end up at the mountroot>
prompt. When I enter "ufs:/dev/daa0s3" I get a panic for:

panic: mtx_lock of spin mutex WWV @ =
/mnt/usr/src/sys/powerpc/aim/mmu_oea64.c:2812
(it is a debug-kernel build)

For reference, line 2812 is: PMAP_LOCK(pm);

panic is reached via an interesting(?) call chain,
showing the backtrace (typed from screen pictures):

.__mtx_lock_flags+0xd4
.moea64_sync_icache+0x48
.pmap_sync_icache+0x90
.ppc_instr_emulate+0x1b4
.trap+0x10fc
.powerpc_interrupt+0x2cc
user PGM trap by 0x810053bb4: srr1=3D0x900000000008d032
r1=3D0x3ffffffffffffcc00 cr=3D0x20002024 xer=3D0 ctr=3D0x1 =
r2=3D0x81007bdd0 frame=3D0xe000000070ca9810

It was thread pid 28 tid 100097

So far these details seem consistent.

But I will note that openfirmware use via ofwdump -ap
and the like causes system crashes going back to when
the direct map base was moved to high memory addresses
( -r330610 and later ). This is one of the reasons I
want to avoid openfirmware and use the conversion to
fdt instead. (There is a -r330614 artifact to test
such crashes with --or use a later one that otherwise
boots.)


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AEC7FFA4-955B-4F4B-91C0-7B3B054C6BC7>